If we made a copy deref, then we need to do dead-write elimination for the
pervious writes or we'll just emit the same copy deref again next time
around. And, at the end of the opt loop, we need to lower copy derefs
because later passes (locals_to_regs, notably) depend on it.
Fixes infinite opt loop on fs-function-inout-array with virgl on NTT.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15899>
rusticl (and clover) would like to get a graceful fail here so they can
fall back to a shadow copy instead of us asserting. We also start
rejecting arrayed surface because isl doesn't allow selecting a QPitch
yet. Even if it did, QPitch is horribly restrictive, even for linear
surfaces, that it likely wouldn't be that useful.
Fixes: e81f3edf76 ("iris: Allow userptr on 1D and 2D images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15903>
Even if we're the first job on some queue, there may be no wait
semaphores but we still need to ensure things happen in-order. (See
the "Implicit Synchronization Guarantees" section of the Vulkan spec.)
The client can submit back-to-back command buffers with no semaphores
between them and it needs to adt the same as if there were a semaphore.
If job->serialize is set because of a barrier or something, we still
need to synchronize across HW queues by waiting on last_job_syncs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
In order to properly wait for a query to be complete, we need to first
wait for the end query job to flush through on the queue. Since query
end is always handled on the CPU, we can do this with a condition
variable. The 2s timeout is taken from ANV.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Instead of having the CPU job execute the CSD job, put both jobs on the
list with the CPU job first which modifies the GPU job which gets kicked
off next. This gives the queue code more visibility into what types of
jobs are actually in the list. In particular, if an indirect compute
job is the last job in a batch buffer, it currently appears as if the
batch ends with CPU work which isn't true because it kicks off GPU work.
In that case, the last job on the list is now a GPU job, which better
matches reality.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
The v3dv kernel driver doesn't support timelines yet but we want
threaded submit and that requires WAIT_PENDING. Fortunately, it should
never sit in this loop for long in practice. The primary use-case is
sorting out dependencies and these checks will always trivially succeed
for non-shared semaphores because v3dv only has a single queue.
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
c78be5da30 ("intel/fs: lower ray query intrinsics") introduced a
helper function using nir_(push|pop)_if which invalidated dominance &
block_index for the replacement of nir_intrinsic_rt_trace_ray.
We can still keep dominance/block_index metadata for the lowering of
nir_intrinsic_rt_execute_callable though.
This change uses 2 different lowering function with correct metadata
preservation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
FLUSH_HDC is sufficient to flush things out to L3, so we'd rather
use that where possible. It's also emulated via DATA_CACHE_FLUSH
on platforms where it isn't supported, so we can use it unconditionally.
We still use DATA_CACHE_FLUSH for invalidating the data cache, and to
flush the DC-tagged cachelines in L3 to be globally-observable.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Push constant loading is not coherent with L3 according to the document
that describes the hardware change for the vertex buffer L3 Bypass
Disable field.
If we've updated a push constant buffer with say, a blorp_buffer_copy,
we may need to flush both the render cache and the tile cache.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
When stream output is active, we need to let the cache tracker know
about any SO buffers, which we access via IRIS_DOMAIN_OTHER_WRITE.
In particular, we may have written to those buffers via another
mechanism, such as BLORP buffer copies. In that case, previous writes
happened via IRIS_DOMAIN_RENDER_WRITE, in which case we'd need to flush
both the render cache and the tile cache to make that data globally-
observable before we begin writing via streamout, which is incoherent
with the earlier mechanism.
Fixes misrendering in Ryujinx.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6085
Fixes: d8cb76211c ("iris: Fix MOCS for buffer copies")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Most clients are L3-coherent these days. However, there are some
notable exceptions, such as push constants, stream output, and command
streamer memory reads and writes.
With the advent of the tile cache, flushing the render or depth caches
alone are no longer sufficient for memory to become globally-observable.
For those, we need to flush the tile cache as well. However, we'd like
to avoid that for L3-coherent clients, as it shouldn't be necessary,
and is expensive.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The render, depth, sampler, and data (HDC) caches are all coherent
with L3. We consider OTHER_READ and OTHER_WRITE to be non-coherent,
as they're kitchen-sink domains which include non-L3-clients.
Starting with Tigerlake, the VF cache is coherent with L3 (because we
set the L3BypassDisable bit in the vertex/index buffer packets).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
On Tigerlake, we use the data cache for reading indirect UBOs instead
of the sampler. But we still use the constant cache for direct UBO
access, so unfortunately we may access it through two different domains.
To work around this, we add a new domain for pull constants (UBOs),
which will be either constant+texture or constant+data.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The bulk of IRIS_DOMAIN_OTHER_READ domain usage was the 3D sampler, but
there were also a few oddball cases like command streamer reads, blitter
access, and so on. The sampler is definitely L3 coherent, but some off
the more esoteric reads may not be, so I'd like to separate them, so
that OTHER_READ can become a non-L3-coherent kitchen-sink domain.
The sampler cases only need TEXTURE_CACHE_INVALIDATE, and can skip the
CONSTANT_CACHE_INVALIDATE we had on IRIS_DOMAIN_OTHER_READ.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
We were using IRIS_DOMAIN_OTHER_READ for read-only depth/stencil access
in an attempt to avoid unnecessary flushing; IRIS_DOMAIN_DEPTH_WRITE
could indicate read-write access.
However, IRIS_DOMAIN_OTHER_READ is clearly the wrong domain. Depth and
stencil data is read via the depth cache, while IRIS_DOMAIN_OTHER_READ
currently corresponds to the sampler cache and constant cache together
(although this will change in future patches).
It's unclear whether this hack was useful. For now, just drop it and
use the correct depth cache domain, even if it's marked as read-write.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
For texture fetches and buffer load the fix is not needed,
and the override creates faulty TGSI.
In addition remove all modifiers from the src in the additional mov
instruction.
Fixes: d1c7a7b131
virgl: Add an extra mov for int outputs from constant and immediate inputs
v2: Move workaround after the use of
virgl_tgsi_rewrite_src_for_input_temp (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15896>
This patch adds support for transcoding ASTC to BC7 (BPTC) and prefers
it over BC3 (DXT5) when hardware supports that format.
BC7 is a much newer format (~2009 vs. ~1999) and offers higher quality
than the older BC3 format. Furthermore, our encoder seems to be faster.
Tapani put together a small benchmark for transcoding a 1024x1024 ASTC
texture, and switching from BC3 to BC7 improves performance of that
microbenchmark by 25% on my Tigerlake NUC (with hardware ASTC disabled
so we can test this path). Presumably, this isn't fundamental to the
formats, but rather reflects the speed of our in-tree compressors.
So, we should use BC7 where possible.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15875>
now zink will init using a priority system if multiple devices are available
multiple devices will ONLY be available if:
* the user does not specify VK_ICD_FILENAMES as they should
* the user does not specify LIBGL_ALWAYS_SOFTWARE
* multiple drivers exist
I've prioritized the virtualized gpu here with the assumption that if
such a thing is detected, the environment is most likely virtualized
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15857>
It can happen that a single vector constant is referenced multiple times
by the same node, with different swizzles.
This needs to be taken into account by checking and updating the
swizzles for all the srcs of a target node when inserting the const
node to the same instruction.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15726>
When emitting the AR forces splitting an ALU group, and at the same time
a new CF instruction is started, then the last instrcution in the finished
CF block might not have the "last" bit set, which results in an invalid
shader that might hang, or crash SB.
So when a new CF is started, force the last bit in the last ALU instruction.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
ALU_SRC_PARAM_BASE is an inline constant that defines the
address for pulling data from LDS memory for interpolation
and not a value from the kcache, so there is no need to
take these values into account when allocating kcache
load slots.
v2: Fix the constant range check to not exclude the translated
ranges for kcache banks 2 and 3.
v3: limit range check to only include kcache values and and
rename relevant function (Emma).
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
We don't need to disable this path if there are indirect or 8/16/64-bit
push constant loads. We can just use the default path for them.
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 2028 -> 1884 (-7.10%)
Instrs: 366 -> 363 (-0.82%); split: -2.46%, +1.64%
Latency: 6630 -> 6579 (-0.77%)
InvThroughput: 26520 -> 26316 (-0.77%)
Copies: 84 -> 102 (+21.43%)
PreSGPRs: 141 -> 222 (+57.45%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
NIR doesn't propagate precise through moves, and with NTT the
last output is usually preceded by a move, so that we no longer
see that the evaluation of some value is supposed to be exact,
and, hence we can't decorate the outputs accordingly.
Fixes with NTT:
dEQP-GLES31.functional.tessellation.common_edge.
triangles_equal_spacing_precise
triangles_fractional_odd_spacing_precise
triangles_fractional_even_spacing_precise
quads_equal_spacing_precise
quads_fractional_odd_spacing_precise
quads_fractional_even_spacing_precise
v2: Don't clear the precise flag when we hit a mov, because we may
hit a if/else construct like below and we don't track branches
IF X
TEMP[0] = OP_PRECICE ...
ELSE
TEMP[0] = MOV CONST[]
ENDIF
Thanks Emma for pointing out the problem.
v2: allocate precise handling flags to transform_prolog (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
drm_shim.c undefines the _FILE_OFFSET_BITS macro, so plain off_t might
be 32 bits, while it's 64 bits in device.c. To avoid this mismatch,
use off64_t which will always be 64 bits in both source files.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
loader_open_render_node returns the first device in /dev/dri that it
can use. To make sure the drm-shim device always gets chosen, return
the fake entries in readdir before returning the real ones.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
We've wanted to remove destructors from ralloc's API for a long time (it's
an extra storage cost per ralloc for a rarely-used feature), and for the
suballoc change we'd need to spend more storage on storing the tu_device
pointer per result since destructors don't get anything else but the
pointer passed into them.
Fixes use-after-frees:
=================================================================
==2383==ERROR: AddressSanitizer: heap-use-after-free on address 0xffff88fe1940 at pc 0xffff934f427c bp 0xfffff5481e90 sp 0xfffff5481ea8
WRITE of size 8 at 0xffff88fe1940 thread T0
#0 0xffff934f4278 in list_del ../src/util/list.h:108
#1 0xffff934f4278 in result_destructor ../src/freedreno/vulkan/tu_autotune.c:237
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#5 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#6 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#7 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
[...]
0xffff88fe1940 is located 80 bytes inside of 112-byte region [0xffff88fe18f0,0xffff88fe1960)
freed by thread T0 here:
#0 0xffff9c1c90d8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
#5 0xffff935cf2ac in tu_queue_submit_locked ../src/freedreno/vulkan/tu_drm.c:997
[...]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
We don't return unused space to the suballocator, so it's a little useful
to limit how much we overallocate to reduce memory footprint. I took a
look through the tu_cs_emit_array() calls and accounted for a couple of
them in the variant-specific space calculation, then dropped the base
allocation by factors of 2 until we started throwing asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
Allocating a BO for each pipeline meant that for apps with many pipelines
(such as Asphalt9 under ANGLE), we would end up spending too much time in
the kernel tracking the BO references.
Looking at CS:Source on zink, before we had 85 BOs for the pipelines for a
total of 1036 kb, and now we have 7 BOs for a total of 896 kb.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
gallium queries have to be mapped onto multiple vulkan queries,
this can happen for two reasons.
1. primitives generated and overflow any don't map directly, and
multiple vulkan queries are needs per iteration. These are stored
inside the "starts" as zink_vk_query ptrs.
2. suspending/resuming queries uses multiple queries, these are
the "starts". Every suspend/resume cycle adds a new start.
Vulkan also requires that multiple queries of the same time don't
execute at once, which affects the overflow any vs xfb normal
queries, so vk_query structs are refcounted and can be shared
between starts. Due to this when the draw state changes, it's
simple to just suspend/resume all queries so the shared vulkan
queries get handled properly.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15785>
This enables a lower power mode in the sampler hardware in certain
common scenarios. On Tigerlake, SAMPLER_MODE is not programmable by
userspace but the kernel already sets this bit for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
While this register still exists, it's no longer a per-context register.
Instead, on Gfx12+, SAMPLER_MODE exists per dual-subslice and is
accessed as a "multicast" register, where you write control which
version is accessed by the "steering control register".
At any rate, userspace cannot write it any longer, and so there's not
much point to it existing in our genxml (which was missing most of the
fields anyway).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
This allows the sampler to perform faster filtering of 8-bit UNORM
textures by filtering them at a different precision. The filtering
is intended to still be OpenGL and DirectX spec compliant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
according to spec, a barrier is required any time the pixels of the
framebuffer are changed. since zink defers clears and runs them at
a later time, it must also be responsible for handling the required
synchronization for such operations
fixes (radv):
KHR-GL46.blend_equation_advanced.blend_all*
fixes#5572
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
according to spec, for fbfetch this should match the subpass self-dependency of
* stage FRAGMENT_STAGE -> FRAGMENT_STAGE
* access 0 -> INPUT_ATTACHMENT_READ
zs fbfetch doesn't seem to be a thing, so that code is left for historical
and/or future purposes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
this is super annoying since it means that a build of zink cannot
be mix-and-matched with an existing build of lavapipe, e.g., for faster
bisecting
the env var should be sufficient to handle this, and if someone sets it
and doesn't have swrast enabled then they can deal with it
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15844>
Previously, the caller allocated storage and tgsi_transform_shader() would
emit into that, returning how many tokens it emitted. All the callers had
to guess at how much storage was necessary, trying not to over-allocate
but also getting enough that you wouldn't (effectively) silently run out
of space.
Instead, make tgsi_transform_shader() do the allocation for you, taking
just a hint of how much space you think you need, and internally double
size when necessary. Fixes failures on virgl with fp64 since we've added
more fp64 virglrenderer workarounds and its old "XXX: is this enough?"
allocation wasn't any more.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15782>
virglrenderer maps atomic accesses to atomic counter declarations using
the .Index field. We were previously emitting a .Index of 0 for array
accesses, so virglrenderer would emit
atomicIncrement(first_counter[counter_offset+array_index]). This would
mostly work because hardware doesn't care about the bounds of counter
declarations, but if the first counter was a non-array, then the [] GLSL
emit gets dropped (can't array access a scalar!) and you'd access the
non-array first_counter instead.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15824>
this is an illegal alignment, so clamp the range to the nearest
texel offset since the shader should be hitting the robustness
case for the partial texel
affects:
dEQP-GLES31.functional.texture.texture_buffer.modify.mapbuffer_readwrite.range_size_65537
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15827>
This uses WRITE_DATA on the ME engine to reset the register, to match what
PAL does on GFX9+.
This fixes
KHR-GL45.transform_feedback_overflow_query_ARB.multiple-streams-one-buffer-per-stream
on zink/radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15812>
The array is allocated for VkDrmFormatModifierPropertiesEXT, so
writring entried with type VkDrmFormatModifierProperties2EXT is
bogus.
It seems this was a mistake added with a change intended to get
rid of VK_OUTARRAY_MAKE, that changed the type of the write by
mistake.
Fixes: 56a2ccf058 ('v3dv: Stop using VK_OUTARRAY_MAKE()')
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15819>
There are reliability problems with the RTL8153 ethernet driver under
certain network loads, related to incompatibility of the device with
Link Power Management.
Add usbcore.quirks=0bda:8153:k to the kernel command line to enable the
USB_QUIRK_NO_LPM option.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15791>
This happens when we have a sequence of multiple beginQuery / endQuery
with the same target and query identifier.
As a BO is created on beginQuery but not free on endQuery (we need to
wait until either explicitly deleting the query or querying the
results), in the above sequence we are basically leaking the BO.
This ensures that any BO created before is unreference before creating
the new one.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15796>
We don't know if an application developer wants error-dialog by default
or not, even when using a Mesa OpenGL driver.
So let's not assume this is a good thing, and instead make an option for
this behavior that we can enable for CI testing.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15485>
These dialogs only exist on Windows, so let's not even expose a util
function for this on other platforms.
The code is only ever called from Windows specific code anyway.
While we're at it, clean up the name a bit as well.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15485>
Move one of the is_supported() check before we start filling the
structure so we don't end up with a partially filled object when
we return VK_ERROR_FORMAT_NOT_SUPPORTED (which deqp doesn't seem to like,
so it's probably coming from a spec requirement).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15698>
VK_REMAINING_ARRAY_LAYERS maps to -1 in the D3D12 world. Let's make sure
we set WSize to -1 in that case, because the layer_count calculated by
dzn_get_layer_count() won't work for 3D images which never have more
than one layer (in case of 3D images, we treat slices as layers).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15698>
We have been reconstructing/rematerializing uniforms for a while, but we
can do this in more scenarios, namely instructions which result is
immutable along the execution of a shader across all channels.
By doing this we gain the capacity to eliminate TMU spills which not
only are slower, but can also make us drop to a fallback compilation
strategy.
Shader-db results show a small increase in instruction counts caused
by us now being able to choose preferential compiler strategies that
are intended to reduce TMU latency. In some cases, we are now also
able to avoid dropping thread counts:
total instructions in shared programs: 12658092 -> 12659245 (<.01%)
instructions in affected programs: 75812 -> 76965 (1.52%)
helped: 55
HURT: 107
total threads in shared programs: 416286 -> 416412 (0.03%)
threads in affected programs: 126 -> 252 (100.00%)
helped: 63
HURT: 0
total uniforms in shared programs: 3716916 -> 3716396 (-0.01%)
uniforms in affected programs: 19327 -> 18807 (-2.69%)
helped: 94
HURT: 50
total max-temps in shared programs: 2161796 -> 2161578 (-0.01%)
max-temps in affected programs: 3961 -> 3743 (-5.50%)
helped: 80
HURT: 24
total spills in shared programs: 3274 -> 3266 (-0.24%)
spills in affected programs: 98 -> 90 (-8.16%)
helped: 6
HURT: 0
total fills in shared programs: 4657 -> 4642 (-0.32%)
fills in affected programs: 130 -> 115 (-11.54%)
helped: 6
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15710>
Bit 20 isn't actually MERGEDREGS, the mode for the entire geometry
pipeline is controlled by SP_VS_CTRL_REG0::MERGEDREGS and it appears to
be something preamble-related instead since writing any register in the
preamble hangs if it's set. This fixes those hangs on freedreno and
turnip since we no longer set it.
Fixes: fccc35c2de ("ir3: Add preamble optimization pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15801>
since 1.0 is used in nearly every case, drivers requiring this exporting
can avoid potential shader variants by adding a 1.0 export to the base
shader variant and the only using the ubo upload when pointsize is explicitly
set for wide point functionality
drivers can then be responsible for removing unused pointsize exports
as needed (or desired)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15699>
We don't need to restrict our timeout to 5 seconds, because the kernel's
hangcheck will ensure that the wait completes in finite time if the GPU
gets wedged. If the GPU is making progress, we don't want to time out
early and have pipe_transfer_map() return an error, causing glReadPixels()
to throw a confusing GL_OOM even though we're not out memory.
The INFINITE arg to this function isn't actually infinite, it's limited to
an hour. But an hour of GPU processing to wait on is probably plenty.
This 5s timeout has caused problems with the CTS on freedreno at high
parallelism, and I suspect is the cause of recent issues in the closed
traces replay jobs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15805>
Much less noisy, and provides a path to further improvements. There is a slight
behaviour change: int-to-float conversions now use RTE instead of RTZ. For
32-bit opcodes, this affects conversions of integers with magnitude greater than
2^23 by at most 1 ulp. As this behaviour is unspecified in GLSL, this change is
believed to be acceptable.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>
To prepare for defeaturing round modes, replace uses of round-to-positive with
round-to-even in our unit tests. This doesn't meaningfully impact test coverage;
there is no way to generate that round mode.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>
There are new data structures that we need to (dirty) track. Add the
corresponding fields so we can proceed as with Bifrost dirty tracking.
Trivial increase in memory usage, but that should not matter as batches are
few.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15797>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. The
maximum number of gradient components and coordinate components should
be 2. In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment
The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit. I just want to make
sure this commit gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
Valhall removed the ability to set an explicit primitive restart index as
required by desktop OpenGL, in favour of fixed primitive restart indices only as
required by OpenGL ES.
Set the CAPs accordingly so that mesa/st lowers unusual primitive restart
indices at draw call time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15795>
Surface descriptors have been replaced by plane descriptors, which facilitate
the intermediate layout of textures. This allows for more sophisticated handling
of texture compressions, of particular to interest to copy_image. However, it
requires a considerable amount of new logic to handle.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15795>
Issue: When a video is decoded where the max_references is updated the
decoder keeps using same old value. This results into green patches and
decoding is not proper.
Root Cause: The max_references is updated only once when the instance is
created for the first time.
Fix: Added a check along with the context->decoder to check if the
context->templat.max_references has changed. If yes, then we go ahead
and create the decoder again.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Krunal Patel <krunalkumarmukeshkumar.patel@amd.corp-partner.google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15750>
Based on hardware behaviour, it appears vertex_id is zero-based with the legacy
geometry flow but not with the new malloc IDVS flow. Since the geometry flow is
per-shader (not per-machine), there's not a good way to communicate this to NIR.
Rather than trying to shoehorn this obscure detail into NIR, just do the
lowering ourselves instead of in NIR. It's not much more code anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Our swizzle lowering optimizations depend on replication of scalar fp16. This
holds on Bifrost (at least for now), but not on Valhall which has proper support
for write masks. For now, enforce Bifrost-compatible behaviour as we do not make
use of the write masks on Valhall yet.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Varying stores was changed in Valhall. Rather than using attribute descriptors
like on Bifrost and Midgard, on Valhall we store to memory directly with
hardware-allocated buffers. This requires a new implementation of store_output,
with special provisions for writing gl_PointSize from a position shader.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Something an instruction has two logic flow controls, namely wait + reconverge.
These are orthogonal -- we need to insert a NOP to handle this. Add a lowering
pass that works out flow control to replace the ad hoc previous va_pack_flow.
Fixes dEQP-GLES31.functional.ssbo.layout.single_basic_type.shared.lowp_vec3.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15756>
Similar to pipeline statistics but done for a single counter.
We use REG_A6XX_RBBM_PRIMCTR_7 to get generated primitives
and not PRIMCTR_8 because PRIMCTR_7 counts pre-clipped prims
while PRIMCTR_8 counts them after clipping.
OpenGL spec for GL_PRIMITIVES_GENERATED says:
"Subsequent rendering will increment the counter once for every
vertex that is emitted from the geometry shader, or from the
vertex shader if no geometry shader is present."
Passes tests:
dEQP-VK.transform_feedback.primitives_generated_query.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15746>
This has one negative side-effect; we're no longer able to print
validation errors without dxcompiler.dll. I doubt that's a real problem,
but if it is, we should add this ability to dxil_validator instead of
having a second implementation here.
The reasons I didn't try adding this in the first place is:
1. This code seems a bit janky; it doesn't consult the "known"-variable
to figure out if the encoding is OK, and it's lacking a fallback path
in that case.
2. It seems unlikely that the compiler varies the encoding of the output
in the first place; one of the two code-paths in here is probably
untested.
3. Since dxil_validator leaves reporting to the call-site, we'd need to
either add and output-encoding to the API (yuck), or re-encode the
string to UTF-8 using WinAPI.
Right now, it seems questionable if fixing all of the above is worth it.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15751>
by utilizing a separate slot map for patch variables, the entire i/o
assignment mechanism can be simplified to accurately manage i/o for all
types of variables and avoid location conflicts
affects:
KHR-Single-GL46.enhanced_layouts*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15770>
the previous methodology triggered warnings any time a rasterizer state
was created with unsupported features without determining whether those
features would actually be used
a more optimal process is to check for missing features at pipeline creation,
as all the necessary info is now available, and spurious warnings can be avoided
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15778>
deqp-runner uprevved to reduce memory usage on HW runners, let us
experiment with shader cache on tmpfs, and hopefully provide a tool for
virgl to be able to plausibly run piglit under crosvm instead of vtest.
piglit uprevved to avoid a flake in softpipe in glx-multithread-texture,
and improve performance of the test, too. This also brings in the
fbo-blending-format-quirks fix to properly initialize the buffers, fixing
some fails/flakes.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15419>
When porting lavapipe to the common sync framework, I stole the dummy
sync signal_for_memory idea from RADV but didn't actually do it
correctly. Unless you set wsi_device::signal_semaphore_with_memory and
wsi_device::signal_fence_with_memory, it doesn't actually signal
anything. If you do set those, it works but also results in dummy
syncs being created for present fences which we see as signals. We
could choose to just skip those like RADV does but that's too magic.
Instead, have our own AcquireNextImage2() again which sets dummy syncs.
Fixes: 3b547a9b58 ("lavapipe: Switch to the common sync framework")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15774>
Cube descriptors require a different sampling instruction in shader,
however we don't know whether image is a cube or not until the start
of a renderpass. We have to patch the descriptor to make it compatible
with how it is sampled in shader.
For the reference subpassLoad is currently translated into isaml.a
Blob v615 also doesn't handle this case correctly.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15734>
When we don't have a good dxcompiler.dll that we can load IDxcLibrary
from to help with diagnostics, we currently return true for validation
even if the validation actually failed.
Let's fix that, and also add a debug-message explaining what went wrong
for those who are debugging and wondering what's up.
Fixes: 2ea15cd661 ("d3d12: introduce d3d12 gallium driver")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15744>
In order to ensure consistent results when running performance tests,
lock the frequency for Intel GPUs to ~70% of the maximum allowed by
hardware.
This seems to offer a good balance between execution speed and results
consistency.
An increase of the frequency will also increase the rate of throttling
events, with a negative impact on consistency. Such events are logged,
as in the following example:
GPU throttling detected: act=200 min=850 cur=850 RPn=100
This shows the actual GPU frequency (200 MHz) dropped below the minimum
requested (850 MHz).
For more details about the various frequency information sources, please
see the script header comments in ".gitlab-ci/common/intel-gpu-freq.sh".
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
Add script to manage Intel GPU frequencies.
It can be used for debugging performance problems or to lock a stable
frequency while executing benchmark tests.
Typical use cases:
- Get all available GPU frequency information
$ ./intel-gpu-freq.sh -g all
* Hardware capabilities
RP0: 1350 MHz
RPn: 100 MHz
RP1: 400 MHz
* Enforcements
max: 1350 MHz
min: 100 MHz
boost: 1350 MHz
* Actual
act: 100 MHz
cur: 400 MHz
- Lock frequency to 80% of the maximum allowed by hardware and enable
throttling detection
$ ./intel-gpu-freq.sh -s 80% -d
GPU throttling detected: act=1050 min=1350 cur=1350 RPn=100
GPU throttling detected: act=1100 min=1350 cur=1350 RPn=100
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
Vulkan spec says:
"When an image view of a depth/stencil image is used as a depth/stencil
framebuffer attachment, the aspectMask is ignored and both depth and
stencil image subresources are used."
Since we use two planes for D32S8 format we have to add a special
case for depth in addition to already existing case for stencil.
Fixes hang in CTS:
dEQP-VK.renderpass.depth_stencil_write_conditions.stencil_kill_write_d32sf_s8ui
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15532>
- When resolving d32s8 to s8 we stored stencil with a wrong format.
- For unaligned multi-sample store we used wrong gmem offset for stencil.
If unaligined store is forced this change fixes a hang in:
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d32_sfloat_s8_uint_separate_layouts.compatibility_depth_zero_stencil_zero_testing_stencil
Fixes: b157a5d0d6
("tu: Implement non-aligned multisample GMEM STORE_OP_STORE")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15532>
this interaction requires read-only sampler access from
depth component with writes to the stencil component, which can only
be done in the GENERAL layout
affects:
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_stencil_blit
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15716>
ssbo binds are always at least readable, so without deeper shader
inspection, never assume that write binds mean no read access
this is different than image descriptors which can explicitly get
write-only access set
cc: mesa-stable
fixes#6231
THANKS TO @daniel-schuermann FOR SOLVING THIS WITH HIS INCREDIBLE TALENT AND WIT
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15697>
The common Vulkan sync framework will do most of the queueing for us.
It will even sort out timeline semaphore dependencies and ensure
everything executes in-order. All we have to do is make sure our
vk_sync type implements VK_SYNC_FEATURE_WAIT_PENDING. This lets us get
rid of a big pile of code.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15651>
dEQP-VK.binding_model.buffer_device_address.set3.depth3.basessbo.convertuvec2.nostore.multi.scalar.vert
runtime -24.4002% +/- 1.94375% (n=7). The win (I think) is in LLVM not
having to chew through handling the extra loops on every constant-offset
SSBO load, not in actual rendering time.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
If we know the offset is constant, we don't have ask LLVM to loop over the
elements pulling the same value out over and over.
This doesn't seem to have produced a win in the testcase I was looking at,
but it was an easier entrypoint to figuring out how to do scalar memory
access than load_memory, and will probably affect some workload.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
In a fragment shader or inside of control flow, invocation 0 might be
inactive, and so our use-first-invocation-and-broadcast optimizations
would be invalid, and the loop logic of an emit_read_invocation would
defeat the point of these optimized paths.
For load_kernel_input, I didn't guard the uniform path with the check
because some CL tests that are currently passing are doing the
load_kernel_input under (presumably) uniform control flow. Instead, I
dropped in a once warning for the next person to be debugging CL.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
You're not supposed to include a '\n' in mesa_log*() messages because
android logging will log what you provide on its own line anyway, so each
mesa_log() should be the body of a log line. But also, getting everyone
to consistently not do that is hopeless because we're all so trained by
printf(). So, just detect an existing \n and don't add a new one.
Cleans up deqp-vk debug output a bunch from turnip.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15332>
We do require C++20 still, because designated initializers is part of
that standard. This is almost a revert, but conditionally selecting
between c++latest or c++20 when available, as that's what we really want.
Fixes: 55ca1c8db3 ("vulkan/microsoft: Remove `override_options: ['cpp_std=c++latest']` option for visual studio")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
Designated initializers are a C++20 feature, but we don't use C++20. In
fact, enabling C++20 for ACO triggers new compiler errors due to some
equality semantics details.
So let's instead stop using designated initializers here.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
We can now use the same cache tracking mechanism for synchronizing QBO
writes instead of the unconditional PIPE_CONTROL performed currently,
which is unable to invalidate any incoherent caches which may contain
stale data for the buffer object.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15738>
The unconditional flushing performed by
iris_flush_and_dirty_for_history() is now redundant with the memory
barriers introduced previously in this series, which should be in a
better position to determine from which domain the buffer will
actually be used in the future, and whether an additional flush or
invalidation is required or redundant with other PIPE_CONTROL commands
emitted elsewhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15738>
We don't support 'Update After Bind', however, the limits for this
model also include the ones without it. See the with or without remark
in the spec below:
"maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks is similar to
maxPerStageDescriptorInlineUniformBlocks but counts descriptor bindings
from descriptor sets created with or without the
VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit set."
Fixes:
dEQP-VK.api.info.vulkan1p2_limits_validation.ext_inline_uniform_block
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15732>
Since the packing functions generated by csbgen use a void pointer
for the buffer in which to pack, it's possible to easily write out
of bounds. This commits attempts to reduce the chances by
having the pack macro check that the pointer passed points to an
element sized equally to the state word being packed. Catching
these errors earlier.
As can be seen in this commit, there already was a case of this:
"pds_ctrl". The word size is meant to be 64 bits but the pointer
was pointing to a 32 bit field.
Although it's fine for the word size to be smaller than the
storage pointed to by the pointer, this is not allowed just to
be extra careful.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15687>
If we know the height is 1, then it would be a waste to align each
miplevel to tile height. For non-mipmapped textures, it doesn't save us
memory (since you still align to 4 on the last miplevel), but it should be
better cache locality by not loading those unused lines.
Incidentally, this gets us some more coverage of swap != WZYX cases in CTS
tests, which often use optimal tiling without also testing linear.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This field does appear to work as expected: with 1D/1DArray turnip storage
images switched to be always linear, it fixes the dEQP-VK.image.*store*
tests using a color swapped format (once we allow color swap).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This reports all of our storage formats as supporting read/write without
format, since we don't have any in-shader format conversions. Similarly,
shadow comparisons were already supported on all the depth formats.
This extension is required for VK 1.3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This is a potential performance optimization, from the docs:
The PVS Instruction which updates the clip coordinatea position for
the last time. This value is used to lower the processing priority
while trivial clip and back-face culling decisions are made. This
field must be set to valid instruction.
Right now it is set at the last instruction. However, there is no
measurable performance effect in either Unigine Sanctuary, Lightsmark
or GLmark.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6045
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15677>
This ensures that any channels used for helper invocations are
also spilled/filled correctly.
Alternatively, we could recursively track all temps that get
involved in computing values that are then used in explicit
(dfdx,dfdy) or implicit (texture coordinates for mipmap or
anisotropic filtering, etc) derivatives, and only enable
per-quad on these (or disable spilling of any of these
values).
Fixes:
dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15705>
This extension has been promoted to Vulkan 1.2 which means it has been
silently enabled when we implemented Vulkan 1.2.
Enable it explicitely to make mesamatrix happy and also for consistency.
This extension was designed for potential performance improvements of
MSAA depth/stencil images but it's currently a no-op in RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15665>
The current code was allowing multiple query pools of the same type to be
active on the same commmand buffer which creates validation warnings.
Restructure the query handling, so the query pools are allocated on the
context on first use, then have queries allocate query_id from those pools.
This approach creates a new start dynamic array that contains each time a
vulkan query is started for a gallium query, and holds the flags for it rather
than storing them in a massive array.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15702>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube
maps and 3D surfaces were already handled, but 1D array and 2D array
surfaces were not.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment
The Fixes: tag below is a bit misleading. This commit adds another
lowering, similar to the one in the Fixes: commit, that probably should
have been added at the same time. I just want to make sure this commit
gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
Moving duplicate vk_format helper functions to common
vulkan/util/vk_format.h and also renaming
vk_format_get_component_size_in_bits to match how amd and
freedreno name the same function. Not moving this function
to common code as freedreno's implementation is a bit different.
Signed-off-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15696>
In cases when no immutable samplers were present but sampler
descriptor set layout bindings were, a seg fault was being caused
by an attempt to get the immutable sampler from within the
immutable sampler array while the array was not allocated.
This commit also remove the binding type check since only those
specific types can have immutable samplers. The check is covered
by the descriptor set layout creation and assignment of
has_immutable_samplers.
This commit also makes the immutable samplers const, since they're
meant to be immutable.
This commit also adds has_immutable_samplers field to descriptor
set layout. Previously to check whether immutable
samplers were present or not you'd check for the layout binding's
descriptor count to be 0 and the immutable sampler offset to be 0.
This doesn't tell you everything. If you have a descriptor of the
appropriate type (VK_DESCRIPTOR_TYPE_{COMBINED_,}IMAGE_SAMPLER).
So descriptor count of >1. The offset can be 0 if you don't have
immutable sampler, or have them at the beginning of the immutable
samplers array. So you can't determine if you really have immutable
samplers or not. One could attempt to perform a NULL check on the
array but this would not work in cases where you have following set
layout bindings with immutable samplers as the array stores the
whole layout's immutable samplers.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15688>
radv_image_queue_family_mask was still using queue family index values
for the special cases, while being passes a radv_queue_family enum.
This adds the bits to the enum and vk_queue_to_radv so we can implement
radv_image_queue_family_mask completely in terms of the enum.
Fixes: 1ec4e568 ("radv: abstract queue family away from queue family index.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15423>
This is for compatibility with loaders that don't know about
__DRI_DRIVER_GET_EXTENSIONS. xserver has supported that since 1.15.0,
which is almost nine years old now.
While we're about it, fix a comment in meson.build that used to be about
megadrivers to reflect reality.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15649>
In the case where u_index_generator returns zero new vertices, we never
filled tmp_indices before trying to duplicate the last veretx. This
causes us to read unitialized memory.
This fixes a Valgrind issue triggering in glxgears on Zink:
---8<---
==296461== Invalid read of size 2
==296461== at 0x570F335: compile_vertex_list (vbo_save_api.c:733)
==296461== by 0x570FEFB: wrap_buffers (vbo_save_api.c:1021)
==296461== by 0x571050A: upgrade_vertex (vbo_save_api.c:1134)
==296461== by 0x571050A: fixup_vertex (vbo_save_api.c:1251)
==296461== by 0x57114D1: _save_Normal3f (vbo_attrib_tmp.h:315)
==296461== by 0x10B750: ??? (in /usr/bin/glxgears)
==296461== by 0x10A2CC: ??? (in /usr/bin/glxgears)
==296461== by 0x4B3F30F: (below main) (in /usr/lib/libc.so.6)
==296461== Address 0x11ca23de is 2 bytes before a block of size 1,968 alloc'd
==296461== at 0x4845899: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==296461== by 0x570E647: compile_vertex_list (vbo_save_api.c:604)
==296461== by 0x570FEFB: wrap_buffers (vbo_save_api.c:1021)
==296461== by 0x571050A: upgrade_vertex (vbo_save_api.c:1134)
==296461== by 0x571050A: fixup_vertex (vbo_save_api.c:1251)
==296461== by 0x57114D1: _save_Normal3f (vbo_attrib_tmp.h:315)
==296461== by 0x10B750: ??? (in /usr/bin/glxgears)
==296461== by 0x10A2CC: ??? (in /usr/bin/glxgears)
==296461== by 0x4B3F30F: (below main) (in /usr/lib/libc.so.6)
---8<---
Fixes: dcbf2423d2 ("vbo/dlist: add vertices to incomplete primitives")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15633>
Now that we have a threading mode in the device, we can set that based
on the environment variable instead of delaying it to submit time. This
allows us to avoid the static variable trickery we use to avoid reading
environment variables over and over again. We also move the enabling of
the submit thread up a level or two and give it a bit more obvious
condition.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15566>
Fossilize used the git:// protocol for fetching submodules
but as of January 11, 2022, GitHub has tempirarily disabled
acceptance of the Git protocol until March 15, 2022 whereby
it will be permanantly disabled.
This patch uprevs to the Fossilize commit that switches
submodule URLs to https:// instead. Otherwise, we would get
an error stating that "The unauthenticated git protocol on
port 9418 is no longer supported." when trying to clone
submodules.
See https://github.blog/2021-09-01-improving-git-protocol-security-github
for more info.
Signed-off-by: Omar Akkila <omar.akkila@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15626>
if the driver requires pointsize uploads, only flag the last vertex
stage for updates, not all vertex stages
this should be functionally equivalent but without the unnecessary overhead
of also scanning the other stages
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15590>
the vulkan spec doesn't explicitly state whether this function progresses
a given semaphore's timeline, and apparently there are some cases where
it's assumed that progress occurs if this function is called in a loop
instead of WaitSemaphores, so check the current fence for completion
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15453>
according to KHR_gl_texture_3D_image:
If <target> is EGL_GL_TEXTURE_3D_KHR, <buffer> must be the name of a
complete, nonzero, GL_TEXTURE_3D (or equivalent in GL extensions) target
texture object, cast
into the type EGLClientBuffer. <attr_list> should specify the mipmap
level (EGL_GL_TEXTURE_LEVEL_KHR) and z-offset (EGL_GL_TEXTURE_ZOFFSET_KHR)
which will be used as the EGLImage source; the specified mipmap level must
be part of <buffer>, and the specified z-offset must be smaller than the
depth of the specified mipmap level.
thus a 2d view of a 3d surface is not only legal, it's part of the spec and
must be supported when available
cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15584>
Right now we always copy pos to wpos regardless of if the fragment
shader needs it or not. Instead just do it only for the shaders
that really need it.
Shader-db is not able to measure preciselly the effect as we
build only the non-wpos variants, but given that majority of fragment
shaders don't needs the wpos, the real effect should be close:
total instructions in shared programs: 105427 -> 104313 (-1.06%)
instructions in affected programs: 53098 -> 51984 (-2.10%)
total temps in shared programs: 13991 -> 13958 (-0.24%)
temps in affected programs: 114 -> 81 (-28.95%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Make the r300_vertex_shader structure resemble the r300_framgment_shader
in order to allow support for shader variants. There is no functional
change, but it will be used in the next commit to only output wpos
when needed from vertex shaders.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Right now we always write to a temp and than emit movs to both
POS and WPOS.
When the result is trivial, like:
MOV temp[0], in[0]
MOV output[0], temp[0]
MOV output[1], temp[0]
the dataflow analysis passes can take care of it, this is however
one of the last optimizations we need the pass for. Additionally
it fails even for some still trivial cases like:
MAD temp[2], const[3], temp[0].wwww, temp[1];
MOV output[0], temp[2];
MOV output[2], temp[2];
This patch will just duplicate the output instuction when there
is only a single output write.
The long-term plan would be to just trust NIR, do as little in
the backend as possible and optimize the remaining backend passes
to not need any additional cleanups.
total instructions in shared programs: 105862 -> 105427 (-0.41%)
instructions in affected programs: 20527 -> 20092 (-2.12%)
total temps in shared programs: 14029 -> 13991 (-0.27%)
temps in affected programs: 152 -> 114 (-25.00%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Instead just emit both outputs as soon as possible.
If the last write is inside a loop or a branch, emit it after
the ENDLOOP or ENDIF. This saves some temps and also allows us
to potentially benefit from R300_PVS_XYZW_VALID_INST as right
now the position output write is always penultimate with the
WPOS output being the last.
total temps in shared programs: 14101 -> 14029 (-0.51%)
temps in affected programs: 435 -> 363 (-16.55%)
helped: 72
HURT: 0
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Now that we're using 3DSTATE_BINDING_TABLE_POOL_ALLOC to set the base
address for the binding table pool separately from surface states, we
don't actually need to update surface state base address anymore.
Instead, we can just set STATE_BASE_ADDRESS once at context creation,
and never bother updating it again, saving some heavyweight flushes
and freeing us from the need for address offsetting trickery.
This patch was originally written by Jason Ekstrand, with fixes from
Lionel Landwerlin, but was targeting Icelake. Doing it there requires
additional changes (15:5 -> 18:8 binding table pointer formats) which
also involve some trade-offs, whereas the XeHP change is purely a win,
so we'll do it here first.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15616>
You should probably resize the sources array before accessing entries
that might be out of bounds. inst->resize_sources() always allocates
enough space for at least 3 sources, so this is really only an issue
when there are 4+ sources.
Fixes: a920979d4f ("intel/fs: Use split sends for surface writes on gen9+")
Fixes: 4f86a70599 ("intel/fs: Lower DW untyped r/w messages to LSC when available")
Fixes: d372abe397 ("intel/fs: Add surface OWORD BLOCK opcodes")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>
Thix fixes two bugs. First, we stop leaking in/out fences with
multisync. Because the in_syncs and out_syncs parameters to
set_multisync were arrays and not pointers to arrays, the caller's
in_syncs and out_syncs pointers never got set and remained NULL so
multisync_free() always sees to NULL pointers and does nothing, leaking
both arrays. Not sure how this isn't showing up in the dEQP leak check
tests.
Second, the struct drm_v3d_multi_sync was in the scope of the then
clause of the `if (device->pdevice->caps.multisync)` so it goes out of
scope before the ioctl. This is, effectively, a use-after-free and,
depending on stack allocation details, may result in the multisync
extension struct getting stompped before the ioctl.
Fixes: ff8586c345 ("v3dv: enable multiple semaphores on cl submission")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15512>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
Based on a similar fix for RADV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>
iris may rely on 3DSTATE_BINDING_TABLE_POOL_ALLOC's address being
inherited from a previous batch. So, we need to tell the decoder
about the address in case it sees binding tables to decode before
it encounters such a command in the batch.
We used to update surface_base, now we update bt_pool_base instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15625>
Use a unified si_create_copy_image_cs() to generate both the 1D and 2D
copy shaders; the shaders themselves remain separate.
Also:
- add a helper deref_ssa() for nir deref
- add a helper set_work_size() for setting workgroup and grid sizes
- pass si_context (instead of pipe_context) to the copy_image generator
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15268>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In queue_create = queue_create = &pCreateInfo->pQueueCreateInfos[0], queue_create is written twice with the same value.
Fixes: 8991e64641 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15604>
It seems that a660 has the same bug. Without the workaround there
are a lot of flakes with depth-stencil tests, e.g. in:
dEQP-VK.pipeline.extended_dynamic_state.*
dEQP-VK.renderpass.depth_stencil_write_conditions.*
dEQP-VK.pipeline.stencil.format.d24_unorm_s8_uint.states.*
Or guaranteed failures like of:
dEQP-VK.pipeline.render_to_image.core.2d.huge.width.r8g8b8a8_unorm_d32_sfloat_s8_uint
Enabling the workaround fixes all of them.
cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15548>
in scenarios like:
vec1 32 ssa_151 = deref_var &shadow_map (uniform sampler2D)
vec1 32 ssa_152 = deref_var &shadow_map (uniform sampler2D)
vec2 32 ssa_153 = vec2 ssa_151, ssa_152
vec1 32 ssa_154 = deref_var ¶m@4 (function_temp uvec2)
intrinsic store_deref (ssa_154, ssa_153) (wrmask=xy /*3*/, access=0)
vec1 32 ssa_160 = deref_var ¶m@4 (function_temp uvec2)
vec2 32 ssa_164 = intrinsic load_deref (ssa_160) (access=0)
vec1 32 ssa_167 = mov ssa_164.x
vec1 32 ssa_168 = deref_cast (texture2D *)ssa_167 (uniform texture2D) /* ptr_stride=0, align_mul=0, align_offset=0 */
vec1 32 ssa_169 = mov ssa_164.y
vec1 32 ssa_170 = deref_cast (sampler *)ssa_169 (uniform sampler) /* ptr_stride=0, align_mul=0, align_offset=0 */
vec1 32 ssa_172 = (float32)tex ssa_168 (texture_deref), ssa_170 (sampler_deref), ssa_171 (coord), ssa_166 (comparator)
the real variable is stored to a function_temp and then loaded back again,
which means it isn't a direct deref and lower_vri_instr_tex_deref() will
crash because the variable can't be found
BUT running only the passes needed to eliminate derefs will break other tests,
so just run the whole optimize loop again here to avoid such issues
for #5945
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15511>
Previously we were just doing a group barrier for both membar and barrier.
This sometimes worked out, because atomics and reads waited for ack
already, but writes were not waiting for ack. Use the need_wait_ack
pattern that scratch writes used, with a little refactoring for
reusability.
The refactor also incidentally fixes the atomics waiting for outstanding
acks to be > 1 instead of > 0.
Cc: mesa-stable
Fixes: #6028
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
Prevents several regressions when NIR-to-TGSI is enabled where it was
allocating arrays on top of each other.
Fixes vec3 fails on RV770,
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1 and 2
in general, and fixes another piglit but breaks two others. Still, this
seems to be a win.
Cc: mesa-stable
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
The two types of instructions get added to the same CF list, but not the
same instr list within the CF list. So, if you SSBO fetched your
texcoord, the emission of the SSBO fetch would come *after* the texcoord
fetch.
Avoids regressions when NIR-to-TGSI starts optimizing more.
Cc: mesa-stable
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
Avoids a regression when enabling shader precompilation, where the
precompile would happen with MSAA disabled (so no sample mask export) but
we'd never catch up to the shader being rendered with MSAA.
Doesn't fix any current testcases, though.
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14427>
This test was never supposed to be skipped, and the referenced commit
just exposed a bug in turnip fixed by the previous commit. It was
hanging due to a CTS bug making the submit take way too long, which will
be fixed once the CTS change lands.
Also, add it to the a630 skips.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15563>
In this case we should relax checks based on the format, since the user
will be responsible for them when creating an image view.
This gets dEQP-VK.image.sample_texture.*_bit_compressed_format_* not
skipping again after VK-GL-CTS 736eec57dc0c ("Fix checkSupport in
compressed texture sampling tests").
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15563>
We have to add WFM to pending bits when we are flushing into CP
for indirect draw to know when they should apply WFM workaround.
Fixes CTS tests:
dEQP-VK.draw.renderpass.indirect_draw.*_data_from_compute.indirect_draw_count*
Fixes: abf0ae014a
("tu: Properly handle waiting on an earlier pipeline stage")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15577>
We handle uniforms by copying them into the uniform stream to be
consumed with ldunif when they have a constant offset. Otherwise
we fallback to general TMU access, which has more latency.
However, just like we did for UBOs and read-only SSBOs, we can
also try to use the unifa mechanism to handle indirect accesses
in certain cases instead of the TMU fallback.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
Inline uniform blocks store their contents in pool memory rather
than a separate buffer, and are intended to provide a way in which
some platforms may provide more efficient access to the uniform
data, similar to push constants but with more flexible size
constraints.
We implement these in a similar way as push constants: for constant
access we copy the data in the uniform stream (using the new
QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from
which we need to copy and for indirect access we fallback to
regular UBO access.
Because at NIR level there is no distinction between inline and
regular UBOs and the compiler isn't aware of Vulkan descriptor
sets, we use the UBO index on UBO load intrinsics to identify
inline UBOs, just like we do for push constants. Particularly,
we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this,
however, unlike push constants, inline buffers are accessed
through descriptor sets, and therefore we need to make sure
they are located in the first slots of the UBO descriptor map.
This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS
slots of the map, with regular UBOs always coming after these
slots.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
shader_storage_blocks_write_access was computed using the buffer indices
in the program but ShaderStorageBlocksWriteAccess is used with the shader
buffers.
So if a VS had 3 SSBOs and a FS had 4, the mask for VS was 0x3 (correct) but
the mask for the FS was 0x78 instead of 0x15.
Fix this by substracting the index of the first shader buffer in the program's
buffers.
Fixes: 79127f8d5b ("glsl: set ShaderStorageBlocksWriteAccess in the nir linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6184
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15552>
In general, an atomic intrinsic may perform separate atomics for every
enabled SIMD channel, as each channel may operate on different memory.
However, an extremely common case is for all channels to access the same
memory location. In this case, we can simply perform a reduction/scan
across the subgroup, and perform one atomic for the whole subgroup,
rather than one per channel. For example, if an intrinsic says to take
the minimum value of the existing memory and the value in each channel,
we can do a thread-local minimum of all enabled channels, then do a
single atomic to take the minimum of that and the existing memory.
Our hardware doesn't optimize the case where multiple channels ask for
atomics on the same memory location; it assumes the compiler will do so.
nir_opt_uniform_atomics() uses divergence analysis to detect this case,
adds the necessary subgroup operations, and moves the atomic inside a
conditional that disables all but a single invocation. It even detects
cases where the shader code already performs this kind of optimization,
and avoids doing it a second time.
This may not be the optimal solution for us. In the backend, we could
detect this case and emit send(1) instructions with NoMask, rather than
generating if...send(16)...endif, and a lot of unnecessary ALU ops. But
it's simple to do, reuses the same path as ACO, and still provides most
of the benefit by cutting up to 16x atomics down to a single atomic,
which is more merciful to the memory bus.
Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP.
Improves performance of a customer-internal benchmark on XeHP at
3840x2160 and low settings by approximately 30%.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V. However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().
We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
- load_reloc_const is just an immediate constant load, it's convergent.
- nir_intrinsic_load_global_const_block_intel should be convergent,
it says the address must be uniform, and we uniformize the predicate
- Lowered image intrinsics: image_deref_load_param_intel just reads
information about an image, as long as the image variable is
convergent it should be too. load_raw_intel...if the address we
come up with is convergent, it ought to be as well.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
Instead of reusing the in/out slot mechanism, use a separated NIR
variable mode. This will make easier later to implement staging the
output in shared memory (and storing all at the end to the URB).
Note to get 64-bit type support we currently rely on the
brw_nir_lower_mem_access_bit_sizes() pass.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>
The only ways that function can return NULL are:
- the xcb connection was closed
- the window for the swapchain was destroyed
- the special event listener was unregistered from another thread
- malloc failure
All of these are permanent errors, the swapchain is no longer in a
usable state, so we should treat this as VK_ERROR_SURFACE_LOST_KHR.
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15558>
Logic is lifted from bi_layout.c, adapted to work on instructions (not
clauses) and for Valhall's off-by-one semantic which is annoyingly
different than Bifrost. (But the same as Midgard -- Bifrost was
annoyingly different than Midgard!)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
We already collect enums in the ISA description XML. Export them for use in the
compiler backend, particularly the packing code.
Usually we'd use Mako for templating. In this case, the script is so trivial a
template engine didn't seem worth it. (The obvious version with Mako was about
10 lines longer than just prints and f-strings used here.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
If there are modifiers only used by pseudo instructions, not the real
instructions, bi_packer can get out-of-sync with bi_opcodes, causing
hard-to-debug issues. Do the stupid-simple thing to ensure this doesn't happen.
This may be a temporary issue, depending whether ISA.xml and the IR get split
out for better Valhall support.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
NIR load_consts/inputs tend to happen together at the top of the program.
In the TGSI backend the loads got emitted at use time, while the NIR
backend was emitting the loads at load intrinsic time. By sinking the
intrinsics, we can greatly reduce register pressure.
nv92 NIR results:
total local in shared programs: 2024 -> 2020 (-0.20%)
local in affected programs: 4 -> 0
total gpr in shared programs: 790424 -> 735455 (-6.95%)
gpr in affected programs: 215968 -> 160999 (-25.45%)
total instructions in shared programs: 6058339 -> 6051208 (-0.12%)
instructions in affected programs: 410795 -> 403664 (-1.74%)
total bytes in shared programs: 41820104 -> 41660304 (-0.38%)
bytes in affected programs: 7147296 -> 6987496 (-2.24%)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15542>
Fix 'error C4576: a parenthesized type followed by an initializer
list is a non-standard explicit type conversion syntax' errors by
declaring an actual variable and returning it in
vk_image_view_subresource_range().
All those MSVC/c++ related-constraints are quite annoying to be honest,
but it looks like the D3D12 headers have been updated to plain C
recently, which will allow us to write the driver in C, and hopefully
get all this sort of issues behind us.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14766>
Implements vkCreateSampler and vkDestroySampler APIs.
Also fixes maxSamplerLodBias value from 15.0f to 16.0f
as it's the max supported by our hardware.
Also changing maxSamplerAnisotropy from 16.0f to 1.0f to
temporarily disable anisotropy as we are missing software
support for it.
Signed-off-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15557>
the previous method of using affected_states to trigger constant updates
was ineffectual in the scenario where a ubo pointsize was needed on
the first time a non-precompiled shader was used after being the not-last
vertex stage:
* have vs+gs -> gs precompiles with pointsize lowering -> gs constants get updated
* remove gs -> vs was precompiled without pointsize lowering -> vs constants broken
now just do a quick check as in st_atom_shader.c and set the flag manually to
ensure the update is done correctly every time
cc: mesa-stable
fixes#6207
fixes (radv):
KHR-GL46.texture_cube_map_array.image_op_fragment_sh
KHR-GL46.texture_cube_map_array.sampling
KHR-GL46.texture_cube_map_array.texture_size_fragment_sh
KHR-GL46.constant_expressions*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15570>
Freedreno will check if the virtgpu supports the pass-thru context, and
if not will bail, falling back to virgl.
TODO this requires that virgl is also enabled in the mesa build, even if
it is not needed.. maybe there is a better way to handle this?
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Add a new backend to enable using native driver in a VM guest, via a new
virtgpu context type which (indirectly) makes host kernel interface
available in guest and handles the details of mapping buffers to guest,
etc.
Note that fence-fd's are currently a bit awkward, in that they get
signaled by the guest kernel driver (drm/virtio) once virglrenderer in
the host has processed the execbuf, not when host kernel has signaled
the submit fence. For passing buffers to the host (virtio-wl) the egl
context in virglrenderer is used to create a fence on the host side.
But use of out-fence-fd's in guest could have slightly unexpected
results. For this reason we limit all submitqueues to default priority
(so they cannot be preepmted by host egl context). AFAICT virgl and
venus have a similar problem, which will eventually be solveable once we
have RESOURCE_CREATE_SYNC.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
We are going to want basically the identical thing, other than
flush_submit_list, for virtio backend. Now that we've moved various
other dependencies into the base classes, extract out an abstract base
class for submit/ringbuffer.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Add a hint for buffers that we won't need to mmap. With the virtio
backend, virglrenderer needs to create a dmabuf fd for mapping into
the host, which we want to avoid when possible.
Low hanging fruit is to use this hint for anything tiled/ubwc. There
are probably more bo's that can be flagged as such.
TODO add fd_bo_upload() for memcpy to bo.. this would be useful for
uploads, for example, shaders which we just write once and never touch
again.. for virtio this could be implemented with a TRANSFER_TO_HOST
ioctl.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Decoupling handle and fd_bo creation simplifies things for "normal" drm
drivers, avoiding duplication for the create vs import paths. But this
is awkward for the virtio backend when wants to do multiple things in
the same guest<->host round trip.
So instead, split the paths in the interface backend and move the code
sharing for the two different paths into the msm backend itself.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Based on profiling, these 10us sleeps are behaving closer to ~60us
and causing higher-than-necessary CPU overhead. 120us seems like a
good balance between latency management and overhead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15565>
This transitions the resource to TRANSFER_SRC_OPTIMAL, but
never updates the res->layout field, so subsequent transitions
are wrong and throw validation errors.
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x6660f60, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x6660f60[] expects VkImage 0x2f000000002f[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15545>
This patch adds the PowerVR driver to the following shared builds:
* debian-arm64
* debian-clang
* debian-release
* debian-vulkan
* fedora-release
It also adds the associated "tools" to these builds:
* debian-arm64-build-test
* debian-release (already uses -Dtools=all)
* fedora-release
And removes them from these builds by expanding -Dtools=all:
* debian-gallium
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15284>
unreachable() doesn't lead to executing the code that follows it,
neither in debug nor release builds. So falling through doesn't make any
sense.
This fixes a compile-error on clang.
Let's move the default-block to the end to make it clearer that there's
no intended fallthrough.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15531>
unreachable() doesn't lead to executing the code that follows it,
neither in debug nor release builds. So falling through doesn't make any
sense.
This fixes a compile-error on clang.
Let's move the default-block to the end to make it clearer that there's
no intended fallthrough.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15531>
First, there is a problem if you do the following
vkCmdSetColorWriteEnableEXT(attachmentCount = 8)
vkCmdBindPipeline(GFX, with attachmentCount = 4)
vkCmdDraw()
vkCmdBindPipeline(GFX, with attachmentCount = 8)
vkCmdDraw()
Because in the dynamic state emission code we rely on the first
pipeline to figure the number of BLEND_STATE entries to prepare. This
is wrong, we should fill all entries so that the dynamic state works
regardless of the number of attachments in the pipeline. With regard
to the dynamic values, we should retain enable/disable values that do
not concern the current pipeline.
Second, 3DSTATE_WM was not always reemitted when the pipeline changed.
But since it is not emitted as part of the pipeline, this results in
inconsistent state being programmed.
Third, we end up disabling the fragment stage completely in some
cases. And that is programming the pipeline inconsistently and
triggering a hang on TGL.
v2: Fix comment (Tapani)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b15bfe92f7 ("anv: implement VK_EXT_color_write_enable")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
The problem is that we missed looking at pipeline changes. Pipelines
hold bits of dynamic states and when it changes we might need to
reemit a packet.
v2: fix comment (Tapani)
Add missing anv_cmd_buffer_needs_dynamic_state() use (Tapani)
Cc: mesa-stable
Fixes: 505d176a8e ("anv: disable baked in pipeline bits from dynamic emission path")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
error log:
```
[2/43] Compiling C object src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj
FAILED: src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj
"cc" "-Isrc/gallium/drivers/llvmpipe/libllvmpipe.a.p" "-Isrc/gallium/drivers/llvmpipe" "-I../../src/gallium/drivers/llvmpipe" "-I../../src/gallium/include" "-Isrc/gallium/auxiliary" "-I../../src/gallium/auxiliary" "-Iinclude" "-I../../include" "-Isrc" "-I../../src" "-Isrc/compiler/nir" "-I../../src/compiler/nir" "-Isrc/util" "-I../../src/util" "-IC:/CI-Tools/msys64/clang64/include" "-fvisibility=hidden" "-fcolor-diagnostics" "-Wall" "-Winvalid-pch" "-std=c11" "-O0" "-g" "-DPACKAGE_VERSION=\"22.0.0-devel\"" "-DPACKAGE_BUGREPORT=\"https://gitlab.freedesktop.org/mesa/mesa/-/issues\"" "-DHAVE_WINDOWS_PLATFORM" "-DHAVE_SURFACELESS_PLATFORM" "-DUSE_ELF_TLS" "-DUSE_TLS_BEHIND_FUNCTIONS" "-DENABLE_ST_OMX_BELLAGIO=0" "-DENABLE_ST_OMX_TIZONIA=0" "-DEGL_NO_X11" "-DDEBUG" "-DHAVE___BUILTIN_BSWAP32" "-DHAVE___BUILTIN_BSWAP64" "-DHAVE___BUILTIN_CLZ" "-DHAVE___BUILTIN_CLZLL" "-DHAVE___BUILTIN_CTZ" "-DHAVE___BUILTIN_EXPECT" "-DHAVE___BUILTIN_FFS" "-DHAVE___BUILTIN_FFSLL" "-DHAVE___BUILTIN_POPCOUNT" "-DHAVE___BUILTIN_POPCOUNTLL" "-DHAVE___BUILTIN_UNREACHABLE" "-DHAVE___BUILTIN_TYPES_COMPATIBLE_P" "-DHAVE_FUNC_ATTRIBUTE_CONST" "-DHAVE_FUNC_ATTRIBUTE_FLATTEN" "-DHAVE_FUNC_ATTRIBUTE_MALLOC" "-DHAVE_FUNC_ATTRIBUTE_PURE" "-DHAVE_FUNC_ATTRIBUTE_UNUSED" "-DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT" "-DHAVE_FUNC_ATTRIBUTE_WEAK" "-DHAVE_FUNC_ATTRIBUTE_FORMAT" "-DHAVE_FUNC_ATTRIBUTE_PACKED" "-DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL" "-DHAVE_FUNC_ATTRIBUTE_ALIAS" "-DHAVE_FUNC_ATTRIBUTE_NORETURN" "-DHAVE_FUNC_ATTRIBUTE_VISIBILITY" "-DHAVE_UINT128" "-D_WINDOWS" "-D_WIN32_WINNT=0x0A00" "-DWINVER=0x0A00" "-DPIPE_SUBSYSTEM_WINDOWS_USER" "-D_USE_MATH_DEFINES" "-DUSE_SSE41" "-DUSE_GCC_ATOMIC_BUILTINS" "-DHAS_SCHED_H" "-DHAVE_CET_H" "-DHAVE_STRTOF" "-DHAVE_STRTOK_R" "-DHAVE_QSORT_S" "-DHAVE_ZLIB" "-DHAVE_ZSTD" "-DHAVE_COMPRESSION" "-DLLVM_AVAILABLE" "-DMESA_LLVM_VERSION_STRING=\"13.0.0\"" "-DLLVM_IS_SHARED=1" "-DDRAW_LLVM_AVAILABLE" "-DMESA_EXECMEM" "-DVK_USE_PLATFORM_WIN32_KHR" "-Werror=implicit-function-declaration" "-Werror=missing-prototypes" "-Werror=return-type" "-Werror=empty-body" "-Werror=incompatible-pointer-types" "-Werror=int-conversion" "-Wimplicit-fallthrough" "-Wno-missing-field-initializers" "-fno-math-errno" "-fno-trapping-math" "-Qunused-arguments" "-fno-common" "-Wno-microsoft-enum-value" "-Werror=format" "-Wformat-security" "-Werror=thread-safety" "-ffunction-sections" "-fdata-sections" "-pthread" "-D_FILE_OFFSET_BITS=64" "-D__STDC_CONSTANT_MACROS" "-D__STDC_FORMAT_MACROS" "-D__STDC_LIMIT_MACROS" "-Werror=pointer-arith" "-Werror=gnu-empty-initializer" -MD -MQ src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj -MF "src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj.d" -o src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj "-c" ../../src/gallium/drivers/llvmpipe/lp_setup_tri.c
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup_tri.c:37:
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup_context.h:38:
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup.h:31:
In file included from ../../src/gallium/drivers/llvmpipe/lp_jit.h:40:
In file included from ../../src/gallium/auxiliary/gallivm/lp_bld_limits.h:37:
In file included from ../../src/util/u_cpu_detect.h:41:
In file included from ../../src/util/u_thread.h:35:
In file included from ../../include/c11/threads.h:64:
In file included from ../../include/c11/threads_win32.h:58:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/windows.h:69:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/windef.h:9:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/minwindef.h:163:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/winnt.h:1555:
In file included from C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/x86intrin.h:15:
In file included from C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/immintrin.h:37:
C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/tmmintrin.h:582:1: error: redefinition of '_mm_shuffle_epi8'
_mm_shuffle_epi8(__m128i __a, __m128i __b)
^
../../src/gallium/auxiliary/util/u_sse.h:159:1: note: previous definition is here
_mm_shuffle_epi8(__m128i a, __m128i mask)
^
1 error generated.
```
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14077>
Because of incomplete codepaths further down in this function, we end up
trying to use this variable without initialization. Let's initialize it
to zero, just to keep the compiler happy.
This avoids a compile-time warning, which would get treated as an error
on CI.
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15525>
There's no guarantee that we don't have more than 128 PHI values either,
so let's stop asuming so.
We do this by changing dxil_phi_set_incoming to dxil_phi_add_incoming,
which lets us add more incoming phi-values to the current one instead of
setting a new set of them.
This also lets us reduce these stack-arrays a bit, down to something
much more reasonable.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15519>
As this function doesn't check for any control-flow
dependence, it only returns true for statically
(or globally) uniform values.
The same holds true for is_binding_dynamically_uniform()
in nir_opt_gcm().
Rename to better reflect that property.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14994>
the fastpath here can only be taken if there is exactly one stream active,
as this will otherwise break nonzero stream primitives generated queries
in truth, this num_vertex_streams thing should be a bitmask so that the case
of num_streams=1,stream_id!=0 could also be fastpathed, but the complexity
probably isn't worth it given the infrequency of use
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15506>
this can't be determined from pipe_shader_state::stream_output,
as this only contains xfb info, which is not the same as the vertex
stream info, and may break primitives generated queries
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15506>
When using cubemaps and the u/v values are 0, then this point
can be arrived at with rho = nan, and if rho is NaN, then lod
calculations end up at the max lod, whereas the spec suggests
they should end up at the most negative lod.
This fixes
dEQP-VK.glsl.texture_functions.query.texturequerylod.samplercube_float_zero_uv_width_fragment
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15335>
Otherwise, if batch->scissor_culls_everything is set for a single draw,
every draw after it in the batch will be skipped because the new
scissor/viewport state will never be processed. Process scissor state
early in draw_vbo to fix this interaction.
We do need to be careful: setting something on the batch can only happen when
we've decided on a batch. If we have to select a fresh batch due to too many
draws, that must happen first. This is pretty clear in the code but worth noting
for the diff.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Reviewed-by: Icecream95 <ixn@disroot.org>
Fixes: 79356b2e ("panfrost: Skip rasterizer discard draws without side effects")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5839
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6136
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15365>
There is an out-of-sync approach regarding the location of the results
folder: some scripts refer to it via $CI_PROJECT_DIR/results, while
others just assume it is located in the current working directory.
Usually $PWD points to $CI_PROJECT_DIR, but in some cases this is not
the case, hence let's ensure the 'results' folder can always be found
in the current working directory.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
In order to run a VM (e.g. crosvm) through HWCI_TEST_SCRIPT on a LAVA
target, it's necessary to download a kernel image on the target device.
When HWCI_KVM is set to 'true', we can safely assume HWCI_TEST_SCRIPT
contains a command or the path to a script which expects the kernel
image to be available under /lava-files/${KERNEL_IMAGE_NAME}.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
Having CI_JOB_JWT_FILE pointing to path on /tmp makes it difficult to be
managed in VM contexts (e.g. crosvm) because the /tmp mountpoint usually
refers to a local filesystem rather than the host one.
Additionally, there is another restriction for 'piglit-traces-test' job
to have the file available on the root filesystem.
To avoid amending all the jobs that might be affected, let's just set
the variable to a fixed path '/minio_jwt'. Note we also need to do this
in the 'variables:' section instead of 'before_script:' in order to be
able to reference it in CI job variables, e.g. PIGLIT_REPLAY_EXTRA_ARGS
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
Use a dedicated DEQP_RUNNER_CARGO_ARGS variable instead of
EXTRA_CARGO_ARGS in build-deqp-runner.sh to pass custom arguments when
invoking 'cargo install'.
This is to avoid modifications of EXTRA_CARGO_ARGS which might have
negative side-effects in the scripts which rely on this variable and
import build-deqp-runner.sh instead of executing it in a subshell.
Fixes: 8729c6e981 ("ci: Support building and installing deqp-runner from source")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
R8G8 have a different block width/height and height alignment from other
formats that would normally be compatible (like R16), and so if we are
trying to, for example, sample R16 as R8G8 we need to demote to linear.
Follows the fix in Freedreno: b97e3bb2e1
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15465>
Replaced the existing internal TGSI compute shader, which clears
a read-modify-write buffer, with its NIR equivalent. The disassembly
shader generated by the new NIR variant is identical to the previous
implementation. These changes remove the additional conversion step
from TGSI to NIR for the shader at runtime. Tested on a Navi 23 card.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15356>
When CCS compression first came out on Skylake, we referred to it as
"renderbuffer compression", or RBC for short. However, that name has
long since fallen out of favor, and we refer to it as CCS nearly
everywhere.
This patch renames DEBUG_NO_RBC to DEBUG_NO_CCS inside the codebase
for clarity, and adds INTEL_DEBUG=noccs. The legacy INTEL_DEBUG=norbc
name continues to work, because it's one line of code and having both
names makes our lives easier in the interim.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15447>
by disabling color and depth write, the side effects of force-disabling discard can
be mitigated
fixes:
KHR-GL46.tessellation_shader.single.isolines_tessellation
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through
KHR-GL46.tessellation_shader.tessellation_invariance.invariance_rule3
KHR-GL46.tessellation_shader.tessellation_shader_point_mode.points_verification
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.degenerate_case
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.inner_tessellation_level_rounding
KHR-GL46.tessellation_shader.tessellation_shader_tessellation.gl_InvocationID_PatchVerticesIn_PrimitiveID
KHR-GL46.tessellation_shader.vertex.vertex_spacing
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15392>
From docs:
The PVS Instruction which uses the Input Vertex Memory for the last
time. This value is used to free up the Input Vertex Slots ASAP.
This field must be set to a valid instruction.
Right now it is set to the last instruction. When the last read is
inside a loop, set it on the outhermost ENDLOOP. This could in theory
help performance, but none of my usual benchmarks including GLmark,
Unigine Sanctuary or Lightsmark show any measurable performance difference.
Suggested in: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6045
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15252>
this is a lot of machinery to propagate the block sizes down from the
descriptor layout to the pipeline layout to the rendering_state
block data is appended to ubo0 immediately following push constant
data (if it exists), which requires that a new buffer be created and
filled any time either type of data changes
shader handling is done by propagating the offset of each block relative
to the start of its descriptor set, then accumulating the sizes of
every uniform block in each preceding descriptor set into the offset,
then adding on the push constant size, and finally adding that on to
the existing load_ubo deref offset
update-after-bind is no longer an issue since each instance of pc+block
data is its own immutable buffer that can never be modified
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15457>
now instead of having static per-stage buffer regions and letting llvmpipe
do the upload, lavapipe creates a new pipe_resource and chucks it away
with take_ownership=true to allow it to be destroyed once it's no longer
in use
this also alters ubo0 mechanics such that the buffer is now sized exactly to
the size of the push constants in the pipeline and push constants are only
updated when the appropriate shader stage is flagged
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15457>
the converted number of vertices is 2x that of the "real" number of vertices,
so divide the results by 2
this works nicely since zink performs cpu readback for all queries, and shader
aggregation should be simple in the future as well
fixes:
KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15401>
Without this, 128-bit or 64-bit register spills can generate unaligned
loads and stores if tlsBase is unaligned.
Fixes glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd
with NV50_PROG_USE_NIR=1 on kepler
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13693>
ir3_compile_shader_nir() should set local_size[] and local_size_variable
fields not only for compute shaders, but for the OpenCL kernels too.
v2: use gl_shader_stage_is_compute() instead of explicit comparison with
MESA_SHADER_[COMPUTE,KERNEL].
Signed-off-by: Andrey Konovalov <andrey.konovalov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14863>
bare-metal can reboot boards into an existing rootfs on intermittent
device failure, but traces-db doesn't do any sanity-checking of the local
downloads of traces and would proceed to just trying to replay them.
Nuke any existing trace db so that it re-downloads every time, same as
LAVA or docker container tests do.
Fixes: #5585
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15440>
We get a lot of useful coverage from running graphicsfuzz with spilling
enabled, but it's also pretty slow and can cause intermittent hangcheck
failures. I thought I'd categorized them when merging !14839 (device loss
on reset), but it looks like not all of them and we're now more likely to
have flakes take out the whole test run when a single flake makes the rest
of the caselist a flake.
This is a little unfortunate in that it means our test environment is not
the same as a stock system you would want to run deqp on to submit
conformance, but I think it's an improvement in the test maintenance work
vs needing to fix things up later.
We have some other tests besides turnip that can trigger hangchecks which
we might also like this increase for (some disabled traces, for example).
However, freedreno GL has a 5-second timeout waiting for idle when
mapping, and a couple of 2-second timeouts in a row can result in spurious
failures in other tests!
Fixes: #6163
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15435>
We use bi_dontcare() to specify any encoding where we don't care about
the value, with a preference for power-efficient encodings. On Bifrost,
a (possibly nonexistant) FAU read is the best encoding. On Valhall, that
encoding doesn't exist so just use a zero. That should be good enough in
practice.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15461>
When emitting code during or after register allocation, we need to be able to
emit constants without running the constant->{LUT, move, uniform} pass running
after. In particular, we need to access the constant 0 to implement spill code.
Add a Valhall-specific zero for this purpose.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15461>
Instead of one MANUAL_COMMANDS, we now have two deny-lists:
MANUAL_COMMANDS and NO_ENQUEUE_COMMANDS. The former is for things which
have a manually typed implementation in vk_cmd_enqueue.c and the later
is for things we want to ignore entirely. This lets us auto-generate
vk_cmd_enqueue_unless_primary_Cmd* entrypoints for the manually typed
vk_cmd_enqueue_Cmd* entrypoints.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14406>
This fixes bugs with lavapipe's hand-rolled pCount handling. The driver
is supposed to set *pCount to the number of queues actually written in
the case where it's initialized to a value that's too large. It's also
supposed to handle *pCount being too small.
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15459>
The Vulkan spec was recently clerified to say that transitions only
happen to the bound layers:
"Automatic layout transitions apply to the entire image subresource
attached to the framebuffer. If multiview is not enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the number of layers specified by
VkFramebufferCreateInfo::layers. If multiview is enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the layers corresponding to views which are
used by some subpass in the render pass, even if that subpass does
not reference the given attachment."
This is in the context of render passes but it applies to dynamic
rendering because the implicit layout transition stuff is a Mesa pseudo-
extension and inherits those rules.
For clears, the Vulkan spec says:
"renderArea is the render area that is affected by the render pass
instance. The effects of attachment load, store and multisample
resolve operations are restricted to the pixels whose x and y
coordinates fall within the render area on all attachments. The
render area extends to all layers of framebuffer."
Again, this is in the context of render passes but the same principals
apply to dynamic rendering where the layerCount and renderArea are
specified as part of the vkCmdBeginRendering() call.
Fixes: 3501a3f9ed ("anv: Convert to 100% dynamic rendering")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15441>
Any thread we create may end up creating/submitting at least a
noop job, which is a shared object. Before multisync, this was
an issue only for the creation of the job itself, but with
multisync we can also modify parameters of the noop job
every time it is used (for signaling and serialization
configuration).
This change adds a noop mutex that all threads (main, wait and
master) take before submitting a noop job to ensure concurrent
access is not an issue.
Fixes flakyness observed with multisync with the following test:
dEQP-VK.api.command_buffers.secondary_execute_twice
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
If a CPU job comes first in a command buffer with a semaphore wait operation
we need to wait on the CPU for the semaphore to be signaled before we process
the job.
We have been doing this with a WaitForIdle operation, but that only works
if the semaphore has been submitted for signaling from the same instance
of the driver. If we have an imported payload from another instance in our
semaphore however, waitForIdle may return too early since the submission
to signal the semaphore may have been submitted by a different instance
of the driver as well, and our wait for idle checks only know about this
instance submissions.
To fix this, we always submit a noop job from our instance that waits on
the semaphores on the GPU and follow up with WaitForIdle to wait for that
to complete.
Fixes test failures and/or assert crashes in:
dEQP-VK.synchronization.cross_instance.*
(when enabling support for semaphore imports)
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
When we have a wait thread we can't ensure that the last job in the last
command buffer will be the one to signal semaphores because in this case
there is no gurantee that jobs from command buffers in the batch will be
submitted to the GPU in order, as those put in a wait thread will be
submitted later when the event wait operation is completed.
Instead, we need to wait for all outstanding wait threads to complete
and only then we should signal any semaphores or fences.
This also fixes a bug where the wait for events was the last job in
the command buffer. In this case, once the event wait is completed
we have no additional jobs to submit and thus would never try to
signal semaphores or fences.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
This is preparatory work to expose support for importing semaphores, which
was waiting on kernel multisync support.
When we implemented user-space multisync support we didn't handle
temporary fence/semaphore payload imports at all, so we fix that here.
Also, we add a has_temp boolean flag to identify the case where we have
a temporary payload in a fence/sempahore instead of just checking if
temp_sync is not 0. This is necessary to support semaphore imports
(for which we are not exposing support yet) because these need to drop
the temporary payload when they are used as wait semaphores in a submit,
but we can't destroy the underlying temp_sync at that point because it
needs to survive at least until the submit is finished, so instead
we use a flag to tell if we have an active temporary payload or not,
and we simply destroy any temp_sync on a semaphore destroy or any new
import on the same semaphore. We only strictly need this flag for
semaphores because fences drop the temporary payload when they are
reset, which happens in the CPU and can only be done if the GPU is not
using the fence, but we add the same flag for the fence for consistency.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
This path uses a shader blit to implement the copy which is only
supported for tiled images (except 1D). While blit_shader() already
checks for this, this path does a lot of heavy lifting to prepare for
the blit_shader call so we rather avoid that if possible when we know
blit_shader won't be able to implement the blit.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
We had some code that considered the possibility that the destination
might be linear when configuring TFU jobs, but we never actually allow
for this to happen since we avoid hitting these paths in that case, as
the TFU always produces UIF results. Instead, add an assert when
producing the TFU packet to ensure we are expecting a UIF result.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
this fixes desync+crash when:
1. usage is added for bs A
2. tracking is added for bs B
3. tracking is removed for bs B
4. context is destroyed
5. usage A is now dangling and will crash if accessed
as seen in glmark2
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15429>
In addition to use the helper, we also remove some of the lowering we
had at preprocess_nir, as they are called now by the helper.
As we are here we also move the call to nir_lower_sysvals_to_varyings,
that for some reason we were calling it before preprocess_nir.
It is worth to note that with this change we lose the ability to debug
the NIR just after spirv_to_nir using V3D_DEBUG, as now this is done
on vk_spirv_to_nir, and as mentioned that includes several lowerings
now. The workaround to that is to use NIR_DEBUG.
We also needed to change how to check the entrypoint on the broadcom
compiler, checking just if it is an entrypoint, instead of assuming
that the name will be "main".
v2: tweak comment, squash v3dv and compiler change (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15449>
This fixes the test_resolve_non_issued_query_data vkd3d-proton test.
This change is required on TGL+ (maybe ICL?) because on all platforms
3D pipeline writes are not coherent with CS. On previous platform we
fixed this by flushing the render cache to make sure data is visble to
CS before it writes to memory. But on more recently platforms,
flushing the render cache leaves the data in the tile cache which is
still not coherent with CS, so we need to flush that one too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14552>
It stores the compiled shaders on disk so further executions will load
them from disk instead of compiling, improving the overall performance.
This is noticeable in games where they suddenly get stuck for a while
because they start to compile one or more shaders.
v2:
- Remove comment (Alejandro)
- Use malloc() + helper to simplify code (Alejandro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15380>
This has been implemented for a while but we could not expose it on
Vulkan 1.0 because the extension declares a dependency on
VK_KHR_sampler_ycbcr_conversion, which we don't implement, and
CTS would complain.
On Vulkan 1.1 however, VK_KHR_sampler_ycbcr_conversion was promoted
to core as an optional feature, and this is enough for the the
dependency to be satisfied, even if the feature is not supported,
meaning that we can now expose the extension.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15426>
Most of the time, this doesn't matter. On the versions with _sat, if
the destination type is incorrect, the clamping will not happen
correctly.
Fixes the following CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_uu_v4i8_out32
v2: Update anv-tgl-fails.txt.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15417>
The previous alignment of 64 bytes, which we got from the blob,
indicates that single-texel alignment isn't supported. So just do a
trivial no-op implementation that returns the same alignment as before.
This matches what newer blobs that expose this extension do.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15427>
Now all special FAU handling is unified, which makes both assembler and
disassembler considerably nicer. This adds some more special FAU indices from
page 0 that were previously missing, allowing them to be assembled and
disasembled.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15364>
FAU pages do not need to be specified explicitly in the assembly. Rather, they
should be inferred by the assembler by the instructions used. Rewrite the code
handling this in alignment with new information about the hardware.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15364>
This reuses the same UBO analysis to do the pushing in the shader
preamble via the ldc.k instruction instead of in the driver via
CP_LOAD_STATE6. The const_data UBO is exempted as it uses a different
codepath that isn't as critical.
Don't do this on gallium because there are some regressions. Aztec Ruins
in particular regresses a bit, and nothing I've benchmarked benefits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Since the NIR pass runs very late, it needs to be aware of preambles,
and when creating the instruction we need to move it to the start block
so that RA doesn't overwrite it in the preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Add in the type, even though it turns out to not be that useful. Add
in support for assembling it. Add some notes based on computerator
experiments. And add support for the indirect a1.x mode that's needed
for storing c64.x and later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
These will be used to implement the ir3-specific shader preamble
lowering in NIR. shps is conceptually similar to getone (although it
technically can't be duplicated) and shpe is similar to other barriers,
since it has to happen after any stores to the constant file in the
preamble. Add NIR intrinsics and plumbs them through ir3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Previously we included the reserved user consts (for Vulkan push
constants) as part of the pushed UBO contents, but that led to a problem
because when calculating the worst-case space for UBOs we didn't factor
in the reserved user consts. We'll have the same problem when doing the
same thing in the preamble optimization pass. Stop including the
reserved size in ubo_state::size, and have ir3_setup_consts() add it in
instead, so we won't forget to add it anywhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
This pass tries to move computations that are uniform for the entire
draw to the preamble. There's also an API for backends to insert their
own instructions into the preamble, for porting existing UBO pushing
passes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
These are functions that run before the entrypoint at least once per
draw and write their results via store_preamble, and then are loaded in
the rest of the shader via load_preamble.
We will add users in the following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
If you hadn't already called wsi_GetPhysicalDeviceDisplayProperties2KHR or
wsi_GetDrmDisplayEXT before calling
GetPhysicalDeviceDisplayPlaneProperties2KHR, then the connectors list
wouldn't be populated and you'd get no plane properties. Fixes failure of
dEQP-VK.wsi.display.get_display_plane_capabilities when run on its own.
Fixes: #4575
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15353>
ANGLE-on-venus-on-turnip and zink-on-turnip want real data here for EGL's
reset tests.
This required moving the remaining GPU-reset-causing tests from flakes or
xfails to skips. Otherwise, the rest of the caselist associated with them
ends up being marked as fails as well. The alternative would be to put
these tests in their own test groups with tests_per_group = 1, but that
didn't seem worth the effort. Or, we could finally do something with
https://gitlab.freedesktop.org/anholt/deqp-runner/-/issues/14.
Fixes: #5955
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14839>
It's tricky to always get the render area to the viewport code. In
particular, it's not provided to secondary command buffers as part of
the inheritance info so we have to bend over backwards and look for a
framebuffer. With VK_KHR_dynamic_rendering, there is no framebuffer and
this approach won't work and we'll need something better if we want
competent guardbands in secondary command buffers.
The good news is that any client that's sloppily rendering and trusting
the clipper to keep things inside the render area will set a scissor and
that's something they have to set inside the secondary. We can dig
through the scissor state and also include the corresponding scissor (if
any) and use that for our render area. This should give us the same
secondary command buffer performance with VK_KHR_dynamic_rendering.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This is about the only "small" change we can make in the process of
converting from render-pass-based to dynamic-rendering-based. Make
everything in pipeline creation work in terms of dynamic rendering and
create the dynamic rendering structs from the render pass as-needed.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This was ill-conceived at best. Yes, it checks for a few error
conditions but it doesn't check much and what checks it has are very far
away from the code that relies on those invariants. If we care about
these invariants, we should add asserts near the code that makes those
assumptions rather than pretending to be the validation layers.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
These helpers are used by vkCreateGraphicsPipelines to get the
VkPipelineRenderingCreateInfo and in vkCmdBeginCommandBuffer to get the
VkCommandBufferInheritanceRenderingInfo. This is required because the
Vulkan runtime code can't yet hook and modify calls made to driver-
provided functions. Instead, we just provide a helper to be used in leu
of vk_find_struct_const(). The structs themselves are stored in the
render pass so we can pass back a pointer and there's no need to
construct one on the stack or stuff it in the pipeline.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This implements vkCmdBeginRenderPass, vkCmdEndRenderPass, and
vkCmdNextSubpass in terms of the new vkCmdBegin/EndRendering included in
VK_KHR_dynamic_rendering and Vulkan 1.3. All subpass dependencies and
implicit layout transitions are turned into actual barriers. It does
require VK_KHR_synchronization2 because it always uses the 64-bit
version of the pipeline stage and access bitfields.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
this draw mode in particular requires driver-specific conversions
for queries (e.g., number of vertices), so pass that info through
the only limitation is that it doesn't work for dlists,
but I have yet to see a real use case of a statistics query being used with dlists
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15326>
If the base register is killed, it may be reused as the destination of a
ldp. In that case we should just skip resetting it afterwards.
Fixes regressions in dEQP-VK.ssbo.layout.random.scalar.38 later.
Fixes: 9912c61362 ("ir3/spill: Support larger spill slot offset")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15288>
We need to prepare for storage buffers having different sizes from
uniform buffers. This switches dynamic_offset_offset to have units of
bytes, the same as offset, and as a nice bonus we can more easily
combine the dynamic and non-dynamic paths in various different places.
This also entails rewriting the code that patches dynamic descriptors,
since we can no longer assume a linear mapping between indices in
dynamicOffsets and descriptor locations which the previous approach
heavily relied on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15288>
From the Vulkan spec:
"drmFormatModifierTilingFeatures is a bitmask of
VkFormatFeatureFlagBits that are supported by any image created
with format and drmFormatModifier. The returned
drmFormatModifierTilingFeatures must contain at least one bit."
This fixes recent CTS dEQP-VK.drm_format_modifiers.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15383>
We're nowhere close to even having Vulkan 1.0 working yet, there's no
reason to get too excited about 1.1. It just means piles more test
crashes for features we're claiming to support but don't. If we want to
enable more tests, we can turn on the extensions for those features once
we actually have them working.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15334>
cmd_buffer->pool should never be NULL. Even if AllocateCommandBuffers()
fails, the successfully created cmdbuffers would have it set correctly.
From the Vulkan spec:
"VUID-vkFreeCommandBuffers-pCommandBuffers-parent
Each element of pCommandBuffers that is a valid handle must have
been created, allocated, or retrieved from commandPool."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15164>
For weird reasons, the COPY_DATA packet doesn't seem to copy anything
while on the compute queue. Instead, use PKT3_LOAD_SH_REG_INDEX which
seems to work as expected.
Note that LOAD_SH_REG_INDEX on the compute queue is only supported by
the CP on GFX10.3, so we need to implement a different solution (load
from the indirect BO in the shader) for older generations.
This should fix the Control RT GPU hang.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15053>
Handle arrays generically by using the last component of the coordinate source
as the array index. That works for both 2D arrays and cube arrays, fixing cube
arrays. Cube arrays were already handled correctly in core Panfrost code.
This code path is not tested in dEQP-GLES31 without exposing OES_cube_map_array,
which depends on OES_geometry_shader, which we don't have. Yet we do expose
PIPE_CAP_CUBE_ARRAY, so ARB_cube_map_array is exposed.
Disabling PIPE_CAP_CUBE_ARRAY would be an easy band-aid fix, but it's easy
enough to handle correctly.
dEQP-GLES31 passes with a hack enabling OES_cube_map_array [without geometry
shaders].
Also fixes 1D arrays on Bifrost for the same reasons.
Fixes: 70d6c5675d ("pan/bi: Emit TEXC with builder")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15254>
Hardware support was removed with Midgard. Use mesa/st to emulate GL_CLAMP with
nir_lower_tex automatically (the Zink lowering), and disable GL_MIRROR_CLAMP
which isn't lowered correctly.
Fixes *texwrap* Piglit tests on G52.
Fixes: f9ceab7b23 ("panfrost: Fix CLAMP wrap mode")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15253>
Instead of two magic ternary operations, define a new `axis` temporary
which is 1 for X and 2 for Y. Then define everything else in terms of
this variable. In particular, the mask operation we do on LANE_ID is a
mask so it makes more sense to use ~axis than 1/2 but in the other
order.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15352>
The Vulkan spec for VK_KHR_depth_stencil_resolve allows a format
mismatch between the primary attachment and the resolve attachment
within certain limits. In particular,
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03181
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED and VkFormat of
pDepthStencilResolveAttachment has a depth component, then the
VkFormat of pDepthStencilAttachment must have a depth component with
the same number of bits and numerical type
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03182
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED, and VkFormat of
pDepthStencilResolveAttachment has a stencil component, then the
VkFormat of pDepthStencilAttachment must have a stencil component
with the same number of bits and numerical type
So you can resolve from a depth/stencil format to a depth-only or
stencil-only format so long as the number of bits matches.
Unfortunately, this has never been tested because the CTS tests which
purport to test this are broken and actually test with a destination
combined depth/stencil format.
Fixes: 5e4f9ea363 ("anv: Implement VK_KHR_depth_stencil_resolve")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15333>
This changes our error handling to be compatible with upcoming
specification change that unifies glTex[Sub]Image error handling
between OpenGL ES 2.0 vs ES 3.0+ specifications, see:
https://gitlab.khronos.org/opengl/API/-/issues/147
OpenGL ES 2.0.25 spec states:
"Specifying a value for internalformat that is not one of
the above values generates the error INVALID_VALUE. If
internalformat does not match format, the error
INVALID_OPERATION is generated."
This fixes following new tests:
KHR-GLES31.core.compressed_format.*
KHR-GLES32.core.compressed_format.*
v2: GL_INVALID_OPERATION -> GL_INVALID_VALUE in extension
checks, remove (now overlapping) extension checks from
_mesa_gles_error_check_format_and_type (Eric Anholt)
v3: take GLES version in to account in internalformat checks
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12936>
In 4001d9ce1a we started lazily allocating surface states in the
descriptor sets rather than upfront in the descriptor pool. This was
to workaround vkd3d-proton allocating more than we could handle at the
HW level.
The issue introduced in that change is that we didn't protect the
descriptor pool free list as well as the anv_state_stream which are
now potentially used from different threads through the descriptor set
write functions.
This reverts the lazy allocation part of that change. Host only
descriptor sets changes remain.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4001d9ce1a ("anv: Handle VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE for descriptor sets")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15241>
When I switched iris over to use 3DSTATE_BINDING_TABLE_POOL_ALLOC, I
stopped flagging things dirty when allocating a new binder, because
the contents of the binding table were still valid, thanks to us not
having to subtract Surface State Base Address anymore.
This unfortunately missed the point that the old binding table is in the
old buffer, which is no longer what the binder pool base address points
to. So we'd either need to copy it over, or just flag it dirty and
re-emit it on the next draw.
Fixes misrendering in Ryujinx.
Fixes: 8b9045e7a4 ("intel: Use 3DSTATE_BINDING_TABLE_POOL_ALLOC exclusively on Gfx11+")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15314>
If we introduce another queue type (video decode) we can have a
disconnect between the RADV_QUEUE_ enum and the API queue_family_index.
currently the driver has
GENERAL, COMPUTE, TRANSFER which would end up at QFI 0, 1, <nothing>
since we don't create transfer.
Now if I add VDEC we get
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, 1, <nothing>, 2
or if you do nocompute
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, <nothing>, <nothing>, 1
This means we have to add a remapping table between the API qfi
and the internal qf.
This patches tries to do that, in theory right now it just adds
overhead, but I'd like to exercise these paths.
v2: add radv_queue_ring abstraction, and pass physical device in,
as it makes adding uvd later easier.
v3: rename, and drop one direction as unneeded now, drop queue_family_index
from cmd_buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13687>
in the initial implementation, a stream like:
* CmdBeginTransformFeedbackEXT
* CmdSetRasterizerDiscardEnableEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
* CmdBeginTransformFeedbackEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
would never enable transform feedback, as it only checked for the change
in rasterizer_discard state
Fixes: 4d531c67df ("anv: support rasterizer discard dynamic state")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15269>
for genuine early depth tests, the samplecount must be updated after depth
test but before samplemask is applied
for inferred-early or regular depth tests, the samplemask can be applied
before the depth test
Fixes: d9276ae965 ("llvmpipe: handle gl_SampleMask writing.")
fixes:
dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_samples_4
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15319>
If a driver sets driver_data but not driver_free_cb, driver_data will
get freed along with the command. If a driver sets driver_free_cb,
driver_data will not get automatically freed but the callback will get
called before the rest of the data structure is freed.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15329>
we can effectively skip any kind of checks here and just assume that one
of two scenarios is in effect:
* the user is about to attempt some incredibly illegal behavior that VVL will catch
* the user is about to attempt a pro gamer move and we'll be fine
in either case, it's EXTENDED_USAGE, so hopefully we're about to make a texture
view from a compatible and supported format
cc: mesa-stable
fixes:
dEQP-VK.image.extended_usage_bit_compatibility.image_format_properties*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15320>
subpass->color_count is (obviously) not set yet, so this would just clobber
the color attachments any time resolves were used
Fixes: 8a6160a354 ("lavapipe: VK_KHR_dynamic_rendering")
fixes:
dEQP-VK.draw.dynamic_rendering.multiple_interpolation.structured.with_sample_decoration.4_samples
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15330>
Normally this wouldn't matter, but it will matter for the upcoming scan
macro because the running tally is communicated through a shared
register across a physical edge. It may also matter if a live-range
split occurs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
We weren't considering the other destinations when allocating a
destination, so we could allocate overlapping destinations. This wasn't
done before because we never had a need for it, but the subgroup
reduction macros will need it.
The trickiest part of this is that we have to rewrite the
compress_regs_left fallback, because we may have to move around the
other already-allocated destinations. We now have a list of destinations
to (re)allocate in addition to the popped live intervals. For the rest
of the destination handling, we can just bail out if the proposed spot
for something overlaps another destination, but for the fallback we have
to handle all the cases gracefully. I also added support for odd
combinations of multiple destinations where some of them are tied, which
we'll use in the next commit to handle early-clobber destinations and
which will actually be used because one of the destinations of the
subgroup reduction macro will be early-clobber. The result is that the
order of intervals to allocate is now a lot more complicated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
For pcopies we only care about the register's type, i.e. whether its a
half-register and whether it's an array (plus its size). Copying over
other flags like IR3_REG_RELATIV just leads to sadness and validator
assertions.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Before, we were careful to
1. Get the source physreg.
2. Allocate the destination.
3. Insert a copy with the source being the physreg from step 1.
and this guaranteed that if the tied source were moved in step 2 we'd
still insert a copy from the correct place. However this won't work with
multiple destinations because an earlier destination could've already
moved the tied source around. Instead flip steps 2 and 3 (we'll insert
the copy before we allocate the interval, but that's ok) and run the
first two steps in a separate loop before any destinations are
allocated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Note: this is a behavior change for arrays, because it will count the
entire array instead of just the components written in the register
pressure calculation. However this is more accurate since this matches
how RA works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Re-use auto-generated vk_cmd_enqueue entrypoints instead of generating
our own version doing the same thing. In order to effectively do this,
we also add an allow-list of which entrypoints lavapipe actually handles
to avoid issues where the autogenerated one stomps a vkCmdFoo2 wrapper.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15311>
The test suite is full of flakes around transform feedback, atomics, and
tess. But, I hope it can be useful for regression testing core Mesa
reworks.
This required updating the kernel to 5.16.12 to get a more stable boot
process. That kernel rebuild caused an update of the container with
piglit which that was missed in a previous MR, so we got new xfails in x86
swrast.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> (nouveau)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15201>
This required updating the kernel to 5.16.12 to get a more stable boot
process. That kernel rebuild caused an update of the container with
piglit which that was missed in a previous MR, so we got new xfails in x86
swrast. Also, including modules on arm64 exposed a bug in v3d's
poe-powered.sh rsyncing of modules.
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15201>
this hardware won't return the correct value from dmod instructions,
so lower it to ensure that cts passes
nobody else will ever hit this, so perf isn't an issue and regular fmod
can be left alone
fixes (amd):
KHR-GL46.gpu_shader_fp64.builtin.mod_d*
Fixes: 5fae35fb17 ('zink: fix 64bit float shader ops ')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15306>
From the VK_KHR_shader_float_controls extension:
"5) Do any of the “Pack” GLSL.std.450 instructions count as
conversion instructions and have the rounding mode applied?"
"RESOLVED: No, only instructions listed in “section 3.32.11.
Conversion Instructions” of the SPIR-V specification count as
conversion instructions."
This is also the same logic as the LLVM backend.
No fossils-db changes on Sienna Cichlid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15301>
The multiViewport feature isn't required for GL 4.3, it's required for
GL 4.1. Technically speaking, we could have just dropped it because we
already list the maxViewports requirement. But it seems better to be
very clear here to me.
Fixes: 29f8f21bff ("docs: document zink GL 4.3 requirements")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15300>
We have twice the registers in this case so it makes sense to double
this as well. While this causes slight regressions in shader-db
stats (due to additional register pressure), it helps us hide latency
of memory reads better on 2-thread compiles, where the thread switch
mechanism will be less effective. This shows a ~3% performance
improvement on the UE4 SunTemple demo.
total instructions in shared programs: 12642413 -> 12656164 (0.11%)
instructions in affected programs: 2272652 -> 2286403 (0.61%)
helped: 2924
HURT: 3389
total uniforms in shared programs: 3703861 -> 3704776 (0.02%)
uniforms in affected programs: 213729 -> 214644 (0.43%)
helped: 823
HURT: 1272
total max-temps in shared programs: 2150686 -> 2153505 (0.13%)
max-temps in affected programs: 191332 -> 194151 (1.47%)
helped: 1900
HURT: 1891
total spills in shared programs: 3255 -> 3274 (0.58%)
spills in affected programs: 166 -> 185 (11.45%)
helped: 3
HURT: 6
total fills in shared programs: 4630 -> 4656 (0.56%)
fills in affected programs: 367 -> 393 (7.08%)
helped: 7
HURT: 15
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This can add quite a bit of register pressure so it makes sense to disable it
to prevent us from dropping to 2 threads or increase spills:
total instructions in shared programs: 12672813 -> 12642413 (-0.24%)
instructions in affected programs: 256721 -> 226321 (-11.84%)
helped: 719
HURT: 77
total threads in shared programs: 415534 -> 416322 (0.19%)
threads in affected programs: 788 -> 1576 (100.00%)
helped: 394
HURT: 0
total uniforms in shared programs: 3711370 -> 3703861 (-0.20%)
uniforms in affected programs: 28859 -> 21350 (-26.02%)
helped: 204
HURT: 455
total max-temps in shared programs: 2159439 -> 2150686 (-0.41%)
max-temps in affected programs: 32945 -> 24192 (-26.57%)
helped: 585
HURT: 47
total spills in shared programs: 5966 -> 3255 (-45.44%)
spills in affected programs: 2933 -> 222 (-92.43%)
helped: 192
HURT: 4
total fills in shared programs: 9328 -> 4630 (-50.36%)
fills in affected programs: 5184 -> 486 (-90.62%)
helped: 196
HURT: 0
Compared to the stats before adding scheduling of non-filtered
memory reads we see we that we have now gotten back all that was
lost and then some:
total instructions in shared programs: 12663186 -> 12642413 (-0.16%)
instructions in affected programs: 2051803 -> 2031030 (-1.01%)
helped: 4885
HURT: 3338
total threads in shared programs: 415870 -> 416322 (0.11%)
threads in affected programs: 896 -> 1348 (50.45%)
helped: 300
HURT: 74
total uniforms in shared programs: 3711629 -> 3703861 (-0.21%)
uniforms in affected programs: 158766 -> 150998 (-4.89%)
helped: 1973
HURT: 499
total max-temps in shared programs: 2138857 -> 2150686 (0.55%)
max-temps in affected programs: 177920 -> 189749 (6.65%)
helped: 2666
HURT: 2035
total spills in shared programs: 3860 -> 3255 (-15.67%)
spills in affected programs: 2653 -> 2048 (-22.80%)
helped: 77
HURT: 21
total fills in shared programs: 5573 -> 4630 (-16.92%)
fills in affected programs: 3839 -> 2896 (-24.56%)
helped: 81
HURT: 15
total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%)
sfu-stalls in affected programs: 8993 -> 7564 (-15.89%)
helped: 1808
HURT: 1038
total nops in shared programs: 324894 -> 323685 (-0.37%)
nops in affected programs: 30362 -> 29153 (-3.98%)
helped: 2513
HURT: 2077
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
We do a few changes over NIR's defaults:
1. Lower delay for texture reads. Empirically, we don't observe any
benefits with delays over 50 and since this delay value is still
used by the scheduler in the "favor register pressure" case it is
benefitial to avoid overestimating it too much.
2. Adjust delay for non-filtered TMU reads to the delay selected for
texture reads.
3. In our case, UBO reads from dynamically uniform addresses don't
use the TMU and have a latency of 1 instruction in the best case
scenario or 4 at worse, so we go with 1 so we don't try to move
this early.
This helps us get back some of what we lost when updating the
default scheduler configuration to add a delay for non-filtered
memory reads:
total instructions in shared programs: 13126587 -> 12671765 (-3.46%)
instructions in affected programs: 3764097 -> 3309275 (-12.08%)
helped: 14664
HURT: 4244
total threads in shared programs: 407208 -> 415522 (2.04%)
threads in affected programs: 8716 -> 17030 (95.39%)
helped: 4224
HURT: 67
total uniforms in shared programs: 3812698 -> 3711224 (-2.66%)
uniforms in affected programs: 335170 -> 233696 (-30.28%)
helped: 2816
HURT: 3551
total max-temps in shared programs: 2318430 -> 2159345 (-6.86%)
max-temps in affected programs: 539991 -> 380906 (-29.46%)
helped: 13173
HURT: 1440
total spills in shared programs: 49086 -> 5966 (-87.85%)
spills in affected programs: 48306 -> 5186 (-89.26%)
helped: 1655
HURT: 28
total fills in shared programs: 55810 -> 9328 (-83.29%)
fills in affected programs: 54821 -> 8339 (-84.79%)
helped: 1659
HURT: 22
LOST: 0
GAINED: 3
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This has been pending for a long time. It is not very consistent to
add a significant delay for textures and not do it for UBOs, etc
The reason we have not been doing this so far is the accumulated effect
on register pressure for V3D as shown by shader-db results below, but
from the point of view of a generic scheduler it makes sense to do this.
Later patches will address V3D specific issues with register pressure
derived from this by letting the driver control its instruction delay
settings.
total instructions in shared programs: 12662138 -> 13126587 (3.67%)
instructions in affected programs: 1813091 -> 2277540 (25.62%)
helped: 2410
HURT: 10499
total threads in shared programs: 415858 -> 407208 (-2.08%)
threads in affected programs: 17348 -> 8698 (-49.86%)
helped: 8
HURT: 4333
total uniforms in shared programs: 3711483 -> 3812698 (2.73%)
uniforms in affected programs: 128012 -> 229227 (79.07%)
helped: 3474
HURT: 2143
total max-temps in shared programs: 2138763 -> 2318430 (8.40%)
max-temps in affected programs: 318780 -> 498447 (56.36%)
helped: 588
HURT: 11997
total spills in shared programs: 3860 -> 49086 (1171.66%)
spills in affected programs: 709 -> 45935 (6378.84%)
helped: 23
HURT: 1595
total fills in shared programs: 5573 -> 55810 (901.44%)
fills in affected programs: 1067 -> 51304 (4708.25%)
helped: 23
HURT: 1595
LOST: 3
GAINED: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
We can get a generic nir_intrinsic_memory_barrier to represent a
barrier involving multiple semantics (instead of getting individual
specific barriers for each semantic). This means that we need to
consider these as potentially affecting shared memory access as well.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This doesn't have any significant impact shader-db stats and would
reduce our capacity to hide latency from the loads, so it is probably
undesirable:
total instructions in shared programs: 12663189 -> 12663186 (<.01%)
instructions in affected programs: 4222 -> 4219 (-0.07%)
helped: 9
HURT: 4
total uniforms in shared programs: 3711624 -> 3711629 (<.01%)
uniforms in affected programs: 186 -> 191 (2.69%)
helped: 0
HURT: 2
total max-temps in shared programs: 2138822 -> 2138857 (<.01%)
max-temps in affected programs: 569 -> 604 (6.15%)
helped: 1
HURT: 9
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
On Icelake and later, we can use a new 3DSTATE_BINDING_TABLE_POOL_ALLOC
command to update the location of the binder (buffer containing binding
table entries), rather than having to move Surface State Base Address
via a STATE_BASE_ADDRESS command. This has less stalling and also means
our surface addresses can remain relative to a fixed 4GB address range,
meaning we don't have to re-stream them any time the binder changes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
This workaround is needed on all Gfx12.0 parts, but doesn't appear to be
necessary on XeHP. The other drivers do not appear to be applying this
workaround on those parts. As further evidence, we accidentally added
the 3DSTATE_BINDING_TABLE_POOL_ALLOC commands after switching back to
GPGPU mode, which would be an incorrect way to implement the workaround,
and things seem to be working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
On Gfx11+, we're going to stop changing Surface State Base Address
and instead start changing the Binding Table Pool address instead.
So, rename a few things to track the last binder address, which is
what we're actually changing, regardless of how we program it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
Skylake and older use a 15:5 binding table pointer format, which means
our binder can be at most 64kB in size. Each binding table within the
binder must be aligned to 32B.
XeHP uses a new 20:5 binding table format, which allows us to increase
the binder size to 1MB while retaining the nice 32B alignment. Larger
binders mean fewer stalls as we update the base address for the binder.
Icelake and Tigerlake can either use the 15:5 format or an 18:8 format.
18:8 mode requires the base of each binding table to be aligned to 256B
instead of 32B, but it gives us a maximum binder size of 512kB.
We can store 64 binding table entries in a 256B chunk (256B / 4B = 64),
but only 8 entries in a 32B chunk (32B / 4B = 8). Assuming that most
binding tables have fewer than 64 entries, this means that with the 18:8
format, we're likely to be able to fit 2048 (512KB / 256B) tables into a
a buffer before needing to allocate a new one and stall.
Technically, the old format could also store 2048 binding tables per
buffer as well (64KB / 32B = 2048). However, tables that needed more
than 8 entries would need multiple 32B chunks. A single table would
take multiple aligned chunks, while with the larger 256B format, it
could fit in a single one.
This cuts binder resets by 6.3% on a Shadow of Mordor benchmark trace.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
this comes in two flavors:
* streamout of array<struct/block>
* partial streamout of array/struct/block
for the former:
* arrays of structs can just be blasted out in the initial var declaration (easy)
* arrays of blocks must be output to separate xfb buffers for each array block,
which requires skipping initial xfb blast-off and instead propagating the values
using tmp variables at a later point
for the latter:
* the optimal way to do this is to unwrap the struct first to figure out what's being
emitted, at which point the value can be extracted and exported
fixes the rest of spec@arb_gl_spirv@execution@xfb
...except spec@arb_gl_spirv@execution@xfb@vs_block_array, which I'm suspecting is broken
due to vtn bugs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>
this now handles inlining of stupid types (dvec3, dmatX) and complex
types (goku) as seen in cts
fixes:
KHR-Single-GL46.enhanced_layouts.xfb_explicit_location
KHR-Single-GL46.enhanced_layouts.xfb_struct_explicit_location
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>
This commit fixes the following flaws in the implementation:
* when a resource was re-allocated, the guest side storage
was also allocated
* when a source needs a readback before being written to, then
the call would go through vws->transfer_get, thereby bypassing the
staging resource, and this would fail on the host, because no
the allocated IOV was too small (just one byte)
* if the texture write would need neither flush nor readback, the
old code path would be used expecting that guest side backing stogage
for the texture.
v2: - actually do a readback to the stageing resource when it is required
- fix typo (Lepton)
v3: Don't use stageing transfers if the host can't read back the data
by rendering to an FBO or calling getTexImage, because in this case
we rely on the IOV to hold the date.
v4: Also don't use staging transfers if the format is no readback
format. Otherwise we have to deal with the resolve blit, and
this is currently not working correctly.
v5: add a new flag that indicates whether non-renderable textures can
be read back (either via glGetTexImage or GBM)
v6: Restrict the use of staging texture transfers to textures that can
be read back, and on GLES also if the they are bound to scanout and
the host uses minigbm to allocate such textures.
For that replace the flag indicating the capability to read back
non-renderable textures with a cap that indicates whether scanout
textures can be read back.
v7: update virglrenderer version in the CI
v8: update use of stageing (Chia-I)
v9: remove superflous check and assignment (Chia-I)
v10: disable stageing textures for arrays with stencil format. This is a
workaround for failures of the CI.
Fixes: cdc480585c
virgl/drm: New optimization for uploading textures
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14495>
this was being set from back before zink actually supported 64bit
natively and only 32bit was functional, but it breaks 64bit support
cc: mesa-stable
fixes (lavapipe):
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec2
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec3
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec4
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15274>
Run crosvm as a background process in order to allow intercepting
interrupt signals (INT, TERM) and properly release/cleanup any allocated
resources.
This is particularly helpful when one or more crosvm tasks hang, which
will eventually prevent subsequent instances to be started - currently
we can handle up to 128 concurrent crosvm instances per runner.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15238>
When using indirect textures, some lanes may not be active,
particularly in a loop, so as with some other areas, extracting
the correct lane is needed here. This extracts the last valid one.
KHR-GL45.texture_barrier.* on zink.
Fixes: e168d148d7 ("gallivm/nir: handle non-uniform texture offsets")
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15259>
Experiments suggest that e.g.
add.u r0.y, hr0.x, hr0.y
will result in the summed value in both the high and low words of r0.y.
This only happens with odd registers, not even ones (r0.x works fine).
Seen in the bit_count lowering (which turns out to be unnecessary, but
this is still a larger problem).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15251>
Along with fixing the grammar, this allows it to be deduplicated since
the properly worded description exists in later generations' XMLs.
Cuts 96 B from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
917613 0 0 917613 e006d meson-generated_.._intel_perf_metrics.c.o (before)
917511 0 0 917511 e0007 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14131044 365708 210004 14706756 e06844 iris_dri.so (before)
14130948 365708 210004 14706660 e067e4 iris_dri.so (after)
text data bss dec hex filename
8124321 214264 22820 8361405 7f95bd libvulkan_intel.so (before)
8124225 214264 22820 8361309 7f955d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
The compiler does a good job of deduplicating strings already, but we
can eliminate the pointers to each string by combining the strings into
a single char array and storing only an index into that array.
The longest of the char arrays is the descriptions array, which is a
little over 45 KiB, so still under MSVC's 64 KiB string literal limit
[0]. Because the string length is under 64 KiB we can use uint16_t as
the index type, which roughly doubles our savings as compared to an int.
This cuts 77 KiB from iris_dri.so (0.5%) and libvulkan_intel.so (0.9%).
text data bss dec hex filename
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (before)
924401 0 0 924401 e1af1 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 391628 210004 14792484 e1b724 iris_dri.so (before)
14137732 365708 210004 14713444 e08264 iris_dri.so (after)
text data bss dec hex filename
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (before)
8131009 214264 22820 8368093 7fafdd libvulkan_intel.so (after)
relinfo:
iris_dri.so (before): 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
[0] https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170&viewFallbackFrom=vs-2019
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
intel_perf_query_counter contains fields for things we can't or don't
want to store in our static data (like runtime-determined max values) or
oa_read_counter function pointers which are dependent on the GPU gen and
would make deduplication very ineffective.
Cuts 16 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (before)
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 408908 210004 14809764 e1faa4 iris_dri.so (before)
14190852 391628 210004 14792484 e1b724 iris_dri.so (after)
text data bss dec hex filename
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (before)
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
And specifically mark it with ATTRIBUTE_NOINLINE. Otherwise it will be
inlined and actually slightly increase code size.
Cuts 505 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
1538720 0 0 1538720 177aa0 meson-generated_.._intel_perf_metrics.c.o (before)
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14751700 365708 210004 15327412 e9e0b4 iris_dri.so (before)
14190852 408908 210004 14809764 e1faa4 iris_dri.so (after)
text data bss dec hex filename
8744913 214264 22820 8981997 890ded libvulkan_intel.so (before)
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (after)
Relocations increase because the counter initializations are moved from
code (in .text) to pointers (in .text) to .rodata, which require
relocations.
relinfo:
iris_dri.so (before): 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
No changes in resulting code (yes, seriously!). GCC constant propagates
the static const arrays into the code, yielding bit for bit identical
results. This will however enable further cleanups.
Before this patch, we emit 11916 different initializations of
intel_perf_query_counter. With this patch we emit an array of 539 and
initialize the intel_perf_query_counters in terms of those.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
Instead of allocating cpu_storage in threaded_resource_init, defer the
allocation to first use (in tc_buffer_map).
This avoids needless memory allocation if tc_buffer_disable_cpu_storage is
called before tc_buffer_map.
map_buffer_alignment is stored and serves as a "can cpu_storage be used" flag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15074>
When "dri2_wl_formats_init" fails in "dri2_initialize_wayland_swrast",
the "dri2_display_destroy" function is called for clean up. However, the
"dri2_egl_display" was not associated with the display in its
"DriverData" field yet.
The following cast in "dri2_display_destroy":
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
Expands to:
_EGL_DRIVER_TYPECAST(drvname ## _display, _EGLDisplay, obj->DriverData)
Crashing.
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13972>
When "dri2_wl_formats_init" fails in "dri2_initialize_wayland_drm", the
"dri2_display_destroy" function is called for clean up. However, the
"dri2_egl_display" was not associated with the display in its
"DriverData" field yet.
The following cast in "dri2_display_destroy":
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
Expands to:
_EGL_DRIVER_TYPECAST(drvname ## _display, _EGLDisplay, obj->DriverData)
Crashing.
Addresses-Coverity-ID: 1494541 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13972>
when migrating a recycled set here, the set was previously invalid and
in the recycled table, meaning it can be reused directly so long as
it's first invalidated
the previous code would instead pop a different set off the allocation array,
leaking this one
cc: mesa-stable
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
this got mixed up during some refactor and started indexing based on the
number of bindings instead of the number of descriptors, which means
that array descriptor bindings would have overlapping array memory
cc: mesa-stable
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
the variant data may have changed in the meanwhile, so do a pass to
ensure that the correct variant is used
cc: mesa-stable
fixes caselist:
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_two_samples.multisample_texture_4
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_two_samples.singlesample_rbo
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
Another instruction might write to the second source, and then an
INSTR_INVALID_ENC fault will be raised because the tuple will write to
and read from the register at the same time.
Fixes: 795638767d ("pan/bi: Use fused dual source blending")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
BITSET_MASK returns ~0 when given an input of zero, when we need it to
return 0 instead.
Fixes shaders with only sysvals but no UBOs when push constants are
disabled.
This breaks when 31 or 32 UBOs are used, but PAN_MAX_CONST_BUFFERS is
currently set to 16.
Fixes: c246af0dd8 ("panfrost: Only upload UBOs when needed")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
This could be made slightly more efficient by only setting the dirty
state that is needed, but eventually you reach a point where it's
cheaper to re-emit everything than work out what can or can't be kept.
Fixes rendering issues in Duckstation.
Fixes: cd2c1ef9da ("panfrost: Dirty track textures/samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
TEXC can have two destinations; the value for neither of them can be
used in the same bundle, so extend the code to check for this to
iterate over both destinations.
Fixes artefacts in the game "LIMBO".
Fixes: a303076c1a ("pan/bi: Add bi_instr_schedulable predicate")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
Trying to write to overlapping register ranges from a single
instruction is undefined behaviour, so add interference between the
nodes to avoid this.
Hit in a dual-texture instruction in LIMBO.
Fixes: 9146bafbb4 ("pan/bi: Add dual texture fusing pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
When discarding the whole resource to create a new one, if this resource
is used by a sampler view, a rebind must be done to use the new
resource.
But this must be done when setting the sampler views, because we don't
have access to those samplers before.
v2:
- Pack shader state on setting sampler views (Iago)
- Use a serial ID to know when to rebind sampler views (Juan)
v3:
- Move check to caller (Iago)
- Keep rebind sampler view on BO change (Iago)
v4:
- Rename "serial_bo" to "serial_id" (Iago)
- Add comments (Iago)
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6027
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15171>
The Gallium pipe video "frame_num" variable is internally used as a
counter of elapsed reference frames since the last IDR. The incoming
frame_num field from VA picture parameters is not equivalent; the VA
value may wrap to zero prematurely, as it is a 16-bit struct field with
a documented max value of 2^(log2_max_frame_num_minus4 + 4)-1.
This change improves "infinite GOP" single-client live streaming, where
it is reasonable for the server to desire an endless series of P-frames
without IDR. Without this change, it is difficult/impossible for an
application to encode a P- or B-frame after the VA frame_num field wraps
around to zero, depending on the backend encoder implementation.
This change has no effect on existing applications that always signal an
IDR frame and reset the VA frame_num to zero before it wraps around. For
example, the FFmpeg vaapi encoder ignores the VA documentation and sends
an un-wrapped VA frame_num, which results in identical computation of
the internal frame_num (as long as each GOP is less than 65536 frames).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5768
Reviewed-by: Thong Thai <thong.thai@amd.com>
patch revision 3: correctly avoid incrementing frame_num when the encoded
frame is not a reference, per h264 spec and ffmpeg behavior
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14332>
Only allow this in situations where we know it's safe. In particular, this
stops removal of unconditional branches like with
block_kind_continue_or_break.
Fixes dEQP-VK.graphicsfuzz.fragcoord-control-flow hang.
fossil-db (Sienna Cichlid):
Totals from 34 (0.02% of 162293) affected shaders:
Instrs: 84115 -> 84178 (+0.07%); split: -0.00%, +0.08%
CodeSize: 463372 -> 463624 (+0.05%); split: -0.00%, +0.06%
Latency: 3467316 -> 3467652 (+0.01%)
InvThroughput: 3085493 -> 3085578 (+0.00%)
Branches: 3221 -> 3284 (+1.96%); split: -0.03%, +1.99%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: f030b75b7d ("aco: relax condition to remove branches in case of few instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15214>
vecN instructions which are only used by other ALU
will now get duplicate channels removed.
i915g:
total instructions in shared programs: 396309 -> 396294 (<.01%)
instructions in affected programs: 186 -> 171 (-8.06%)
r300:
total instructions in shared programs: 1165059 -> 1164354 (-0.06%)
instructions in affected programs: 35884 -> 35179 (-1.96%)
total temps in shared programs: 165497 -> 165326 (-0.10%)
temps in affected programs: 2990 -> 2819 (-5.72%)
softpipe:
total instructions in shared programs: 2860028 -> 2859084 (-0.03%)
instructions in affected programs: 55539 -> 54595 (-1.70%)
total temps in shared programs: 516939 -> 516546 (-0.08%)
temps in affected programs: 6623 -> 6230 (-5.93%)
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12468>
This gives
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 4096
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 8192
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 12288
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 16384
MESA-VIRTIO: debug: aborting
Aborted
which should be more friendly than printing the messages forever.
On my i7-7820HQ, this aborts after roughly 4+8+16+32=60 seconds
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15200>
NIR-to-TGSI produces partial output writes contrary to the old paths
that always wrote the full outputs. Therefore if there is now a partial
output write ready to be scheduled and nothing else besides a tex
is ready, we would schedule the output write first. This was not a
problem before as usually at last some component of the full output write
depended on the tex result.
This is not optimal from the performance point of view and resulted in
~20% slowdown in the Unigine demos. The docs say:
The first OUTPUT instruction will reserve space in the output register
fifo. This space is limited, therefore issuing an OUTPUT earlier than
necessary may cause threads to stall earlier than necessary. You
should not set an ALU instruction as type OUTPUT unless it is actually
writing to an output register, or it is the last instruction of
the program.
Fix it by explicitly prefering a TEX before OUT and restore the
performance: 9.66 -> 12.12 fps (as compared to 11.83 with the old
glsl-to-TGSI path) in Unigine Sanctuary. No change in Lightsmark or
GLmark.
This is also a win from the intructions point of view as we are usually
able to schedule the partial output writes in a single pair at the end.
total instructions in shared programs: 106009 -> 105891 (-0.11%)
instructions in affected programs: 10153 -> 10035 (-1.16%)
helped: 118
HURT: 0
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5840
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
The max_score == -1 condition is already before so this
will never trigger. Its unclear what was the intention anyway. Now we
emit either:
- if we have accumulated enough tex intructions for a full block
- if we have nothing else to emit
- or if we can emit all remaining tex instructions already.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
New extensions properties/feature are being put in the `vn_physical_device`
which is not ideal from an organization point of view.
Here the `vn_physical_device_{features,properties}` are two new struct to
help the `vn_physical_device` organzation.
Signed-off-by: Igor Torrente <igor.torrente@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15170>
Something is slightly off in the integer values returned. It passes many
tests without the fixup, but the dEQP-GLES31 tests complain. The blob
ends up doing 3x gathers, and selects between them based on getinfo
results. Since we already have a per-sampler key with some spare bits,
just stick the bit-size info in there. And we can derive signedness from
the associated type info.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14670>
It can't be done. This just provides bad results. The blob had a
comparable approach where they fixed up coordinates, but that also can't
work with a separate texture definition with nearest filtering. By then,
might as well provide a unswizzled variant instead, and using native
functionality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14670>
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL 4.50 spec:
"For some blocks declared as arrays, the location can only be applied
at the block level: When a block is declared as an array where
additional locations are needed for each member for each block array
element, it is a compile-time error to specify locations on the block
members. That is, when locations would be under specified by applying
them on block members, they are not allowed on block members. For
arrayed interfaces (those generally having an extra level of
arrayness due to interface expansion), the outer array is stripped
before applying this rule"
From Section 1.2.1 (Changes from Revision 6 of GLSL Version) of the GLSL 4.50 spec:
"Private Bug 15678: Don’t allow location = on block members where
the block needs an array of locations"
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL ES 3.20 spec
"If an input is declared as an array of blocks, excluding per-vertex-arrays
as required for tessellation, it is an error to declare a member of
the block with a location qualifier"
From Section 1.1.3 (Changes from GLSL ES 3.2 revision 3) of the GLSL ES 3.20 spec:
"Arrayed blocks cannot have layout location qualifiers on members"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11522>
if one of these states change then it affects which result needs to be
used for that query, so split it up over multiple query ids to make sure
the correct result is obtained
fixes (lavapipe):
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_pause_resume
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_states
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15227>
When the AOS/linear code was added it only worked with TGSI which
meant nothing in mesa upstream was really using it.
This adds support to analyse NIR shaders, and adds aos support
to the backend.
AOS support is limited to mov,vec,fmul,tex sampling in order to
accelerate mostly compositing operations. I've tested weston uses
the fast path. gnome-shell can't use it yet as we can't optimise
the depth test paths.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15140>
For Bifrost, we model load/store segments, for example for thread local storage.
We need something similar on Valhall -- access modifiers. There are four access
modifiers on Valhall, controlling memory subsystem optimizations for the access:
none: Nothing may be assumed. Corresponds to "global".
istream: Internally streaming within the GPU. Corresponds to "pos", as it's
used for position stores.
estream: Externally streaming outside the GPU. Corresponds to "vary", as it's
used for varying stores.
force: Force access in discarded threads. Corresponds to "tl", as it's required
for correct behaviour of helper invocations that use the stack.
If these access modifiers end up being useful outside these fixed purposes, we
may need to rework this part of the IR. For now, this should suffice.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15216>
This can avoid some cases where a constant has to be loaded into a
temporary register.
v2: Update i915-g33-fails.txt.
total instructions in shared programs: 788625 -> 782376 (-0.79%)
instructions in affected programs: 166269 -> 160020 (-3.76%)
helped: 1578
HURT: 0
helped stats (abs) min: 3 max: 21 x̄: 3.96 x̃: 3
helped stats (rel) min: 1.56% max: 33.33% x̄: 4.82% x̃: 3.45%
95% mean confidence interval for instructions value: -4.06 -3.86
95% mean confidence interval for instructions %-change: -5.00% -4.64%
Instructions are helped.
LOST: 0
GAINED: 35
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
I noticed the SGE case while looking at the output of
shaders/closed/steam/trine-2/fp-3.shader_test on i915g. These are
especially bad on i915 that needs two instructions to implement SNE.
An alternative would be to duplicate the sne(sXX(a, b), 0.0) rules in an
algebraic pass that occurs after bool_to_float. Doing the work earlier
seems preferable.
i915
total instructions in shared programs: 788274 -> 788223 (<.01%)
instructions in affected programs: 666 -> 615 (-7.66%)
helped: 5
HURT: 0
helped stats (abs) min: 9 max: 12 x̄: 10.20 x̃: 9
helped stats (rel) min: 5.00% max: 11.11% x̄: 8.12% x̃: 8.16%
95% mean confidence interval for instructions value: -12.24 -8.16
95% mean confidence interval for instructions %-change: -10.81% -5.43%
Instructions are helped.
LOST: 0
GAINED: 2
The two gained shaders are assembly fragment programs in Euro Truck
Simulator 2.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
Many users of nir_vec() do so by nir_channel()-ing a new ssa defs as movs
from other vectors to put the new vector together, which then just have to
get copy-propagated into the ALU srcs and DCEed away the temporary movs.
If they instead take nir_ssa_scalar, we can avoid that extra work.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14865>
"queueFamilyIndexCount is the number of queue families having access to the image(s) of the
swapchain when imageSharingMode is VK_SHARING_MODE_CONCURRENT.
pQueueFamilyIndices is a pointer to an array of queue family indices having access to the
images(s) of the swapchain when imageSharingMode is VK_SHARING_MODE_CONCURRENT."
If the type isn't concurrent, don't attempt to access the arrays.
dEQP-VK.wsi.xlib.swapchain.create.exclusive_nonzero_queues on lavapipe.
Fixes: 5b13d74583 ("vulkan/wsi/drm: Break create_native_image in pieces")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15101>
We were just emitting the bad reloc (either an assert fail on a debug
build or for a release build likely a GPU hang from the resulting fault).
Given that the GLES 3.2 spec's robust context requirement says we should
return undefined data but not terminate for element indices outside of the
VB, ignoring the offset in this case seems like a better behavior to have
in all cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15198>
We only have a single prog_data::total_scratch for all shader variants
(SIMD 8, 16, 32). Therefore we should always max the total_scratch on
top of existing variant.
We probably haven't run into that issue before because we compile by
increasing SIMD size and higher SIMD size is more likely to spill. But
for bindless shaders with return shaders, if the last return part
doesn't spill, we completely ignore the previous parts' scratch
computation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15193>
Now that we don't sort our nodes we can arrange them so we can
easily translate between nodes and temps without a mapping table,
just applying an offset.
To do this we have a single array of nodes where twe put first the nodes
for accumulators and then the nodes for temps. With this setup we can
ensure that for any given temp T, its node is always T + ACC_COUNT.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
Nodes are allocated in order to registers so initially sorting
was used to ensure that nodes with smaller life ranges would
be assigned first and therefore be more likely to get
accumulators.
However, since d81a6e5f1d now we don't rely on order to make
decisions about accumulators and instead we make policy decisions
based on actual liveness, so sorting is no longer strictly
relevant to this decision.
Furthermore, we are not re-sorting nodes after each spill either,
since that would probably require that we rebuild the interference
graph after each spill (the graph identifies nodes by their index).
Shader-db results show a significant improvement in instruction
counts, due to more optimal accumulator assignments. The reason for
this is that we use a round-robin policy for choosing the next
accumulator to assign. The idea behind this is preventing nearby
temps to be assigned to the same accumulator so that QPU scheduling
is more flexible, but if we sort our nodes, we are basically not
assigning temps in program order any more and the round-robin policy
becomes less effective:
total instructions in shared programs: 13000420 -> 12663189 (-2.59%)
instructions in affected programs: 11791267 -> 11454036 (-2.86%)
helped: 62890
HURT: 19987
total threads in shared programs: 415874 -> 415870 (<.01%)
threads in affected programs: 20 -> 16 (-20.00%)
helped: 2
HURT: 4
total uniforms in shared programs: 3711652 -> 3711624 (<.01%)
uniforms in affected programs: 43430 -> 43402 (-0.06%)
helped: 134
HURT: 173
total max-temps in shared programs: 2144876 -> 2138822 (-0.28%)
max-temps in affected programs: 123334 -> 117280 (-4.91%)
helped: 4112
HURT: 1195
total spills in shared programs: 3870 -> 3860 (-0.26%)
spills in affected programs: 1013 -> 1003 (-0.99%)
helped: 14
HURT: 12
total fills in shared programs: 5560 -> 5573 (0.23%)
fills in affected programs: 1765 -> 1778 (0.74%)
helped: 14
HURT: 17
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
For us they are basically uniforms too so we want to make their
lifespans short to facilitate allocating them to accumulators.
total instructions in shared programs: 13043585 -> 13015385 (-0.22%)
instructions in affected programs: 8326040 -> 8297840 (-0.34%)
helped: 24939
HURT: 19894
total threads in shared programs: 415860 -> 415858 (<.01%)
threads in affected programs: 4 -> 2 (-50.00%)
helped: 0
HURT: 1
total uniforms in shared programs: 3721953 -> 3720451 (-0.04%)
uniforms in affected programs: 96134 -> 94632 (-1.56%)
helped: 744
HURT: 435
total max-temps in shared programs: 2173431 -> 2154260 (-0.88%)
max-temps in affected programs: 264598 -> 245427 (-7.25%)
helped: 10858
HURT: 841
total spills in shared programs: 4005 -> 4010 (0.12%)
spills in affected programs: 700 -> 705 (0.71%)
helped: 5
HURT: 10
total fills in shared programs: 5801 -> 5817 (0.28%)
fills in affected programs: 1346 -> 1362 (1.19%)
helped: 6
HURT: 11
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
If we are compiling with a strategy that does not allow TMU spills
we should not allow spilling anything that is not a uniform.
Otherwise the RA cost/benefit algorithm may choose to spill a
temp that is not uniform and that will cause us to immediately
fail the strategy and fallback to the next one, even if we
could've instead chosen to spill more uniforms to compile the
program successfully with that strategy.
Some relevant shader-db stats:
total instructions in shared programs: 13040711 -> 13043585 (0.02%)
instructions in affected programs: 234238 -> 237112 (1.23%)
helped: 73
HURT: 172
total threads in shared programs: 415664 -> 415860 (0.05%)
threads in affected programs: 196 -> 392 (100.00%)
helped: 98
HURT: 0
total uniforms in shared programs: 3717266 -> 3721953 (0.13%)
uniforms in affected programs: 12831 -> 17518 (36.53%)
helped: 6
HURT: 100
total max-temps in shared programs: 2174177 -> 2173431 (-0.03%)
max-temps in affected programs: 4597 -> 3851 (-16.23%)
helped: 79
HURT: 21
total spills in shared programs: 4010 -> 4005 (-0.12%)
spills in affected programs: 55 -> 50 (-9.09%)
helped: 5
HURT: 0
total fills in shared programs: 5820 -> 5801 (-0.33%)
fills in affected programs: 186 -> 167 (-10.22%)
helped: 5
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
Our cost was 5 which matches the number of instructions we have to
add for a TMU spill (a fill is 4 instructions).
Uniform spills on the other hand add an extra instruction for each
fill and remove one instruction for the spill itself. These have
a cost of 1.
Therefore, if we have a single spill+fill, we end up with +9
instructions if it is a TMU spill and +0 instructions with a uniform
spill, so making the former only 5 times more costly is probably
not a good idea, and this is without even considering the added
latency of the TMU accesses.
Relevant shader-db changes show this causes as a marginal instruction
count increase in a few shaders but better thread counts and lower
TMU spilling overall:
total instructions in shared programs: 13037315 -> 13040711 (0.03%)
instructions in affected programs: 370106 -> 373502 (0.92%)
helped: 187
HURT: 321
total threads in shared programs: 415090 -> 415664 (0.14%)
threads in affected programs: 574 -> 1148 (100.00%)
helped: 287
HURT: 0
total uniforms in shared programs: 3706674 -> 3717266 (0.29%)
uniforms in affected programs: 63075 -> 73667 (16.79%)
helped: 40
HURT: 395
total max-temps in shared programs: 2176080 -> 2174177 (-0.09%)
max-temps in affected programs: 15838 -> 13935 (-12.02%)
helped: 316
HURT: 34
total spills in shared programs: 4247 -> 4010 (-5.58%)
spills in affected programs: 2599 -> 2362 (-9.12%)
helped: 107
HURT: 14
total fills in shared programs: 6121 -> 5820 (-4.92%)
fills in affected programs: 3622 -> 3321 (-8.31%)
helped: 108
HURT: 13
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
This is for drivers that have separate store instructions for varyings,
system value outputs (such as clip distances), and transform feedback.
The flags tell the driver not to store the output to those locations.
This will be used by radeonsi initially, and then maybe by a new linker.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
NIR now fully contains pipe_stream_output_info in shader_info and IO
intrinsics if lower_io_variables is true. radeonsi will not use
pipe_stream_output_info after this, and other drivers are free to follow
that.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
This will allow compaction of transform feedback varyings because they
are no longer tied to varying slots with this information.
It will also make transform feedback info available to all NIR passes
after IO is lowered. It's meant to replace pipe_stream_output_info.
Other intrinsics are not used with transform feedback.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
These are unified in the hardware, so let's unify them in pan_shader_info.
Hoisting this logic to pan_shader.c avoids the need to duplicate this logic for
Midgard/Bifrost (RSD packing) and Valhall (SPD packing).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15204>
Instead of specifying the tiling on the texture descriptor, Valhall specifies it
on the plane descriptor. There is a new flag on the texture descriptor
specifying only whether the planes are interleaved or not.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15204>
Add support of GUEST_VRAM type of blob. These are dedicated heap memory
allocations required for vk support on hypervisors that don't support
runtime injections of host memory into guest physical address space.
The flow of usage:
1) Host VM reserves dedicated heap memory
2) Device get info about memory reservations and report it to guest
using mmio registers
3) Guest virtio-gpu driver on starts checks mmio registers for
physical address and length of reserved region. Then it reserves it
in guest.
4) On each call of vkAllocateMemory() guest driver gets chunk of
required memory and send it to host using sg list. It uses one sg
entry for 1 blob call. Heap is managed on guest using drm memory
manager (drm_mm).
Signed-off-by: Oleksandr.Gabrylchuk <Oleksandr.Gabrylchuk@opensynergy.com>
Signed-off-by: Andrii Pauk <Andrii.Pauk@opensynergy.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14536>
v2.
- Build the runner image as part of the CI for the boot2container
project, rather than as a manually step using the build instructions
in valve-trigger.dockerfile.
- Depend on a non-default kernel build hosted in the valve-infra
package repository. This does reduce the current caching feature of
local artifacts, but makes it easier to chop and change kernels on a
per-project or even per-test basis.
v3.
- Depend on a kernel built and stored in the valve-infra generic
package repo.
- Build the runner container using ci-templates as part of the CI in
valve-infra.
- Now that the runner container is built in the valve-infra CI, I
dropped the source import of client.py and message.py. They are
built in the runner container.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14660>
- Add the scripts to the prepared Mesa artifacts for use in later
runner stages.
- Add a template generator (generate_b2c.py) which reads and
validates (very lightly for now) the Gitlab job environment and then
spits out a YAML file describing the necessary test workload to be
sent to a Valve CI gateway.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14660>
nir_var_shader_out writes are only used for later TES invocations, so I
don't think there's any need for the TCS workgroup to wait for them.
fossil-db (Sienna Cichlid):
Totals from 1691 (1.04% of 162293) affected shaders:
Instrs: 710699 -> 709008 (-0.24%)
CodeSize: 3830168 -> 3823404 (-0.18%)
Latency: 3396997 -> 3007934 (-11.45%)
InvThroughput: 1212094 -> 1082823 (-10.67%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15195>
This reverts commit 382f6ccda8.
The spec requires that all color images created with the same tiling
(and a few other properties) support the same memoryTypeBits. So this
wasn't a valid change. It also wasn't necessary - we already have a
mechanism in anv_BindImageMemory2 for disabling compression if the BO
doesn't support it.
With this, XeHP passes the tests in
dEQP-VK.memory.requirements.*tiling_optimal
Fixes: 382f6ccd ("anv: Require the local heap for CCS on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
When an image configured for HIZ_CCS/HIZ_CCS_WT is bound to a BO lacking
implicit CCS, we disable any compression it may have had. Such images
are still compatible with ISL_AUX_USAGE_HIZ however. Fall back to that
aux usage to retain the performance benefit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
When an image is bound to a BO lacking implicit CCS, we disable any
compression it may have had. This is unnecessary in the cases where the
compression type doesn't depend on the BO having implicit CCS support.
Avoid this disabling for ISL_AUX_USAGE_MCS and ISL_AUX_USAGE_HIZ.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
In the bindless case, we don't have to keep any shadow descriptors and
can just reuse the IBO descriptor as a texture descriptor. Now that
we're emitting the swizzle we can just flip this on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
ncoords includes the array index, and the NIR source has the array index
as its last component, so we have to insert the extra y coordinate in
the middle in this case.
Fixes: 0bb0cac ("freedreno/ir3: handle image buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
Since these were reverse-engineered, it's become clear that IBO
descriptors are just a subset of texture descriptors, and bindless reads
of readonly images actually use isam on the IBO descriptor, further
confirming that the two are always compatible, even if not all of the
texture fields exist for IBOs. It's pointless to have a separate type
for IBOs, and just leads to things getting out-of-sync unnecessarily
which has already happened. Just remove it and use TEX_CONST insted.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
CAN_REORDER takes volatile into account, and is closer to what we
actually require to use texture instructions, which is that we can
arbitrarily reorder loads.
Fixes: aa93896 ("freedreno/ir3: adjust condition for when to use ldib")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
From Vulkan spec for VkDrmFormatModifierProperties2EXT:
"drmFormatModifierTilingFeatures is a bitmask of VkFormatFeatureFlagBits
that are supported by any image created with format and drmFormatModifier."
"The returned drmFormatModifierTilingFeatures must contain at least one bit."
"Therefore, if the returned drmFormatModifier is DRM_FORMAT_MOD_LINEAR,
then drmFormatModifierPlaneCount must equal the format planecount, and
drmFormatModifierTilingFeatures must be identical to the
VkFormatProperties2::linearTilingFeatures returned in the same pNext chain."
Relevant tests: dEQP-VK.drm_format_modifiers.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15032>
now when gl_PointSize and gl_PointSizeMESA are both present, the former
will be used for xfb with a new location and the latter will be
exported by the shader
fixes (zink):
GTF-GL46.gtf30.GL3Tests.transform_feedback.transform_feedback_builtins
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15184>
creating this at the start of the shader means it will get optimized out
when the pass is used to overwrite existing psiz values, and creating it
at the end means it will get optimized out in geometry shaders, so instead
just walk the instructions and create another store right after the existing one
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15184>
On mips64, the compiler does not allow use of non-zero argument with
__builtin_frame_address(). However, the returned frame address is only
used when PIPE_ARCH_X86 is defined. The compile error can be avoided
by making #ifdef PIPE_ARCH_X86 cover the getting of frame address too.
The argument checking of __builtin_frame_address() has been present
as a debug assert in clang 8. In clang 10, there is a proper runtime
check for the argument. This is why the build has not failed before.
Fixes: dc94a0506f ("gallium: Do not add -Wframe-address option for gcc <= 4.4.")
from Visa Hankala
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6511>
Make this build on clang architectures that don't have 64-bit atomic
instructions. Clang doesn't allow redeclaration (and therefore
redefinition) of the __sync_* builtins. Use #pragma redefine_extname
to work around that restriction. Clang also turns __sync_add_and_fetch
into __sync_fetch_and_add (and __sync_sub_and_fetch into
__sync_fetch_and_sub) in certain cases, so provide these functions as
well.
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where they're missing")
patch from Mark Kettenis
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6511>
The structure wrapped around the rb tree node was being freed, but not the node
itself, which caused a segmentation fault when accessing its parent node.
Add rb tree node remove call to fix it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15188>
Since this text was written, VirGL has become a shipping, production
quality solution. It's no longer a research project. Let's update the
text to reflect that.
While we're at it, let's drop the project from the page title, as this
is no longer the docs for the entire project.
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13836>
need to iterate over the descriptors in the binding to invalidate the whole
thing here
=================================================================
==546534==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a0000ae6c0 at pc 0x7fe20e26fd9d bp 0x7ffd92be6bc0 sp 0x7ffd92be6bb8
READ of size 8 at 0x61a0000ae6c0 thread T0
#0 0x7fe20e26fd9c in zink_descriptor_set_refs_clear ../src/gallium/drivers/zink/zink_descriptors.c:950
#1 0x7fe20e401304 in zink_destroy_surface ../src/gallium/drivers/zink/zink_surface.c:340
#2 0x7fe20e21311b in zink_surface_reference ../src/gallium/drivers/zink/zink_surface.h:106
#3 0x7fe20e21a5b9 in zink_sampler_view_destroy ../src/gallium/drivers/zink/zink_context.c:835
#4 0x7fe20c41d35f in tc_sampler_view_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:1848
#5 0x7fe20e210ff7 in pipe_sampler_view_reference ../src/gallium/auxiliary/util/u_inlines.h:216
#6 0x7fe20e22d592 in zink_set_sampler_views ../src/gallium/drivers/zink/zink_context.c:1532
#7 0x7fe20c41a3d8 in tc_call_set_sampler_views ../src/gallium/auxiliary/util/u_threaded_context.c:1393
#8 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#9 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#10 0x7fe20c42b728 in tc_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:4250
#11 0x7fe20b65176a in st_destroy_context_priv ../src/mesa/state_tracker/st_context.c:387
#12 0x7fe20b65669f in st_destroy_context ../src/mesa/state_tracker/st_context.c:1009
#13 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
#14 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
#15 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
#16 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
#17 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
#18 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
#19 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384
#20 0x41b88a in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:734
#21 0x41b8e9 in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:735
#22 0x2323aa7 in deqp::gles31::Context::destroyRenderContext() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:77
#23 0x2323969 in deqp::gles31::Context::~Context() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:55
#24 0x232278e in deqp::gles31::TestPackage::deinit() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestPackage.cpp:102
#25 0x2c866c2 in tcu::DefaultHierarchyInflater::leaveTestPackage(tcu::TestPackage*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:75
#26 0x2c87058 in tcu::TestHierarchyIterator::next() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:252
#27 0x2c365da in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:122
#28 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
#29 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
#30 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
#31 0x7fe2263e160b in __libc_start_main_impl (/lib64/libc.so.6+0x2d60b)
#32 0x413fa4 in _start (/home/zmike/src/VK-GL-CTS/build/external/openglcts/modules/glcts+0x413fa4)
0x61a0000ae6c0 is located 64 bytes inside of 1328-byte region [0x61a0000ae680,0x61a0000aebb0)
freed by thread T0 here:
#0 0x7fe226cb6627 in free (/usr/lib64/libasan.so.6+0xae627)
#1 0x7fe20aab1751 in unsafe_free ../src/util/ralloc.c:302
#2 0x7fe20aab16c8 in unsafe_free ../src/util/ralloc.c:295
#3 0x7fe20aab13c3 in ralloc_free ../src/util/ralloc.c:265
#4 0x7fe20e269234 in descriptor_pool_free ../src/gallium/drivers/zink/zink_descriptors.c:286
#5 0x7fe20e26937d in descriptor_pool_delete ../src/gallium/drivers/zink/zink_descriptors.c:296
#6 0x7fe20e26ff53 in zink_descriptor_pool_reference ../src/gallium/drivers/zink/zink_descriptors.c:967
#7 0x7fe20e270db2 in zink_descriptor_program_deinit ../src/gallium/drivers/zink/zink_descriptors.c:1071
#8 0x7fe20e3b6536 in zink_destroy_gfx_program ../src/gallium/drivers/zink/zink_program.c:695
#9 0x7fe20e1eaaf9 in zink_gfx_program_reference ../src/gallium/drivers/zink/zink_program.h:242
#10 0x7fe20e20d386 in zink_shader_free ../src/gallium/drivers/zink/zink_compiler.c:2099
#11 0x7fe20e3b9f0b in zink_delete_shader_state ../src/gallium/drivers/zink/zink_program.c:1074
#12 0x7fe20c3e29ad in util_shader_reference ../src/gallium/auxiliary/util/u_live_shader_cache.c:188
#13 0x7fe20e3ba11e in zink_delete_cached_shader_state ../src/gallium/drivers/zink/zink_program.c:1093
#14 0x7fe20c41709e in tc_call_delete_fs_state ../src/gallium/auxiliary/util/u_threaded_context.c:998
#15 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#16 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#17 0x7fe20c423683 in tc_flush ../src/gallium/auxiliary/util/u_threaded_context.c:3003
#18 0x7fe20b62d996 in st_flush ../src/mesa/state_tracker/st_cb_flush.c:60
#19 0x7fe20b62dbe3 in st_glFlush ../src/mesa/state_tracker/st_cb_flush.c:94
#20 0x7fe20ae4bded in _mesa_make_current ../src/mesa/main/context.c:1493
#21 0x7fe20ae49702 in _mesa_free_context_data ../src/mesa/main/context.c:1187
#22 0x7fe20b65668b in st_destroy_context ../src/mesa/state_tracker/st_context.c:1005
#23 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
#24 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
#25 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
#26 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
#27 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
#28 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
#29 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384
previously allocated by thread T0 here:
#0 0x7fe226cb691f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xae91f)
#1 0x7fe20aab0c81 in ralloc_size ../src/util/ralloc.c:120
#2 0x7fe20aab0e33 in rzalloc_size ../src/util/ralloc.c:153
#3 0x7fe20aab12c8 in rzalloc_array_size ../src/util/ralloc.c:233
#4 0x7fe20e26c76d in allocate_desc_set ../src/gallium/drivers/zink/zink_descriptors.c:657
#5 0x7fe20e26e9cb in zink_descriptor_set_get ../src/gallium/drivers/zink/zink_descriptors.c:840
#6 0x7fe20e2747aa in zink_descriptors_update ../src/gallium/drivers/zink/zink_descriptors.c:1424
#7 0x7fe20e36fc48 in void zink_draw<(zink_multidraw)1, (zink_dynamic_state)2, true, false>(pipe_context*, pipe_draw_info const*, unsigned int, pipe_draw_indirect_info const*, pipe_draw_start_count_bias const*, unsigned int, pipe_vertex_state*, unsigned int) ../src/gallium/drivers/zink/zink_draw.cpp:788
#8 0x7fe20e29166d in zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)2, true> ../src/gallium/drivers/zink/zink_draw.cpp:907
#9 0x7fe20c424982 in tc_call_draw_single ../src/gallium/auxiliary/util/u_threaded_context.c:3155
#10 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#11 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#12 0x7fe20c41f7a9 in tc_texture_map ../src/gallium/auxiliary/util/u_threaded_context.c:2279
#13 0x7fe20b630757 in pipe_texture_map_3d ../src/gallium/auxiliary/util/u_inlines.h:572
#14 0x7fe20b6341f6 in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:546
#15 0x7fe20b42fea7 in read_pixels ../src/mesa/main/readpix.c:1178
#16 0x7fe20b42fea7 in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1195
#17 0x7fe20b42ffc0 in _mesa_ReadPixels ../src/mesa/main/readpix.c:1210
#18 0x2a6d094 in glu::readPixels(glu::RenderContext const&, int, int, tcu::PixelBufferAccess const&) /home/zmike/src/VK-GL-CTS/framework/opengl/gluPixelTransfer.cpp:61
#19 0x29eaa06 in deqp::gls::ShaderExecUtil::FragmentOutExecutor::execute(int, void const* const*, void* const*) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:677
#20 0x25a600b in iterate /home/zmike/src/VK-GL-CTS/modules/gles31/functional/es31fOpaqueTypeIndexingTests.cpp:585
#21 0x2322b53 in deqp::gles31::TestCaseWrapper<deqp::gles31::TestPackage>::iterate(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestCaseWrapper.hpp:86
#22 0x2c376fd in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:302
#23 0x2c366e3 in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:139
#24 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
#25 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
#26 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
this ensures that swapchain images will have been acquired before potentially
accessing swapchain images bound as descriptors
fixes caselist like:
dEQP-GLES31.functional.fbo.color.texcubearray.r8ui
dEQP-GLES31.functional.primitive_bounding_box.blit_fbo.blit_default_to_fbo
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
Correct type for sysctl argument to fix the build.
../src/util/u_cpu_detect.c:631:29: error: incompatible pointer types passing 'int *' to parameter of type 'size_t *' (aka 'unsigned long *') [-Werror,-Wincompatible-pointer-types]
sysctl(mib, 2, &ncpu, &len, NULL, 0);
^~~~
Fixes: 5623c75e40 ("util: Fix setting nr_cpus on some BSD variants")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13448>
move include so va_list will be picked up via stdarg.h
In file included from ../src/util/u_printf.cpp:24:
../src/util/u_printf.h:43:41: error: unknown type name 'va_list'; did you mean '__va_list'?
size_t u_printf_length(const char *fmt, va_list untouched_args);
^~~~~~~
__va_list
/usr/include/machine/_types.h:126:27: note: '__va_list' declared here
typedef __builtin_va_list __va_list;
^
and add includes to u_printf.h as suggested by Ilia Mirkin
stdarg.h for va_list and stddef.h for size_t
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13448>
This required relaxing a core NIR assertion which I don't think is doing
any important validation.
The shader-db effects here are small, but they're important for avoiding a
regression when we start doing per-component DCE in opt_shrink_vectors
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12468)
softpipe shader-db:
total instructions in shared programs: 2859777 -> 2859454 (-0.01%)
instructions in affected programs: 18881 -> 18558 (-1.71%)
total temps in shared programs: 293994 -> 293914 (-0.03%)
temps in affected programs: 418 -> 338 (-19.14%)
i915g:
total instructions in shared programs: 407562 -> 407544 (<.01%)
instructions in affected programs: 570 -> 552 (-3.16%)
r300:
total instructions in shared programs: 1414450 -> 1414459 (<.01%)
instructions in affected programs: 44494 -> 44503 (0.02%)
total vinst in shared programs: 473782 -> 473727 (-0.01%)
vinst in affected programs: 1102 -> 1047 (-4.99%)
total sinst in shared programs: 231224 -> 231216 (<.01%)
sinst in affected programs: 432 -> 424 (-1.85%)
total temps in shared programs: 197605 -> 197607 (<.01%)
temps in affected programs: 103 -> 105 (1.94%)
crocus hsw:
total instructions in shared programs: 8158185 -> 8158134 (<.01%)
instructions in affected programs: 10927 -> 10876 (-0.47%)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15178>
Because we have to maintain two different packages of Mesa, one
specific to RADV and another one for RadeonSI and such, it's a bit
annoying to have to synchronize the drirc entries. Currently, only our
Mesa package installs 00-mesa-defaults.conf which means we have to
backport the drirc RADV changes.
This splits 00-mesa-defaults.conf in two to move the drirc RADV entries
to src/amd/vulkan/00-radv-defaults.conf. Meson will install the file
only if RADV is built.
There is still a caveat for common drirc workarounds like for WSI but
they are rare enough and we could still duplicate them if needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15152>
Task shader outputs work differently than other shaders, so they
need special consideration. Essentially, they have two kinds of
outputs:
1. Number of mesh shader workgroups to launch.
Will be still represented by a shader output.
2. Optional payload of up to (at least) 16K bytes.
These payload variables behave similarly to shared memory, but
the spec doesn't actually define them as shared memory (also, they
may be implemented differently by each backend), so we need to add
a new NIR variable mode for them.
These payload variables can't be represented by shader outputs
because the 16K bytes don't fit the 32x vec4 model that NIR uses
for its output variables.
This patch adds a new NIR variable mode: nir_var_mem_task_payload
and corresponding explicit I/O intrinsics, as well as support for
this new mode in nir_lower_io.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14930>
The problem is that the real workgroup launched on NGG HW
can be larger than the size specified by the API, and the
extra waves need to keep up with barriers in the API waves.
There are 2 different cases:
1. The whole API workgroup fits in a single wave.
We can shrink the barriers to subgroup scope and
don't need to insert any extra ones.
2. The API workgroup occupies multiple waves, but not
all. In this case, we emit code that consumes every
barrier on the extra waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034>
This commit adds support for Vulkan backend on a630_skqp job.
= Needed changes
- Needed to install libvulkan-dev package on system
- Refactored the way the available skqp reports are printed
tested in development builds with skia tools
Piglit expectations had to be updated in various drivers due to !14750 not
having bumped the tags when it tried to uprev.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
The Android CTS 10 version is relative old when compared with skia main
branch, which was being used before. Some modifications in the skqp
build/runner scripts were needed to make it run on CI.
- skqp versions from android-cts have already all assets inside
platform_tools folder.
- along with the assets, are the render and unit files which are
expected to pass in the Android CTS execution.
- removed custom test files from the a630 folder, to make it comply
with the CTS expectations.
- include new patches to remove Python2 dependencies and avoid the
installation of it in rootfs.
- strip binariesthe built binaries `skqp` and `list_gpu_unit_tests`, as
`is_debug = false` gn argument did not work, maybe it is not well
tested in development builds with skia tools
- use Clang instead of GCC. The GCC support is not so graceful as it is
in the skia main branch, some NEON instructions needs to be turned off
in the GCC compilation, causing different tests result. This change
does not imply a bigger rootfs, since the built skqp binary uses GCC
libc++ and other library runtimes. So clang is just a build
dependency.
= Changes in skqp results =
Some errors were found for GL backend and unit tests. GLES and VK tests are green.
All the failed tests were classified as expected to fail in the render and unit tests list.
```
gl_blur2rectsnonninepatch
gl_bug339297_as_clip
gl_bug6083
gl_dashtextcaps
```
```
SRGBReadWritePixels (../../tests/SRGBReadWritePixelsTest.cpp:214 Could not create sRGB surface context. [OpenGL])
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
This extension isn't wired up in Gallium, so there's just no way a
Gallium driver like Panfrost exposes it.
While there were support in i965 for this for the cancelled Broxton GPU,
thre's no such support in the Iris driver. And since Broxton has been
cancelled, it's unlikely to be wired up any time soon.
Fixes: da23a31726 ("docs/features: Update ASTC entries for Panfrost")
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15145>
When we shadow a resource, the backing BO is changed; as such,
existing references to the resource become invalid. So batches accessing the
resource need to be flushed (or otherwise have their references invalidated).
The wrong behaviour change (not flushing) was introduced when we started
tracking resources instead of BOs. The issue manifested as a severe performance
regression in glmark2's -bbuffer test, particular the subdata subtest. The issue
is magnified on slow CPUs; without the fix, the test becomes completely CPU
bound
Relevant glmark2 -bbuffer test from 43fps to 84fps.
Apparently, this causes functional issues too -- this performance-minded change
also fixes a few piglits.
Fixes: cecb889481 ("panfrost: Do tracking of resources, not BOs")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Chris Healy <cphealy@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13502>
The MI_FLUSH_DW post-sync operation uses the same encoding as the
PIPE_CONTROL one so we can use the same helper. Write PS Depth Count
is not supported, of course, as the blitter has no depth pipeline.
This means that we can write the timestamp register from the blitter.
Fixes: 604d97671b ("iris: Add support for flushing the blitter (hackily)")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15157>
To make sure it is applied in the cases we expect it to be, to avoid code
generation regressions. Functional regressions are expected to be caught by
integration-testing, so that is not focused on here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
If a message-passing instruction like LD_VAR is preloaded, it will no longer be
counted in the shader cycle counts. Add a special message preload counter that
approximates the cost of preloading, so this information doesn't get a lost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
The compiler will soon produce preloaded messages, but it should not pack them
itself, as this would require depending on GenXML or handcoding bitfields / bit
packs in the compiler. Instead, add a struct encoding the unpacked form of the
message, used as ABI between the compiler and the common driver.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
On V3D the quality of the code we generate is significantly affected by
how we decide to assign accumulators during register allocation, which
is determined by liveness, favoring short-lived temps.
There are many shaders that end up doing a whole lot of uniform loads
first, and using them later, which is very inconvenient for our register
allocation process because this increases uniform liveness and causes
us to use accumulators less efficientely, leading to significant churn.
To fix this, we move uniforms right before their first use in the same
block, but we need to do this after NIR scheduling, which means we are
doing it in non-SSA form, since the scheduler has a tendency to undo
this optimization and it is not easy to modify it to avoid it, since it
works in more abstract terms, using instruction dependencies, estimated
register pressure and instruction delay information to do its work,
which are very different concepts.
total instructions in shared programs: 13316738 -> 13033613 (-2.13%)
instructions in affected programs: 10389172 -> 10106047 (-2.73%)
helped: 55442
HURT: 16144
total threads in shared programs: 413722 -> 415048 (0.32%)
threads in affected programs: 1428 -> 2754 (92.86%)
helped: 680
HURT: 17
total loops in shared programs: 1716 -> 1690 (-1.52%)
loops in affected programs: 26 -> 0
helped: 26
HURT: 0
total uniforms in shared programs: 3704313 -> 3705181 (0.02%)
uniforms in affected programs: 687730 -> 688598 (0.13%)
helped: 2920
HURT: 7384
total max-temps in shared programs: 2364785 -> 2175190 (-8.02%)
max-temps in affected programs: 1215387 -> 1025792 (-15.60%)
helped: 49667
HURT: 1556
total spills in shared programs: 4241 -> 4248 (0.17%)
spills in affected programs: 642 -> 649 (1.09%)
helped: 11
HURT: 19
total fills in shared programs: 6115 -> 6125 (0.16%)
fills in affected programs: 1276 -> 1286 (0.78%)
helped: 11
HURT: 21
total sfu-stalls in shared programs: 34381 -> 36578 (6.39%)
sfu-stalls in affected programs: 16055 -> 18252 (13.68%)
helped: 3647
HURT: 5206
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15056>
On Bifrost and Valhall, push uniforms are loaded into Fast Access Uniform
Random Access Memory (FAU-RAM). FAU-RAM is organized as an array of 64-bit
slots. A given tuple (Bifrost) or instruction (Valhall) may access at most a
single 64-bit slot. If an instruction requires uniforms from multiple 64-bit
slots, a uniform-to-register move must be inserted to avoid the hazard. However,
if an instruction requires a pair of 32-bit uniforms from the same 64-bit slot,
no move is required.
To reduce the number of moves we emit, this commit adds an optimization pass
that reorders pushed uniforms, trying to group uniforms used by the same
instruction. The pass works by creating a graph of pushed uniforms, where edges
denote the "both 32-bit uniforms required by the same instruction" relationship.
We perform depth-first search on this graph to find the connected components,
where each connected component is a cluster of uniforms that are used together.
We then select pairs of uniforms from each connected component. The remaining
unpaired uniforms (from components of odd sizes) are paired together
arbitrarily.
In principle, we should weight the graph by number of occurences and choose
pairs that maximize the total selected edge weight. This is left for
future work, as it is nontrivial -- selecting these edges optimally appears to
be NP-hard at first blush.
Implementation note: As position and varying shaders share FAU on Bifrost, extra
care is taken with a `push_offset` shader stage info parameter that ensures
varying shaders do not reorder uniforms selected by the previous position
shader.
total instructions in shared programs: 2503343 -> 2451758 (-2.06%)
instructions in affected programs: 1553309 -> 1501724 (-3.32%)
helped: 14256
HURT: 8
helped stats (abs) min: 1.0 max: 80.0 x̄: 3.62 x̃: 3
helped stats (rel) min: 0.06% max: 36.36% x̄: 7.31% x̃: 6.67%
HURT stats (abs) min: 1.0 max: 2.0 x̄: 1.38 x̃: 1
HURT stats (rel) min: 1.30% max: 12.50% x̄: 4.99% x̃: 3.85%
95% mean confidence interval for instructions value: -3.66 -3.58
95% mean confidence interval for instructions %-change: -7.41% -7.20%
Instructions are helped.
total tuples in shared programs: 2008399 -> 1969627 (-1.93%)
tuples in affected programs: 1146344 -> 1107572 (-3.38%)
helped: 12867
HURT: 147
helped stats (abs) min: 1.0 max: 61.0 x̄: 3.03 x̃: 2
helped stats (rel) min: 0.17% max: 42.86% x̄: 6.79% x̃: 4.65%
HURT stats (abs) min: 1.0 max: 3.0 x̄: 1.20 x̃: 1
HURT stats (rel) min: 0.29% max: 20.00% x̄: 2.12% x̃: 1.19%
95% mean confidence interval for tuples value: -3.03 -2.93
95% mean confidence interval for tuples %-change: -6.82% -6.57%
Tuples are helped.
total clauses in shared programs: 408005 -> 401708 (-1.54%)
clauses in affected programs: 90760 -> 84463 (-6.94%)
helped: 6006
HURT: 164
helped stats (abs) min: 1.0 max: 9.0 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.45% max: 33.33% x̄: 12.44% x̃: 14.29%
HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.64% max: 25.00% x̄: 9.81% x̃: 5.26%
95% mean confidence interval for clauses value: -1.03 -1.01
95% mean confidence interval for clauses %-change: -12.03% -11.66%
Clauses are helped.
total cycles in shared programs: 203308.37 -> 202737.83 (-0.28%)
cycles in affected programs: 19264.71 -> 18694.17 (-2.96%)
helped: 3024
HURT: 41
helped stats (abs) min: 0.041665999999999315 max: 2.5416680000000014 x̄: 0.19 x̃: 0
helped stats (rel) min: 0.17% max: 33.33% x̄: 3.83% x̃: 2.83%
HURT stats (abs) min: 0.041665999999999315 max: 0.125 x̄: 0.06 x̃: 0
HURT stats (rel) min: 0.30% max: 5.88% x̄: 1.41% x̃: 0.93%
95% mean confidence interval for cycles value: -0.19 -0.18
95% mean confidence interval for cycles %-change: -3.89% -3.64%
Cycles are helped.
total arith in shared programs: 76265.67 -> 74669.25 (-2.09%)
arith in affected programs: 45001.50 -> 43405.08 (-3.55%)
helped: 12945
HURT: 97
helped stats (abs) min: 0.041665999999999315 max: 2.5416680000000014 x̄: 0.12 x̃: 0
helped stats (rel) min: 0.17% max: 50.00% x̄: 8.06% x̃: 4.88%
HURT stats (abs) min: 0.041665999999999315 max: 0.125 x̄: 0.05 x̃: 0
HURT stats (rel) min: 0.21% max: 33.33% x̄: 2.16% x̃: 0.96%
95% mean confidence interval for arith value: -0.12 -0.12
95% mean confidence interval for arith %-change: -8.16% -7.81%
Arith are helped.
total quadwords in shared programs: 1796563 -> 1766803 (-1.66%)
quadwords in affected programs: 948830 -> 919070 (-3.14%)
helped: 12078
HURT: 219
helped stats (abs) min: 1.0 max: 42.0 x̄: 2.49 x̃: 2
helped stats (rel) min: 0.10% max: 33.33% x̄: 5.57% x̃: 5.26%
HURT stats (abs) min: 1.0 max: 4.0 x̄: 1.21 x̃: 1
HURT stats (rel) min: 0.33% max: 6.67% x̄: 2.00% x̃: 1.14%
95% mean confidence interval for quadwords value: -2.46 -2.38
95% mean confidence interval for quadwords %-change: -5.52% -5.36%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14163>
Gives us memory back faster which is useful for pathalogical CTS
tests.
The GLSL IR was previously used after converting to NIR for things
like building the GL resource list but we have had a NIR version
for this for some time and I don't believe there are any other
use cases left for keeping the old IR hanging around this long.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15127>
These runners are configured to have a single job take up the whole
runner, which means we get to use threads to our hearts content. The pile
of cores means we don't need to spawn separate jobs to try to load-balance
across fdo's shared runner capacity. Having dedicated runners means we
won't get our MRs blocked as much waiting on non-Mesa testing happening on
fd.o.
We manage to complete all of this llvmpipe testing in about 6:15.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14962>
When handle->type is WINSYS_HANDLE_TYPE_FD, the caller wants a file descriptor
for the BO backing the resource. We previously had two paths for this:
1. If rsrc->scanout is available, we prime the GEM handle from the KMS device
(rsrc->scanout->handle) to a file descriptor via the KMS device.
2. If rsrc->scanout is not available, we prime the GEM handle from the GPU
(bo->gem_handle) to a file descriptor via the GPU device.
In both cases, the caller passes in a resource (with BO) and expects out a file
descriptor. There are no direct GEM handles in the function signature; the
caller doesn't care which GEM handle we prime to get the file descriptor. In
principle, both paths produce the same file descriptor for the same BO, since
both GEM handles represent the same underlying resource (viewed from different
devices).
On grounds of redundancy alone, it makes sense to remove the rsrc->scanout path.
Why have a path that only works sometimes, when we have another path that works
always?
In fact, the issues with the rsrc->scanout path are deeper. rsrc->scanout is
populated by renderonly_create_gpu_import_for_resource, which does the
following:
1. Get a file descriptor for the resource by resource_get_handle with
WINSYS_HANDLE_TYPE_FD
2. Prime the file descriptor to a GEM handle via the KMS device.
Here comes strike number 2: in order to get a file descriptor via the KMS
device, we had to /already/ get a file descriptor via the GPU device. If we go
down the KMS device path, we effectively round trip:
GPU handle -> fd -> KMS handle -> fd
There is no good reason to do this; if everything works, the fd is the same in
each case. If everything works. If.
The lifetimes of the GPU handle and the KMS handle are not necessarily bound. In
principle, a resource can be created with scanout (constructing a KMS handle).
Then the KMS view can be destroyed (invalidating the GEM handle for the KMS
device), even though the underlying resource is still valid. Notice the GPU
handle is still valid; its lifetime is tied to the resource itself. Then a
caller can ask for the FD for the resource; as the resource is still valid, this
is sensible. Under the scanout path, we try to get the FD by priming the GEM
handle on the KMS device... but that GEM handle is no longer valid, causing the
PRIME ioctl to fail with ENOENT. On the other hand, if we primed the GPU GEM
handle, everything works as expected.
These edge cases are not theoretical; recent versions of Xwayland trigger this
ENOENT, causing issue #5758 on all Panfrost devices. As far as I can tell, no
other kmsro driver has this 'special' kmsro path; the only part of
resource_get_handle that needs special handling for kmsro is getting a KMS
handle.
Let's remove the broken, useless path, fix Xwayland, bring us in line with other
drivers, and delete some code.
Thank you for coming to my ted talk.
Closes: #5758
Fixes: 7da251fc72 ("panfrost: Check in sources for command stream")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-and-tested-by: Jan Palus <jpalus@fastmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Tested-by: Dan Johansen <strit@manjaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15120>
We always blit from a particular level, so it's a waste to compute the LOD.
This corresponds to a simple texture instruction with implement 0 LOD, which is
the optimal texturing path on Bifrost -- it maps to TEXS_2D but does not require
helper invocations.
Functional change on Bifrost: Blit shaders no longer set .computed_lod or
shader_contains_barrier.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
There are always set to true. Don't pollute the driver code with them, make
their existence a local detail to pre-Valhall XML and that's it.
Functional change: "four components per vertex" is now set on vertex job DCDs.
This should be a no-op.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
This reduces code duplication and will ease Valhall porting. Functional changes
on v7:
* Shader contains barrier is now set (perf loss, fixed later in series)
* Shader register allocation is now set (perf win)
* Point sprite inverted, no-op for blit shaders
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
Trivially implemented by using A6XX_GRAS_SC_CNTL_SINGLE_PRIM_MODE.
This extension is useful for emulators e.g. AetherSX2 PS2 emulator and
could drastically improve performance when blending is emulated.
Relevant tests:
dEQP-VK.rasterization.rasterization_order_attachment_access.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15106>
Otherwise a shader invocation would read the value which should have
been set AFTER this shader invocation.
Fixes tests:
dEQP-VK.rasterization.rasterization_order_attachment_access.depth.samples_1.multi_draw_barriers
dEQP-VK.rasterization.rasterization_order_attachment_access.stencil.samples_1.multi_draw_barriers
Fixes: 71595a189a
("tu: Fix feedback loops in sysmem mode")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15106>
XeHP scratch space is handled differently. Commit ae18e1e707
implemented support for it, but handled it differently between render
and compute shaders: it calculates scratch_addr differently and
doesn't pin the buffer on compute. Make it work on compute shaders by
calling pin_scratch_space() from iris_compute_walker(), which fixes
both the address and the pinning.
This commit can be verified by the two-year-old-but-still-unreviewed
Piglit MR 234. You can also verify this by running a very simple
compute shader with INTEL_DEBUG=spill_fs.
References: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/234
Fixes: ae18e1e707 ("iris: Add support for scratch on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15070>
This raises our vertex input bindings limit from 28 to 31, and our
vertex input attribute limit from 28 to 29. We could theoretically
go higher, but it will take additional work.
The 3DSTATE_VERTEX_BUFFERS and 3DSTATE_VERTEX_ELEMENTS limits are 33
vertex buffers, and 34 vertex elements. But we need up to two vertex
elements for system values (FirstVertex, BaseVertex, BaseInstance,
DrawID), and we currently use two vertex bindings for those.
There is another hidden limit: our compiler backend only supports the
push model for VS inputs currently. 3DSTATE_VS only allows URB Read
Lengths between [0, 15], which is measured in pairs of inputs, which
means we can theoretically push no more than 32 vertex elements. This
is no artifical limit either, as a vec4 element takes up 4 registers
in the payload, and 32 * 4 = 128, the entire size of our register file.
Plus, the VS Thread payload needs at least g0 and g1 for other things,
so we can really only push 31.
We can theoretically support one additional binding, by combining our
two SGV bindings into a single upload. In order to support additional
vertex elements, we would need to add support to the backend compiler
for the pull model for VS inputs.
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5917
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14991>
I got confused while writing this somehow because of the null descriptor
feature, which enables drivers to consume a null descriptor, which has no
relation to a descriptor layout containing no descriptors
failing to accurately use zero descriptors can put layouts over the maximum
per-stage limits, which causes tests to crash
fixes (lavapipe):
KHR-GL46.shading_language_420pack.binding_uniform_block_array
KHR-GL46.multi_bind.dispatch_bind_buffers_base
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15067>
I chose to implement this as a global flag in the device, because
otherwise we would end up with extra draw overhead trying to avoid it in
the implicit-sync WSI case, and you're probably going to end up needing
implicit sync anyway because you used one of the BOs in any of the
submitted cmdbufs. To do better than this, we would probably want a
skip-implicit-sync flag on the BOs in the BO list, rather than global on
the submit.
Reports about venus on turnip say that this flag reduces worst-case
QueueSubmit time in a game workload from ~10ms to ~4ms.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14838>
Bifrost post-RA dead code elimination can cull the destinations of
regular ALU instructions, by weakening from a register write to a
temporary write. However, there is no way to suppress staging writes, so
culling the destinations will result in invalid code generation.
Fixes a regression in
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_static_vertex
with scoreboarding. The root cause there is the backend dead code
elimination not being sufficiently aggressive in the presence of control
flow. Usually this does not matter, since the backend optimizations are
intended to be local with global optimizations happening in NIR.
Unfortunately, our implementation of IDVS hits this hard. That will need
to be optimized (probably by specializing IDVS shaders in NIR instead of
the backend). In the mean time, let's fix the actual bug affecting
scoreboarding.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14298>
"raw" (IDXEN=0) and "structured" (IDXEN=1) do bounds checking differently.
From `si_make_buffer_descriptor`:
* - For VMEM and inst.IDXEN == 0 or STRIDE == 0, it's in byte units.
* - For VMEM and inst.IDXEN == 1 and STRIDE != 0, it's in units of STRIDE.
so there is a difference between setting vindex = i32_0 and vindex = NULL.
Instead of having the `structured` flag, we can just check if vindex is NULL.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
"raw" (IDXEN=0) and "structured" (IDXEN=1) do bounds checking differently.
From `si_make_buffer_descriptor`:
* - For VMEM and inst.IDXEN == 0 or STRIDE == 0, it's in byte units.
* - For VMEM and inst.IDXEN == 1 and STRIDE != 0, it's in units of STRIDE.
so there is a difference between setting vindex = i32_0 and vindex = NULL.
Instead of having the `structured` flag, we can just check if vindex is NULL.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
If we have a postponed spill, the temp we create at ip is no longer
the spilled temp and therefore is affected by the thrsw injection.
Fixes corruption in the additive blending animation demo from
Three.js.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15112>
Not all Vulkan implementations allows rendering to linear images, so in
order to support scanning out from these on Windows we might have to copy
through a buffer like we do in the PRIME path.
To avoid reimplementing the same, let's instead generalize the code a
bit so it doesn't have to specfy any PRIME-specific details.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12210>
Application may exit without freeing created display list, this
may leave the cache not empty.
This is triggered by Abaqus which just close X11 display without
calling any of GLX cleanup functions like glXDestroyContext. But
GLX hook to X11 display close function to free GLX screen resource.
So display list as a context resource has not been freed, but
cache as a screen resource is freed.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14926>
DRI3 window back buffer is a client resource, so it's destroyed
when context switch drawable for native window.
But some application like Abaqus may leave a dirty back buffer
and reuse it when switch back. So add a driconf option for these
kind of app to keep the entire GLX drawable for native window.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14926>
When we spill we add new temps. We should be careful not to access
liveness for these until we have re-computed it after all spills and
fill for that the spilled temp have been processed so as to avoid
out-of-bounds accesses to the c->temp_start and c->temp_end arrays.
This fixes a crash in a Three.js demo when we try to patch register
classes after a TMU spill that was caused because we would incorrectly
try to patch the same temps we had just added for the spill itself,
which is not only unnecessary but also incorrect since we these temps
would not have liveness information available yet and thus would
cause out of bounds accesses.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15107>
We need to take ownership of shader keys before we can insert them into
the hash table. Caught by ASan.
==6343==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016bc51410 at pc 0x00010498d6cc bp 0x00016bc50240 sp 0x00016bc4f9d0
READ of size 592 at 0x00016bc51410 thread T0
#0 0x10498d6c8 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)+0x208 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x196c8)
#1 0x10498da08 in wrap_memcmp+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x19a08)
#2 0x10b7f3f18 in asahi_shader_key_equal agx_state.c:867
#3 0x10a482e7c in hash_table_search hash_table.c:325
#4 0x10b7f4e94 in agx_update_shader agx_state.c:899
#5 0x10b7f0dc4 in agx_draw_vbo agx_state.c:1590
#6 0x10a7c28c4 in u_vbuf_draw_vbo u_vbuf.c:1498
#7 0x10a5db03c in cso_multi_draw cso_context.c:1639
#8 0x10aed03d0 in _mesa_validated_drawrangeelements draw.c:1812
#9 0x10aed08d4 in _mesa_DrawElements draw.c:1945
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15109>
This removes the block at the end of fragment processing and allow
multiple scenes to be queued to the rasterizer up to MAX_SCENES.
Running heaven with this shows up to 39 scenes been allocated in the
first few renders, but it also removes all stalls in rendering until
the present stalls for completion. It reduces frame rendering times
(not enough to make it > 1 fps) but looks to be about 10% increase.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
When llvmpipe is allowed execute fragment shaders overlapping
with other stuff, we have to start using the pipe fences.
With presentation, the acquire path needs to signal a semaphore
that can be waited on by the user, so add support for passing
signal/wait semaphores for non-timeline in, and just put a
fence pointer in the semaphore for that case.
This fixes rendering once we allow overlapping rasterization.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
Currently neither llvmpipe or softpipe ever leave any drawing in
the pipeline, but I'd like to change that for llvmpipe.
This makes drisw block for completed rendering before sending
data to the X server.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
If the luminance formats aren't renderable, they back out to R*
formats, but those will end up with a 1 in alpha rather than 0 when
textured. So instead make them explicitly renderable, which will cause
the correct texture format swizzle to be applied.
Fixes query-rgba-signed-components and probably others.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15097>
Until now we have lived without a refcount mechanism in the driver
because in Vulkan the user is responsible for handling the life
span of memory allocations for all Vulkan objects, however,
imported BOs are tricky because the kernel doesn't refcount
so user-space needs to make sure that:
1. When importing a BO into the same device used to create it
(self-importing) it does not double free the same BO.
2. Frees imported BOs that were not allocated through the same
device.
Our initial implementation always freed BOs when requested,
so we handled 2) correctly but not 1) on drm and we would
double-free self-imported BOs because kernel doesn't return
a unique gem_handle on each import.
Beside this the submit ioctl checks for duplicates in the
BO list and returns an error if there is one.
This fixes the problem for good by adding refcounts to BOs
so that self-imported BOs have a refcnt > 1 and are only freed
when all references are freed.
KGSL on the other hand does not have the same problems,
at least not with ION buffers which are used for exportable
BOs on pre 5.10 android kernels.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5936
Fixes CTS tests: dEQP-VK.drm_format_modifiers.export_import.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15031>
It is good to know that we run out of ring space and have to wait. This
happens easily with fossilize-replay because encoding a
vkCreateGraphicsPipeline takes microseconds while executing it can take
milliseconds, >100ms sometimes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14966>
This reverts commit 29d319c767.
Now that we use nir_lower_bool_to_bitsize, we don't see 1-bit booleans
anymore, so the issue this fixed doesn't apply. Actually, that issue was
(in part) why I started looking into boolean handling in the first
place.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
This lets us avoid generating SWZ instructions. Those instructions could
be constant folded but that complicates the replication analysis
introduced in the next commit.
Almost no shader-db changes.
quadwords HURT: shaders/glmark/1-22.shader_test MESA_SHADER_FRAGMENT: 718 -> 722 (0.56%)
total quadwords in shared programs: 68169 -> 68173 (<.01%)
quadwords in affected programs: 718 -> 722 (0.56%)
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
We've got lots of VK coverage on 618, so take some of the load off (but
leave a little bit of testing just to make sure we don't totally break
630). This should help with our Marge times since we've added some other
coverage to 630 that's started overloading the runners.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15085>
The old calculation for mat3 was clever, but it turns out that a
straightforward application of subdeterminants similar to how mat4 is
handled is more efficient: on a scalar architecture with some sort of
combined multiply+add instruction with a negate modifier (both fairly
common), the new determinant is 9 instructions vs. 15 for the old one,
and without the multiply-add it's 14 instructions vs. 18 for the old
one. When used as a routine for inverse() the savings are compounded,
because we now use the same method as used to compute the adjucate
matrix and so CSE can combine most of the calculations with the adjucate
matrix ones.
Once mat3 and mat4 use the same method for computing determinants, we
can combine them into a single recursive function. I also pulled up the
mat_subdet() function because it was doing basically what we need, so
it's now shared between determinant and inverse. This shrinks the
implementation significantly, as can be seen from the diffstat.
The real reason I want to change this, though, is that it fixes
dEQP-VK.glsl.builtin.precision_fp16_storage16b.inverse.compute.mat3 with
turnip. Qualcomm uses round-to-zero for 16-bit frcp, which combined with
some inaccuracy in the old method of calculating the determinant led us
to fail. Qualcomm's driver uses something like the new method to
calculate the determinant in the inverse. We could argue that Mesa's
method should be allowed, because round-to-zero for floating-point
division is within spec and there are no precision guarantees given for
determinant() or inverse(). However we might as well use the more
efficient method.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14652>
It won't work if the blob is fixed-size and we overrun the size, which
will be the case with the Vulkan pipeline cache.
This gets a bit tricky for the repeated-header optimization, because we
can't read the header from the blob. Instead we have to store the header
itself.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15028>
Label IDVS variants as being MESA_SHADER_{POSITION, VARYING} stages;
reserve the MESA_SHADER_VERTEX label for non-IDVS shaders. This reduces
confusion where a single shader compiles to two MESA_SHADER_VERTEX
shaders with different stats.
While we're at it, de-vendor the blend shader stage name; these stats
are internal anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15086>
C++ requires explicit casts from integers to enums. Fixes errors like
the following when trying to use Asahi GenXML from a GTest unit test.
src/asahi/lib/agx_pack.h:554:23: error: assigning to 'enum agx_channels' from incompatible type 'uint64_t' (aka 'unsigned long long')
values->channels = __gen_unpack_uint(cl, 0, 6);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
The texture descriptor we construct for reloading needs to respect the
surface's texture/layer selection. Fix exactly the same bug as
b8c31ac06d ("lima: fix glCopyTexSubImage2D").
Fixes:
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgba
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
We can only use linear for 2D images, not even 2D arrays. Even for 2D
images, we only want to use linear if:
* We are required to use linear due to window system requirements.
* The texture is streaming.
Otherwise, we want to use tiled textures. (Or better, compressed, but we
don't support that yet.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
nir_alu_instr_is_comparison needs to consider all comparison opcodes regardless
of size. Otherwise, they will be missed by nir_opt_move/sink.
Without this change, lowering booleans to integers regresses register
pressure (and spills/fills) significantly in certain shaders on Panfrost,
like android/com.miHoYo.GenshinImpact/1420.shader_test.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15073>
The Vulkan 1.3 spec says:
"The implementation-dependent limit bufferImageGranularity specifies
a page-like granularity at which linear and non-linear resources
must be placed in adjacent memory locations to avoid aliasing. Two
resources which do not satisfy this granularity requirement are said
to alias. bufferImageGranularity is specified in bytes, and must be
a power of two. Implementations which do not impose a granularity
restriction may report a bufferImageGranularity value of one.
Note: Despite its name, bufferImageGranularity is really a
granularity between "linear" and "non-linear" resources."
We set this limit to 64 bytes (a cacheline) at the dawn of time, without
any real rationale attached. There shouldn't be any restrictions here.
Our tile sizes are typically 4K, and tiled resource addresses are
aligned to the tile size, and the extent is also a multiple of the tile
sized. So if a linear resource occurs before a tiled one, there will
naturally be some space due to the alignment of the tiled resource's
starting address. If a linear resource occurs after a tiled one, the
tiled resource's ending address is already 4K aligned, which is already
guaranteeing that they won't share a cacheline.
So I think it should be fine to reduce this to 1. The other Vulkan
driver for our hardware seems to advertise 1 here as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15066>
Instead, we only recompute liveness and we add new nodes and
interferences to the graph manually (we also need to patch
register classes in some cases).
To assist in this process, we also add an ip counter to our
instructions that we also recompute after each spill, which we use
to identify registers that cross thrsw boundries introduced with
TMU spills and fills and adjust their register classes accordingly
(removing their capacity to use accumulators).
This significantly reduces the CPU cost of spills. Using
shaders/closed/gputest/piano/7.shader_test as reference:
Compile time up to the first successful compile strategy in main is
~24s and with this change it is ~11s. With this speed up, we can now
try all 2-thread compile strategies (including the fallback scheduler)
in only ~15s.
A full shader-db run results in:
Total CPU time (seconds): 9904.67 -> 9087.98 (-8.25%)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
We may be pipelining TMU writes and reads, in which case we can
see both TMUWT and LDTMU at the end of a TMU sequence, so we should
not assume that a TMUWT always terminates a sequence.
Also, we had a bug where we were using inst instead of scan_inst
to check if we find another TMUWT after the curent instruction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Instead of whether they are allowed to spill or not. This is more flexible.
Also, while we are not currently enabling spilling on any 4-thread strategies,
should we do that in the future, always prefer a 4-thread compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Until now we would only allow spilling as a last resort in the
last 2 strategies, however, it is possible that in some cases
earlier strategies may produce less spills if we allowed spilling
on them.
Likewise, the fallback scheduler can sometimes produce less spills
than 2 threads with optimizations disabled.
With this change, we start allowing all our 2-thread strategies to
spill, and instead of choosing the first strategy that is successful,
we choose the one that doesn't spill or the one with the least amount
of spilling.
It should be noted that this may incur in a significant increase
of compile times. We will address this in a follow-up patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Commit 38800b38 changed nir_opcodes.py, but that doesn't seem to have
triggered nir_opt_algebraic.py. The change in 75ef5991 depends on
opt_algebraic lowering 16-bit versions of slt, but if opt_algebraic is
not rebuilt, this may not happen. This resulted in some people seeing
assertion failures in, for example,
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_3.step,
due to the backend seeing nir_op_slt that it didn't know how to handle.
v2: Add nir_opcodes.py to nir_algebraic_py so that all the per-driver
algebraic passes pick up the dependency too. Rename it to
nir_algebraic_depends. Suggested by Emma.
Closes: #6047
Fixes: d1992255bb ("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15050>
memcpy is divided into chunks that are vec4 sized max. The problem
here happens with a structure of 24 bytes :
struct {
float3 a;
float3 b;
}
If you memcpy that struct, the lowering will emit 2 load/store, one of
sized 8, next one sized 16. But both end up located at offset 0, so we
effectively drop 2 floats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a3177cca99 ("nir: Add a lowering pass to lower memcpy")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15049>
The mechanism currently used to pass data from the dEQP child process
executed in a crosvm guest environment towards the deqp-runner wrapper
script that starts the crosvm instance is based on creating, writing
and reading regular files.
In addition to the main drawback of using the storage, this approach
is potentially unreliable because the data cannot be transferred in
real-time and there is no control on ending the transmission. It also
requires a forced sleep for syncing the content, while the minimum
amount of time necessary to wait cannot be easily and safely
determined.
Replace this with an IPC based on the virtio transport for virtual
sockets (virtio-vsock).
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14995>
The img->deferred_info will out-live vn_CreateImage, so we need a deep
copy of the VkImageFormatListCreateInfo struct.
This change also avoids tracking VkImageFormatListCreateInfo struct with
a zero viewFormatCount.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15017>
lava_job_submitter.py unit tests are written in pytest and uses
freezegun in order to simulate timeouts in some tests scenarios. So,
this commit adds the packages `python3-pytest` and `python3-freezegun`
to fulfill this dependencies.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
When the lava_job_submitter.py retry loop finishes normally (without
falling through break-loop) it means that the submitter has exceeded the
retry count limit. However, when it happens the script
finishes normally. This patch adds a treatment to this case, warning the
user what happened and forcing the job to fail.
Moreover, this commit will make retry configurations configurable by
CI job, as it can take the default value from the following variables:
- LAVA_DEVICE_HANGING_TIMEOUT_SEC
- LAVA_WAIT_FOR_DEVICE_POLLING_TIME_SEC
- LAVA_LOG_POLLING_TIME_SEC
- LAVA_NUMBER_OF_RETRIES_TIMEOUT_DETECTION
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
If a secondary command buffer is used and the client provides a
framebuffer and that framebuffer has a stencil-only attchment, we would
try to get the aux usage for the depth component of that attachment and
crash. Check the aspects of the image before looking at aux usage.
This fixes at least the following SkQP tests on my Tigerlake:
- vk_circular-clips
- vk_filterfastbounds
- vk_innershapes_bw
- vk_lineclosepath
- vk_multipicturedraw_rrectclip_simple
- vk_pathinvfill
- vk_quadclosepath
- vk_rrect_clip_bw
- vk_windowrectangles
Fixes: 0d8b9c529c ("anv: Allow PMA optimization to be enabled in secondary command buffers")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15048>
Valhall has a new twist on Mali's task splitting voodoo, plus compute offset
support.
On Bifrost + Vulkan, compute offsets needed lowering on Bifrost (gl_GlobalID).
Valhall saves a few instructions here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15047>
I was doing some RE on freedreno and we had some questions about when the
hardware might need non-uniform or non-constant array access for various
descriptor types, so let's leave some notes for the next person.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13621>
shader-db results (note that we expect many more loops unrolled, so instr
count is probably understating the win):
nv30:
total instructions in shared programs: 16535069 -> 14299105 (-13.52%)
instructions in affected programs: 16377286 -> 14141322 (-13.65%)
total gpr in shared programs: 81255 -> 67268 (-17.21%)
gpr in affected programs: 56714 -> 42727 (-24.66%)
LOST: 0
GAINED: 824
nv40:
total instructions in shared programs: 20907673 -> 18428749 (-11.86%)
instructions in affected programs: 20755510 -> 18276586 (-11.94%)
total gpr in shared programs: 104200 -> 82831 (-20.51%)
gpr in affected programs: 80278 -> 58909 (-26.62%)
14130
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14130>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on resources can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: 7688b8ae98 ("st/mesa: eliminate all atomic ops when setting vertex buffers")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on sampler views can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: ef5d427413 ("st/mesa: add a mechanism to bypass atomics when binding sampler views")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
I had missed a int -> enum conversion in one recently added function and
it's probably nice to also dump the target value also in
trace_dump_resource_template() so let's do just that.
Signed-off-by: Matti Hamalainen <ccr@tnsp.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14980>
Once we simplified a phi node, we never updated the definition it points
to, which meant that it could become out of date if that definition were
also simplified, and we didn't check that when rewriting sources. That
could happen when there are multiple nested loops with phi nodes at the
header.
Fix it by updating the phi's pointer. Since we always update sources
after visiting the definition it points to, when we go to rewrite a
source, if that source points to a simplified phi, the phi's pointer
can't be pointing to a simplified phi because we already visited the phi
earlier in the pass and updated it, or else it's been simplified in the
meantime and this isn't the last pass. This way we don't need to
keep recursing when rewriting sources.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15035>
This fixes artifacts seen in games when using ASTC transcoding,
we need to use DXT5 for proper alpha channel support.
Number of components is a block specific property, there is no easy
way to see if we will require >1bit alpha support or not, so simply
use DXT5 to have support in place.
Fixes: 91cbe8d855 ("gallium: Add a transcode_astc driconf option")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15029>
This introduces inotify support in RADV to handle changes from the
RADV_FORCE_VRS_CONFIG_FILE. This is similar to MangoHUD. I'm personally
not sure it's the best solution but let's try this way and change it
later if we have issues (or if we have a lightweight solution).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
When I originally added vk_image_view, I was overly clever when it came
to the format field. I decided to make it only contain the bits of the
format contained in the selected aspects. However, this is confusing
(not generally a good thing) and it's also not always what you want.
The Vulkan 1.3.204 spec says:
"When using an image view of a depth/stencil image to populate a
descriptor set (e.g. for sampling in the shader, or for use as an
input attachment), the aspectMask must only include one bit, which
selects whether the image view is used for depth reads (i.e. using a
floating-point sampler or input attachment in the shader) or stencil
reads (i.e. using an unsigned integer sampler or input attachment in
the shader). When an image view of a depth/stencil image is used as
a depth/stencil framebuffer attachment, the aspectMask is ignored
and both depth and stencil image subresources are used."
So, while the restricted format makes sense for texturing, it doesn't
for when the image is being used as an attachment. What we probably
actually want is both versions of the format. We'll call the one given
by the VkImageViewCreateInfo vk_image_view::format and the restricted
one vk_image_view::view_format.
This is just the first commit which switches format to view_format so
the compiler will make sure we get them all. The next commit will
re-add vk_image_view::format but this time unmodified.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15007>
I suspect this double-check and comment was due to originally using ir.nir
as the condition, which might be uninitialized if !IR_NIR. You could only
take the branch if IR_NIR was set, and you should always not take if it
!IR_NIR, so it worked out in the end, but it would cause spurious valgrind
warnings if you hadn't zeroed out your TGSI shader's struct.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14896>
There were two paths in this code: One was transform_loops, which would
try to detect loops with an iteration count it understood and unroll that
many times. The other was emulate_loops, which would just figure out how
many instructions the program could have and still compile (hopefully),
and unroll this loop however many times would fit in that.
The transform_loops had no analysis as good as GLSL or NIR loop unrolling
have, so it shouldn't be missed -- any opportunity it found would only be
due to bugs in the unrolling code.
The emulate_loops path had an issue with computing the number of times it
should try to unroll -- if you had more instrs than ALUs available
already, you'd overflow and unroll approximately infinitely many times,
OOMing the system. But, also, it's better to throw a compiler error about
unsupported loops than to run the loop an incorrect number of times and
call it a success.
Fixes: #5883, #6018
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15004>
this otherwise treates begin/end/begin the same as begin/pause/resume
cc: mesa-stable
fixes:
KHR-GL46.texture_view.view_classes
KHR-GL46.transform_feedback.capture_geometry_separate_test
KHR-GL46.transform_feedback.capture_vertex_separate_test
KHR-GL46.transform_feedback.query_geometry_separate_test
KHR-GL46.transform_feedback.query_vertex_separate_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15020>
virglrenderer makes invalid shaders when faced with vector 64-bit
instructions, which GLSL-to-TGSI never produced. While this doesn't fix
everything, it does get more tests running, and virgl probably the primary
consumer of 64-bit TGSI. virgl may be deprecating its host 64-bit
support, at which point we can drop this workaround.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
Prior to the noted MR, virglrenderer encoded the tex operands in a
limited-size temp buffer which we could easily overflow with TGSI
immediates. GLSL-to-TGSI always emitted an extra MOV, so keep that
behavior when nir-to-tgsi feeds us more optimized TGSI.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
GLSL-to-TGSI would emit some MOVs that made things OK, but NTT
successfully copy-propagates the inputs and sysvals more, triggering bugs
in virglrenderer when the signedness of the input is different from that
of the ALU op.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
Various workaround paths in virglrenderer assume that outputs are written
with a full writemask, which is not required by TGSI. To work around
that, store affected outputs in a temp and do a full write after each
writemasked write.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
We really need offsets to be in dwords, not in vec4s.
The bug manifests as random failure of func.mesh.clipdistance.5 crucible
test, where stores to gl_MeshVerticesNV[x].gl_ClipDistance[4+n] actually write to
gl_MeshVerticesNV[x].gl_ClipDistance[1+n].
Fixes: 1f438eb033 ("intel/compiler: Implement Mesh Output")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14997>
Contains stuff needed for layered rendering. Unfortunately, there's no more
provoking vertex per draw -- ugh! That's fine for Vulkan (just don't set
provokingVertexModePerPipeline), but requires inserting extra flushes on desktop
OpenGL.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15003>
For some games that take like 400 MiB of shader binaries, the
number of shader arenas ends up going >1500. Cut that down a bit
by using larger arenas.
8 MiB should still be decent with small BAR and should still cut
things down from ~1500 to ~50 buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14591>
Some functions in ppir iterate the ppir_op_info slots arrays looking
for the PPIR_INSTR_SLOT_END token. The dummy/undef internal ops may
appear in the scheduling code and their slots arrays did not contain
that token, which could result in invalid array reads.
Reported by gcc -fsanitize=address.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
Reported by gcc -fsanitize=address, sometimes gpir regalloc attempts to
handle an uninitialized node->value_reg (containing the value -1), which
results in an invalid array access.
Avoid it for now to prevent crashes, but more investigation may be
required later on.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
Static analysis complains that spill_costs might be accessed in
non-initialized positions.
It does not seem to be an issue with the current code which initializes
it for every relevant register index, but we can also just initialize it
to not have to worry about that.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
This was needed, so that in case of active helper lanes,
these contain the correct value. It is now handled implicitly.
Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3)
CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00%
Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00%
Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 47498 -> 47273 (-0.47%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
Platforms that don't have flow control also don't have anything that
could be written that has a side effect. It should be safe to implement
these condition writes as
foo = csel(condition, bar, foo);
This should eliminate the last thing in the GLSL compiler that can
create new conditions on assignments. Everything else that can store
something in ir_assignment::condition derives it from a pre-existing
condition.
v2: Fix bad rebase.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
When there are four or fewer elements left in the array partition, the
strategy changes from a binary search of nested flow control to sequence
of conditional assignments like
(assign, dest, src[constant_i+0], index == constant_i+0)
(assign, dest, src[constant_i+1], index == constant_i+1)
(assign, dest, src[constant_i+2], index == constant_i+2)
(assign, dest, src[constant_i+3], index == constant_i+3)
or
(assign, dest[constant_i+0], src, index == constant_i+0)
(assign, dest[constant_i+1], src, index == constant_i+1)
(assign, dest[constant_i+2], src, index == constant_i+2)
(assign, dest[constant_i+3], src, index == constant_i+3)
Realistically, the first case should use ir_triop_csel instead.
The second case will either get turned back into flow control like
if (index == constant_i+0)
(assign, dest[constant_i+0], src)
if (index == constant_i+1)
(assign, dest[constant_i+1], src)
if (index == constant_i+2)
(assign, dest[constant_i+2], src)
if (index == constant_i+3)
(assign, dest[constant_i+3], src)
or a sequence of conditional selects like
(assign, dest[constant_i+0], (csel, index == constant_i+0, src, dest[constant_i+0]))
(assign, dest[constant_i+1], (csel, index == constant_i+1, src, dest[constant_i+1]))
(assign, dest[constant_i+2], (csel, index == constant_i+2, src, dest[constant_i+2]))
(assign, dest[constant_i+3], (csel, index == constant_i+3, src, dest[constant_i+3]))
The former case should continue to use the binary search. The later
case could be generated from the binary search by other lowering passes.
At the end of the day, conditional assignments don't really help
anything here, so stop using them.
Radeon R430
total instructions in shared programs: 2398683 -> 2398419 (-0.01%)
instructions in affected programs: 5143 -> 4879 (-5.13%)
helped: 9
HURT: 8
total vinst in shared programs: 616292 -> 616010 (-0.05%)
vinst in affected programs: 4467 -> 4185 (-6.31%)
helped: 9
HURT: 8
total sinst in shared programs: 315417 -> 315667 (0.08%)
sinst in affected programs: 2568 -> 2818 (9.74%)
helped: 2
HURT: 15
total flowcontrol in shared programs: 1049 -> 1048 (-0.10%)
flowcontrol in affected programs: 7 -> 6 (-14.29%)
helped: 1
HURT: 0
total presub in shared programs: 47027 -> 47027 (0.00%)
presub in affected programs: 127 -> 127 (0.00%)
helped: 1
HURT: 1
total omod in shared programs: 3618 -> 3615 (-0.08%)
omod in affected programs: 8 -> 5 (-37.50%)
helped: 3
HURT: 0
total temps in shared programs: 450757 -> 451312 (0.12%)
temps in affected programs: 837 -> 1392 (66.31%)
helped: 8
HURT: 6
total consts in shared programs: 1031928 -> 1031920 (<.01%)
consts in affected programs: 1211 -> 1203 (-0.66%)
helped: 6
HURT: 7
The shaders that were hurt for temps... are all lies. None of those
shaders should have compiled as all 6 had more than 32 temps to begin
with.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
Use if-statements instead. Any hardware that supports this sort of
tessellation has flow control, so it will probably emit the conditional
assignment using an if-statement anyway. This is definitely what
st_glsl_to_nir does.
v2: Fix copy-and-paste bug in the ir_type_swizzle handling. This bug
caused segfaults in tests/spec/arb_tessellation_shader/execution/variable-indexing/tcs-patch-vec4-swiz-index-wr.shader_test.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
The nir approximation is a bit more precise so there is one more
instruction for the scalar version but if the shader actually uses
vector one or some other stuff like sin(x) followed by cos(x) we
save more.
This nir approximation importantly seems to have better precision
so this should also fix some piglits/dEQPs.
With my shader-db and faked R300:
total instructions in shared programs: 67751 -> 65978 (-2.62%)
instructions in affected programs: 8637 -> 6864 (-20.53%)
total temps in shared programs: 9191 -> 9137 (-0.59%)
temps in affected programs: 486 -> 432 (-11.11%)
total consts in shared programs: 45427 -> 45412 (-0.03%)
consts in affected programs: 856 -> 841 (-1.75%)
total lits in shared programs: 2317 -> 2346 (1.25%)
lits in affected programs: 69 -> 98 (42.03%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14957>
This patch moves the shrinking of store sources into
a separate pass.
The reasoning behind this is that this pass usually only
needs to be called once while nir_shrink_vectors might
better be called several times. This allows to move
the pass(es) out of the optimization loops.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14480>
This wasn't much of a problem before because vk_command_buffer_finish()
doesn't do much on an empty command buffer. However, it's about to be
responsible for managing the pool's list of command buffers so it will
be critical to get this right.
Fixes: c9189f4813 ("anv: Use a common vk_command_buffer structure")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14917>
Now that we have no overlapping memory addresses for different
contexts we can go ahead and also share the same VM for every context.
This should make VM binding much easier to implement once we have
Kernel support for it.
v2: We now have engines_ctx, set it there too.
v3: Fix indenting (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
Have a single border color pool per bufmgr instead of per context.
We want to have a single VM shared among every context and the border
color pool is the last feature preventing us from having that.
Previously we had 1024 colors per context but once the buffer was full
we just waited for the buffer to be unused and restarted it. After
this patch we have 4096 colors for every single context and we can't
just flush buffers if they are full, so we simply return black.
There are many strategies we could try to implement to help alleviate
this new 4096 limit, none of which are implemented by this patch:
- We could just expand the buffer to the full 16MB we can use,
allowing 262144 colors.
- We could use multiple buffers and make the contexts refcount them,
so eventually older buffers would reach zero references and be
recycled, moving us to a working set maximum from a lifetime
maximum.
- We could also make the border color pool be a standard memzone and
then give smaller buffers to each context when they need, so the
limit would be in the number of contexts that can use border color
pools. This was my first implementation but Ken suggested I switch
to the one provided by this patch, which is simpler.
Keep it like this since border colors don't seem to be used very much
and other Mesa drivers such as radeonsi also seem to employ the
"return black once we reach the limit" strategy.
As a last note, we could also move the contents of iris_border_color.c
to iris_bufmgr.c in order to avoid breaking some abstractions we have
in Iris, like we do with iris_bufmgr_get_border_color_pool(). I can do
this in case we want it.
v2: Switch from standard memzone to a per-screen thing (see above).
v3: Actually make it per bufmgr. Just making it per screen is not
enough, since screens can share the same VM, an example being the
gputest benchmark suite.
v4: Rebase.
v5: Remove dead code, lock around hash table lookup (Ken).
v6: Simple rebase.
v7: Another rebase (for_each_batch).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
We're moving towards a path where all contexts share the same virtual
memory - because this will make implementing vm_bind much easier - ,
and to achieve that we need to rework the binder memzone. As it is,
different contexts will choose overlapping addresses. So in this patch
we adjust the Binder to be 1GB - per Ken's suggestion - and use a real
vma_heap for it. As a bonus the code gets simpler since it just reuses
the same pattern we already have for the other memzones.
Credits to Kenneth Granunke for helping me with this change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
this codepath needs to update literally one spirv word to get the right shader
(why can't this be a spec constant?)
so just update that word instead of running all the nir passes and doing a
full recompile
fixes:
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_MaxPatchVertices_Position_PointSize
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14976>
by initializing this on context creation, we can ensure that the correct
value is always here
cc: mesa-stable
fixes:
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_only
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14974>
There's already a single command queue for the screen, meaning that
all commands are being serialized implicitly into that queue. There's
no need to have separate fences for parallel contexts when those
fences would all share the same underlying timeline.
This adds an explicit lock to expand the scope of the implicit screen
command queue ordering to include fence signals.
Each context still gets its own submit sequence, which is used for 1
purpose right now: A uniqueness check in the state manager to see
if states are coming from separate command lists, to apply promotion
and decay logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
The no-wait condition was wrong before. If the query was used in the
current batch (query->fence_value == context->fence_value), we'd
continue on with the operation instead of returning false. Then the
buffer map would see that the bo is referenced in the current batch,
and would flush and wait, even though we were asked not to wait.
This fixes the condition by simply using the (correct) buffer map
logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
Properly handling NaN adversely affects several hundred shaders in
shader-db (lots of Skia and a few others from various synthetic
benchmarks) and fossil-db (mostly Talos and some Doom 2016). Only apply
the NaN handling work-around when the shader demands it.
v2: Add comment explaining the 1.0*y_over_x. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 2098ae16c8 ("nir/builder: Move nir_atan and nir_atan2 from SPIR-V translator")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
frexp_sig of ±0, ±Inf, or NaN should just return the input unmodified.
frexp_exp of ±Inf or NaN is undefined, and frexp_exp of ±0 should return
the input unmodified. This seems to already work.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 23d30f4099 ("spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
NOTE: This commit needs "nir: All set-on-comparison opcodes can take all
float types" or regressions will occur in other Vulkan SPIR-V tests.
No shader-db changes on any Intel platform.
NOTE: This commit depends on "nir: All set-on-comparison opcodes can
take all float types".
v2: Fix handling 16-bit (and presumably 64-bit) values.
About 280 shaders in Talos are hurt by a few instructions, and a couple
shaders in Doom 2016 are hurt by a few instructions.
Tiger Lake
Instructions in all programs: 159893290 -> 159895026 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019260087 -> 7019254134 (-0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624235 -> 143625691 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440083238 -> 8440090702 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185495 -> 134186618 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366923 -> 8222365826 (-0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 1feeee9cf4 ("nir/spirv: Add initial support for GLSL 4.50 builtins")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
This (sort of) matches the behavior of nir_opt_algebraic. This ensures
that subnormal values are properly flushed to zero.
With the aid of "nir/search: Float sources of texture instructions are
float users" and "nir/search: Transitively apply is_only_used_as_float",
there would have been no shader-db regressions on Intel platforms.
However, those caused a significant increase in compile time. Since the
instruction regressions were so small, I just dropped those commits
rather than improve them.
All Haswell and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20125042 -> 20125094 (<.01%)
instructions in affected programs: 7184 -> 7236 (0.72%)
helped: 0
HURT: 32
HURT stats (abs) min: 1 max: 4 x̄: 1.62 x̃: 2
HURT stats (rel) min: 0.11% max: 1.49% x̄: 0.85% x̃: 0.78%
95% mean confidence interval for instructions value: 1.39 1.86
95% mean confidence interval for instructions %-change: 0.74% 0.96%
Instructions are HURT.
total cycles in shared programs: 862745586 -> 862746551 (<.01%)
cycles in affected programs: 109872 -> 110837 (0.88%)
helped: 12
HURT: 23
helped stats (abs) min: 2 max: 774 x̄: 90.83 x̃: 19
helped stats (rel) min: 0.07% max: 25.23% x̄: 3.06% x̃: 0.40%
HURT stats (abs) min: 2 max: 1106 x̄: 89.35 x̃: 12
HURT stats (rel) min: 0.08% max: 45.40% x̄: 3.01% x̃: 0.47%
95% mean confidence interval for cycles value: -60.09 115.23
95% mean confidence interval for cycles %-change: -2.21% 4.07%
Inconclusive result (value mean confidence interval includes 0).
All of the shaders hurt are in either UE4 shooter-game or shooter_demo.
Tiger Lake
Instructions in all programs: 159893213 -> 159893290 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019259514 -> 7019260087 (+0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624164 -> 143624235 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440082767 -> 8440083238 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185424 -> 134185495 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366529 -> 8222366923 (+0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: f5dd6dfe01 ("anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
Extend 4195a9450b so that the next poor fool doesn't come along and
say, "sge does the right thing for 16-bit sources, but slt gives a NIR
validation failure. What the deuce?"
NOTE: This commit is necessary to prevent regressions in GLSLstd450Step
tests of 16-bit sources at "spriv: Produce correct result for
GLSLstd450Step with NaN".
Fixes: 4195a9450b ("nir: sge operation is defined for floating-point types")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
When slots is 64 only the first bit was being set, instead of setting
all 64 bits of the variable, so for that case the function
get_variable_io_mask() always returned 0.
This behaviour caused variables that are being used both on producer and
consumer to be considered unused and thus being removed on
nir_remove_unused_io_vars().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14955>
Expose most of the counters exposed by blob. By faking the value of
counters returned from kgsl I found the exact underlying counters and
constant coefficients being used.
Note, coefficients for counters that depend on time are NOT verified.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14323>
Although a resource may support CCS with its original format, a texture
view of that resource may have a format that doesn't support
compression. Don't create CCS surface states for such texture views.
This change affects iris' behavior when running piglit's
arb_texture_view-rendering-formats_gles3 test on SKL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
Fast clear with the resource format instead. This is safe to do because
can_fast_clear_color ensures that the clear color generates the same
pixel with either the view format or the resource format.
On SKL, this prevents us from using an invalid surface state. This platform
doesn't support CCS_E with sRGB formats, but prior to this patch we allowed
fast-clearing with this combination. Piglit's fcc-write-after-clear test
can trigger this.
Fixes: 230952c210 ("iris: Don't support sRGB + Y_TILED_CCS on gen9")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
the idx param for LLVMBuildInsertElement is zero-indexed based on the
value of 'vector_length' (always 4), and the vector length is (obviously)
sized to 'vector_length', so this should be the member of the vec that is being
inserted, not the invocation index
cc: mesa-stable
fixes (zink, but only on my one machine):
KHR-GL46.tessellation_shader.single.max_patch_vertices
KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls
dEQP-GLES31.functional.tessellation.shader_input_output.barrier
dEQP-GLES31.functional.tessellation.shader_input_output.patch_vertices_5_in_10_out
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_point_mode_geometry_output_triangles
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_point_mode_geometry_output_lines
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_point_mode_geometry_output_lines
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14949>
All of the opcodes in nir_opt_algebraic_late are the unsized (1-bit)
versions. If the lowering to int32 happens first, many of the
optimizations and lowerings won't happen.
Of particular importance is the lowering of fisfinite. If a shader
happens to contain fisfinite of an fp16 value, it will assert later
during compliation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 78b4e417d4 ("gallivm: handle fisfinite/fisnormal")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14942>
Allocating NIR registers ends up being required for drivers like r600 and
nv30, which don't do their own allocation (except in some cases on r600
where sb is used).
Rather than add a NIR register liveness impl
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14158), switch
from NIR-based liveness to just doing the same channel-based liveness
logic that the NIR registers needed at the TGSI level. The actual
liveness code here basically comes straight out of
brw_vec4_live_variables.cpp.
Since we do the liveness in TGSI now, it also means we don't need to be
careful about not reading SSA values from later TGSI instructions (which
may be useful for doing some greedy instruction selection in generating
TGSI instructions).
i915g:
total instructions in shared programs: 400719 -> 380730 (-4.99%)
instructions in affected programs: 284760 -> 264771 (-7.02%)
total tex_indirect in shared programs: 12289 -> 12290 (<.01%)
tex_indirect in affected programs: 4 -> 5 (25.00%)
total temps in shared programs: 32172 -> 22086 (-31.35%)
temps in affected programs: 30647 -> 20561 (-32.91%)
LOST: 0
GAINED: 148
r300:
total instructions in shared programs: 1472463 -> 1459286 (-0.89%)
instructions in affected programs: 507009 -> 493832 (-2.60%)
total temps in shared programs: 212143 -> 201678 (-4.93%)
temps in affected programs: 78007 -> 67542 (-13.42%)
softpipe:
total temps in shared programs: 517071 -> 294387 (-43.07%)
temps in affected programs: 509324 -> 286640 (-43.72%)
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
To do register allocation well, we want to have a point before
ureg_insn_emit() to look at the liveness of the values and allocate them
to TGSI temporaries. In order to do that, we have to switch from
ureg_OPCODE() emitting TGSI tokens directly to a new ntt_OPCODE() that
stores the ureg args in a block structure.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
PIPE_CAP_FORCE_PERSAMPLE_INTERP is broken for the no-attachment case, so
this is the only option
fixes (lavapipe):
KHR-GL46.sample_shading.render*
dEQP-GLES31.functional.sample_shading.min_sample_shading*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14852>
This cap bit only affects DRI_PRIME setups. Since iris now uses the
blitter to perform dGPU -> iGPU copies asynchronously, it's better to
always use at least two backbuffers so the 3D engine can start rendering
the next frame during the copy.
See commit d17e752857 where this change
was made for radeonsi.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13877>
In a hybrid graphics setup, Mesa allocates two buffers for the window
surface. The first is what the discrete card renders to; it lives in
VRAM and is usually tiled and possibly compressed. The second is a
shadow copy that lives in system memory (readable by the integrated
card with the displays); it's usually linear and uncompressed.
Mesa's window system code schedules blits to update the shadow copy
when needed, typically at the end of a frame. These can be fairly
costly when running a full-screen application at high resolutions.
We'd like to use the blitter for these copies, as it lets us perform
the copy asynchronously, letting the 3D engine race ahead and start
rendering the next frame. If we used the 3D engine, the next frame
could not start rendering until the PRIME blit finishes, giving us
less time to draw the frame. Fortunately, Tigerlake introduced new
blitter commands which can operate at full memory bandwidth.
DRI PRIME blits happen via the Gallium blit() hook. We can detect that
case by looking for the PIPE_BIND_PRIME_BLIT_DST flag on the destination
resource. This patch detects that case and calls iris_copy_region() on
IRIS_BATCH_BLITTER to handle it. We know a priori that the blitter can
handle this operation (it's not a scaled blit, the formats match and
should not be 96bpp, there's no combined depth stencil, or other weird
edge cases). blorp_copy() will also assert that edge cases don't occur.
Together with the next patch, this improves performance on DG1 Hybrid
scenarios by about 5-6%.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13877>
Use drmSyncobjSignal to signal out_syncobjs when a GPU job submission
ends in the simulator. With this, we can enable multisync support in the
simulator and keep the multisync approach to process fence by submitting
a serialized no-op job that adds the fence to the array of out syncobjs,
i.e. syncobjs to be signaled in the kernel when a job completes (job
post deps).
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14768>
The existing logic would drop the low bit. Instead, let's drop the high
bit, do the conversion, and then add the fixed constant back in if the
value had the high bit set originally.
Fixes KHR-GL45.direct_state_access.vertex_arrays_attribute_format on
drivers that use this module to handle the format conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emma Anholt <emma@anholt.net>
Tested-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14922>
Saves instructions if the same fabs value is used multiple times.
i915g:
total instructions in shared programs: 397005 -> 396525 (-0.12%)
instructions in affected programs: 11061 -> 10581 (-4.34%)
LOST: 0
GAINED: 22
r300 (not r500):
total instructions in shared programs: 180286 -> 179767 (-0.29%)
instructions in affected programs: 27102 -> 26583 (-1.91%)
total temps in shared programs: 29692 -> 29638 (-0.18%)
temps in affected programs: 356 -> 302 (-15.17%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14938>
Given that our fcsels are on float-bools, we can emit the LRP directly and
save the backend having to emit a SLT to turn the CMP src[0] into a bool.
This required passing a codegen flags struct for nir-to-tgsi. I think
this is a good way forward for it, as the alternative I think has mostly
been adding flags to nir_shader_compiler_options (since adding
PIPE_SHADER_CAPs is an unreasonable amount of pain).
r300 shader-db:
total instructions in shared programs: 1484320 -> 1472463 (-0.80%)
instructions in affected programs: 243588 -> 231731 (-4.87%)
total temps in shared programs: 212485 -> 212143 (-0.16%)
temps in affected programs: 3845 -> 3503 (-8.89%)
Acked-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14886>
The main thing is VK 1.3 testing, but also includes test bugfixes. The
1.3 CTS required an uprev of deqp-runner to handle a new style of test
output, and that deqp-runner brings in some neat new features, too (piglit
in your deqp-runner suite, and extension list checking).
A bunch of VK tests got renamed, so I replaced panvk's custom test list
with simple include filters on the main test list.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (panvk)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14920>
Block compressed formats like ETC2 are now indicated in the plane descriptor,
rather than the pixel format descriptor. Various other minor formats were
removed in Valhall; remove them from the XML so we don't accidentally try to use
them.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
Remove a few that no longer exist, and rename IDVS helper to Malloc Vertex. The
distinction between Malloc Vertex jobs and regular Indexed Vertex jobs is that
the hardware allocates varying buffers dynamically for Malloc Vertex jobs.
Regular IDVS and even legacy tiler jobs are also supported where desired.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
Merged with the Buffer descriptor, hence why it shares a type nibble. However,
Bifrost uses a dedicated tiler heap descriptor, and I see no benefit to merging.
So pretending it's a dedicated descriptor on Valhall too allows us to reuse the
Bifrost code with no modifications.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
The Entrypoint class already has utilities for gettingt he parameter
list as either declarations or as comma-separated argument names for a
call. Use that instead of hand-rolling it. The only modification we
need to make is to add the ability to start the list somewhere other
than at the beginning.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14919>
This fixed a problem for Zink where uniform buffer alignment varies by
GPU, e.g. 64 bytes for an RTX 2070 SUPER but 256 bytes for a GTX 1070
Ti.
Tested running Superposition on Windows 10 with Nvidia 1070 Ti with
496.13 driver. Without the fix Superposition soft locks on its splash
screen. With the fix Superposition runs through its benchmark.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14674>
The bit fields in zink_framebuffer_state cause a false positive with
MSVC's run-time checks enabled. setting state.num_attachments in
zink_get_framebuffer_imageless(). Writing some bits of num_attachments
involves reading bits from layers and samples that haven't been
initialized.
Fixed by assigning to num_attachments earlier in the function. Not
quite sure why that makes a difference but at a guess there's a
heuristic that considers assignment close to declaration as
initialization.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14566>
In the future we'll want to reuse this intrinsic to deal with ray
queries. Ray queries will use a different global pointer and
programmatically change the control/level arguments of the trace send
instruction.
v2: Comment on barrier after sync trace instruction (Caio)
Generalize lsc helper (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We'll want different types of IDs based on topology. Let's make this
more flexible and also move the bit shifting code a layer above where
it's easier to do bitshifting operations, especially if you need to
stash things into temporary registers.
v2: Keep previous comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
spec requires that the number of timeline waits/signals matches the
base number of waits/signals if there are any timeline semaphores
being processed by the submit, so asserting here is in line with what
validation will yield
failure to match these will also hang every driver I've tested, so asserting
here potentially saves some people their desktop session
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14741>
these regions might not have the coords in the correct order, which will
cause them to fail intersection tests, resulting in clears that are never
applied
cc: mesa-stable
fixes:
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_all_buffer_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_depth_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_stencil_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_linear_filter_color_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_magnifying_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_minifying_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_missing_buffers_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_nearest_filter_color_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_dimensions_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_height_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_width_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_scissor_blit
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14867>
e9c3dbd046 added PIPE_BIND_DRI_PRIME but it was only set when
importing a prime buffer.
This commit adds handling of this flag in the other codepath = the
one where the prime buffer is allocated by the render GPU.
With this change PIPE_BIND_DRI_PRIME is still only set for the
render GPU - the display GPU will never see this flag; a future
commit will rename it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14615>
To flush the blitter, we need to use MI_FLUSH_DW rather than the usual
PIPE_CONTROL we use on the 3D engine. Most of our code is set up to
suggest flushes via PIPE_CONTROL commands, however, so we hackily just
emit MI_FLUSH_DW when they ask for any kind of PIPE_CONTROL flush.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14912>
Graphics Flight Recorder is:
"The Graphics Flight Recorder (GFR) is a Vulkan layer to help
trackdown and identify the cause of GPU hangs and crashes.
It works by instrumenting command buffers with completion tags."
This is a nice little tool which could help quickly identify the call
which hanged. Or if command buffer is executed for too long.
The tiling nature of our GPU shouldn't be a big issue aside from
lower performance.
For non-segfault case, if:
- Hang happens at the same place in cmdbuf and draw/dispatch is not
finished at that point - it is likely that there is an infinite
loop in some of the shaders in this draw.
- Hang happens always in different place - likely there is nothing
wrong and command buffer just takes too long to execute and you
should try increasing hangcheck_period_ms. If it doesn't help
it is likely a synchronization issue.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13553>
If the surface already has a current backbuffer - say through a
buffer_age query - we do not want to replace it in get_buffers, because
it means the result we'd previously returned them is stale.
If we already have a backbuffer set on the surface, keep it locked in no
matter what until we hit SwapBuffers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14873>
A buffer age of 0 means that the buffer is uninitialised or has unknown
content. We rely on the buffer age initially being 0 through zalloc when
the surface is first created; when they are first used for a swap, we
set their age to 1, and then we increment the age of every buffer in the
chain with a non-zero age when we swap.
Now that we can release buffers, both through dmabuf-feedback as well as
detecting when we're using a deeper swapchain than the compositor needs,
make sure to reset their age as they are released. Without doing this,
the age will stay as it was before it was released and be incremented,
returning the wrong age to the user the first time a previously-released
buffer slot has been reused.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5977
Fixes: 22d796feb8 ("egl/wayland: break double/tripple buffering feedback loops")
Fixes: b5848b2dac ("egl/wayland: use surface dma-buf feedback to allocate surface buffers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14873>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
Based on ANV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14621>
If we have batch a + b, and writing to batch b, causes batch a
to flush, all the bo->index get reset, and we try to submit a -1
to the kernel.
Look the bo index up when creating relocations.
Fixes crash seen in KHR-GL46.compute_shader.pipeline-post-fs
and a trace from Wasteland 3
Fixes: f3630548f1 ("crocus: initial gallium driver for Intel gfx 4-7")
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14905>
Like explicit LODs, biases must be 16-bit, so add a lowering rule for
this. With the LOD mode selection updated for txb, we can then ingest
biases like explicit LODs and allowlist txb. Passes:
dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2d_bias
dEQP-GLES2.functional.texture.mipmap.2d.bias.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14899>
In cases where the to-be-allocated node size with padding exceeds BLOCK_SIZE
but without padding doesn't, a new block is not created and no padding is done
to the previous instruction, causing a misaligned pointer to be returned.
v2: Per Ilia Mirkin's suggestion, remove the extra condition in the first
if statement, let it unconditionally pad the last instruction if needed.
The updated currentPos will then be taken into account in the
block size checking.
This fixes crash seen with lightsmark and Optuma apitraces
Fixes: 05605d7f53 (' mesa: remove display list OPCODE_NOP')
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Tested-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14871>
When new context was created, shared_mem_size was getting overwritten.
This fixes glretrace failure seen with manhattan, aztec and BASS2_intro
apitraces
Fixes: 247c61f2d0 ('svga: Add support for compute shader, shader buffers and image views')
Tested with glretrace, piglit
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit dd6793ec9218782b1b716a87582d7219bae4e75f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14870>
The limit here is from the RENDER_SURFACE_STATE height/width/depth
fields - it's 2^30 for ISL_FORMAT_RAW buffers, and 2^27 otherwise.
anv_isl_format_for_descriptor_type() uses ISL_FORMAT_R32G32B32A32_FLOAT
for uniform buffers when compiler->indirect_ubos_use_sampler is set
(Icelake and earlier), but ISL_FORMAT_RAW when it isn't (Tigerlake+).
So we can increase the limit on Tigerlake and later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14892>
We are updating the deadcode state while walking the program backwards.
When encountering ENDLOOP, we scan the loop, mark everything in the loop
as used and than continue as usuall. We were previously trying to be
smart with the breaks. This was however not working as expected.
Instead, save the most pesimistic deadcode state from the ENDLOOP and
just restore it anytime we see a break.
This keeps the code simple and more importantly does not touch the flat
and IF(-ELSE)-ENDIF paths at all so reduces the chances of regression.
No changes with my shader-db.
Fixes piglits on RV530:
shaders/ssa/fs-if-def-else-break.shader_test
spec/glsl-1.10/execution/vs-loop-array-index-unroll.shader_test
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5832
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14661>
Compute shaders might do vec3(Xbits) loads which are translated
to LD.<next_pow2(3 * X)> by the midgard compiler. This might cause
out-of-bound accesses potentially leading to pagefaults if the
access is at the end of a BO. One solution to avoid that (maybe not
the best) is to split non-log2 loads to make sure we only read what's
requested.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
Swizzling happens in 2 steps on Midgard:
1. Vector expansion/shuffling
2. Swizzling at the instruction-size granularity, but defined using
the source size. Those size are different if the source is expanded.
So, when we print 64 bit swizzles on an expanded source, we first need
to apply an offset if the high part of the 32bit vector was selected,
and then divide the result by 2 to account for vector expansion.
To sum-up, swizzling on midgard is complicated, and I'm not sure I got
it right, but it seems to print what I expect on the few compute
shaders using 64bit arithmetic I debugged.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
Even though 8-bit ALUs are not supported, we can have [un]pack_32_4x8
instructions which translate to IMOVs, and those operate on 8-bit
vectors. The problem is, the swizzling granularity is 16 bit, which
means we don't support
MOV.i8 R0.xyzw, TMP0.xxxx, R1.zyxw
and the compiler doesn't even complain, it just applies 8 bit
swizzling directly, which obviously doesn't work.
This is probably not the right way to fix that, but I thought I'd
raised the issue with a hack to fix, so we can get the discussion
started.
(Found while debugging FB store lowering on Midgard).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
The register allocator assumes instructions are executed sequentially
and allows one instruction to overwrite a portion of a register written
by a previous instruction if this portion is never used. But scalar and
vector ALUs might be executed in parallel if they are part of the same
bundle, and when such instructions write to the same portion of the
register file, the result is undefined.
Let's add intra-bundle interferences to avoid this situation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
We didn't remove desc set from the pool's list if pool was
host_memory_base. On the other hand in there is no point in removing
desc set from the list in DestroyDescriptorPool/ResetDescriptorPool.
Fixes: da7a4751
("turnip: Drop references to layout of all sets on pool reset/destruction")
Fixes cts tests:
dEQP-VK.api.buffer_marker.graphics.default_mem.bottom_of_pipe.memory_dep.draw
dEQP-VK.api.buffer_marker.graphics.default_mem.bottom_of_pipe.memory_dep.dispatch
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14855>
Turn the context state and shader key into a bitmask. When lowering
the depth invert into the shader, scan for writes to viewport index.
If found, move position to the end of the function (or current vertex)
and check if the current viewport needs the depth invert before applying.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14881>
This pass (lowering discard_if to control flow and unconditional
discard) was originally written for Zink, but is useful for hardware
that lacks conditional discard instructions like AGX. In theory AGX
could implement a conditional discard with CSEL, but the vendor
compiler uses a lowering like this one. Since I like not writing code,
I'd like to use the pass that's already in tree.
v2: Don't preserve dominance (Jason). Assert we don't see demotes or
terminates (Jason). Add Mike's ack.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14217>
The bug here is that the DRI context "flags" are intended to alias the
GLX context flag values, and they don't, DRI's no-error flag is GLX's
reset-isolation flag. GLX (and EGL!) treat no-error as a context
attribute, and reset isolation predates Mesa's no-error implementation
by several years. The GL_KHR_no_error spec does describe it as a
"context flag", though, so maybe that's why we do it as a (DRI) context
flag.
In order to unalias these we need a new contract with the loader. We
remove the old __DRI_NO_ERROR extension, and add a new
__DRI_RENDERER_HAS_CONTEXT_NO_ERROR value to query. Loaders can key on
that to know to pass no-error-ness through as a context attribute,
matching the GLX/EGL calling convention. We go ahead and define
__DRI_CTX_FLAG_RESET_ISOLATION as well, and update the drivers to refuse
it since we don't support it yet.
This means mismatched drivers/loaders will not be able to create
no-error contexts. Too bad. If you want performance that badly you can
build both things at once.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12474>
Meson devenv is a feature added in meson 0.58 (thus the features is
version guarded) that allows creating a shell environment with
environment variables automatically setup for running the project inside
the build dir. Some variables (such as LD_LIBRARY_PATH and PATH) are set
automatically, others must be added by the project.
For vulkan is is relativley simple, we create a new, uninstalled, icd
file for each driver and set the VK_ICD_FILENAMES variable
appropriately. This can be used with:
```sh
meson devenv -C $builddir
```
then, vulkan applications will automatically use the uninstall vulkan
driver, no need to install.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14826>
The syntax we're using doesn't work when included into C++ sources. So
let's make it C++ compabible.
It turns out, nobody needs this extra definition which is what's causing
issues. Let's just make the initializer trivial without casting the
struct.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14850>
We're about to need including this header from a C++ source, so let's
add some explicit casts for C++ compatibility.
In one case we can make things a bit cleaner by moving the
char-pointer-ism to the place that needs it, so let's clean that up
while we're at it.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14850>
Moved nir_lower_compute_system_values to lower
load_local_invocation_index which could be emitted by
nir_zero_initialize_shared_memory.
Relevant CTS tests:
dEQP-VK.compute.zero_initialize_workgroup_memory.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14829>
IRIS_BATCH_BLITTER isn't supported prior to Tigerlake; in general,
batches may not be supported on all hardware. In most cases, querying
them is harmless (if useless): they reference nothing, have no commands
to flush, and so on. However, the fence code does need to know that
certain batches don't exist, so it can avoid adding inter-batch fences
involving them.
This patch introduces a new iris_foreach_batch() iterator macro that
walks over all batches that are actually supported on the platform,
while skipping the others. It provides a central place to update should
we add or reorder more batches in the future.
Fixes various tests in the piglit.spec.ext_external_objects.* category.
Thanks to Tapani Pälli for catching this.
Fixes: a90a1f15 ("iris: Create an IRIS_BATCH_BLITTER for using the BLT command streamer")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14834>
viewport_index won't be >= 16
layer should be < 2048
view_index should be < 2048
this leaves the last 64-bits as padding, which something
expects, but not having to write to it means we have to write
less memory every triangle.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14846>
the bboxpos = bbox copy was visible overhead on perf traces,
but we don't need to really do it.
Extract all the things from bbox needed early, then don't copy
it just overwrite it.
use boolean to fix power build.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14846>
The simple-varyings piglit test attempts to use GL_MAX_VARYING_FLOATS
varyings, PLUS one additional vector for position (which is not used
as input to the PS). "Reserve" that additional position vector by
removing it from the max varyings and max PS inputs.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14837>
When dealing with a vertex input that takes multiple rows, the value of
nir_intrinsic_base points to a driver-location-based index, but we need
to emit a location-based index (or more specifically, an index that
increments once per input, not once per register). Add a mapping to
the module of base -> ID.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14837>
This brings in some interesting new vulkan tests and fixes for the
spurious KHR-GL TF failures. Also, reduces the runtime of
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36 so that it
should stop timing out.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
It triggered on uprevving VK-GL-CTS, and @airlied says it's tripped
apparently spuriously before. There seems to be some interesting logic
behind it, so leave the big comment for whoever can revisit the issue some
day.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
This ran into OOM-kills with the CTS uprev. Looking at caselists at the
time of fail, some had 500MB of system memory used by the CTS (mostly
spirv string codegen), plus whatever BOs were allocated, and the lazors
are only 4GB it looks like.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
Between adding features and increased test coverage, we're hitting the
1-hour job limit. !13441 tried to increase the full run timeout for LAVA,
but by having not bumped the gitlab-ci timeout value it ended up just
letting the job keep running in LAVA after gitlab had given up on it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
In the autotune merge, we added another 1/15th run of a configuration
knob, thinking that was small enough to be in the noise. But actually the
main run is only 1/9th, so another 1/15th took us from nearly hitting the
job runtime target, to totally missing it. Crank things back down to keep
MRs flowing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
Interleave mode specified per-plane on Valhall. The texture descriptor proper
merely has a flag specifying whether planes are somehow interleaved
(u-interleaved, AFBC, or block compressed formats) or whether they are all
linear (and uncompressed).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
Flesh out the ZS/CRC XML, adding fields required for AFBC. Valhall allows AFBC
compressing stencil buffers independent of depth buffers, which is a new feature
since Bifrost. That results in a shuffling of the descriptor.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
This looks superficially like the Bifrost "Surface" descriptor, but it
additionally specifies the in-memory representation of blocks (clumps). If I
understand correctly, decompression is controlled by the plane descriptor,
rather than the texture descriptor level. This is a bit more flexible than
Bifrost.
Once the new fields here are wired up to Mesa, my
dEQP-GLES2.functional.texture.* failures should go away... I hope!
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
After starting to use a new version of the simulator, it got
outdated.
We made some initial effort to update it, but it was not
working. Taking into account that no one is using it, it is better to
just remove it.
We keep the noop drm drivers, as they could have some value for
developers that doesn't have access to the v3dv3 simulator.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14682>
Without this change LLVM 12 hits this error:
"""
LLVM ERROR: Error while trying to spill SGPR0_SGPR1 from class SReg_64:
Cannot scavenge register without an emergency spill slot!
"""
when running glcts KHR-GL46.arrays_of_arrays_gl.AtomicUsage test.
Fixes: 9ff086052a ("radeonsi: unroll loops of up to 128 iterations")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14848>
Instructions like V2S8_TO_V2S16 need a special 4-bit special selecting any two
bytes. The definition is the same as Bifrost. Let's call this a half-swizzle
since we need a name, and it is indeed half a swizzle...
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14833>
The lanes are at bit 28 and bit 26 respectively. This matches the 16-bit "swizzle" encoding. In general the handling of widens/swizzles/lane/lanes on Valhall is rather confused but... one problem at a time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14833>
Those surfaces are used as attachment to rendering passes and describe
the rate of coarse pixel shading for the pass.
v2: Move CPB_BIT tile filtering to isl_gfx125_filter_tiling() (Nanley)
v3: Drop unused macro (Nanley)
s/isl_to_gen/isl_encode/ (Nanley)
Remove pitch alignment 128B constraint already covered by tiling (Nanley)
Move some asserts together (Nanley)
v4: Disable miptail for now (Nanley)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>
We already had them disabled for hwtcl, but in the swtcl case gallivm's
param query would return (nir) support even though nir-to-tgsi couldn't
handle it because TGSI doesn't do fp16/int16.
Fixes: 7d2ea9b0ed ("r300: Request NIR shaders from mesa/st and use NIR-to-TGSI.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14809>
We need to check the source of the copy, not the destination. That means
this we need to move this check inside the ifs, where we know that the
source is a copy.
Fixes: 590efd180b ("Prevent propagating shared regs out of loops")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>
The old code was both too aggressive and not aggressive enough in
inserting (jp), and it wasn't based at all on a solid understanding of
how the hardware operates. It inserted an extra unnecessary (jp) at the
beginning of an if statement right after the conditional branch, which
is unnecessary. On the other hand, the only case where it didn't
insert a (jp) was a block with one predecessor that has only one
successor. We usually don't emit these kinds of blocks, or if we do then
one of the blocks is empty, but there is a case where we *do* and the
difference actually matters: for something like
while (true) {
if (...) {
// do stuff
...
break;
}
}
The instructions inside the if could be moved below the loop, except
that they are supposed to execute before control flow is merged if the
loop trip count is nonuniform. The subgroup reduce/scan macro does
exactly this, and relies on the control flow being correct. We have to
insert a (jp) after the loop, which this code wasn't doing, breaking the
scan macro.
Since we're now using the physical edges in a non-trivial way, we have
to preserve them better when we modify control flow in ir3_legalize.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>
This option only applies to relative shuffles (up/down/xor), and in a
moment we're going to add an option to lower normal shuffles, so rename
it.
While we're here, rename lower_shuffle() to lower_to_shuffle() for
similar reasons.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>
We recently had a bug of forgeting to add the buf->bo_offset. Just make
the easiest field to get be the bo->iova + buf->bo_offset already. Plus,
a little less work at emit time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14816>
_mesa_marshal_GetIntegerv expects those states to be tracked. I am not
sure if this covers all states that _mesa_marshal_GetIntegerv needs, but
this fixes Civ5 for virgl at least.
Fixes: e48f676835 ("glthread: don't sync for more glGetIntegerv enums for glretrace")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14659>
Previously, no buffers were ever marked as EXEC_OBJECT_ASYNC so the
Kernel would ensure dependency tracking for us. After we implemented
explicit busy tracking in commit 89a34cb845, only the external
objects kept relying on the Kernel's implicit tracking and Iris did
inter-batch busy tracking, meaning we lost inter-screen and
inter-context synchronization. This seemed fine to me since, as far as
I understdood, it is the duty of the application to synchronize itself
against multiple screens and contexts.
The problem here is that applications were actually relying on the old
behavior where the Kernel guarantees synchronization, so 89a34cb845
can be seen as a regression. This commit addresses the inter-context
synchronization case.
v2: Rebase after the changes from a90a1f15a7. This new version is
significantly different.
Cc: mesa-stable (see the backport in MR !14783)
References: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14783
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5731
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5812
Fixes: 89a34cb845 ("iris: switch to explicit busy tracking")
References: a90a1f15a7 ("iris: Create an IRIS_BATCH_BLITTER for using the BLT command streamer")
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14505>
Save some iris_syncobj_reference() calls at at update_bo_syncobjs() by
refactoring the function, moving the unnecessary calls out of the for
loop. Also improve the documentation and drive-by fix a white space
issue.
Commit a90a1f15a7 incremented IRIS_BATCH_COUNT and adjusted the code
involving it. But the way it was done at update_bo_syncobjs() means
we'll now call iris_syncobj_reference() more than the amount of times
we need. This isn't a problem since in the second call we're moving
the reference from something to the same thing, but still we can get
away with just not doing it, so why not?
There is also the question of whether we should really iterate over
IRIS_BATCH_COUNT or something else (to save time on platforms that
don't have as many batches as IRIS_BATCH_COUNT), but this is a matter
for a separate patch (see MR !14747). In this patch we reorder the
operations a little bit so when the time comes there will be just one
loop to adjust.
I also changed the order so first we look at's in bo_deps and only
later we write to bo_deps. This is more future-proof and will make the
next patch easier to land and understand.
Reference: a90a1f15a7 ("iris: Create an IRIS_BATCH_BLITTER for using the BLT command streamer")
Reference: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14747
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14505>
Before this patch, when querying the clear color plane's stride, iris
would return the aux surface stride. This was okay because the clear
color plane wasn't really used for anything.
This doesn't work on XeHP however. On that platform, the aux surface
stride is zero (because it doesn't have an ISL surface for the CCS).
This is a problem because mesa's implementation of eglCreateImage
rejects strides of zero (see dri2_check_dma_buf_attribs) and this value
may be queried from GBM and passed into EGL.
When the DG2 clear color modifier is enabled, this avoids EGL_BAD_ACCESS
errors when running the piglit test,
ext_image_dma_buf_import-intel_modifiers.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14759>
Pick the clear color BO if the clear color plane is being queried. This
avoids picking a NULL aux BO on XeHP.
When creating shared resources, we place the gallium-visible planes in
the same buffer object. However, when importing them, we aren't very
strict about each plane sharing the same BO. So, instead of just using
res->bo, we use a couple ternaries to figure out the right one.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14759>
Semaphores info was stored as an info of event_wait cpu jobs and this
leads to mem leak when the same event_wait job in the same cmd buffer
batch was submitted more than once. As a result,
`dEQP-VK.api.command_buffers.record_simul_use_primary` fails due to a
double-free of sems_info.
In this patch, we no longer use v3dv_event_wait_cpu_job_info to store
semaphores from a submit info, since semaphores is related to a queue
submission and not to the event_wait job type. If we spawn a wait_thread,
we copy semaphores to an auxiliary struct (v3dv_wait_thread_info) that
will be used in wait_thread to get job and semaphores information. When
the spawned thread finishes, it releases the related
v3dv_wait_thread_info and the semaphores copy as well.
Fixes: d5bd18fb ("v3dv: store wait semaphores in event_wait_cpu_job_info")
Suggested-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14736>
DXIL doesn't support a single varying having components belonging
to multiple streams. For geometry shader outputs, whenever we find
this to be the case, break up that variable into multiple sub-
variables, and update stores to that variable to write to the new
sub-variables instead.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14787>
The previous logic was a bit bolted-on, using a linked list of queries
to implement cases where a GL query needed maybe multiple D3D queries.
Once the sub-query was created, it was always begun/ended along with
the parent query, but that's not sufficient or correct.
Instead, we need to be able to have mutually-exclusive subqueries
underneath a parent query. This will let us handle using different
counters for the same GL query in different pipeline configs, and then
accumulating the results together.
Fixes the arb_transform_feedback2-pause-counting piglit test.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14787>
If an indirect arg buffer is being produced by a compute shader, then when
we go to consume it as an SSBO in a compute transform pass, we need to insert
a UAV barrier to prevent the two dispatches from overlapping. For app dispatches,
this is the app's responsibility via explicit barrier APIs, and if they don't,
then they're allowed to overlap.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14787>
Workgroup is only allowed in compute shaders, and Device should be more
in line with the intended use here
the alternative would be to keep using Workgroup for compute and use Device
otherwise, but this would effectively make atomic ops non-atomic, which seems
like it isn't desirable
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14690>
according to spec, these must match the texel pointer type
cc: mesa-stable
fixes (nvidia):
dEQP-GLES31.functional.image_load_store.2d.atomic.exchange_r32f_return_value
dEQP-GLES31.functional.image_load_store.2d_array.atomic.exchange_r32f_return_value
dEQP-GLES31.functional.image_load_store.cube.atomic.exchange_r32f_return_value
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14690>
glsl/nir automatically swizzle the value if a vecN is being used, but spirv
requires a single scalar, so this has to be detected and unswizzled to avoid
violating spec and crashing shader compilers that are less permissive than mesa's
fixes (nvidia):
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract*
KHR-GL46.shader_bitfield_operation.bitfieldExtract*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14690>
Vulkan bindings take only one slot per variable, but d3d12 ones take one
slot per entry when the variable is an array. This forces us to pass
an explicit vulkan -> d3d12 mapping table when dealing with vulkan
shaders.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14765>
The unconditional Y-flip is needed for Vulkan 1.0 since D3D12 and
Vulkan coordinate systems differ. Conditional YZ-flip is needed if
we want to support negative viewport height/depth.
Prepare spirv_to_dxil() to support that, and while at it, prepare
things for multi-viewport: the Y/Z flips are per-viewport and encoded
in a 32bit bitmask, with the upper 16bits reserved for Z flips, and the
lower 16bits reserved for Y flips.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14765>
This guarantees that we get an image that's created with exactly the
swapchain image creation parameters instead of trying to emulate it
inside the driver. Ideally, we'd use the fancy new bind helper too but
our magic ANV_IMAGE_MEMORY_BINDING_PRIVATE gets in the way and we really
do want to re-bind ourself.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12031>
With Vulkan 1.1, we have a VkImageSwapchainCreateInfoKHR struct which
lets you create a new VkImage which aliases a swapchain image. However,
there is no corresponding swapchain create flag so we have to set
VK_IMAGE_CREATE_ALIAS_BIT all the time.
We need to do a bit of work in ANV to prevent it from asserting the
moment it sees one of these. Fortunately, they're already safe because
WSI images go through a different bind path for
VkBindImageMemorySwapchainInfoKHR.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12031>
This is similar to the previous two commits that we did for DRM native
images. It breaks it into configure/create/bind/finish and calls
wsi_create_image to walk through the process. The primary difference is
that prime images need fifth step in the process to set up the blit
command buffer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12031>
Instead of making create_native_image one monolithic function, break it
into a configure stage and a create stage. The configure stage is
further broken up, first into a common piece that constructs a simple
VkImageCreateInfo and a couple chain-ins. The second adds the extra
stuff for create_native_image. This is to prepare for eventually
storing those structs in the swapchain itself.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12031>
The input attachment lowering pass turns input attachment loads into
texel fetch operation, and insert an image -> texture deref cast along
the way. In this situation, we can end up with a texture deref chain
pointing to an image variable, which is not a combined sampler+texture
object. Bail out when an image type is found, like we do for bare
textures.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13993>
After autotuner introduction most CTS tests are running in
sysmem mode. Now we have to force gmem run and add a small
forced sysmem run since it's not guaranteed that autotuner
would select gmem in future.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12128>
The implementation is separate from Freedreno due to multithreading
support.
In Vulkan application may fill command buffer from many threads
and expect no locking to occur. We do introduce the possibility of
locking on renderpass end, however assuming that application
doesn't have a huge amount of slightly different renderpasses,
there would be minimal to none contention.
Other assumptions are:
- Application does submit command buffers soon after their creation.
Breaking the above may lead to some decrease in performance or
autotuner turning itself off.
The heuristic is too simplistic at the moment, to find a proper
one - we should run a bunch of traces with sysmem and gmem, and
build better heuristic from gathered data.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12128>
- the current batch id is always 0
- there is always a current batch
- a batch id can only be set at the time of submit
thus when passing 0 to wait on the current batch, the submit must complete
so that there is a batch id, and this must occur before the timeline wait
path or else the timeline wait does nothing
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14693>
If the relevant state for a resource has not been dirtied between the
last and the current draw, we don't need to mark the resource as read
or written, as they are guaranteed to be marked already by the last draw.
This saves quite a bit of hashset operations for the resource tracking
in draw heavy workloads.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14791>
Assert the lock is held when we need it to be held. Also hold the lock
during bufmgr destruction to keep the asserts happy.
Now the only place where we're not holding the lock while manipulating
bufmgr data structures is iris_bufmgr_create(), but this shouldn't
trigger any of the new assertions.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13486>
On these functions, for vm_bind I want to add another function call
that can fail and needs to be handled. As a preparation for that, add
real error checks around vma_alloc() and try to fix some of the other
error-related issues in these functions. This way, it will be much
simpler to just drop the new vm_bind-related function calls and error
handling code.
My fear is that having vma-related errors (and in the future
vm_bind-related errors) go unannounced may create some hard-to-debug
bugs.
v2: Unlock only after bo_free() (Marcin Ślusarz).
v3: Prefer goto over early returns (Marcin Ślusarz).
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13486>
This pass was originally written for d3d12, but is useful for hardware
that lacks sample compare support like some etnaviv GPU models.
Also rename the lowering pass and some surrounding code to
nir_lower_tex_shadow as suggested by Emma.
I'd like to use the pass that's already in tree.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14308>
as long as this isn't multisampled, it should be okay since it could be used
with texture_subdata
fixes (anv):
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.conversion_gpu
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.enabled
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.multiple_textures
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.skipped
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.texel_fetch
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.toggled
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.using_sampler
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8_linear
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8_linear_mipmap_linear
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8_linear_mipmap_nearest
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8_nearest
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8_nearest_mipmap_linear
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8_nearest_mipmap_nearest
dEQP-GLES31.functional.texture.format.sized.cube_array.srgb_r8_npot
dEQP-GLES31.functional.texture.format.sized.cube_array.srgb_r8_pot
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14719>
Just as with all other TLB operations, we can only use the TLB if the render
area is aligned to tile boundaries. If it is not, then the operation would
overwrite pixels outside the render area, which is not allowed.
In this case, we can't even emit a previous TLB load to fix this because the
TLB has the multisampled attachment, not the resolve attachment, which is
just a destination buffer for the tile store.
Because the condition for tile alignment has to be determined for each
subpass, we handle this by storing this information in the attachment
state of the command buffer with the start of each subpass. We store
whether the attachment is to be resolved and whether it can use the
TLB (considering tile alignment restrictions).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14752>
The name of the bit field is CompressionFormat. The format subsections
of the field specify the alternate names of RenderCompressionFormat or
MediaCompressionFormat depending on the compression type.
We're going to start programming this field for media compression, so
we'd like to use either the bit field name or a new
MediaCompressionFormat field. Either option seems fine, so we go with
the first.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14355>
iris_is_format_supported has been returning false for YUV pipe formats.
We're going to update isl_format_for_pipe_format to map some YUV pipe
formats, but we don't want iris_is_format_supported to start returning
true for them.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14355>
The automatic VK coverage we care about is happening on a618, which is the
HW we're shipping. Having the old 630 runners make sure we don't leak
memory is a great use for them. Still, keep one default A630 VK job to
make sure we don't totally trash it.
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14235>
On XeHP, CCS doesn't require VMA on XeHP. The HW provides anything
allocated in LMEM a mapping to a CCS memory range for free. So, we:
1) use the implicit CCS framework to avoid adding an image memory
binding for the CCS surface.
2) leave each BO sized as-is instead of adding on space for the CCS.
Thankfully the framework only adds on space if an aux-map is present.
XeHP has no aux-map, so this patch doesn't explicitly do anything for
this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14431>
The fallback is incompatible with allocations that use CCS on XeHP. On
that platform, compression can't be used in SMEM.
Apps should be okay with this change. They're able to manage local and
system memory heaps directly (see VK_EXT_memory_budget).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14431>
The common WSI advertises VK_KHR_swapchain v70. We must handle
VkBindImageMemorySwapchainInfoKHR.
Fixes dEQP-VK.wsi.*.image_swapchain_create_info.
v2: try to match vn_wsi_create_image (Yiwei) and the common WSI
v3: match modifier as well (Yiwei)
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14550>
Perfetto requires time in clock snaphots to be monotonic, otherwise
the clock would be excluded.
GPU timestamps start from zero after every suspend-resume cycle
which makes them non-monotonic.
As a solution on msm we check whether GPU was just resumed and
remember previous highest timestamp to then add it to the next
timestamps.
If the functionality to get whether gpu is resumed is unavailable
or doesn't work - we fallback to a check for a discontinuity
in timestamps. For kgsl we always use fallback.
Fixes renderstage timeline disappearing in AGI.
Or you could avoid the issue altogether by preventing GPU from going to
sleep by increasing auto suspend delay e.g.:
echo 5000 > /sys/devices/platform/soc\@0/3d00000.gpu/power/autosuspend_delay_ms
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14391>
Fixes the case when last cmd buffer in submission doesn't have
tracepoints leading to flush data not being freed.
Added a few comments, renamed things, refactored allocations - now
the data flow should be a bit more clean.
Extracted submission data creation into tu_u_trace_submission_data_create
which would be later used in in tu_kgsl.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14391>
crosvm-runner.sh was using `export -p` to create an environment script
for the virtualized system, but this command will dump every declared
environment variable in the system, which includes Gitlab's CI variables
with sensitive data, such as passwords and auth tokens.
Replacing `export -p` to `generate-env.sh`, which only exports the
necessary variables for Mesa CI jobs.
Extra changes:
* Stop changing ${PWD} variable programmatically in scripts. ${PWD} is a
variable used by most prolific coreutils and bash commands, such as `cd`
and `pwd`, besides it is set by subshells [1]; changing this variable
may lead to complex situations.
As drop-in replacement for ${PWD}, use ${DEQP_BIN_DIR} to flag that
there is a special folder where dEQP should be run.
* Double quote path and array variables. See: https://github.com/koalaman/shellcheck/wiki/SC2086
* Do not export variables directly from commands output. See: https://github.com/koalaman/shellcheck/wiki/SC2155
[1]
```
$ cd /tmp
$ export PWD=test; bash -c 'echo $PWD'
/tmp
```
v2:
- Revert $DEQP_BIN_DIR quoting in crosvm-runner.sh and crosvm-init.sh
- Log all the passed variables to stdout, to help with debugging when
new variable are needed to be put in `generate-env.sh`
v3:
- Revert $DEQP_BIN_DIR quoting leftovers
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14626>
Now that RADV is exposing Vulkan 1.3 by default, VKCTS is getting
confused by it and fails 2 tests:
- dEQP-VK.api.version_check.version: This version of CTS does not
support Vulkan device version 1.3.204 (Fail)
- dEQP-VK.info.device_properties: deviceProperties apiVersion not
valid (Fail)
Mark both of these failures as expected, while we wait for VKCTS 1.3
to be released.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14729>
The new runner reduces the runtime by about 1/3 thanks to using rust
instead of python, and includes automatic flake handling so you don't just
have to skip flaky tests. The wrapper script also includes IRC flake
reporting (so one can track and update the flakes list to improve CI
reliability), always uploading results to CI for review (so you can
diagnose flakes and look at timings), has a prettier regressions report
and a helpful timing report, and is the same as what's used by all the HW
runners as well.
The downside is that by dropping the massive list of skips, you no longer
get flagged if Mesa refactors end up accidentally disabling extensions and
thus making tests skip. For that, I've started on
https://gitlab.freedesktop.org/anholt/deqp-runner/-/merge_requests/33 so
that hardware drivers get extension checking coverage too.
Thanks to the perf improvement, we get to drop one of the jobs for
llvmpipe.
xfail lists were mostly sed-jobs from the prior expectations lists. The
exceptions to that you'll find in the form of whitespace around the
affected test group (usually changes of capitalization or
special-characters), or an explanation for the more interesting changes
(which thankfully we can now record in the xfails lists!).
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14604>
Fixes:
In file included from ../mesa-freedreno-22.0.0_pre/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c:45:
In file included from ../mesa-freedreno-22.0.0_pre/src/gallium/drivers/freedreno/ir3/ir3_gallium.h:34:
In file included from ../mesa-freedreno-22.0.0_pre/src/gallium/drivers/freedreno/freedreno_util.h:31:
../mesa-freedreno-22.0.0_pre/src/freedreno/drm/freedreno_ringbuffer.h:35:10: fatal error: 'adreno_common.xml.h' file not found
#include "adreno_common.xml.h"
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14735>
Commit e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
accidentally disabled CCS_E on TGL+ because it checked for
image->vk.mip_levels > 0 instead of image->vk.mip_levels > 1.
Instead of reverting it, we remove the code which disables CCS_E for
mipmapped or arrayed images now that we've sufficiently handled the
clear color issue in other ways.
Fixes: e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14723>
On TGL, if a block of fragment shader outputs match the surface's clear
color, the HW may convert them to fast-clears (see HSD 14010672564).
This can lead to rendering corruptions if not handled properly. We
restrict the clear color to zero to avoid issues that can occur with:
- Texture view rendering (including blorp_copy calls)
- Images with multiple levels or array layers
Fixes: e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14723>
CCS_E is currently disabled on TGL+, but we'll enable it soon. We choose
to explicitly disable it for certain copy operations to avoid CTS
failures in the following groups:
- dEQP-VK.drm_format_modifiers.export_import.*
- dEQP-VK.synchronization*
Fixes: e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14723>
GitLab Pages has added a feature to do proper HTTP redirects, which
are genreally better than the HTML redirects we currently use.
Unfortunately, it doesn't support redirecting to other domains, all
paths must start with a slash. So there's sadly *one* redirect this
doesn't work for. So let's leave that one using a HTML redirect, and
use HTTP redirects when we can.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14369>
We were executing 1 non-helper invocation and 3 helpers per CS tgsi_exec
machine, which was a total waste of the CPU when we could trivially have
all 4 invocations do real work (at least in the common case of a
gl_WorkGroupSize.x >= 4).
This didn't have the effect on dEQP that I was hoping for, as it turns out
that its shaders are almost all 1x1x1 workgroups. However, it does reduce
the runtime of piglit arb_compute_shader-local-id from 2:10 to 47 seconds
on my system.
Part of #4097
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14728>
The shared var store overflow checks left a lot of overflowing
opportunities available, while the buffer storage path did proper
checking. But, more importantly for this branch, it always used the first
invocation's offset for each invocation in the quad (which only worked so
far because softpipe only dispatched a single non-helper invocation
per quad).
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14728>
this isn't the minimum allowed by the driver, but zink doesn't return
the minimum allowed by the driver anyway and hasn't in a very long time
instead, it suballocates using a minimum alignment of 256 bytes, so use
that instead
fixes (in caselists):
KHR-GL46.map_buffer_alignment.functional
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14720>
The reference buffer address is used as the indication in h264 DPB
Tier2, when reference buffer was reallocated, h264 DPB would lose
track of that reference picture. Adding a pointer obsolete_buf in
vlVaSurface data structure for tracking this released buffer, also
in h264_picture_desc adding a private field, which contains
past_ref[16] for tracking previously released buffer vs current
buffer for reference frames.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5868
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14646>
drawing unpopulated xfb data is legal(?) and tested in cts, and the correct
operation is to just drop the draw, so do that here
fixes (nvidia):
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_api
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14694>
Includes a fake framebuffer allocation that's necessary for the code we
still use from the regular render passes.
v3: (Lionel)
- Reuse the attachment count from the faux render pass, remove now
unused function
- Add a cmd_buffer_end_rendering function to match begin_rendering,
making use of the split stuff from end_subpass
v4: (Lionel)
- Don't bother with mark_images_writen or resolves on suspend case
- Remove flush at the end of end_rendering, it's not needed
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13980>
There's two of them because they can be created from three points in the
code that provide different details and this is the least ugly way I
could think of for now.
v2: Avoid allocations (Lionel)
v3: Move definition closer to its usage (Lionel)
v4: (Lionel)
- Simplify anv_dynamic_pass_init_full
- Zero out pass/subpass to avoid stall pointers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13980>
This was made way more complicated than it needs to be for a Midgard-only pass.
The only caller doesn't care about the class, only if it's native or not.
Simplify it appropriately.
It really isn't that hard.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14724>
These "quirks" are common for Midgard, yet are only consumed by
pan_lower_framebuffer -- a Midgard-only pass. So the quirks should be removed
and inlined into their users. Thid removes MIDGARD_QUIRKS altogether.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14724>
When multiple variables are packed into the same location, we need
to re-construct variables that read/write the same components of that
register so that the DXIL signature is correct. We could try to
merge these variables, but getting the types right sounds harder than
just preserving the multiple individual variables.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14399>
GLSL puts a bunch of tessellation info in the eval shaders, because
passthrough control shaders can exist. D3D12 puts it in the control
(hull) shader instead. So, when specializing, copy info from domain
to hull. For initial compiles (no domain shader), just make something
up.
D3D12 also requires the domain and hull shaders to have identical
patch constant signatures. Use the existing infrastructure and extend
it to also propagate patch constants. Notably, patch constant locations
are outside of the 64-bit range value so they require a separate pass
to avoid shifts larger than 64.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14399>
In order to get the semantics right, we need to know how many of the clip/
cull fields are designated for which purpose. In the case of a shader that
can receive these fields as both input and output, the shader_info property
is reserved to store the output info. We could add a dedicated input field
to shader_info, but since it'd probably only be useful for us, just send
it through a side channel during shader linking.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14399>
Note that this requires the shader info "tess" data to be correct.
For GLSL tess control shaders, only the output primitive count is
automatically available. The rest will need to be either guessed
or filled in from a matching tess eval (domain) shader. This is handled
by the d3d12 driver in a later patch.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14399>
The resource declarations are module-wide, but the resource handles
are function-local. A future change will add multi-function support,
but requires these handles to be potentially emitted multiple times.
The alloca used for scratch is also function-local.
This is the same pattern that the DXBC to DXIL converter uses.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14399>
On platforms where we don't support 64 bit instructions we shouldn't
pass such instructions for the code generator to lower into supported
instructions, because this makes their execution pipeline
unpredictable to the scoreboard lowering pass on XeHP+ platforms.
We really should be reducing all these 64 bit instructions before code
generation, so here we add an assert to help us catch and fix these
cases more easily.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[ Francisco Jerez: Also allow has_integer_dword_mul. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This fixes a bug in the CLUSTER_BROADCAST code generation that causes
the original IR region to be ignored, this will be a problem when we
start lowering 64-bit CLUSTER_BROADCAST instructions at the IR level,
since it will lead to instructions with non-trivial regioning.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
One of the two SHUFFLE implementations wasn't taking into account the
destination stride at all, and the other (more commonly used) one was
taking it into account incorrectly since brw_reg::hstride represents
the stride logarithmically, so we need to use a left-shift operator
instead of product. Found by inspection.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This fixes a bug in the handcrafted SIMD lowering done by the SHUFFLE
code generation, which wasn't taking into account the source and
destination region strides while deciding whether it needs to split an
instruction.
v2: Use new element_sz() helper instead of left shift. (Lionel)
Fixes: 90c9f29518 ("i965/fs: Add support for nir_intrinsic_shuffle")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This adds some generic infrastructure that allows splitting any
instruction into a number of instructions of a smaller legal execution
type. This is meant to replace several instances of handcrafted 64bit
type lowering done manually in the code generator, which is rather
error-prone, prevents scheduling of the lowered instructions, and
makes them invisible to the SWSB pass on Gfx12+ platforms, which will
become especially problematic on Gfx12.5+ since the EUs introduce
multiple asynchronous execution pipelines which the SWSB pass needs to
be able to synchronize to one another, so it's critical for the real
execution type of the instruction to be visible to the SWSB pass.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Right now the execution type lowering functionality of this pass
assumes that an integer type of the original bit size is always
acceptable, however we'll want more complex behavior than that in
order to leverage this pass to automate the lowering of unsupported
64-bit operations into multiple 32-bit operations.
In order to do that calculate the closest legal execution type from a
new helper function, and take advantage of that function from the
has_invalid_exec_type() helper, along the lines of other
lower_regioning() helpers structured as a pair of has_invalid_foo() +
required_foo() functions.
This shouldn't have any functional changes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Previously the software scoreboard structure would drop previous
dependencies for a given register and replace them with the most
recent one for the same register when a new instruction (or set of
instructions) is processed. This worked correctly on the Gfx12LP
platforms this code was originally designed for, because a repeated
dependency on the same register would either require the second
instruction to synchronize against the first (so the first dependency
could be disregarded from that point on) *or* require the dependency
to be RaR and in-order, which allows the synchronization to be
optimized out (the first dependency could still be disregarded as
well, since the pipeline is in-order). However the latter assumption
will break on upcoming Gfx12HP platforms, because they have multiple
asynchronous FPU pipelines, so whenever we hit a RaR dependency we
need to propagate forward both dependencies, since the order in which
both reads will complete is not guaranteed by the hardware in cases
where they occur from different asynchronous pipelines.
Note that this dependency propagation change requires us to change the
definition of dependency::done as well, since that constant is defined
to discard any previous dependency information when used as argument
for shadow().
This has been reported to fix the following conformance failures on DG2:
KHR-GL46.shaders.uniform_block.random.all_per_block_buffers.19
dEQP-GLES3.functional.shaders.derivate.fwidth.*
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5670
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
As explained at the header of the lowering:
"Once this pass is done, the color write will either have one
component (for single sample) with packed argb8888, or 4 components
with the per-sample argb8888 result."
So in several cases the lowering was updating the number of
components, so we need to update the writemask too.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14708>
We reserved space for chaining by subtracting 4 words from max_dw, but
then the new alignment code in radv_amdgpu_cs_finalize ended up running
all over that. That resulted in going over buffer size when chaining.
When lucky you'd get a crash, and when unlucky other stuff might happen.
This always adds the 4 words at the end, but initializes with NOP by
default. That way we still adhere to the alignment rules.
Fixes: 1f36f6b83f ("radv/winsys: use same IBs padding as the kernel")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14644>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that operate
on descriptor sets allocated using that layout, and those descriptor
sets must not be updated with vkUpdateDescriptorSets after the descriptor
set layout has been destroyed. Otherwise, a VkDescriptorSetLayout object
passed as a parameter to create another object is not further accessed
by that object after the duration of the command it is passed into."
Copied mostly from ANV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14622>
Since ldunif is a 32-bit instruction we need to demote these to
UBO loads, like we do for indirect indexing, with the exception
of scalar 16bit uniforms with an offset that is 32-bit aligned.
For the exception where we can use lfdunif we read a 32-bit slot
from memory where the uniform data is in the lower 16-bit and we
will read garbage in the upper 16-bit which we won't use anyway.
It should be noted that by using ldunif, we are consuming
32-bit from the uniform stream, but this is fine because
if there is valid uniform data in the upper 16-bit (i.e.
we had a ivec2 uniform aligned to a 32-bit address), since
we scalarize 16-bit loads, we would see another load uniform
with an unaligned offset for the second component, which we
will demote to UBO.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
These are required by VK_KHR_16bit_storage. Our hardware, however,
doesn't provide any mechanism to decide on the rounding mode of
the conversion and it seems to be using RTE, so we implement
RTZ in software.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
The vectorization pass can inject 32_2x16 (un)packing opcodes
upon successful vectorization of 16-bit operations into 32-bit
counterparts, so make sure we lower these to something our
backend can handle.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This allows us to implement 16-bit access on uniform and
storage buffers.
Notice that V3D hardware can only do general access on scalar
16-bit elements, which we currently enforce by running a lowering
pass during shader compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
V3D hardware doesn't support vector access for general TMU load/store
operations like the ones we use for UBO and SSBO, so we need to split
these to scalar operations.
It should be noted that we also have a vectorization pass (which runs
later, during optimization), that may reconstruct some of these into
32-bit operations when possible (i.e. when the resulting operation
is 32-bit aligned).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
I noticed the inefficiency in NIR-to-TGSI output while trying to debug a
failure handling some arrays in r600. While this makes reading CTS
shaders easier, the effect in the real world is pretty limited. From
softpipe shader-db:
total instructions in shared programs: 2929840 -> 2929836 (<.01%)
instructions in affected programs: 118 -> 114 (-3.39%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14321>
This allows the exported fds to be mapped for writing. My use case is
for virtio-gpu blob resources where the fds are mapped rw and mappings
are added to the guests using KVM_SET_USER_MEMORY_REGION.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14699>
With upstream mesa PIPE_CAP_IMAGE_STORE_FORMATTED needs to be set to enable
ARB_shader_image_load_store extension. This will reenable GL43 support for svga GL43 capable
device
Fixes: 3b81d2d30d ('mesa/st: do not expose ARB_shader_image_load_store if not fully implemented')
Tested with glretrace
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14688>
This splits iris_blorp_exec() into separate functions for executing on
the render command streamer and the blitter command streamer. A future
patch could add a separate iris_blorp_exec_compute() path that skips a
bunch of render-specific work.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This makes blits, copies, and (non-fast) clears set the appropriate
BLORP_BATCH_USE_{COMPUTE,BLITTER} flag if their batch is either
IRIS_BATCH_COMPUTE or IRIS_BATCH_BLITTER. We ignore the other
operations for now as those don't support compute or blit yet.
Of course, there is no code to attempt to launch BLORP operations on
either the compute or blitter batches yet, but that will come in time.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
We removed all the hardware blitter support from i965 years ago because
the blitter was not worth using (limited functionality, bad performance,
extra synchronization, and worse). However, on Tigerlake there are new
blitter commands that are actually fast and allow us to do proper
asynchronous copies while 3D is busy doing other work.
So, reintroduce the blitter. We'll want to use it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This introduces a new blorp_copy() path using the new XY_BLOCK_COPY_BLT
blitter command introduced on Tigerlake. Unlike the blitter commands of
old, this one is actually fast and worth using. Although it doesn't use
shaders like the rest of BLORP, we still can use some surface-munging
code from there, and BLORP also provides a nice place to put this which
is shared among the drivers.
To use the new path, set BLORP_BATCH_USE_BLITTER (much like Jordan's
recent BLORP_BATCH_USE_COMPUTE bit) and target the batch at the copy
engine.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This will be used as a performance hint for XY_BLOCK_COPY_BLT to
indicate whether the source/destination surfaces are (likely) in
device-local memory or system memory. We don't need to be precise
here - it's okay to set the fields to LOCAL even if a buffer has
been evicted out to system memory.
We should set this from Vulkan too, but I haven't yet. There isn't
a convenient anv_bo field like there is in iris...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This lets us support indirect access to UBOs easily. The existing
constant special case disappears too, since the peephole optimizer can
inline the constant later. (note: this is too conservative since we can
go up to 16-bit immediates...)
Unfortunately, nir_opt_algebraic can't seem to optimize expressions like
"((a << 3) + 4) >> 2" to "(a << 1) + 1" which would be necessary for
reasonable perf out of this...
Fixes:
dEQP-GLES2.functional.shaders.indexing.uniform_array.float_dynamic_loop_read_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14581>
This change introduces the anv_descriptor_size_for_mutable_type and
anv_descriptor_data_for_mutable_type helpers to compute the size and
data flags respectively for mutable descriptor types.
In order to make handling these types easier we now store a precomputed
descriptor stride for all types and use in the in appropriate places.
We also need to adjust the compiler to take into account this descriptor
stride. To that extent, we now pack the stride into the upper 16 bits
alongside the index and the dynamic offset index and use it later to
compute the correct offset.
Closes: #4250
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14633>
Without this the simulator wrapper will abort upon seeing this
query, rendering the driver unusable in that context.
Also, it seems the simulator environment doesn't quite work with
multisync at present, so do not enable it until we figure out what
the issue is.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14678>
iris_resource_bo() is convenient when we only have a pipe_resource *
variable, and don't need to do a lot with it other than get at the BO.
When we need to do more with a resource, we usually cast it to
iris_resource *, at which point we can just use res->bo instead.
This patch updates iris_copy_region to use src_res->bo, dst_res->bo,
rather than iris_resource_bo(src) and iris_resource_bo(dst), since we
already have those cast versions on hand.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14667>
If the application binds a fragment shading rate attachment to a
subpass and also clears the depth stencil attachment, the VRS rates
would have been reinitialized to 1x1 with fast clears. It makes more
sense to clear and then copy instead of the opposite.
Found by inspection.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14518>
Implements a workaround for HSDES#14014414195. Note that this change
deviates heavily from the workaround suggested in the HSDES, since all
of the suggestions are either costly at runtime or outright
non-compliant, so they would require us to apply the workaround
selectively for affected applications.
Instead of adding hacks to the compiler that manually implement the
LOD computation of 3D texturing operations in the shader, initialize
an extra sampler state structure in the driver that has anisotropic
filtering forced off, and use it instead of the normal sampler state
structure whenever a 3D texture is bound to the same sampler unit.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14489>
For non-discard writes, we need to make sure that the resource has valid contents
so they can be *updated*, not overwritten.
We have to skip this when doing asynchronous maps, but that should be okay, because
the threading context should only do asynchronous map-write when the resource is
known to be idle/empty.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14624>
In the case of a multi-stream GS that is attempting to output wide
points to stream 0, we can support this by lowering stream 0 to
triangles and then removing the other streams. This is only valid
to do if the other streams are not being written to stream output,
either if they're not present in the SO info or no buffer is bound.
Fixes the arb_gpu_shader5/arb_gpu_shader5-emitstreamvertex_nodraw
piglit test which does this.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14624>
I couldn't find this in a spec but the builtin-gl-sample-mask piglit
seems to expect writing to the output sample mask to do nothing when
max num samples == 0.
The ForcedSampleCount property should make everything appear as if
MSAA is disabled. However, it's undefined behavior if depth is
bound, so in that case, we can at least use a lowering pass to
make things *look* like MSAA is off, unless you use atomics to
count invocations.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14624>
This *almost* matches what GLSL wants, except for the handling of
large widths. You can see this in the lowering algorithm:
(('bitfield_insert', 'base', 'insert', 'offset', 'bits'),
('bcsel', ('ult', 31, 'bits'), 'insert',
('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')),
'options->lower_bitfield_insert'),
DXIL's 'bfi' instruction is an inseparable pairing of NIR's 'bfi(bfm(...), ...)',
so we just apply the additional bcsel in the backend.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14624>
Note that GL requires input coverage for sample execution mode to
be only the single bit corresponding to the executing sample, so
rearrange things to understand during shader emitting if we're
executing per-sample to emit the right coverage value.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14624>
The first container stage ("build") is for dependencies of the build.
These are infrequently-changing things like Visual Studio, LLVM, git,
and also meson. The second container stage ("test") currently depends
on the first, and adds test dependencies like piglit.
This lets us rev piglit without having to rebuild LLVM.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14637>
Cheatsheet:
_mesa_has_ARB_depth_texture() becomes (true && ctx->Extensions.Version
>= _mesa_extension_table[...].version[ctx->API]). The last value is 0
when ctx->API is API_OPENGL_COMPAT and ~0 otherwise. The whole function
effectively becomes (ctx->API == API_OPENGL_COMPAT).
_mesa_has_OES_depth_texture() becomes (true && ctx->Extensions.Version
>= _mesa_extension_table[...].version[ctx->API]). The last value is 0
when ctx->API is API_OPENGLES2 and ~0 otherwise. The whole function
effectively becomes (ctx->API == API_OPENGLES2).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14213>
The current msm_kgsl.h header in the tree isn't sanitized and the kernel
specific macros will confuse a compiler. Copy in the sanitized version of
the header from the 4.19 kernel tree which also adds a few new API bits
that are currently unused but may be useful some day.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14651>
This was copied from the blob before we understood what it did, and it
has questionable utility: there's nothing in the GL, Vulkan, or D3D11
specs that require the result be clamped to the underlying range to
account for imprecision. And it doesn't make sense at all for cubic
filtering, because the result can legitimately be outside the range in
some scenarios. Just remove it.
This fixes a bunch of tests added in vulkan CTS 1.2.8 to test blitting
from compressed textures, which use random inputs and therefore are more
likely to hit the out-of-range condition. For example,
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.2d.etc2_r8g8b8a8_unorm_block.r8g8b8a8_snorm.general_general_cubic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14613>
This appears to do the same thing as CLAMPENABLE on a3xx. That is, it
clamps the result to [0, 1] for unorm formats and [-1, 1] for snorm
formats *after* filtering. In particular it's now more easily observable
with cubic filtering, because cubic filtering can produce values outside
the original range. Presumably this only matters with linear filtering
due to rounding errors when computing the weighted average.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14613>
Here are two facts, each mildly unpleasant, quite nasty taken together:
- xcb_wait_for_special_event retries its poll() if the fd woke up but
no matching event arrived, without verifying that the special event
queue is still registered.
- Present gives no in-band notification of window destruction.
Now if the window is destroyed before the swapchain we're in trouble.
Our WSI thread might be stuck in xcb_wait_for_special_event as we're
awaiting a completion that won't come (the pixmap was being presented as
the window, and then the window was destroyed, so no more events can
happen on that window).
The solution is to use xcb_poll_for_special_event, which is
non-blocking, and handle the appropriate edge cases. If we've run the
event queue but we still don't have an image to acquire, we poke the X
server with a request that gently verifies that the window exists,
allowing the thread to exit gracefully in the above case. We detect
when we're busy-looping, and poll on the X connection for up to 1ms in
response to avoid burning the CPU.
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13564>
EXT_vertex_input_dynamic_state
When both EXT_vertex_input_dynamic_state and EXT_extended_dynamic_state
are not available stride is never set and nothing is rasterized as all
triangles are degenerate.
This fix copies stride into the VkVertexInputBindingDescription array
when the graphics pipeline is created when those extensions aren't
available.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14549>
For example the previous code generates the following sequence of
SPIR-V instructions ending with a redundant cast to uint:
%2018 = OpExtInst %uint %1 PackHalf2x16 %2017
%2019 = OpBitcast %uint %2018
The new code generates:
%2018 = OpExtInst %uint %1 PackHalf2x16 %2017
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14568>
Fixes InconsistentSpirv validation errors reporting that PackHalf2x16
outputs uint rather than float.
For example the previous code generates the following SPIR-V with
PackHalf2x16 output as a float:
%2018 = OpExtInst %float %1 PackHalf2x16 %2017
%2019 = OpBitcast %uint %2018
The new code generates:
%2018 = OpExtInst %uint %1 PackHalf2x16 %2017
%2019 = OpBitcast %uint %2018
cc: mesa-stable
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14568>
When we create a image view with D24S8 format we made a reformatting
to RGBA8UI if the aspect selected is just STENCIL. But when
configuring the stores we select the aspects based on the attachment
format. Quoting from cmd_buffer_render_pass_emit_stores:
/* From the Vulkan spec, VkImageSubresourceRange:
*
* "When an image view of a depth/stencil image is used as a
* depth/stencil framebuffer attachment, the aspectMask is ignored
* and both depth and stencil image subresources are used."
*
* So we ignore the aspects from the subresource range of the image
* view for the depth/stencil attachment, but we still need to restrict
* the to aspects compatible with the render pass and the image.
*/
const VkImageAspectFlags aspects =
vk_format_aspects(ds_attachment->desc.format);
So we could ending trying to store on a Z+Stencil buffer, using a
RGBA8UI format.
So far this only affected some tests when using the simulator
(assert). Those were working on the real hw, but probably would fail
on other situations, so lets use the original image format on that
case.
v2 (Iago)
* Improve comment grammar
* Do the same on load too (not just store)
v3 (Iago)
* Re-word comments.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14635>
With multiple semaphores support, we can use a GPU job to handle
multiple signal semaphores in the end of a cmd buffer batch. It
means, the last job in the last cmd buffer will be in change of
signalling semaphores as long as it meets some conditions:
1 - A GPU-job signals semaphores whenever we only have submitted
jobs for the same queue (there is no syncobj created for any
other type). Otherwise, we emit a noop job that waits on the
completion of all jobs submitted and then signals semaphores.
2 - A CPU-job is never in charge of signalling semaphores. We
process it first and emit a noop job that depends on all jobs
previously submitted to signal semaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With multiple semaphore support, we can improve the way we handle
wait semaphores considering different job types and cmd buffer
batch scenarios, that means:
- A GPU job depends on wait semaphores whenever it is the first job
submitted to a queue in this command buffer batch (the `first` flag
for the job's queue type is set).
- For the first CPU job, if there are wait semaphores, we should
wait for the CPU and GPU being idle to process the job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
The order in which a GPU job is scheduled is guaranteed within the
same queue type (CL, TFU, CSD), but the order of completion of jobs
from different queues cannot be guaranteed. Since we have multiple
semaphores support now, we can track the completion of the last job
submitted to each queue and therefore better determine when gpu is
idle. We do it using an array of syncobj (last_job_syncs) for each
GPU queue (CL, TFU, CSD). With this, job serialization also become
more accurate. We also keep tracking the very last job submitted
(last_job_sync became an element of the last_job_syncs array as
V3DV_QUEUE_ANY) for the case we don't have multisync support.
To help in handling wait semaphores, we set a flag per queue to
indicate we are starting a new cmd buffer batch and a job submitted
to this queue will be the first.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
In addition to keep a copy of wait semaphores, extend
v3dv_submit_info_semaphores to hold a copy of signal semaphores too.
With a copy of wait and signal semaphores, we can enable GPU jobs to
handle more than one wait and signal semaphores.
By now, we don't change the way as we signalling semaphores when all
jobs complete, i.e., we still use the master thread to signal
semaphores. In this context, no GPU job is actually in charge of
signalling, but the support for multiple signal semaphores is done
here.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Whenever v3d kernel-driver supports multisync extension, use it to
enable more than one semaphores in cl submission. In CL, we have two
kind of job (bin and render), therefore, we need also to determine
the stage to sync, that means to add job dependencies/wait
semaphores.
Also, as we currently process all signal semaphores of a cmd buffer
batch together in the submit master thread (when the last wait
thread completes), there isn't now a situation in which GPU jobs
need to handle signal VkSemaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Extends command submission ioctls to support multiple semaphores through
generic ioctl extension design. In this approach, a multisync extension
subclasses a generic ioctl extension struct (base) and enables more than
one wait and signal semaphores. Multisync extension also uses v3d_queue
to specify the wait_stage, i.e. which job should sync before start (wait
semaphores).
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of a boolean (sem_wait) in v3dv_event_wait_cpu_job_info,
that is used to determine wait condition for jobs put in a wait
thread, copy the wait semaphores data and store it as struct
v3dv_submit_info_semaphores. In the following patches we enable
multiple semaphores in GPU jobs, and therefore we need this data
to add wait semaphores as job dependencies for pending jobs
submitted from a wait thread.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of pass pSubmit to queue_submit_cmd_buffer, create a struct
v3dv_submit_info_semaphores to wrap semaphores data from VkSubmitInfo.
In the next commit, this struct will help to handle wait condition for
jobs submitted in a wait event context, since we need to hold this
data when handle wait events and pass it to queue_submit_job() called
from wait threads. The main goal is to allow multiple wait semaphores
in a job submission. Later, this struct will be extended to include a
copy of signal semaphores too.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Fixes the following valgrind warnings:
Conditional jump or move depends on uninitialised value(s)
at 0x5D1298C: draw_pt_so_emit_prepare (draw_pt_so_emit.c:95)
by 0x5D89FD7: llvm_middle_end_prepare (draw_pt_fetch_shade_pipeline_llvm.c:319)
by 0x5D13BFF: vsplit_prepare (draw_pt_vsplit.c:229)
by 0x5D0D5D8: draw_pt_arrays.isra.0 (draw_pt.c:124)
by 0x5D0DA08: draw_instances (draw_pt.c:484)
by 0x5D0DEB9: draw_vbo (draw_pt.c:610)
by 0x5E847D6: r300_swtcl_draw_vbo (r300_render.c:901)
by 0x5D67125: u_vbuf_draw_vbo (u_vbuf.c:1470)
by 0x5CFCAD3: cso_multi_draw (cso_context.c:1639)
by 0x58FEC37: st_draw_gallium (st_draw.c:186)
by 0x5A486BA: _mesa_draw_arrays.part.0 (draw.c:1323)
by 0x48B4E27: stub_glDrawArrays (piglit-dispatch-gen.c:12421)
Uninitialised value was created by a stack allocation
at 0x5E90D3F: r300_draw_init_vertex_shader (r300_vs_draw.c:313)
Conditional jump or move depends on uninitialised value(s)
at 0x5D175D9: draw_create_vertex_shader (draw_vs.c:70)
by 0x5E90E7A: r300_draw_init_vertex_shader (r300_vs_draw.c:366)
by 0x5E8751F: r300_create_vs_state (r300_state.c:1950)
by 0x594453B: st_create_common_variant (st_program.c:888)
by 0x594720C: st_get_common_variant (st_program.c:973)
by 0x59484B6: st_precompile_shader_variant (st_program.c:1994)
by 0x59484B6: st_finalize_program (st_program.c:2056)
by 0x58F0571: st_program_string_notify (st_cb_program.c:128)
by 0x5928C9B: st_link_tgsi (st_glsl_to_tgsi.cpp:7514)
by 0x5904B53: st_link_shader (st_glsl_to_ir.cpp:178)
by 0x58D3E08: _mesa_glsl_link_shader (link_program.cpp:91)
by 0x58865C7: link_program (shaderapi.c:1363)
by 0x58865C7: link_program_error.part.0 (shaderapi.c:1474)
by 0x48DBC5A: stub_glLinkProgram (piglit-dispatch-gen.c:34426)
Uninitialised value was created by a stack allocation
at 0x5E90D4F: r300_draw_init_vertex_shader (r300_vs_draw.c:313)
Based on Filip Gawin's
r300: set new_vs.type to PIPE_SHADER_IR_TGSI
CC: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14620>
This was identified by Coverity. 4bb9c0a28a added uses of a third
address register, but the arrays for tracking address registers only
have two slots.
Add back a version of the assertion from before 4bb9c0a28a to help
prevent future problems. I don't think any drivers that would hit
this path use NIR-to-TGSI yet, so it may be moot.
Reviewed-by: Matt Turner <mattst88@gmail.com>
CID: 1496942
CID: 1496944
Fixes: 4bb9c0a28a ("nir_to_tgsi: Use the same address reg mappings as GLSL-to-TGSI did.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14487>
It's not obvious why the (gl_FrontFacing ? -1.0 : 1.0) case was handled
different for Gfx12+ than for previous generations, and it's not
correct. It tries to negate the result as an integer, and it does this
before the mask operation that clears the other bits in the value.
When we eventually support dual-SIMD8 dispatch, the other front-facing
bit is in g1.6 at bit 15, so similar code should be possible there.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: c92fb60007 ("intel/fs/gen12: Implement gl_FrontFacing on gen12+.")
Closes: #5876
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14625>
drivers/hardware lacking VK_EXT_image_drm_format_modifier can still use dmabuf,
but that setup has to do the old copy to linear scanout instead of copy to
modifier scanout
this requires a couple extra checks to be added to handle the case
Fixes: 619438bf7ce ("zink: check EXT_image_drm_format_modifier for dmabuf support")
fixes#5836
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14597>
this one's a bit tricky since vulkan doesn't support vec5, the return from
the instructions is a struct, and I don't want to add temp var support to zink
now instead the process for these ops is:
* rewrite the is_sparse_texels_resident instruction to read the first vec member of the texop
* (temporarily) decrement num_components for sparse texop's dest to get real result size
* wrap texop's return type in spirv-required struct(uint, result)
* unwrap struct, store result normally + store residency info to separate array
* for is_sparse_texels_resident, ignore the mov alu for src[0] and instead use the ssa index
from the parent instr since this is the original texop that was used to store the residency result
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14381>
this would allocate a new stream uploader for every map if the offset was
large (e.g., all sparse buffer usage), which almost immediately consumes all vram
cc: mesa-stable
fixes KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14381>
If we don't do this, and we are statically linking LLVM
(-Dshared-llvm=disabled) while using a version of LLVM compiled for a
general-purpose distribution, then the link fails with undefined
references to the functions called by LLVMInitializeAllTargets().
Using a version of LLVM that was built specifically for Mesa will
sometimes mask this: if the only backends built into LLVM are backends
that we specifically link anyway, then the link will succeed.
According to commit 80817b6e "meson: Adjust Clover's required LLVM
modules", all-targets is not available when using CMake to locate LLVM,
so make it optional rather than mandatory.
This partially reverts commit 80817b6e34.
Resolves: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3962
Resolves: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5609
Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13720>
For nir_to_tgsi, I want to be able to fold into the base from a vector
load_const, which the ad-hoc scalar chasing couldn't handle.
r300:
total instructions in shared programs: 1278731 -> 1256502 (-1.74%)
instructions in affected programs: 457909 -> 435680 (-4.85%)
total flowcontrol in shared programs: 8316 -> 8313 (-0.04%)
flowcontrol in affected programs: 5 -> 2 (-60.00%)
total temps in shared programs: 213687 -> 213774 (0.04%)
temps in affected programs: 13140 -> 13227 (0.66%)
total consts in shared programs: 952850 -> 949929 (-0.31%)
consts in affected programs: 386352 -> 383431 (-0.76%)
Fixes: #5781
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
This helps non-native-integers hardware where relative addressing of UBOs
has a constant offset field, and having addressing math (particularly for
D3D9) emitted as ALU ops ends up running us out of constants. For
native-integers drivers (such as softpipe), the possible-overflow check
typically triggers and we end up not folding.
r300:
total instructions in shared programs: 1279167 -> 1278731 (-0.03%)
instructions in affected programs: 50834 -> 50398 (-0.86%)
total temps in shared programs: 213736 -> 213687 (-0.02%)
temps in affected programs: 598 -> 549 (-8.19%)
total consts in shared programs: 952973 -> 952850 (-0.01%)
consts in affected programs: 26776 -> 26653 (-0.46%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
This lets nir-to-tgsi fold the constant offset of addressing calculations
into the CONST[] reference, which is important for D3D9-era compatibility:
HW of that age has limited uniform space, and if we do the addressing math
as math in the shader for dynamic indexing, the nir_load_consts end up
taking up uniforms we don't have available.
r300:
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total temps in shared programs: 213912 -> 213736 (-0.08%)
temps in affected programs: 2166 -> 1990 (-8.13%)
total consts in shared programs: 953237 -> 952973 (-0.03%)
consts in affected programs: 45980 -> 45716 (-0.57%)
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
Commit 9da15aa3aa ("llvmpipe: enable EXT_memory_object(_fd)") enabled
the extension, but left this unimplemented.
Leaving this unimplemented causes segfaults for anyone trying to retrieve
the UUIDs, as the calling code in the state tracker does not check if the
function is implemented. This affects e.g. current Wine versions.
Set the UUID to all zeros. Although this slightly violates the vulkan
specification (since 1.2.146), the UUIDs have to match the ones from
lavapipe (lvp_get_physical_device_properties_1_1).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5865
Fixes: 9da15aa3aa ("llvmpipe: enable EXT_memory_object(_fd)")
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Reviewed-by: Dave Airlie airlied@redhat.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14558>
Having only magic constants instead of human-readable strings in
traces not only hinders readability, but also may affect trace
comparision of old and new traces if new enums have been added
or modified (thus possibly changing the values of existing ones.)
So we implement printing of enum names as strings instead.
In order to have those strings, we need to add some new helper
functions, which we will automatically generate from header file
src/gallium/include/pipe/p_defines.h via a new Python script
enums2names.py.
We also bolt this all into the Meson build system.
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4609
Signed-off-by: Matti Hamalainen <ccr@tnsp.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14398>
When doing copies of descriptors from one set to another, that contain
either a UNIFORM_BUFFER or STORAGE_BUFFER, both the buffer view &
surface state are allocated from the source descriptor. Therefore we
need to copy their content otherwise we could run into lifecycle
issues when the source descriptor is destroyed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14585>
run_command will change the default for the check arg to true
in the future. If it is true then meson will exit if the command
fails. It must be false here as we check the return code to
provide a meaningful error message.
With meson 0.61 we get the following warning:
WARNING: You should add the boolean check kwarg to the run_command call.
It currently defaults to false,
but it will default to true in future releases of meson.
See also: https://github.com/mesonbuild/meson/issues/9300
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14602>
many times it will be the case that an allocation for a block of data
needs to be done in one alloc() call such that the members of a struct as well
as some extra trailing data are all in the same allocation like
```
struct Test {
unsigned a[4];
unsigned c;
};
unsigned *b; //ptr to uint[8]
```
should be allocated as a single block of (13 * sizeof(unsigned)) memory using
C pointer offsets to allocate the memory as
```
| Test | b |
```
with something like
```
struct Test *t = malloc(sizeof(struct Test) + (8 * sizeof(unsigned)));
```
and then set `b` with
```
t->b = ((uint8_t*)t) + sizeof(struct Test);
```
this is annoying, awful to read, and (at least for dum-dums like me) prone to errors,
however, so having some utility functions which can deliver the same
functionality with better readability helps out this case by transforming it to
```
unsigned *b;
void **ptrs[] = {(void*)&b};
size_t sizes[] = {8 * sizeof(unsigned));
struct Test *t = ptralloc(sizeof(struct Test), 1, sizes, ptrs);
```
where `b` is now set to the appropriate offset in memory
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13678>
Broadwell introduced new fields in 3DSTATE_SBE which allow us to ask
the hardware to override Primitive ID for us, rather than requiring us
to turn on attribute swizzling and specify per-attribute overrides in
3DSTATE_SBE_SWIZ. We unconditionally enable attribute swizzling today,
but this is a step toward letting us think about disabling it in the
future.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14210>
VARYING_SLOT_{VIEWPORT,LAYER,PSIZ} all live in the same VUE header slot,
and the FS is already set up to read the x/y/z/w component of that vec4.
However, we were setting up the SBE to pass each of those items as a
separate FS input, so hypothetically if a shader read all three, we
would burn 3 FS inputs with redundant data. Not only was this passing
extra data to the FS, but it would count as extra input slots for the
"Do we have 16 or fewer attributes?" check for using SBE swizzling to
rearrange them in a convenient manner.
Now we make them share a single FS attribute and only count them once.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14210>
This patch handles shader translation for compute, image views and shader
buffers and updates the corresponding shader compile keys.
It also includes support of using shader raw buffer for shader buffer used
as constant buffer.
This patch is squash of numerous in house patches.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
v2: As pointed out by Thomas, fix revert of 64292c0f caused by this patch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14270>
This fixes a regression in tests that pass unclamped point size
values between stages.
This keeps xfb broken since the real way it should work is to have
the hw clamp after xfb, but this seems the least evil path.
Fixes: 3077d96856 ("crocus: Clamp VS point sizes to the HW limits as required.")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14359>
In some cases the file paths passed to crosvm for execution do not point
to dEQP binaries, but can be wrapper scripts, like deqp-runner.sh.
Detect such cases and skip changing the working directory.
Additionally, use the POSIX compliant command substitution syntax
instead of the obsolete variant based on backquotes.
Fixes: 81f25d8f27 ("virgl/ci: Run each dEQP instance in its own VM")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14413>
In some corner cases like the kernel oops, we do not get the relevant
log messages from crosvm process to help with debugging.
Note there is currently a double redirection of its stdout stream, but
the content eventually ends up in /dev/null.
Let's fix this by redirecting both stdout and stderr streams to a
dedicated file, to avoid clobbering the output from the script/program
running inside crosvm. This is particularly required for the scenario
that involves deqp-runner starting crosvm via a *.toml suite.
Additionally, drop the unnecessary usage of 'stdbuf' and set the 'quiet'
kernel command-line parameter to get rid of the noise generated during
crosvm boot process.
Although not directly related, do some cleanup by removing the
temporary folder on script exit.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14413>
Ensure virglrenderer library is built before crosvm in order to allow
dynamic linking. This is needed for the scenarios where a different
virglrenderer library must be provided before launching crosvm, e.g.:
the upcoming Virgl CI solution that shares Mesa CI containers.
Additionally, this provides the virgl_test_server binary which is
required by piglit-runner.sh and deqp-runner.sh scripts when using
the virpipe Gallium driver.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14413>
Add support for building and installing deqp-runner from a git source
repository to facilitate further development & testing of new features
or simply making use of the latest upstream changes which were not
already tagged as a new package version.
The git revision to be built must be specified by setting
'DEQP_RUNNER_GIT_REV' env variable. To specify a git tag name instead,
'DEQP_RUNNER_GIT_TAG' variable must be used. It is also possible to
indicate a custom git repository via 'DEQP_RUNNER_GIT_URL'.
If neither a git revision or tag name has been set, deqp-runner is
installed from the rust package registry (the default behavior).
v2: Make use of '--git' and '--rev' cargo args to automate the git
checkout operation (Rohan).
v3: Allow Git URL override by using the hardcoded URL only when
DEQP_RUNNER_GIT_URL is not already set.
v4: Override 'EXTRA_CARGO_ARGS' to optimize the script and avoid a
second call to 'cargo install'. Additionally, add support for
indicating git tags (Rohan).
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14413>
When invoking TLB clear, the color to clear could require
swapping and clamping.
Do the changes in a copy and leave the original color untouched, as it
is passed as constant.
This fixes local outside scope issues with Coverity.
Fixes: 54cba7d2 ("v3d: clamp clear color")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14534>
If an application calls eglMakeCurrent with the same context and the same
draw and read surfaces as the current ones, there is no need to go
through all the work of unbinding/flushing the old context and binding
the new one.
While the EGL spec is a bit ambiguous here, it seems that the implicit
flush of the outgoing context only needs to be done when the context is
actually changed.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14379>
The flush of the outgoing GL context, as required by the EGL spec for
eglMakeCurrent and extended by KHR_context_flush_control, is already
performed in the unbindContext call. There is no need to pierce through
the layers and unconditionally call glFlush() here.
Getting the out-fence FD for explicit fencing needs to move behind the
unbindContext, to make sure we are getting the fence for the most
recently flushed commands.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14379>
Doing this for ir3 required adding a struct for limits of how much base to
fold in (which NTT wants as well for its case of shared vars), otherwise
the later work to lower to the 1<<9 word limit would emit more
instructions.
The shader-db results are that sometimes the reduction in NIR instruction
count results in the fewer sampler prefetches due to the shader being
estimated to be shorter (dota2, nexuiz):
total instructions in shared programs: 8996651 -> 8996776 (<.01%)
total cat5 in shared programs: 86561 -> 86577 (0.02%)
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>
Lower to `sample_mask = 0`. Actually that implements a demote... doing
discard correctly probably requires rewriting the shader control flow to
insert a return where necessary...
Also, possibly we should be lowering this in NIR to play nice with
gl_SampleMask writes but that's a problem for when we understand the
hardware better.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14219>
According to RFC 4189 CSV files should be encoded using CRLF newlines,
not LF. This helps compatibility with tools, like python's csv module,
who always uses CRLF.
While we're at it, normalize the one CSV that was CRLF in-repo to LF,
and let git do the newline-normalization when needed instead.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12405>
By software ABI, a blend shader is permitted to clobber registers
R0-R15. The scheduler needs to be aware of this, to avoid moving a write
to one of these registers past the BLEND itself. Otherwise the schedule
is invalid.
This bug affects GLES3.0, but is rare enough in practice that we had
missed it. It requires a fragment shader to write to multiple render
targets with attached blend shaders, and have temporaries register
allocated to R0-R15 that are not read by the blend shader, but are sunk
past the BLEND instruction by the scheduler. Prevents a regression when
switching boolean representations on:
dEQP-GLES31.functional.shaders.builtin_functions.integer.uaddcarry.uvec4_lowp_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14577>
Although Bifrost clause packing and register assignment is tricky, the
relevant code is by now extensively tested, and there's no remaining
reverse-engineering here. So disassembling verbosely just adds tons of
noise to pandecode without increasing the useful information.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14543>
It's a pointer to a thread storage descriptor, not a framebuffer
descriptor. Unlike Midgard, these don't have to alias. The FBD pointer
was unused anyway, so remove it to reduce noise in pandecode dumps.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14543>
Rather than emulating page tables, poorly, with a hash table, use a
red-black tree and store the intervals directly. This is deterministic
instead of probabilistic, attaining O(log n) performance for n mapped
intervals which is good enough. Unlike the hash table approach, this
allows us to iterate intervals easily.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14543>
The switch statement in anv_descriptor_data_for_type() shows that this
field isn't used on SKL+.
On XeHP, this avoids assert failures by preventing
isl_surf_fill_image_param() from being called. That function doesn't
expect Tile4 surfaces.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14546>
As indicated by
VkPhysicalDeviceFragmentShadingRatePropertiesKHR::fragmentShadingRateWithShaderSampleMask
our implementation will clamp to 1x1 when reading samplemask or
writing to samplemask.
This fixes vkd3d-proton tests test_sample_mask_dxbc & test_sample_mask_dxil
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b6332fc4a8 ("intel/compiler: handle coarse pixel in render target writes descriptors")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14553>
Double buffer mode splits the tile buffer size in half so we can
start processing the next tile while the current one is being
stored to memory. This mode is available only if MSAA is not enabled
and can, in theory, improve performance by reducing tile store
overhead, however, it comes at the cost of reducing the tile size,
which also causes some overhead of its own.
Testing shows that this helps some cases (i.e the Vulkan Quake
ports) but hurts others (i.e. Unreal Engine 4), so for the time
being we don't enable this by default but we allow to enable it
selectively by using V3D_DEBUG.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14551>
Instead of a unreachable.
This would avoid an assert on debug builds that uses vkfoo_to_str to
print structure types. This will become more common as some tests will
start to use VK_STRUCTURE_TYPE_MAX_ENUM to mark structures from
unsupported extensions more often.
v2 (Jason):
* Include enum name on the default message
* Handle MAX_ENUM as a special case
v3 (Jason):
* vk_ObjectType_to_ObjectName don't need to use ${enum.name}
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14525>
Because these formats are introduced trough an extension, their
enum values are exceedingly large and we cannot use them to index
directly into the format table we had for core formats. Instead,
we put these in a separate table and we always use the
VK_ENUM_OFFSET helper to index into these tables.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14533>
vk_icdNegotiateLoaderICDInterfaceVersion now correctly identifies the
driver as supporting v4. Before, the driver did support the
functionality but didn't report supporting it through the negotiate
function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14299>
Setting the screen ro member before we checked gpu id means the error
case leads to a double-free because screen->ro is set and allocated
by parent who hanbdles de-alloc a second time after we destroyed the
screen we just created because ro was set.
Cc: mesa-stable
Signed-off-by: Carsten Haitzler <carsten.haitzler@foss.arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14516>
Arm Komeda (Mali) display units do exist on a few SoC's. This is the
first time that I've seen Mesa brought up "in anger" on one of these,
thus it has no such knowledge of these DPUs. This adds that to the
kmsro set like a lot of other fairly standard split GPU/DPU devices.
Signed-off-by: Carsten Haitzler <carsten.haitzler@foss.arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14515>
the original change here attempted to fix calculating the maximum bound for the
mapped readback buffer by adding the maximum attribute size to the final element
used by readback
the calculation was erroneous, however, because it instead calculated the maximum
offset instead of the size, which would cause a different kind of overrun
Fixes: 3c5b7dca30 ("util/vbuf: fix buffer overrun in attribute conversions")
fixes#5846
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14479>
It's allowed to reinterpret compressed formats as one of a few
non-compressed formats with the same pixel size as the blocksize of the
compressed format, and vice-versa. If we did this we'd wind up with an
incorrect width/height. Fix that.
Fixes dEQP-VK.image.sample_texture.*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14520>
MAX_GEOMETRY_IMAGE_UNIFORMS are supported if geometry shaders and either
ARB_shader_image_load_store or GLES 3.1 are supported.
v2:
- MAX_GEOMETRY_IMAGE_UNIFORMS shouldn't be supported for GL 3.2 if
ARB_shader_image_load_store is not supported (Ilia).
- MAX_TESS_{CONTROL,EVALUATION}_IMAGE_UNIFORMS requires tessellation
shader support (Anholt).
v3:
- Use _mesa_is_gles31() function (Ilia).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14288>
So far we were checking ARB_shader_image_load_store is supported as
requirement to expose GLES 3.1.
But when this extension functionality was added in GLES 3.1 spec, it
was relaxed, and one of its requirements, the support for formatless
writing, was not included.
So this means that a driver that support all the extension
functionality except formatless writing, could expose GLES 3.1, but it
couldn't expose the extension itself (nor GL 4.2, which requires fully
implementation of the extension).
v2:
- Add the same exposure treatment to ARB_shader_image_size (Ilia).
v3:
- Remove dependency for OES_texture_buffer (Ilia).
- Check image resources for GLES 3.1 (Ilia).
v4:
- Check for MaxImageUniforms in compute shader (Ilia).
- Check for max combined image uniforms/ssbo (Ilia).
v5:
- Remove ARB_shader_image_load_store from check (Ilia).
- ARB_shader_image_store and ARB_shader_image required for
ARB_ES3_1_compatibility (Ilia).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14288>
Define struct tu_timeline_sync for emulated timeline support in common
implementation that is on top of drm syncobj as a binary sync.
Also implement init/finish/reset/wait_many methods for the struct.
v1. Does not set MSM_SUBMIT_SYNCOBJ_RESET for waiting syncobjs since
it's being managed in the common implementation already.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14105>
This patch ports to common code for VkSemaphore, VkFence and relevant
APIs like vkCreate(Destroy)Semaphore/Fence, vkGetSemaphoreFdKHR, etc.
Accordingly, starts using common vkQueueSubmit with implementing
driver-specific hook.
Also remove all timeline semaphore codes so that we could use common
code in the following patches. This way we could easily see what's
modified in the following patch.
Note that kgsl is not ported in this patch.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14105>
This optimizations turns
loop {
...
if (cond1) {
if (cond2) {
do_work_1();
break;
} else {
do_work_2();
}
do_work_3();
break;
} else {
...
}
}
into:
loop {
...
if (cond1) {
if (cond2) {
do_work_1();
} else {
do_work_2();
do_work_3();
}
break;
} else {
...
}
}
As this optimizations moves code into the NIF statement,
it re-iterates on the branch legs in case of success.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>
Excerpt from ARB_occlusion_query.txt:
An implementation can either set QUERY_COUNTER_BITS_ARB to the
value 0, or to some number greater than or equal to n. If an
implementation returns 0 for QUERY_COUNTER_BITS_ARB, then the
occlusion queries will always return that zero samples passed the
occlusion test, and so an application should not use occlusion queries
on that implementation.
This looks more sane for drivers wanting desktop gl 1.5 without real
hw support then just faking it.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14361>
By disabling h264 enxtension flag to let vaapi application manage the
dpb buffers. The calculation of the non_exist_flags for h264 reference
frames needs to consider both frame number and POC in the reference
picture list, set this flag only if both of the frame number and POC
are not existed in the valid reference lists; otherwise, that reference
frame is considered valid.
Also enabled drm buffer in dynamic dpb Tier2.
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14484>
this appeared to fix some sort of bug related to preserving qbo data,
but really there shouldn't have been any sort of bug anyway since the qbos
all get read back, and thus the data is already preserved
instead, it just preserved the query id, which overloaded the pools and crashed
This reverts commit 79790e276f.
fixes#5669
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14397>
This patch is a rewrite of nir_opt_move.
Differently from the previous version, each instruction is checked
if it can be moved downwards and then inserted before the first user
of the definition. The advantage is that less insert operations are
performed, the original order is kept if two movable instructions have
the same first user, and instructions without user in the same block
are moved towards the end.
v2: Only return true if an instruction really changed the position.
Don't care for discards, this will be handled by another MR.
v3: fix self-referring phis and update according to nir_can_move_instr().
v4: use nir_can_move_instr() and nir_instr_ssa_def()
v5: deduplicate some code
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3657>
With a scenario like:
BeginRP(DS + VRS att)
Draw()
EndRP()
BeginRP(same DS)
Draw()
EndRP()
The second draw shouldn't use VRS but it did because the VRS bit
is always set during DS surface initialization if a surface can use VRS.
So, it would have been using the previous copied VRS rates.
Found by inspection.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14443>
Anv allows non-8x4-aligned depth buffer clears, but it has multisampled
HiZ disabled for BDW. iris allows multisampled HiZ on BDW, but disallows
non-8x4-aligned depth buffer clears.
Drop the unused optimization for non-8x4-aligned clears of multisampled
surfaces on BDW and use this opportunity to use some PRM text in the
code comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14091>
Binding table pool runs out of capacity quickly on modern games,
requiring new Surface Base Address instructions to be sent. That
is costly due to flushes and stalls. Increasing BT pool capacity
to 64KB improves performance several workloads.
Fallout4 +4%
Shadow of the Tomb Raider +4%
Borderlands3 +3%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14483>
"most" times it isn't necessary to insert any pipeline barriers when binding
descriptors, as GL requires explicit barrier usage which comes through a different
codepath
the exception here is when the following scenario occurs:
* have buffer A
* buffer_subdata is called on A
* discard path is taken || A is not host-visible
* stream uploader is used for host write
* CmdCopyBuffer is used to copy the data back to A
buffer A now has a pending TRANSFER write that must complete before the buffer is
used in a shader, so synchronization is required any time TRANSFER usage is detected
in a bind
there's also going to be more exceptions going forward as more internal usage is added,
so just remove the whole fake-barrier mechanism since it'll become more problematic
going forward
Cc: 21.3 mesa-stable
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14496>
ensure that vertex key data is always zeroed when changing last stage since it will
be updated before draw anyway and can only cause problems if left alone here
fixes the following caselist:
dEQP-GLES31.functional.shaders.builtin_constants.tessellation_shader.max_tess_evaluation_texture_image_units
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_geometry_output_points
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.25
cc: mesa-stable
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14482>
The idea is to offer the driver a way to execute on a different queue
than the one the app is using for Present.
For instance, this could be used to make the DRI_PRIME blit asynchronous,
by using a transfer queue.
So instead of creating a command buffer to be executed on present using
the supplied queue, this commit uses an internal transfer queue to perform
the blit.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13959>
Fixes a leak:
==47470== 60 bytes in 1 blocks are definitely lost in loss record 1,790 of 1,904
==47470== at 0x484186F: malloc (vg_replace_malloc.c:381)
==47470== by 0x58EBA6A: compile_vertex_list (vbo_save_api.c:535)
==47470== by 0x58EDABF: wrap_buffers (vbo_save_api.c:1021)
==47470== by 0x58EDF97: upgrade_vertex (vbo_save_api.c:1134)
==47470== by 0x58EE52F: fixup_vertex (vbo_save_api.c:1251)
==47470== by 0x58EFE9E: _save_Normal3f (vbo_attrib_tmp.h:315)
Fixes: 69615d92a0 ("vbo/dlist: realloc prims array instead of free/malloc")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14474>
Fixes a leak:
==46154== 48 bytes in 1 blocks are definitely lost in loss record 1,571 of 1,905
==46154== at 0x48466AF: realloc (vg_replace_malloc.c:1437)
==46154== by 0x5FC98EC: util_idalloc_resize (u_idalloc.c:43)
==46154== by 0x5FC9C16: util_idalloc_alloc_range (u_idalloc.c:125)
==46154== by 0x56FDB9F: _mesa_EndList (dlist.c:13681)
Fixes: b703d7c15f ("dlist: store all dlist in a continuous memory block")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14474>
The coarser 32x32 cross-slice hashing mode seems to lead to better L1
and L2 utilization due to the improved execution locality, however it
can also lead to a bottleneck in a single slice, especially in
workloads that concentrate heavy rendering in small areas of the
screen (e.g. SynMark2 OglGeomPoint, OglTerrain*) -- This effect is
mitigated here by performing a permutation of the pixel pipe hashing
tables that ensures that adjacent rows map to pixel pipes as far away
as possible in the caching hierarchy.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
Unlike the Gen11 code, this requires us to allocate a pipe_resource
for the pixel pipe hashing tables and hold a reference to it from the
context, since we need to add it to the validation list of every
batch, the tables may be accessed by the hardware at any time after
they're specified via 3DSTATE_SLICE_TABLE_STATE_POINTERS.
Note that this has an effect even for unfused native die platforms,
since the pixel pipe hashing tables we intend to program aren't
equivalent to the hardware's defaults on such configs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
This starts off with the simplest possible pixel hashing table
calculation that just assigns consecutive indices (modulo N) to
adjacent entries of the table, along the lines of the existing
intel_compute_pixel_hash_table(). The same function will be improved
in a future commit with a more optimal calculation.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
It's now an array with 7 tables, each table is intended to specify the
pixel pipe hashing behavior for every possible slice count between 2
and 8, however that doesn't actually work, among other reasons due to
hardware bugs that will cause the GPU to erroneously access the table
at the wrong index in some cases, so in practice all 7 tables need to
be initialized to the same value.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
The state cache invalidation shouldn't be necessary on recent
platforms. On ICL it *seems* to be required to get the hardware to
pick up an updated indirect clear color, so this change is only
applied to TGL platforms and later for the moment.
On some DG2 configs this seems to improve SynMark2/OglDrvRes by 16.0%
±0.1%, n=8.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
The current packed dispatch assumptions for fragment shaders seem to
be the reason that the fs-readFirstInvocation-uint-loop Piglit
test-case for the ARB_shader_ballot extension fails on DG2 in
combination with the patches in this series that enable pixel pipe
hashing (thanks Jordan for reporting the regression). I've confirmed
that the brw_fs_test_dispatch_packing() test fails on DG2 hardware for
fragment shaders, while it succeeds for other shader stages,
indicating that the PSD hardware no longer guarantees packed dispatch.
Disable it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
We don't key off of it to try to figure out if we need to produce
a new shader variant, so there's no need to set it when changing
properties that feed into variants. If we do have a new shader or
variant at draw time, we'll produce a new PSO without this.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14367>
Actually, no, there's no need to do anything, just update some
comments for the record. An earlier revision of this change that
implemented the workaround text to the letter required no less than 8
new PIPE_CONTROLs throughout the tree. However Felix Degrood noticed
that the cost of some of the PIPE_CONTROLs was showing up in workloads
like Shadow of the Tomb Raider. The Windows driver wasn't emitting
many of those pipe controls, contrary to the W/A instructions, so we
engaged in a back and forth with the hardware team, who concluded that
the original suggested workaround was unnecessarily strict, and the
Windows driver's behavior acceptable. It turns out that Wa_1408224581
we had already implemented for TGL is roughly equivalent to the
Windows behavior, so no need to do anything new after all.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14278>
XeHP platforms require the invalidation of the instruction cache after
a STATE_BASE_ADDRESS change due to a hardware bug potentially leading
to instruction cache pollution. Note that the workaround text says
it's applicable "DG2 128/256/512-A/B", however it's also marked as
permanent and not confirmed to be fixed in any specific steping, so we
apply it to all Gfx12HP platforms.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14278>
Different parts of our codebase disagree on whether spatial
coordinates/dimensions are given in pixels or blocks, which differ by a
constant factor for block-compressed formats. This disagreement
manifests as incorrect results accessing block-compressed formats.
To resolve this, define the public tiling routines to take their
coordinates in pixels, and align the relevant code in Panfrost
accordingly.
Fixes rendering glitches in Factorio, as well as a pile of piglits on
Panfrost. It should also fix glTexSubImage() with ETC1 on Lima, but
there are no tests for this in dEQP/Piglit.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> [dEQP/Lima]
Tested-by: Erico Nunes <nunes.erico@gmail.com> [Piglit/Lima]
Reported-by: Icecream95 <ixn@disroot.org>
Closes: #5560
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14370>
There is a lot of unit confusion in Gallium due to pixels versus blocks
matching only with uncompressed textures. Add a helper to do a common
pixels->blocks unit conversion required in multiple drivers.
v2: Rename dst->blocks, src->pixels to avoid confusion about the units
to casual readers (Mike).
Note to mesa-stable maintainers: this is marked as Cc: mesa-stable so
the next patch (a set of bug fixes for Lima and Panfrost) can be
backported. It's not a bug fix in its own right, of course.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14370>
This updates many places where 0 is used as NULL pointer.
There are a few warnings left when I build the default
configuration but they either relate to code
outside of mesa or where "None" is used instead.
Found with static analysis (smatch)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12174>
This makes the compiler less predictable and should only have a very small
effect on performance.
fossil-db (Vega):
Totals from 2410 (1.79% of 134756) affected shaders:
CodeSize: 6911568 -> 6942840 (+0.45%)
Fixes Horizon Zero Dawn artifacts.
If a shader has:
a = pack_half_2x16(a, 0) //rtne
store(pack_half_2x16(0, b) | a) //rtne
a = unpack_2x16(a).x
It will become:
store(pack_half_2x16(a, b)) //rtz
a = unpack_2x16(pack_half_2x16(a, 0)).x //rtne
So a later shader with "unpack_2x16(load()).x" will use "a" rounded to
zero, while the previous shader will use "a" rounded to the nearest even.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 2f125908b3 ("radv,aco: lower_pack_half_2x16")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14475>
Brings in these changes:
af1785f31 occlusion_query_conform: skip GetQueryCounterBits test if needed
dad078717 occlusion_query_conform: convert to pilgit subtests
b52c1c761 glsl-1.30: test nested preprocessor concat
6c4da153b texture-storage: Fix subtest result handling of skips.
4343f19db fbo-integer: Remove the invalid DrawPixels test.
e3842f2fe arb_dsa: exclude stencil8 textures from test sets.
ce8649be7 spec/ext_external_objects: Fix build on Debian systems
4e553838f glsl: add basic tests for desktop GLSL invariant qualifier linking
7e61e5199 Tests for variable in and out of loop scope
f855ad1c8 fbo-mrt-alphatest: Only require GLSL 1.20
9be2fe999 glx: add glx-multi-display-single-pbuffer test
bfe290725 glx: add glx-swap-pbuffer test
efa64335e framework: Fix build on Windows when using waffle
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14468>
When destroying a BO with a userspace managed address and thus freeing
the VMA space, we need to make sure that the BO isn't in use by any
active submit anymore, as the kernel will rightfully reject the next
submit that re-uses the still active VMA. Keep the BO alive as long
as it isn't fully idle to prevent the VMA being reused prematurely.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14159>
If a BO is removed from a cache bucket list via a lookup, we must
handle it in the same way as if a allocation from the cache happened:
tell valgrind that the buffer is active again and take a reference
to the etna_device, which the BO had given up while being in the
cache.
Cc: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14159>
The intended limit for command stream size is 64KB, as this is what old
kernels can reliably do and what allows for maximum number of queued
streams on newer kernels. However, due to unit confusion with the size
member, which is in dwords, the submitted streams could grow up to
~128KB. Fix this by using the proper limit in dwords.
Flushing due to some limits being exceeded is not an issue, but is
expected with certain workloads, so lower the severity of the message
being emitted in this case to debug level.
Cc: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14425>
Currently we dispose any unneeded color buffers immediately if we detect that
there are more unlocked buffers than we need. This can lead to feedback loops
between the compositor and the application causing rapid toggling between
double and tripple buffering.
Scenario: 2 buffers already queued to the compositor, egl/wayland allocates a
new back buffer to avoid throttling, slowing down the frame. This allows the
compositor to catch up and unlock both buffers. EGL detects that there are
more buffers than currently needed, freeing the buffer, restarting the loop
shortly after.
To avoid wasting CPU time on rapidly freeing and reallocating color buffers
break those feedback loops by letting the unneeded buffers sit around for a
short while before disposing them.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14451>
- gen4 - has dp4acc and dp2acc, dp4acc is used to implement
4x8 dot product.
- gen3 - has dp2acc, in OpenCL blob uses dp2acc for dot product
on both get3 and gen4.
- gen2 - unknown, lower everything.
- gen1 - no dp2acc, lower everything. OpenCL blob doesn't advertise
cl_qcom_dot_product8 but still generates code for it.
The assembly is more verbose and uses yet to be documented
mad32.u16 instruction.
Passes:
dEQP-VK.spirv_assembly.instruction.compute.opsdotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsdotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotaccsatkhr.*
Only packed 4x8 unsigned and mixed versions are accelerated.
However in theory we should be able to do better for signed version
than current NIR lowering.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
* shrm - (src2 >> src1) & src3
* shlm - (src2 << src1) & src3
* shrg - (src2 >> src1) | src3
* shlg - (src2 << src1) | src3
* andg - (src2 & src1) | src3
* dp2acc - dot product of two {i,u}8vec2 packed into
SRC1 and SRC2, added to 32b SRC3
* dp4acc - dot product of two {i,u}8vec4 packed into
SRC1 and SRC2, added to 32b SRC3
* wmm - vec4(x_1, x_2, x_3, x_4) * (y_1 + y_2 + y_3 + y_4), which is
duplicated (1 << (SRC3 / 32)) times starting from DST register
* wmm.accu - same as wmm but result is added to DST registers, however
the first reg in each vec4 result is overwritten instead of
accumulating.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
This allows the wavesize to be controlled per-shader. This will be used
by VK_EXT_subgroup_size_control, and freedreno will also need it if
legacy ARB_shader_ballot is to be supported (since it forces a wavesize
of 64 or less).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13960>
This fixes a test from the vkd3d-proton test_dual_source_blending_dxbc
test which asserts in the backend with :
brw_fs_visitor.cpp:716: void fs_visitor::emit_fb_writes(): Assertion `!prog_data->dual_src_blend || key->nr_color_regions == 1' failed.
This is because there is 2 color attachments provided by the
renderpass so we initially set nr_color_regions = 2. But once we've
parsed the shader, we can see it's only using one output (with dual
source color blending).
This change looks at the output variables to update the valid output
variables.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14417>
Fixes memory leak with dependencies array:
==5224== 104 (96 direct, 8 indirect) bytes in 3 blocks are definitely lost in loss record 1,954 of 2,035
==5224== at 0x484178A: malloc (vg_replace_malloc.c:380)
==5224== by 0x484670B: realloc (vg_replace_malloc.c:1437)
==5224== by 0x14DBAB9B: update_bo_syncobjs (iris_batch.c:819)
==5224== by 0x14DBADB8: update_batch_syncobjs (iris_batch.c:898)
==5224== by 0x14DBB3D5: _iris_batch_flush (iris_batch.c:1031)
==5224== by 0x14DB77D0: iris_transfer_map (iris_resource.c:2348)
==5224== by 0x157786FD: u_transfer_helper_transfer_map (u_transfer_helper.c:243)
==5224== by 0x14C479E7: tc_buffer_map (u_threaded_context.c:2252)
==5224== by 0x1434F3F8: pipe_buffer_map_range (u_inlines.h:393)
==5224== by 0x1435094A: _mesa_bufferobj_map_range (bufferobj.c:491)
==5224== by 0x143586D9: map_buffer_range (bufferobj.c:3737)
==5224== by 0x14358DA3: _mesa_MapBuffer (bufferobj.c:3947)
==5224== 240 (192 direct, 48 indirect) bytes in 6 blocks are definitely lost in loss record 1,984 of 2,035
==5224== at 0x484178A: malloc (vg_replace_malloc.c:380)
==5224== by 0x484670B: realloc (vg_replace_malloc.c:1437)
==5224== by 0x14DBAB9B: update_bo_syncobjs (iris_batch.c:819)
==5224== by 0x14DBADB8: update_batch_syncobjs (iris_batch.c:898)
==5224== by 0x14DBB3D5: _iris_batch_flush (iris_batch.c:1031)
==5224== by 0x14FF72CC: iris_get_query_result (iris_query.c:631)
==5224== by 0x14C4396A: tc_get_query_result (u_threaded_context.c:880)
==5224== by 0x1458F4F7: get_query_result (st_cb_queryobj.c:273)
==5224== by 0x1458F7EB: st_WaitQuery (st_cb_queryobj.c:352)
==5224== by 0x144EFF66: get_query_object (queryobj.c:742)
==5224== by 0x144F01AE: _mesa_GetQueryObjectuiv (queryobj.c:811)
And leak with syncobjs:
==13644== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1,846
==13644== at 0x484186F: malloc (vg_replace_malloc.c:381)
==13644== by 0x639789B: iris_create_syncobj (iris_fence.c:69)
==13644== by 0x63B213A: iris_batch_reset (iris_batch.c:512)
==13644== by 0x63B3637: _iris_batch_flush (iris_batch.c:1056)
==13644== by 0x65EF2BC: iris_get_query_result (iris_query.c:631)
==13644== by 0x623B970: tc_get_query_result (u_threaded_context.c:880)
==13644== by 0x5B874F7: get_query_result (st_cb_queryobj.c:273)
==13644== by 0x5B877EB: st_WaitQuery (st_cb_queryobj.c:352)
==13644== by 0x5AE7F66: get_query_object (queryobj.c:742)
==13644== by 0x5AE8150: _mesa_GetQueryObjectiv (queryobj.c:801)
Fixes: ce2e2296ab ("iris: Suballocate BO using the Gallium pb_slab mechanism")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14387>
If we have a compute shader that has a big workgroup, a barrier, and
a branchstack which limits max_waves - this may result in a situation
when we cannot run concurrently all waves of the workgroup, which
would lead to a hang.
Blob just explodes in such case.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14110>
By putting vertex store and indices all in one buffer the larger part
of the shared buffer might actually only be vertex data we are not
interested in. Hence only map the part of the buffer that contains the
index data for the currently active draw command.
This helps drivers where a mapping operation is expensive, like e.g. virgl.
v2: - add comment about ranged buffer mapping (Pierre-Eric)
- keep passing direct_draws[i].start to direct_draw_func, it looks
like the "start" parameter is properly set in
util_prim_restart_convert_to_direct
v3: Fix ws error (Mike)
Related: #5825
Fixes: f9d12bf50e
vbo/dlist: use a single buffer object
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
The blob uses *both* nops and (ss). It turns out that in some rare cases
the hardware does take more than 6 cycles, at least for movmsk, but
adding nops is unnecessary. I believe the extra nops are only there due
to the immaturity of the blob's implementation of subgroup ops, so we
don't have to copy them - just handle shared reg producers the same as
SFU instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
This now covers e.g. cat6 instructions as well, and ss will cover
instructions writing shared regs as well. This is split out from the
previous change to avoid too much churn and shouldn't cause any
functional changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
Add new centralized functions which will replace the various places we
hardcode 10 for the number of (ss) nops, add numbers for soft (sy) nops
based on similar computerator experiments with ldc, sam, and ldib (the
most common (sy) producers), and add a "systall" metric which is
analogous to sstall. This also fixes some cases where we'd erroniously
count ldl* as (sy) producers instead of (ss) producers when calculating
sstall.
This only switches over the metric reporting to the new functions, so
there is no behavior change. The following commit will switch over
the rest of the compiler.
While we're at it, remove max_sun as it's never set.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
After some experimentation with computerator, it seems on a618 that
writing a full register and then reading half of it as a half register
requires a delay of 6, the same as the delay for cat5/cat6 sources. The
other direction only has a delay of 5, but just bump it unconditionally
out of an abundance of caution.
Fixes: 890de1a436 ("ir3/delay: Fix full->half and half->full delay")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
If we're allocating a source then we force is_killed to false, not to
true. Fixes a regression in
dEQP-GLES31.functional.synchronization.in_invocation.image_atomic_write_read
later.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
Was getting ASAN errors in CI when trying to add ANV to the
debian-testing job:
==10993==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4194304 byte(s) in 64 object(s) allocated from:
#0 0x7f763c1bda3c in __interceptor_posix_memalign ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:226
#1 0x55f43d28627f in os_malloc_aligned ../src/util/os_memory_aligned.h:58
#2 0x55f43d28627f in _util_sparse_array_node_alloc ../src/util/sparse_array.c:107
#3 0x55f43d28627f in util_sparse_array_get ../src/util/sparse_array.c:143
#4 0x55f43d1fdaba in anv_device_lookup_bo ../src/intel/vulkan/anv_private.h:1335
#5 0x55f43d1fdaba in anv_device_import_bo_from_host_ptr ../src/intel/vulkan/anv_allocator.c:1843
#6 0x55f43d1ff571 in anv_block_pool_expand_range ../src/intel/vulkan/anv_allocator.c:534
#7 0x55f43d1ffcb5 in anv_block_pool_init ../src/intel/vulkan/anv_allocator.c:417
#8 0x55f43d18f082 in run_test ../src/intel/vulkan/tests/block_pool_no_free.c:123
#9 0x55f43d1862b6 in main ../src/intel/vulkan/tests/block_pool_no_free.c:152
#10 0x7f763b942d09 in __libc_start_main ../csu/libc-start.c:308
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14121>
When we lower SPIR-V to NIR for textures in vtn_handle_texture, we only
bump the number of coordinate components when the op is not a lod query.
Update the assert to take this into account.
This fixes:
- dEQP-VK.robustness.robustness2.bind.template.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
Fixes: 231337a1 ("intel/fs/xehp: Assert that the compiler is sending all 3 coords for cubemaps.")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13925>
That's how the TGSI math opcodes work.
This lets lower_vec_to_regs coalesce the DP output into the .yzw channels,
giving an impressive shader-db win on softpipe:
total instructions in shared programs: 2929840 -> 2794036 (-4.64%)
instructions in affected programs: 1651438 -> 1515634 (-8.22%)
total temps in shared programs: 372730 -> 332744 (-10.73%)
temps in affected programs: 118151 -> 78165 (-33.84%)
and a minor one on r300:
total instructions in shared programs: 51238 -> 51149 (-0.17%)
instructions in affected programs: 2621 -> 2532 (-3.40%)
total vinst in shared programs: 15655 -> 15618 (-0.24%)
vinst in affected programs: 468 -> 431 (-7.91%)
total temps in shared programs: 9838 -> 9828 (-0.10%)
temps in affected programs: 59 -> 49 (-16.95%)
and a bigger one on i915g:
total instructions in shared programs: 398064 -> 395901 (-0.54%)
instructions in affected programs: 29271 -> 27108 (-7.39%)
total tex_indirect in shared programs: 12261 -> 12233 (-0.23%)
tex_indirect in affected programs: 98 -> 70 (-28.57%)
LOST: 0
GAINED: 5
The r300 change is less impressive because it does some backend copy-prop,
but also because intermediate storage of DPs now takes a vec4 instead of a
scalar.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14200>
Note that this causes a geometry slice to be disabled if any DSS is
fused off within that slice, which may seem stricter than the BSpec
quotation implies, but testing shows that pixel pipes with any faulted
DSS don't work at all, and that using a slice with any faulted pixel
pipe leads to serious graphics corruption.
It would be better to query this geometry topology information from
the hardware instead of trying to reconstruct it here, but the kernel
interface for that is not available yet.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14436>
An if that looks like:
if (x) { } else { }
That has no phis following it is dead. Currently these are only
removed by peephole select, but that means that 'x' is considered
used until that pass is run, which can make it difficult to apply
sane lowering in the case where loading 'x' requires complex or
expensive transformations, but 'x' is *really* unused.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14400>
Note that currently there's no emulation if the D3D12 driver doesn't
support the "UAV typed load" feature for all of the GL required formats.
This is not a required D3D12 feature, so this support won't light up
on all D3D12 hardware.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14342>
This is handled in 2 ways:
* For casts in the same "class," we can just do the cast. There's also
limited support for 4x8 => 1x32 and 2x16 => 1x32.
* For casts that are just by size, use a lowering pass. This reads the
data in the appropriate UINT format (except R11G11B10), retrieves
the original bits, re-packs it into the dest, and returns it to the
app for loads. For stores, the process is done in reverse - bits
for the final format are computed and re-expanded to the format that
should be used for the store.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14342>
This is a bit fragile. The algorithm is essentially:
- Let the driver track state for non-dual-bound resources.
- For resources that are dual-bound as SSBO/image and a second
bind point, assume they're being used as UAVs.
- When a MemoryBarrier is issued, dirty all destination bind points
so they re-assert their state, and if the destination is not UAV,
temporarily suppress UAV state re-assertion.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14342>
There are currently 3 different "environments" supported by this backend,
and they have slightly different semantics for how resources are accessed,
which is only going to get a little weirder when GL images start getting
supported. To try to make things less confusing, use an explicit enum
with heavy documentation on what's different between them.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14342>
This commit makes `kernel+rootfs_arm64` job build and install skqp on
ARM64 devices rootfs.
Skia repository has a tool to prepare skqp models located at
`tools/skqp/cut-release`, which get files from [Skia
Gold](https://skia.org/docs/dev/testing/skiagold/), generate
files.checksum, rendertests.txt and unittests.txt. One gives a range of
commits to let `cut-release` find the right resources to prepare skqp
for the user. However, it is failing, since it fails when trying to get
image packages from a range of commits via HTTPS from the host
https://public-gold.skia.org but it responds with error 404 every time.
I tried a range a thousand of commits, yet it still does not give
results. The workaround employed was to recover the most recent
`files.checksum` and `rendertests.txt` files from the git history and
generate `unittests.txt` from `list_gpu_unit_tests` binary.
`skqp` runs two lists of tests, `rendertests.txt` and `unittests.txt`.
Both must be located inside the `skqp` assets folder. The first list
uses GL and GLES to test rendering scenarios. The second runs some unit
tests that do not render an image per se.
In order to make the first `a630_skqp` to be green, the crashing tests
were removed from the test lists and the expectations of the failing
ones were updated.
It is worth noting that `rendertests.txt` can bring some detail about
each test expectation, so each test can have a max pixel error count, to
tell `skqp` that it is OK to have at most that number of errors for that
test. See also:
https://github.com/google/skia/blob/main/tools/skqp/README_ALGORITHM.md
As each render backend has a different error count, two different
`rendertests.txt` files were created,
`src/freedreno/ci/freedreno-a630-skqp-gl_rendertests.txt`,
`src/freedreno/ci/freedreno-a630-skqp-gles_rendertests.txt` and
, which one refers to GL and GLES tests respectfully.
The unit tests file for a630 is located at
`src/freedreno/ci/freedreno-a630-skqp_unittests.txt`
```
aaclip
domain
formats
highcontrastfilter
rectangle_texture
yuv_make_color_space
```
```
ProcessorOptimizationValidationTest
VkProtectedContext_CreateNonprotectedContext
VkYCbcrSampler_DrawImageWithYcbcrSampler
VkYCbcrSampler_NoYcbcrSurface
```
Each test was updated with the max_error count equal to the first run result.
```
analytic_antialias_inverse
async_rescale_and_read_dog_down
async_rescale_and_read_dog_up
async_rescale_and_read_rose
async_rescale_and_read_text_down
async_rescale_and_read_text_up
async_rescale_and_read_text_up_large
async_rescale_and_read_yuv420_rose
complexclip2_path_bw
encode-platform
imageblur_large
lcdtextsize
onebadarc
onefailarc
scale-pixels
surfaceprops
textfilter_color
textfilter_image
```
Considering all the following tests results as wrong.
```
async_rescale_and_read_no_bleed
backdrop_imagefilter_croprect_persp
complexclip2
imageblurrepeatmode
mixerCF
overdrawcolorfilter
patch_alpha
patch_primitive
rrect_clip_bw
scaledemoji_rendering
yuv_splitter
```
v2:
a) add link to HTML report on job log
b) remove extraneous spaces diff
c) remove unnecessary conditions from build-skqp.sh
d) use fixed skqp source commit SHA
v3:
a) Use only main skia repository to fetch models and build skqp
b) Use list_gpu_unit_tests binary to create a base unittests.txt file
c) Remove crashing tests
d) Set failing tests expectations for the first skqp run
v4:
a) Remove clang dependency
b) Separate each skqp backend result into its folder
c) Regroup a630_skqp in one job
v5:
a) Separate tests files per driver
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5580
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14146>
It seems that at least some GC400 come out of reset with random vertex
attributes enabled and also don't disable them on the write to the first
config register as normal. Enabling all attributes seems to provide the
GPU with the required edge to actually disable the unused attributes on
the next draw.
Cc: mesa-stable
Reported-by: Steven Walter
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14285>
In dEQP-GLES2.functional.shaders.operator.geometric.reflect.highp_vec2_fragment
and friends this pass would turn:
0: DP3 temp[1].x, input[1].yx0_, input[0].wy0_;
1: MUL temp[2].xy, temp[1].xx__, const[0].xx__;
into
0: DP3 temp[2].x * 2, input[1].yx0_, input[0].wy0_;
1: MUL temp[3].xy, temp[2].xy__, input[1].yx__;
Note the attempt to use .y of temp[2]. Just bail when we more dst
channels than src channels, since the rewrite can't generate more channels
for us. Fixes this subset of tests (which I hadn't included in the xfails
until now since results hadn't quite been stable).
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14405>
It would seems msvc and mingw dont pack across disparate types so zink_bind_rasterizer_state is too big for int32
string.h:202:10: warning: ‘__builtin___memcpy_chk’ forming offset [4, 7] is out of the bounds [0, 4] of object ‘rast_bits’ with type ‘uint32_t’ {aka ‘unsigned int’} [-Warray-bounds]
202 | return __builtin___memcpy_chk(__dst, __src, __n, __mingw_bos(__dst, 0));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/gallium/drivers/zink/zink_state.c: In function ‘zink_bind_rasterizer_state’:
../src/gallium/drivers/zink/zink_state.c:586:16: note: ‘rast_bits’ declared here
Fixes: 9c5a2ab6
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12609>
The samplemask VGPR that we had to pass to the epilog increased VGPR usage
by 1 for all shaders. Do it in the main function by using the mono key
structure, which causes on-demand compilation and stall, but we'll save
the VGPR.
57794 shaders in 35145 tests
Totals:
SGPRS: 2715856 -> 2716272 (0.02 %)
VGPRS: 1776168 -> 1718432 (-3.25 %)
Spilled SGPRs: 3704 -> 3630 (-2.00 %)
Spilled VGPRs: 1727 -> 1733 (0.35 %)
Private memory VGPRs: 256 -> 256 (0.00 %)
Scratch size: 2008 -> 2016 (0.40 %) dwords per thread
Code Size: 61429584 -> 61393288 (-0.06 %) bytes
Max Waves: 838645 -> 840484 (0.22 %)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266>
Until now we have lived without a refcount mechanism in the driver
because in Vulkan the user is responsible for handling the life
span of memory allocations for all Vulkan objects, however,
imported BOs are tricky because the kernel doesn't refcount
so user-space needs to make sure that:
1. When importing a BO into the same device used to create it
(self-importing) it does not double free the same BO.
2. Frees imported BOs that were not allocated through the same
device.
Our initial implementation always freed BOs when requested,
so we handled 2) correctly but not 1) and we would double-free
self-imported BOs. We tried to fix that in commit d809d9f3
but that broke 2) and we started to leak BOs for some imports.
This fixes the problem for good by adding refcounts to BOs
so that self-imported BOs have a refcnt > 1 and are only freed
when all references are freed.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5769
Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14392>
This ended up being turned on in gallivm, but since we use nir_to_tgsi on
the VS and TGSI doesn't have FP16, we can't let that happen.
Fixes: f814a2449e ("llvmpipe: enable FP16 and update CL + traces piglit results.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14403>
Previously the code was using a hack to change the token type from
INDETIFIER -> OTHER in order to avoid getting in an infinite loop
expanding the tokens. This worked ok until we got to a paste where
the replacement parameters had already had their type changed
to OTHER because the newly created paste token would then
inherit the OTHER type and never get expanded inself.
For example with the follow code:
#define STEP_ONE() \
out_Color = vec4(0.0,1.0,0.0,1.0)
#define GLUE(x,y) x ## _ ## y
#define EVALUATE(x,y) GLUE(x,y)
#define STEP(stepname) EVALUATE(STEP, stepname)()
#define PERFORM_RAYCASTING_STEP STEP(ONE)
This would get all the way to expanding PERFORM_RAYCASTING_STEP to
STEP_ONE() but because it was created via the paste `x ## _ ## y`
it would never get any further.
To fix this we remove the OTHER hack and instead just track if the
token has already been handled via a bool value `explanding`.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5724
Fixes: 28842c2331 ("glcpp: Implement token pasting for non-function-like macros")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14101>
Softpipe was the only driver still using this feature. I had enabled it
in ba22f014f9 ("softpipe: Enable PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS;") for
an instr count win, but it's really not important to that driver and it's
not worth keeping the knob around just for that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14360>
It turns out r600 has a bunch of expectations about the Dimension being in
ADDR[1].x, and sampler or atomic indirects being in ADDR[2].x. It's
simpler to just use this static assignment than our dynamic one, anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14360>
For historical reasons, we ingest 1-bit booleans in NIR but expand them
to 16/32-bit booleans in the backend IR. We need to handle this case
when loading boolean constants, extending from 1-bit to 16/32-bit as
required. This issue is masked by effective constant folding for
booleans, but is visible in a shader from Firefox WebRender.
Fixes: 646e03c451 ("pan/bi: Temporarily switch back to 0/~0 bools")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95
Closes: #5797
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14371>
Instead of emitting a pile of moves to fixed registers at codegen time
and hoping everything works out, add a second staging source to the
BLEND instruction in the intermediate representation containing the dual
source colour, and modify register allocation appropriately. This better
models the operation of blending render target #0 with two sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>
Extend store_combined_output_pan to take a dual source blend input in
addition to colour, depth, and stencil inputs. Use the last source for
this, and represent the type with the DEST_TYPE index. This is a hack
but there is no SRC2_TYPE and NIR doesn't seem to mind as long as we
know what we mean. This allows the backend to emit a combined "blend
render target #0" instruction taking two sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>
This make things all fall back to the common synchronization functions
which will switch things to the new submission path and to use
vk_fence and vk_semaphore.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
radv: Use common AcquireNextImage2KHR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
radv: Use common functions for Metro Exodus layer.
Needs to be squashed with the big "switch the world" deletion patch
but kept it separate for review.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13974>
We basically use 2 types for amdgpu:
1) syncobj. This supports both binary & timeline and autodetects
all the features.
2) emulated timelines if (1) doesn't detect timeline syncobj support.
Note that one has to put these in order of preference so that the
common CreateFence/CreateSemaphore functions decide on the right
type.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13974>
For getting the fd from the device.
This uses the new function introduced with libdrm 2.4.109. This has
already been updated in CI and is already required for mesa common
bits so requiring it for libdrm_amdgpu is no big deal.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13974>
We don't plan to support NV_mesh_shader officially on RADV,
because it performs poorly on AMD hardware. However, we are
implementing this extension to get some experience with mesh
shader technology.
Users should not rely on this support because we are going
to remove it if/when a potential cross-vendor extension appears.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13580>
Generic per-primitive outputs:
They work similarly to other NGG outputs.
In the ISA they are param export instructions that are executed
on the primitive threads. These per-primitive params must be
sorted last among both mesh shader outputs and pixel shader inputs.
PS can read these inputs using the same old VINTRP instructions.
They use the same amount of LDS space as per-vertex PS inputs.
Special per-primitive outputs:
The VRS rate x, y, viewport and layer are special per-primitive
outputs which must go to the second channel of the primitive
export instruction, which is enabled by EN_PRIM_PAYLOAD.
If the PS wants to read these, they must also be exported as
a generic param.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13580>
Lower mesh shader outputs to shared memory.
At the end of the shader, read the outputs from shared memory
and export their values as NGG expects.
We allocate separate shared memory (LDS) areas for per-vertex,
per-primitive outputs, primitive indices, primitive count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13580>
Use firstMipIdInTail directly from addrlib which calculated this
in a different way:
Original way: either dimension size of mipmap should be less than
the tile size.
Addrlib way: all dimesion size of the mipmap should be less than
the tile size and at lest one dimension size should be less than
half of the tile size, so that all following mip levels can fit
in one tile and any commit for level in the mip tail also commit
for all levels in mip tail.
Theoretically either way is OK but addrlib way needs less care
about the mip tail commit and better align with the true memory
layout given by itself.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14223>
Fix defects reported by Coverity Scan.
uninit_member: Non-static class member m_omod is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member m_pred_sel is not initialized in this constructor nor in any functions that it calls.
Suggested-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12294>
There exists hardware intel gen4 specifically that has only 6 clip planes
but supports GLSL 1.30. This enhances the CAP so that the current values
of 0,1 remain the same, but giving it a larger number will override the
max.
This allows the gen4 intel to set this to 6 and leave everyone else alone.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14344>
The theory here is that a staging resource will need to be transferred
into or out of a non-staging resource. If the staging resource is too
big for 1/2 aperture threshold, then it the corresponding non-stage
resource will be similiarly sized. This means there's no hope of fitting
both of them in.
This will cause the st blit transfer path to fall back and allow
max-texture-size to pass.
Fixes piglit max-texture-size
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14347>
Older gens have a limit of 2GB on surfaces, this results in
isl_surf_init_s failing if the surface exceeds that, in this
case this should fail all the way back up the stack.
This fixes some cases of max-texture-size on crocus
Fixes: f3630548f1 ("crocus: initial gallium driver for Intel gfx 4-7")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14347>
these tests are now passing:
dEQP-GLES2.functional.shaders.operator.exponential.pow.highp_vec2_vertex,Fail
dEQP-GLES2.functional.shaders.operator.exponential.pow.highp_vec3_vertex,Fail
dEQP-GLES2.functional.shaders.operator.exponential.pow.highp_vec4_vertex,Fail
dEQP-GLES2.functional.shaders.operator.exponential.pow.mediump_vec2_vertex,Fail
dEQP-GLES2.functional.shaders.operator.exponential.pow.mediump_vec3_vertex,Fail
dEQP-GLES2.functional.shaders.operator.exponential.pow.mediump_vec4_vertex,Fail
Fixes: 1c2c4ddbd1 ("r300g: copy the compiler from r300c")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14282>
This fixes building on macOS:
Disable ZINK_USE_DMABUF on macOS, it is unsupported.
Import vk_mvk_moltenvk.h for MVK_VERSION in zink_resource.c.
Add additional build arguments (see meson.build) to build properly.
To build on macOS, you will probably need to run:
brew install bison llvm cmake libxext xquartz ninja xorgproto meson
And you also need to setup meson with something like:
-Dmoltenvk-dir=/Users/<Username>/VulkanSDK/<VersionNumber>/MoltenVK
-Dc_std=c11
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14255>
A couple things happen as a result of this:
1. This function is more consistent. It aims to clamp clear color ranges
to format-dependent values, but it didn't clamp the upper ranges of
10-bit and 11-bit floats (64512 and 65024 respectively). Those are
handled now.
2. The clear colors are quantized. The quantization method seems to
differ from what the HW uses for slow (and obviously fast) clears.
Due to this change in quantization method, iris starts failing
dEQP-EGL.functional.image.modify.renderbuffer_rgba4_renderbuffer_clear_color.
That's okay however. That test was removed from the mustpass list
because of threshold issues (see egl-test-issues.txt).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7740>
Consider the following innocuous-looking code:
pan_merge(packed, vtx->attributes[i], ATTRIBUTE);
Under the current implementation, this code is completely broken. Why?
The current implemention is a macro which hardcodes the loop index i,
which shadows the i used to index attributes. Pull out a helper method
so we do the right thing without resulting to further preprocessor abuse
(__COUNTER__).
While making things more robust, assert the crucial pan_merge
invariant that the total size is a multiple of 4; if this fails, the
routine risks memory corruption.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14119>
Previously, the lowering was for 8/16/64-bit values, or 8/16-component
vectors. Now, it also handles write masks on 32-bit 1/2/3/4-component
vectors.
DXIL looks like it supports putting an interesting write mask in the
buffer store intrinsic, but DXC never generates stores with write
masks, and multiple drivers completely ignore the write mask.
Also, set the write mask properly on the output intrinsic.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14294>
Fix defects reported by Coverity Scan.
Resource leak (RESOURCE_LEAK)
leaked_storage: Variable signal_semaphore_infos going out of scope leaks the storage it points to
leaked_storage: Variable wait_semaphore_infos going out of scope leaks the storage it points to.
Fixes: 3da7d10d9b ("radv: implement vkQueueSubmit2KHR()")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14260>
The index is a uint8_t but can be assigned a negative 1
value in panvk_pipeline_builder_parse_color_blend()
The comparison to ~0 thus makes sense but clang will complain:
"result of comparison of constant -1 with expression of type
'const uint8_t' (aka 'const unsigned char') is always true
[-Wtautological-constant-out-of-range-compare]"
Fix this by casting to a uint8_t before comparison.
Fixes a warning with clang
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14289>
Fix defect reported by Coverity Scan.
Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression 2 << W
with type int (32 bits, signed) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression
of type uint64_t (64 bits, unsigned).
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14258>
Acknowledgements to android-rpi team and lineage-rpi maintainer (KonstaT)
for creating/testing initial vulkan support. Their experience was used as
a baseline for this work.
Most of the code is a copy of turnip and anv.
Improved by cleaning dEQP failures:
- Improved gralloc support (use allocation time stride, size, modifier).
- Fixed some dEQP crashes due to memory allocation issues.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14016>
We tried to step over the instruction we just generated, except we didn't
always just generate one. In the sequence_vertex tests, that meant we
skipped processing the next BGNLOOP and then underflowed our stack.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14271>
This adds a missing CFG edge that represents a possible physical
control flow path the EU might take under some conditions which isn't
part of the logical CFG of the program. This possibility shouldn't
have led to problems on platforms prior to Gfx12, since the missing
control flow edge cannot possibly influence liveness intervals.
However on Gfx12+ it becomes the compiler's responsibility to resolve
data dependencies across instructions, and the missing physical
control flow paths may lead to a WaR data hazard currently not visible
to the software scoreboard pass, which could lead to data corruption.
Worse, the possibility for this path to be taken by the EU increases
on Gfx12+ due to a hardware bug affecting EU fusion -- However the
same physical path can be potentially taken on earlier platforms as
well, so this patch extends the CFG on all platforms for consistency,
even though the lack of this edge shouldn't lead to any functional
issues on platforms earlier than Gfx12. There are no shader-db
changes on earlier platforms, so there seems to be no disadvantage
from using the same CFG representation as on later platforms.
This issue has ben reported on TGL with the following conformance
test, thanks to Ian for bringing the FULSIM dependency check warning
to my attention:
dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store
Fixes: 4d1959e693 ("intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4940
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Reported-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14248>
Nothing uses it, and i965 was the last thing to. Even if I enable it for
softpipe or crocus, it quickly causes NIR validation failures in shader-db
from swizzles outside the bounds of vectors. Retire it in favor of
nir_opt_vectorize().
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14249>
Without this, a cloned instruction that takes full regs will trigger an
ir3_validate assert. This can happen, for ex, if an instruction that
writes p0.x and has a relative src gets cloned in ir3_sched.
Fixes an assert in Genshin Impact with a debug build.
Fixes: 9af795d9b9 ("ir3: Make ir3_instruction::address a normal register")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14231>
Preload exactly what the shader needs, based on the compiler's mask of
uninitialized registers, rather than trying to sync pan_shader.h with
the behaviour of code gen. Would've saved me some debugging over the
years...
As a bonus this avoids preloading unnecessary registers, particularly in
compute shaders. In theory this should reduce power consumption.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14154>
On XeHP+, Binding Table Pointers are an offset relative to the Surface
State Base Address anymore. Instead, they are relative to the State
Binding Table Pool Address, which is set by the command above.
We emit that command (pointing to the same address as the Surface
State Base Addresss), and everything should stay working as before.
Reworks:
* Jordan: Add iris
* Jordan: Drop i965
* Ken: Set MOCS to avoid a major perf impact. (Found by Felix DeGrood.)
* Jordan: Shrink size from 2MiB to actual iris, anv usage
* Lionel: Add BINDING_TABLE_POOL_BLOCK_SIZE
Ref: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4995
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: Add Iris, adjust sizes]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13992>
Before meson 0.48 the cpu_family() would return 'ppc64le' on little
endian power8. In newer versions it returns 'ppc64' and endianness
should be checked with endian()
We now require meson >= 0.53 so we can drop the compatability with
older versions.
The old behavior was added in e430a034b9
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14240>
While our LIFO scheduling mode attempts to optimize for register
pressure, it's often hard for a scheduling algorithm to do better than
the instruction order provided by the shader author. Shader authors
often do perfectly reasonable things like using texture results
immediately after fetching them or constructing texture coordinates
immediately before the texture op. When we throw all the instruction
ordering information away, we loose any help the author may have given
us. By attempting NONE before we fall back to the worst case LIFO mode.
And, yes, I tried this with NONE both before and after LIFO and doing
NONE before LIFO is substantially better, according to shader-db.
total instructions in shared programs: 19673152 -> 19665202 (-0.04%)
instructions in affected programs: 33669 -> 25719 (-23.61%)
helped: 20
HURT: 0
helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107
helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12%
95% mean confidence interval for instructions value: -867.61 72.61
95% mean confidence interval for instructions %-change: -21.74% -7.46%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 935562500 -> 935020920 (-0.06%)
cycles in affected programs: 18620349 -> 18078769 (-2.91%)
helped: 104
HURT: 48
helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680
helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87%
HURT stats (abs) min: 10 max: 54724 x̄: 6118.62 x̃: 1530
HURT stats (rel) min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46%
95% mean confidence interval for cycles value: -5724.34 -1401.71
95% mean confidence interval for cycles %-change: -9.86% -4.10%
Cycles are helped.
total spills in shared programs: 12158 -> 10327 (-15.06%)
spills in affected programs: 1831 -> 0
helped: 20
HURT: 0
total fills in shared programs: 14749 -> 12635 (-14.33%)
fills in affected programs: 2114 -> 0
helped: 20
HURT: 0
LOST: 8
GAINED: 649
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
The way the current scheduler loop is implemented, each scheduling pass
starts with what the previous pass had. This means that, if PRE screwed
everything up majorly, PRE_NON_LIFO would have to try to fix it. It
also meant that tiny changes to one pass would affect every later pass.
Instead, reset the order of the instructions before each scheduling
pass. This makes the passes entirely independent of each other.
Shader-db results on Ice Lake:
total instructions in shared programs: 19670486 -> 19670648 (<.01%)
instructions in affected programs: 25317 -> 25479 (0.64%)
helped: 2
HURT: 7
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
HURT stats (abs) min: 8 max: 70 x̄: 24.29 x̃: 12
HURT stats (rel) min: 0.41% max: 4.95% x̄: 1.47% x̃: 0.87%
95% mean confidence interval for instructions value: -1.28 37.28
95% mean confidence interval for instructions %-change: -0.04% 2.30%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 935535948 -> 935490243 (<.01%)
cycles in affected programs: 421994824 -> 421949119 (-0.01%)
helped: 1269
HURT: 879
helped stats (abs) min: 1 max: 12008 x̄: 259.38 x̃: 52
helped stats (rel) min: <.01% max: 28.02% x̄: 1.12% x̃: 0.14%
HURT stats (abs) min: 1 max: 29931 x̄: 322.46 x̃: 20
HURT stats (rel) min: <.01% max: 32.17% x̄: 1.74% x̃: 0.22%
95% mean confidence interval for cycles value: -71.37 28.81
95% mean confidence interval for cycles %-change: -0.11% 0.21%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 12403 -> 12430 (0.22%)
spills in affected programs: 1355 -> 1382 (1.99%)
helped: 2
HURT: 7
total fills in shared programs: 15128 -> 15182 (0.36%)
fills in affected programs: 3294 -> 3348 (1.64%)
helped: 2
HURT: 7
LOST: 21
GAINED: 28
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
This reverts commit ba2fa1ceaf. Doing
optimizations after scheduling but before RA means doing them in the
middle of the scheduling loop which introduces additional dependencies
between one scheduling iteration and the next. That won't work if we
want to make the scheduling modes independent, at least not unless we
have some way of fully cloning the IR.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
brw_find_next_block_end() scans through the instructions to find the end
of the block. We were calling it for every instruction in the program
which is, if you have a single basic block, makes the whole mess a nice
clean O(n^2) when it really doesn't need to be. Instead, only call
brw_find_next_block_end() as-needed. This brings it back to O(n) like
it should have been.
This cuts the runtime of the following Vulkan CTS on my SKL box by 5%
from 1:51 to 1:45: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro. This way, we get
nice INTEL_DEBUG=optimizer dumps for it. While we're here, fix the
header comment which is massively out-of-date.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Instead of modifying every single instruction, keep track of which VGRFs
are actually split in a bit-set, and only modify the instructions that
actually touch split regs.
This cuts the runtime of the following Vulkan CTS on my SKL box by 45%
from 3:21 to 1:51: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
SPV_EXT_demote_to_helper_invocation added OpDemoteToHelperInvocation
operation to turn an invocation into a helper invocation, but the
value of HelperInvocation (a builtin from Input storage class)
couldn't be modified dynamically without breaking compatibility.
For the extension the operation OpIsHelperInvocation was added to get
the dynamic value.
For SPIR-V 1.6, the demote operation was promoted, but now to get the
dynamic value the shader must issue a load to HelperInvocation with
Volatile memory access semantics.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14209>
These can't be encoded on GFX6/7, and combining these additions causes
CTS failures on GFX10.3.
I think the low 2 MSBs are ignored before the addition, not after, so
load(a + 3, 0) becomes load(a, 3), which is the same as load(a, 0).
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13755>
Putting it in the pipeline is a bit of a lie. We no longer need it for
nir_lower_wpos_center. The only other user is pipeline_has_coarse_pixel
and that is used to build the shader key which we construct before we've
processed any NIR so we don't have accurate information at that time
anyway. Instead, look at ms_info->sampleShadingEnable directly in
pipeline_has_coarse_pixel and trust the back-end to deal with disabling
coarse when we need per-sample dispatch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
This rolls compute_sample_position into emit_samplepos_setup, its only
caller, by using a loop instead of calling it twice. We also
early-return for the !persample_dispatch case instead of doing it as
part of the sample calculation. This means that we don't call
fetch_payload_reg() to get sample_pos_reg unless we're actually going to
use it so the function is safe to call even if we haven't set up
sample_pos_reg. This will be important for the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
This function is called on load/store_input/output. It makes no sense
for it to get a SYSTEM_VALUE enum. This only doesn't explode because
SYSTEM_VALUE_PRIMITIVE_ID happens to be below VARYING_SLOT_VAR0 so it
doesn't interact with any actual varyings. The next commit is going to
add another system value which will push SYSTEM_VALUE_PRIMITIVE_ID up by
one so it will equal VARYING_SLOT_VAR0 and then the first FS input will
always get smashed to flat which isn't what we want.
Fixes: b59bb9c07a ("radeonsi: force flat for PrimID early in si_nir_scan_shader")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
If signalled on the same queue it is totally useless, so only wait
if we have a syncobj that is explicitly being waited on, which can
be from potentially another queue/ctx. (Ideally we'd check but there
is no way to do so currently. Might revisit when we integrate the
common sync framework)
Fixes: 7675d066ca ("radv/amdgpu: Add support for submitting 0 commandbuffers.")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14214>
This is by far the more common style in Mesa. It also gives a cue that
e.g. num_dependency_ids is a fixed definition rather than some kind of
local variable maintaining a count.
While hre, we also rename the enums to have full prefixes to prepare for
a future where we use them in multiple files for future backend work.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14182>
This commit fixes apps using the following sequence:
1. XCreateWindow(dpy) -> win
2. glXCreateContextAttribsARB(dpy, ...) -> ctx
3. glXMakeCurrent(dpy, win, ctx)
4. glXQueryDrawable(dpy, win, GLX_FBCONFIG_ID, ...)
glXQueryDrawable returned 0 (while correctly returning a valid
GLXFCONFIG_ID for other types of drawables).
This commit adds the same dance as driInferDrawableConfig to get
the GLX visual from the Window, and then the GLXFBCONFIG_ID of
this visual.
This fixes:
* piglit: glx-query-drawable --attr=GLX_FBCONFIG_ID --type=WINDOW
* Maya which uses the config ID from step 4 as an input to
glXChooseFBConfig.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14174>
This reverts commit 6eb3fe2d4f. The root cause was
a bug in Meson when using the new gtest protocol and a test failed before producing
the XML file expected by it. This was fixed in later versions of Meson, so
we've bumped the required meson version to use that feature. The failure should
now be properly identified, so re-enabling the NIR test.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14204>
Cheatsheet:
_mesa_has_ARB_texture_cube_map() becomes (true &&
ctx->Extensions.Version >=
_mesa_extension_table[...].version[ctx->API]). The last value is 0 when
ctx->API is API_OPENGL_COMPAT and ~0 otherwise. The whole function
effectively becomes (ctx->API == API_OPENGL_COMPAT).
_mesa_has_OES_texture_cube_map() becomes (true &&
ctx->Extensions.Version >=
_mesa_extension_table[...].version[ctx->API]). The last value is 0 when
ctx->API is API_OPENGLES and ~0 otherwise. The whole function
effectively becomes (ctx->API == API_OPENGLES).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14203>
Having an integer destination type instead of a float destination type
confuses the SWSB code. This causes problems on some Intel GPUs. Fix
this by using the correct type in the destination of the F32TOF16
opcode.
Gfx7 doesn't have the HF type, so continue to emit W on that platform.
The assertions in brw_F32TO16 (brw_eu_emit.c) are updated to reflect
this. In scalar mode, UD is never emitted as a destination type for
this opcode, so remove it from the allowed types in the assertion.
I also condidered doing something like de55fd358f ("intel/fs/xehp:
Teach SWSB pass about the exec pipeline of
FS_OPCODE_PACK_HALF_2x16_SPLIT."), but Curro recommended that just using
the correct types is a better fix. I agree.
v2: Add missing changes to fs_generator::generate_pack_half_2x16_split.
I'm not sure how I (and the Intel CI) missed that the first time. :(
v3: Fix copy-and-paste issue in the v2 fix. Noticed by Tapani.
Reviewed-by: Francisco Jerez <currojerez@riseup.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14181>
Shmems are allocated internally and are only for CPU access. They can
be easily cached.
Venus have 4 sources of shmem allocations
- the ring buffer
- the reply stream
- the indirection submission upload cs
- one cs for each vn_command_buffer
The first one is allocated only once. The other three reallocate
occasionally. The frequencies depend on the workloads.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14179>
It suballocates from a shmem pool owned by vn_instance. The goals are
to speed up shmem allocations for VkCommandBuffer and to reduce the
number of BOs. Both are crucial when shmems are HOST3D BOs, because
they require roundtrips to the renderer to allocate and they take up KVM
memslots.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14179>
They are being used on integer to integer stores. From Vulkan sec,
final paragraph of 16.4.4 "Texel Output Format Conversion":
"Each component is converted based on its type and size (as
defined in the Format Definition section for each
VkFormat). ... Integer outputs are converted such that their value
is preserved. The converted value of any integer that cannot be
represented in the target format is undefined."
I didn't find a equivalent quote for OpenGL as all conversion entries
are forcused on float to integer, fixed-point to integer, etc, and not
on integer to integer. Didn't find any test failure with this change.
We didn't get any shader-db stats change with shaderdb (even
overriding to OpenGL 4.4 to get more shaders built), so as a reference
Vulkan shader-db stats with the pattern
dEQP-VK.image.*.with_format.*.*
total instructions in shared programs: 37534 -> 36522 (-2.70%)
instructions in affected programs: 12080 -> 11068 (-8.38%)
helped: 241
HURT: 0
Instructions are helped.
total uniforms in shared programs: 9100 -> 8550 (-6.04%)
uniforms in affected programs: 3004 -> 2454 (-18.31%)
helped: 229
HURT: 0
total max-temps in shared programs: 6110 -> 6014 (-1.57%)
max-temps in affected programs: 402 -> 306 (-23.88%)
helped: 43
HURT: 0
Max-temps are helped.
total nops in shared programs: 1523 -> 1526 (0.20%)
nops in affected programs: 21 -> 24 (14.29%)
helped: 3
HURT: 6
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14194>
When the BO is NULL, AMDGPU will reset the PTE VA range to the initial
state. Otherwise, it will first unmap all existing VA that overlap the
requested range and then map. This seems better than using MAP/UNMAP.
This reduces stuttering in Forza Horizon 5.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14116>
This shouldn't be necessary because applications have to manage
resources and memory themselves.
This also prevented memory to be freed if an application doesn't unbind
a sparse memory object and free it, which is legal as long as the
resource isn't used afterwards.
This was introduced to unmap the sparse mappings when destroying
a virtual BO, but now that the driver uses OP_CLEAR it's no longer
needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14116>
When we get into create_beginend_table, ctx->Exec only contains nops
set by _mesa_alloc_dispatch_table. This function calls
_mesa_alloc_dispatch_table too, so table and ctx->Exec are identical,
and then it copies identical entries from one table to the other.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14000>
Function pointers were first set in GLvertexformat, and then
GLvertexformat was copied to the dispatch.
This just sets the function pointers in the dispatch directly,
skipping the intermediate GLvertexformat structure.
The code with SET_* calls is autogenerated by api_vtxfmt_init_h.py.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14000>
We could remove them from other header files now.
This purposefully omits "_exec" in _mesa_exec such as _mesa_exec_Begin
to make it pretty. Later commits will remove _exec from names, e.g. it
will become _mesa_Begin. The only other variants are really just
save_Begin (dlist) and _save_Begin (vbo).
The autogenerated file looks like this:
void GLAPIENTRY _mesa_NewList(GLuint list, GLenum mode);
void GLAPIENTRY _mesa_EndList(void);
void GLAPIENTRY _mesa_CallList(GLuint list);
void GLAPIENTRY _mesa_CallLists(GLsizei n, GLenum type, const GLvoid * lists);
void GLAPIENTRY _mesa_DeleteLists(GLuint list, GLsizei range);
GLuint GLAPIENTRY _mesa_GenLists(GLsizei range);
void GLAPIENTRY _mesa_ListBase(GLuint base);
void GLAPIENTRY _mesa_Begin(GLenum mode);
void GLAPIENTRY _mesa_Bitmap(GLsizei width, GLsizei height, GLfloat xorig, GLfloat yorig, GLfloat xmove, GLfloat ymove, const GLubyte * bitmap);
void GLAPIENTRY _mesa_Color3b(GLbyte red, GLbyte green, GLbyte blue);
void GLAPIENTRY _mesa_Color3bv(const GLbyte * v);
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14000>
To make sure that apps actually get something when the HW doesn't
support ETC2. To do that we decompress after every copy operation.
Includes a quite complicated decode shader. It is not bit-to-bit
equivalent to AMD APUs that support ETC2, but close enough to
pass CTS. Likely missing bits are related to the R11 and R11G11
formats where we decode to 16 bits but likely do the extension
differently.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14071>
This was workaround for the users of gbm_bo_create_with_modifiers(),
which were unable to specify the buffer usage (GPU / GPU+DISPLAY).
But after the commit [1] this become possible. And forcing usage to
GBM_BO_USE_SCANOUT migrated directly into gbm_bo_create_with_modifiers
[2], allowing us to remove such workarounds from the drivers.
This makes possible to allocate the buffers in VRAM using
{gbm_bo_create_with_modifiers2 | gbm_bo_create} and providing correct
use flag thus saving CMA memory.
This should also enable tiling for such buffers.
[1]: 268e12c605 ("gbm: add gbm_{bo,surface}_create_with_modifiers2")
[2]: ad50b47a14 ("gbm: assume USE_SCANOUT in create_with_modifiers")
Signed-off-by: Roman Stratiienko <roman.o.stratiienko@globallogic.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14151>
Instead of stopping the merge process when we find an instruction
with an incompatible signal (such as an small immediate), keep
going and see if we can merge the thrsw in a previous instruction
that is compatible.
total instructions in shared programs: 13409835 -> 13356648 (-0.40%)
instructions in affected programs: 3556860 -> 3503673 (-1.50%)
helped: 17457
HURT: 18
Instructions are helped.
total max-temps in shared programs: 2353971 -> 2352956 (-0.04%)
max-temps in affected programs: 13960 -> 12945 (-7.27%)
helped: 703
HURT: 0
Max-temps are helped.
total spills in shared programs: 12301 -> 12301 (0.00%)
total sfu-stalls in shared programs: 32596 -> 32499 (-0.30%)
sfu-stalls in affected programs: 225 -> 128 (-43.11%)
helped: 79
HURT: 3
Sfu-stalls are helped.
total nops in shared programs: 347204 -> 325234 (-6.33%)
nops in affected programs: 99834 -> 77864 (-22.01%)
helped: 11515
HURT: 158
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14172>
Since this graph is actually not oriented, its adjacency matrix can be
represented using less than half bits required by full adjacency matrix.
It reduces memory consumption and number of cache misses. It also simplifies
logic of growing this matrix - no need to touch adjacency bits for previously
allocated number of nodes.
Move adjacency bits from nodes to graph to reduce the number of allocations.
No changes to shader-db.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kostiantyn Lazukin <kostiantyn.lazukin@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14189>
Add a helper that maps a heap to the related cache bucket information.
This avoids complicating existing ternaries when new cache buckets are
added.
Rework:
* Jordan: Add default and set pointers in default branch of
bucket_info_for_heap to prevent "may be used uninitialized" warning
in release builds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14012>
Based on Rafael's:
* "nir/lower_tex: Add option to lower offset for tg4 too."
* "intel/compiler: Lower offsets for tg4 on gen9+."
* "WIP: Do not lower basic offsets."
* "WIP: intel/compiler: Enable lowering offsets restriction."
But, with these changes:
* Fixed range checking to be signed 4 bits
* Converted to filter
* Apply only to gfx12.5+
* Use nir_src_is_const / nir_src_comp_as_int (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>
From the Vulkan spec:
"If the pNext chain of VkCommandBufferInheritanceInfo includes a
VkCommandBufferInheritanceRenderingInfoKHR structure, then that
structure controls parameters of dynamic render pass instances
that the VkCommandBuffer can be executed within. If
VkCommandBufferInheritanceInfo::renderPass is not VK_NULL_HANDLE,
or VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT is not
specified in VkCommandBufferBeginInfo::flags, parameters of this
structure are ignored."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14109>
Insert a flush after a depth decompression pass if the texture
was fast cleared.
This fixes a corruption which seems to only affect gfx10.3 chips.
Ideally we should also clear tex->need_flush_after_depth_decompression
after a flush but there's no easy way for this so this commit will
introduce extra flushes.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14089>
Currently V3DV_HAS_SURFACE is always defined.
There is no WSI for Android in mesa3d, therefore WSI related extensions
should not be exposed.
1. Define V3DV_HAS_SURFACE only for platforms which has WSI implemented.
2. Rename V3DV_HAS_SURFACE -> V3DV_USE_WSI_PLATFORM to align naming
with other platforms.
Fixes dEQP-VK.wsi.android.surface#query_protected_capabilities
Fixes: 79e4451430 ("v3dv: move extensions table to v3dv_device")
Signed-off-by: Roman Stratiienko <roman.o.stratiienko@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14144>
debian-vulkan but not any other CI pipeline consistently fails with:
FileNotFoundError: [Errno 2] No such file or directory: 'nir_tests.xml'
I have to assume that either debian-vulkan is broken, or the NIR test
infrastructure is broken. That's not all. I got the same failure when
I wanted to add a new test, which means the CI is preventing us from adding
new NIR tests, which is a very serious problem with the CI or NIR tests.
The python error doesn't imply that it's a test failure, so something else
is broken. If you don't want such commits to happen again, print better
error messages.
See also the discussion in the MR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>
When a resource is multisampled, we usually submit a multisampling
resolving blit before we present it or use it in some other way, but
currently we don't always flush the cmd buffer before flushing the
frontbuffer, this commit fixes that.
Fixes piglit's glx/glx-copy-sub-buffer MSAA cases on vtest, in
conjunction with other commits of this series.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11714>
We were setting anv_pipeline::sample_shading_enable based on
sampleShadingEnable without looking at minSampleShading. We would then
pass this value into nir_lower_wpos_center which would add sample_pos to
frag_coord. Then the back-end compiler picks up on the existence of
sample_pos and forces persample dispatch. This leads to doing
per-sample dispatch whenever sampleShadingEnable = VK_TRUE regardless of
the value of minSampleShading. This is almost certainly costing us
perf somewhere.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14022>
We can't map the CCS on this platform to initialize it into the
PASS_THROUGH state. This can cause issues with optimizations in the
driver that rely on this state.
For example, after rendering to a surface with AUX_NONE, we can then
render to it with AUX_CCS_E without an ambiguate in between (if the CCS
in the PASS_THROUGH state). If that state was incorrect and the aux was
actually compressed, there can be rendering corruption because the
contents may be misinterpreted on the second render.
Use a more accurate initial aux state to avoid these issues.
One notable change in behavior here is that aux surfaces can be created
with fast-cleared blocks even though the caller may specify a modifier
that doesn't support fast clears. This should be fine, so long as all HW
units that can access these surfaces can handle that bit-pattern. We
haven't seen an applicable restriction yet.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
The assert was introduced in a function that allocated an auxiliary
surface BO, iris_resource_alloc_aux. After refactors, the function it's
in now, iris_resource_configure_aux, no longer does this allocation.
Drop the assert because its purpose is unclear and it's no longer
relevant for CCS on XeHP.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
The only depth/stencil aux usage that can actually use the BO is
ISL_AUX_USAGE_HIZ_CCS_WT. Even with that aux usage, iris may disable
sampling depending on the surface configuration.
Allocate the clear color BO when it'd be usable, not just when the
auxiliary surface size is non-zero on ICL+. This prepares for CCS on
XeHP, which won't have an auxiliary surface.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
isl_dev.ss.clear_color_state_size is already zero on BDW and SKL. Drop
the redundant platform check and return the field directly.
We're going to have this function return zero more often and it will do
so uniformly using if-statements. We choose to remove the redundant
expression instead of adding a redundant if-statement.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
Have iris_resource_init_aux_buf compute the clear color state size
(with an iris_screen struct) instead of passing it in directly.
We're going to move the function call soon. This keeps us from having to
move a passed in variable along with it.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
According to Bspec 2424, "Render Target Resolve":
The Resolve Rectangle size is same as Clear Rectangle size from SKL+.
Use get_fast_clear_rect in blorp_ccs_resolve for SKL+.
Note that the Bspec differs from Vol7 of the Sky Lake PRM, which only
specifies aligning by the scaledown factors.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
While setting up DITHER_MODE allows alpha blending to work properly
together with dithering on new GPUs (those with PE_DITHER_FIX), older
cores still change the render target. As dithering is optional and
implementation defined we can simply disable it on the affected GPUs,
when alpha blending is enabled to work around this bug.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13396>
We weren't doing much motion in nir-to-tgsi because we considered all our
lowered load-ubos as unmovable.
softpipe shader-db:
total temps in shared programs: 563942 -> 563136 (-0.14%)
temps in affected programs: 9833 -> 9027 (-8.20%)
r300 shader-db:
instructions in affected programs: 22858 -> 23575 (3.14%)
temps in affected programs: 2039 -> 1813 (-11.08%)
(NIR had given r300 -19% instrs for +40% temps, so this feels like a
worthwhile trade back).
Reivewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14138>
Feature to print out EGL returned configs for debug purposes.
'eglChooseConfig' and 'eglGetConfigs' debug information printout is
enabled when the log level equals '_EGL_DEBUG'. The configs are
printed, and if any of them are "chosen" they are marked with their
index in the chosen configs array.
v2:
a) re-factor the code in line with review comments
b) rename function _snprintfStrcat, split it out and put into the
src/util/u_string.h, make it a separate patch.
v3:
remove unnecessary 'const' qualifiers from the function prototype
v4:
a) re-factor the code in line with review comments
b) move util_strnappend from utils back to eglconfigdebug.c. In my
opinion this function is the best way of achieving the desired
result, so it still used but made private to eglconfigdebug.c.
v5:
a) drop unused parameter from function signature
b) more const annotations
c) directly access config parameters instead of going
through _eglGetConfigKey
Signed-off-by: Silvestrs Timofejevs <silvestrs.timofejevs@imgtec.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13705>
- remove uninteresting lines of output
- remove binary offset before instructions, for easier diffing
- replace generated labels with block numbers
- add encoded instructions at the end of lines, like LLVM dissaembly
- print constant data instead of trying to disassemble it
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14042>
This moves some ops down to when they're needed, generally reducing the
number of temps in use. It's not always a win -- sometimes you can end up
moving a generator of a component used by a nir_op_vec down, which means
that op's sources stay live while the vec (whose register likely gets
coalesced with the ops creating it) is also live. But it's generally
good.
softpipe results:
temps in affected programs: 18115 -> 18026 (-0.49%)
imm in affected programs: 19 -> 22 (15.79%)
r300 results:
instructions in affected programs: 174 -> 178 (2.30%)
vinst in affected programs: 156 -> 160 (2.56%)
sinst in affected programs: 54 -> 50 (-7.41%)
temps in affected programs: 2634 -> 2169 (-17.65%)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14096>
This brings us into parity on state tracker paths with most other
supported drivers, and a lot of additional optimization on our shaders.
Results on a subset of shader-db that doesn't crash:
instructions in affected programs: 59502 -> 47991 (-19.35%)
vinst in affected programs: 17633 -> 15197 (-13.82%)
sinst in affected programs: 9296 -> 7319 (-21.27%)
flowcontrol in affected programs: 627 -> 310 (-50.56%)
presub in affected programs: 4220 -> 1554 (-63.18%)
temps in affected programs: 5775 -> 8570 (48.40%)
lits in affected programs: 215 -> 37 (-82.79%)
The temps (register usage) increase is unfortunate, but it seems that
instruction counts tend to be our limit before reg counts are.
Fixes: #3354
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14096>
Newer version of XServer supporting GLX_DRAWABLE_TYPE query also
support query with raw X11 window ID besides GLXWindow ID. So we
should not limit the suppported type to GLXPbuffer when query
success.
Otherwise can't start GLX application on newer XServer with:
libGL error: GLX drawable type is not supported
libGL error: GLX drawable type is not supported
X Error of failed request: GLXBadContext
Major opcode of failed request: 149 (GLX)
Minor opcode of failed request: 5 (X_GLXMakeCurrent)
Serial number of failed request: 35
Current serial number in output stream: 35
Fixes: 6625c960c5 ("glx: check drawable type before create drawble")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14120>
Previously a predicated BREAK that appeared immediately before the WHILE
would get merged into the WHILE. This doesn't work if other flow
control (e.g., a CONT) can transfer directly to the WHILE.
On Intel platforms, this fixes the CTS test
dEQP-VK.graphicsfuzz.stable-binarysearch-tree-nested-if-and-conditional.
No shader-db changes on any Intel platform.
When this commit was first created (over a month before it is going to
land), there were some regressions that were prevented by other commits
in MR !13095. That does not appear to be the case now, so I don't know
what changed. Basically, the treatment of discard as a combination of
demote and terminate causes additional continues in some loops, and
those continues trigger this bug. The other commits from that MR
prevent those continues from being generated in the first place.
All Intel platforms had simlar fossil-db results. (Ice Lake shown)
Instructions in all programs: 144419989 -> 144419995 (+0.0%)
SENDs in all programs: 6947332 -> 6947332 (+0.0%)
Loops in all programs: 38277 -> 38277 (+0.0%)
Spills in all programs: 204075 -> 204075 (+0.0%)
Fills in all programs: 319480 -> 319480 (+0.0%)
A few shaders in Doom 2016 were hurt by one instruction each. It seems
likely that these shaders would have experienced at least some
mis-rendering.
Closes: #4213
Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14128>
When mesa3d is built without VK_USE_PLATFORM_DISPLAY_KHR definition,
dEQP test fails:
dEQP : Test case 'dEQP-VK.info.instance_extensions'..
dEQP : Fail (Extension VK_KHR_get_display_properties2 is missing
dependency: VK_KHR_display)
dEQP : DONE!
Enable KHR_get_display_properties2 only if VK_USE_PLATFORM_DISPLAY_KHR
is enabled.
Fixes: f884c2e3be ("v3dv: implement VK_KHR_get_display_properties2")
Signed-off-by: Roman Stratiienko <roman.o.stratiienko@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14047>
1. We can create resource with size of "1" on drm, because size
is not passed to the renderer.
2. We can't create resource with size of "1" on vtest, because shmem
is created based on that.
3. If renderer supports copy_transfer_from_host, then use staging
buffer for transfer in both ways to and from host.
This will allow to reduce memory consumption in the guest.
v2:
- add inline function for checking if we can use this optimization
- add check in readback path. If renderer doesn't support
copy transfer from host, then we need to go with previous
path in readback (through transfer_get ioctl)
v3:
- fix logic for readback
v4:
- refactor the implementation to integrate it more to
existing code base
v5:
- reuse COPY_TRANSFER3D in both directions
v6:
- encode direction in COPY_TRANSFER3D if host supports it
v7:
- renamed cap bit
- introduced COPY_TRANSFER3D_SIZE_LEGACY define
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13689>
ppc64le TSD disabled for now since I am insufficiently familiar with ppc
asm. x86 pthread stubs deleted because adding USE_ELF_TLS to the #if is
isn't worth the effort since it only saves a single function call on
initial entry (TSD stubs are not used for read-only text now). also
potentially fix non-pthread TSD builds (_glapi_Dispatch was undefined),
but untested (could still be broken).
Tested-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Jan Beich <jbeich@freebsd.org>
Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13935>
The GLX code had to special case these for uninteresting reasons, but we
don't support them anymore in Mesa so all this would do is keep them
sorta-working over GLX protocol. Given that Mesa hasn't supported them
on the renderer side since ~2011 let's stop pretending they're real. If
we get around to modernizing the indirect GLX code (hah) we can revisit
these then.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14085>
Previously, the number of previously encoded frames the encoder handled
was 1 - the encoder now supports many more encoded pictures, so the
encoder now has to keep track of multiple reconstructed pictures.
v2: Add a check to make sure an array index is not negative (Boyuan)
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13915>
Sets the not_referenced parameter to be the same as the previously
hardcoded frontends/va value (false) to ensure UVD/VCE encoding
functionality remains unaffected by the change in frontends/va code.
This commit will eventually be reverted once more testing is completed.
Fixes: a90802ef644 ("frontends/va/enc: allow for frames to be marked as (not) referenced")
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13915>
Base the number of reconstructed pictures the encoder allocates based on
the number of reference pictures to be used for encoding. Also move the
calculation and allocation of reconstructed pictures to VCN 1, from VCN
2.
v2: Add back the accidentally deleted
'two_pass_search_center_map_offset' (Boyuan)
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13915>
This virtual instruction is translated into multiple half float
physical instructions, even though its destination is typically of
integer type, which prevents the software scoreboard pass from
inferring the correct execution pipeline for the virtual instruction
on XeHP+ platforms. Teach the SWSB lowering pass about this
inconsistency between the IR and physical instruction types.
Fixes among other tests:
dEQP-GLES31.functional.shaders.builtin_functions.pack_unpack.packhalf2x16_compute
Fixes: d4537770bb ("intel/fs: Add helper functions inferring sync and exec pipeline of an instruction.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5685
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14002>
These formats are used when creating surfaces with the
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS modifier.
Makes iris pass the out-of-tree piglit test,
ext_image_dma_buf_import-intel-modifiers.
Fixes: 1433fe7860 ("intel/isl: Unify fmt checks in isl_surf_supports_ccs")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14082>
Inferring from blob's cmdstream the size of shader instruction
cache for:
- a630 is 64
- a650 is 128
- a660 is 128
On a650 and a660 gpu could hang if we exceed the limit. Though
it is not reproducible with computerator or a single amber
test. Also while blob limits the size to 128 - Turnip still
hangs with it but does not hang with the limit of 127.
On a630 there seem to be no hang when limit is exceeded.
Fixes the hang of compute shader in Alien Isolation on a650/a660.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14044>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each variable assuming
variables mostly don't have conditions or the conditions are mostly
reused). Reduces libvulkan_intel.so size by 49kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each expression
(assuming expressions mostly don't have conditions or the conditions are
mostly reused). Reduces libvulkan_intel.so size by 8.7kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
Even with packing all 3 types into a 40-byte union (nir_search_constant
being 24 bytes and nir_search_expression having formerly been 32), and
having a single array of them, this cuts 1.7MB from each of
libvulkan_intel.so and libgallium_dri.so.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
Now that the classic drivers that were mixing the use of these asm
and glsl shader fields are gone we can finally use a union here.
This basically reverts commit 9d99dc4bc1 but also moves a
read of IsPositionInvariant inside an arb asm only code block
for safety.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14059>
On XeHP, the XY_BLOCK_COPY_BLT command has a number of fields that
describe the layout of the surface, much like SURFACE_STATE does.
Several of them are encoded in such a similar manner that we really
would like to reuse the isl helpers for emitting those. This commit
moves them into a new isl_genX_helpers.h file which I can include
from the BLORP code. (The alternative would be to add XY_BLOCK_COPY_BLT
filling commands to isl, but that...seems more like a BLORP feature.)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14094>
This may not be a complete set, as I haven't been able to run dEQP-GLES2
to completion (GPU hangs at some point, no particular test seems to be
guilty). But this will help me assess NIR-to-TGSI for the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14092>
Instead of using Python hex() to print the result, print the result in
the same format as the disassembler for easy visual comparison. This
means we don't need to reprint the expectation. This gives output like:
7c 7d 11 33 04 80 66 00 LD_ATTR_IMM.v4.f16.slot0 @r0:r1, `r60, `r61, index:0x1
7c 7d 10 33 04 80 66 00 Incorrect assembly
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14065>
Valhall v9 introduces a number of new data structures since Bifrost v7,
and removes a number of traditional data structures. Add decode routines
for the new Valhall data structures, and condition the old routines on
(PAN_ARCH <= 7) to remain buildable and warning-free.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14063>
Fork the latest canonical XML (Bifrost v7) and adapt to the data
structures found in the earliest Valhall GPU I could get my hands on
(Valhall v9). This should minimize the churn needed for the port by
keeping the Valhall model close to the Bifrost we already supported.
It is not known what happened to v8. It appears to have been yeeted from
existence.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14063>
This was useful for emulating GL 3.2 in virgl on a GLES3 host renderer,
before GL_EXT_depth_clamp introduced the ability for hardware drivers to
expose the feature on GLES. Now that we have that, the desktop-GL-capable
HW that virgl cares about can expose desktop GL even on its GLES renderer
on the host without this emulation. I don't think anyone particularly
cares about hitting higher GL versions on actually-core-GLES hosts with
virgl.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13729>
These kernels aren't tested (and are probably broken) elsewhere. Don't
waste cycles trying to compile for other architectures. This reduces the
amount of code that needs to be ported to a new architecture.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14064>
this currently makes the dEQP-GLES31.functional.image_load_store.buffer.image_size.readonly_12
test fail when used simultaneously with other tests that lead to hitting the cache.
For instance the combination of:
dEQP-GLES31.functional.image_load_store.buffer.atomic.or_r32i_result
dEQP-GLES31.functional.image_load_store.buffer.atomic.or_r32i_return_value
dEQP-GLES31.functional.image_load_store.buffer.image_size.readonly_12
results in a failure of the readonly_12 test.
Deflag dEQP-GLES31.functional.image_load_store.buffer.image_size.{read,write}only_12 as flakes.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14045>
ASAN found a leak:
```
Direct leak of 1440 byte(s) in 10 object(s) allocated from:
#0 0x4a9a92 in calloc (build-Monado-CMake/src/xrt/targets/service/monado-service+0x4a9a92)
#1 0x7fdf82afed06 in drmDeviceAlloc build-drm/../drm/xf86drm.c:3933:14
#2 0x7fdf82b00203 in drmProcessPciDevice build-drm/../drm/xf86drm.c:3965:11
#3 0x7fdf82b00203 in process_device build-drm/../drm/xf86drm.c:4359:16
#4 0x7fdf82b0485e in drmGetDevice2 build-drm/../drm/xf86drm.c:4528:15
#5 0x7fdf70751113 in device_select_find_xcb_pci_default ../src/vulkan/device-select-layer/device_select_x11.c:95:13
#6 0x7fdf70751113 in get_default_device ../src/vulkan/device-select-layer/device_select_layer.c:395:21
#7 0x7fdf70751113 in device_select_EnumeratePhysicalDevices ../src/vulkan/device-select-layer/device_select_layer.c:456:33
```
Cc: mesa-stable
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14068>
It's a *long* time since SGI was the copyright holder for the OpenGL
trademark. And we implement more APIs by now, so let's update the
disclaimer to instead redirect to the Khronos licensing page for
details.
While we're at it, soften the language on legal status as a formal
implementation, as we currently have conformant drivers for most of the
APIs by now. But refer to the Khronos website for details, as
conformance status for drivers are subject to change.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13833>
Straightforward, just converting to a renderpass as well. Note that
we now own the renderpass so I also added a bool to check if we own
it so we can destroy it after recording.
Doing the destruction at destroy & reset time, as reset can be called
during recording, and destroy all the time.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13721>
This is just the naive implementation that create a new renderpass
and then destroys it at the end.
I do it this way because in meta operations we are still creating
temporary subpasses for a renderpass for e.g. the resolve.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13721>
Just remove queries that are never used or proceeded with. The latter
case leading to undefined values.
v2: Don't use nir_shader_instructions_pass() to find variables (Caio)
Simplify replacement (Caio)
v3: Don't track all the queries intrinsic effects (Caio)
Rename things to represent only read queries (Caio)
Use set instead of hash_table (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
bo allocations for the below cases are after device memory allocation:
1. direct non-external mappable memory allocation
2. pool grow
3. exportable memory allocation
For (1) and (2), the bo is for mapping, which is a pure kernel operation
to happen later. So roundtrip waiting can be deferred until free memory.
For (3), the bo is for either fd export or mapping, which are both pure
kernel operations. So roundtrip waiting can also be deferred until free
memory.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13874>
The behavior we stick to is that the base_bo is always created for pool
memory so that to keep it alive during suballocates and frees.
This CL does the below:
1. rename pool_alloc to pool_suballocate and align the api interface
2. rename simple_alloc to pool_grow_alloc to make it pool specific
3. refactor pool_free and simple_free into a pair of pool_ref and
pool_unref to simplify that vkFreeMemory is only called after the
pool bo gets destroyed.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13874>
Use the same URB access helpers that were added for Task Output. The
Arrayed I/O (per-primitive and per-vertex) is handled by applying the
pitch from the MUE layout into the NIR intrinsics and including the
non-arrayed offset on top of it. After that, the index src can be
used directly for lowering.
Because we keep around the non-arrayed offset AND the pitch is
aligned, we can identify cases where the access is indirect but
guaranteed to be aligned, and dispatch a single message. Added a TODO
to explore that later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
Implement the output written by the task *workgroup* and available to
all the mesh *workgroups* dispatched from that task. We currently
ignore any layout annotations (since they are not really testable) and
produce a (packed) layout ourselves.
The URB messages are only SIMD8, so for larger SIMDs, the functions
will produce multiple messages. Making this lowering here instead of
the generic lower_simd_width() since it is not just a matter of
zip/unzip, e.g. the offset must be adjusted.
Indirect writes/reads are implemented by handling one component at a
time and using the PER_SLOT variant of the messages.
Note that VK_NV_mesh_shader allows reading outputs, so add support for
that as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
Task/Mesh stages are CS-like stages, and include many
builtins (e.g. workgroup ID/index) and intrinsics (e.g. workgroup
memory primitives) originally present only in CS.
This commit add two new stages (task and mesh) that 'inherit' from CS
by embedding a brw_cs_prog_data in their own prog_data structure, so
that CS functionality can be easily reused. They also currently use
the same helpers to select the SIMD variant to use -- that was
recently added for CS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
In Fragment Shader, regular inputs are laid out in the thread payload
in a one dword per each half-GRF, that gives room for having the two
delta dwords needed for interpolation.
Per-primitive inputs are laid out before the regular inputs, and since
there's no need to have delta information, they are packed. So
half-GRF will be fully filled with 4 dwords of input.
When num_per_primitive_inputs is zero (the default case), behavior
should be the same as before.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
This extension is controlled by the ESSL feature level. Bump it up since
all parts of OES_gpu_shader5 should be supported.
This also avoids lowering all of the "advanced" functions (which should
probably not be lowered in the first place since they're part of ES
3.1...)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14035>
Widens the given bit mask by a multiplier, meaning that it will
replicate each bit by that amount.
For example: 0b101 widened by 2 will become: 0b110011
This is typically used in shader I/O to transform a 64-bit
writemask to a 32-bit writemask.
Moving this function here because it is used in multiple places.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14005>
Even for drivers not supporting layers, we need to include the layer
coordinate (zero in this case) when using array textures in the
download/upload FS.
Otherwise we are missing a component to get the texture, which ends up
in a malformed NIR shader.
v5 (Ilia):
- Do not emit needless layer code.
v6 (Ilia):
- Add comment clarifying the code logic behind.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
In the V3D driver there is a NIR lowering step for `image_store`
intrinsic, where the image store format is required for doing the proper
lowering.
Thus, let's define it for the download FS instead of
keeping it as NONE.
v2 (Illia)
- Use format only for drivers not supporting format-less writing.
v4 (Illia):
- Use PIPE_CAP_IMAGE_STORE_FORMATTED to reduce combinations.
v5 (Ilia):
- Use indirect array for download FS in not formatless-store support
drivers.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
On clearing a color buffer, clamp the passed color values to the allowed
ones.
Hardware do clamping for TLB values, but not for clear values.
v2 (Iago)
- Add comment about hardware-based clamping on clear values.
v3 (Iago):
- Use format utils to simplify clamping
- Move clamp color function as utility
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
If PBO require to use a GS, this is created with TGSI.
For drivers preferring NIR shaders, they need to translate it from TGSI
to NIR. But unfortunately TGSI to NIR for GS is not implemented, which
makes it crash.
So let's use a GS only for drivers preferring TGSI.
v3:
- Add comment (Iago and Alejandro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
So far we were relying on the supported formats filtering when
creating images from the user API.
But for some other (internal) uses, some of the formats that pass the
filter need to be restricted when binding them as shader images, as they
are not supported for this case.
v3 (Iago):
- Change commit message.
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
From the GL_OES_texture_buffer spec:
"If no buffer object is bound to the buffer texture, the results
of the texel access are undefined."
this can be interpreted as allowing any result to come back, but not
terminate the program.
The current solution is not entirely complete, as it could still try to
get a wrong address for the shader state address.
This can be checked with piglit test
arb_texture_buffer_object-render-no-bo; the test is skip because it
requires OpenGL 3.1, but if overriding the version then it will crash.
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
This commit handles the support for texture buffer objects. In general
it is mostly about using the buffer info from the pipe_image_view
instead of the texture info.
v2:
- Rework some assertions (Iago)
- Remove needless comment (Alejandro)
- Fix comment typos (Iago)
v3:
- Fix typos (Iago)
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
Move all the NIR related debug environmental variables in a single
NIR_DEBUG one.
Use NIR_DEBUG=help to print all the available options.
v2:
- Use a macro to simplify (Marcin, Jason)
- Remove wrong changes (Marcin)
v3 (Marcin):
- Remove rendundant NIR mentioning in option descriptions.
- Unwrap option descriptions.
- Ensure the constant is unsigned.
- Use extern array to remove switch.
v4:
- Add missing kernel shader (Jason).
- Add unlikely() (Marcin).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13840>
The early-Z test uses Z values produced from FEP, so when
we write Z from a shader we need to disable EZ. However, there
are some instances where want to write the FEP-Z from the shader,
in which case we would not need to disable EZ.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>
From VK_KHR_synchronization2:
"Image memory barriers that do not perform an image layout
transition can be specified by setting oldLayout equal to
newLayout.
E.g. the old and new layout can both be set to
VK_IMAGE_LAYOUT_UNDEFINED, without discarding data in the
image."
v2: make assert more readable (Lionel Landwerlin)
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14008>
As described in "intel: Add intel_gem_create_context_engines", this
should make it easier to support an I915_ENGINE_CLASS_FOO engine in
the future. For example, maybe something like:
98c3bbd5b5
Reworks:
* Tweak engine counting logic (s-b Ken)
* Tweak init of engine_classes in iris_init_engines_context (s-b Ken)
* Add STATIC_ASSERT on engine_classes (Jordan)
* Paulo: Call iris_hw_context_set_unrecoverable() for engines context
* Rename to has_engines_context (s-b Paulo)
* Jordan: Handle creating a new engines context when the context needs
to be replaced.
* Ken: Tweak context destroy code paths.
* Call iris_lost_context_state on every batch. (s-b Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12692>
Engines based contexts operate somewhat different for executing
batches. Previously, we would specify a bitmask value such as
I915_EXEC_RENDER to specify to run the batch on the render ring.
With engines contexts, instead this becomes an array of "engines", and
when the context is created we specify the class and instance of the
engine.
Each index in the array has a separate hardware-context. Previously we
had to create separate kernel level contexts to create multiple
hardware contexts, but now a single kernel context can own multiple
hardware contexts.
Another forward looking advantage to using the engines based contexts
is that the kernel does not plan to add new supported I915_EXEC_FOO
masks, whereas they instead plan to add new I915_ENGINE_CLASS_FOO
engine classes. Therefore some rings may only be usable with an engine
based class.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12692>
For every CI job, put JWT content into a file and unset CI_JOB_JWT
environment var
=======
* virgl jobs:
- Share JWT token file to crosvm instance
- Keep using `export -p` due to high complexity in the scripts
of these jobs. At least, the CI_JOB_JWT will not be leaked,
since it is being unset at the `before_script` phase of each
Mesa CI job.
* iris jobs: Update lava_job_submitter to take token file as argument
- generate-env with CI_JOB_JWT_TOKEN_FILE
- create token file during baremetal init stage
* baremetal jobs: Copy token file to bare-metal NFS
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14004>
RGP expects shaders to be contiguous in memory, otherwise it explodes
because we have to generate huge captures with lot of holes.
This reduces capture sizes of Cyberpunk 2077 from ~3.5GiB to ~180MiB.
This should also help for future pipeline libraries.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13690>
However they have to be called via _glapi_get_dispatch/context. This
would be safe to do on any platform, but the extra indirection is only
necessary on Windows since TLS vars can't be exported from a DLL.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13634>
Saves per-batch allocations, avoids reallocation for various vertex
counts, and avoids needing the indirect tess addrs constobj so that we
could emit the relocs to the tess BO after we'd emitted all the draws.
Also apparently it fixes one of our CTS fails.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13851>
I aimed for "things that look like big switch statements, or cases where
the compiler is unlikely to be able to constant-propagate an argument into
something useful."
Saves another 80kb on disk. No perf difference on iris shader-db, n=23.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>
Drop the call to iris_resource_disable_aux in
iris_resource_configure_aux. With the previous patches, we no longer
create CCS surfaces and pick the AUX_NONE usage. As a result, if the aux
usage is NONE, all iris_resource fields already indicate that aux is
disabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12398>
Allow CCS_E on these formats on TGL+ for a couple reasons:
1) TGL doesn't have the option to fall back to CCS_D/fast-clears like
prior platforms do.
2) The CCS compression scheme on TGL improves to encode more than 3
levels of compression. This should help floating point formats.
In my measurements, enabling this on TGL results in a minor performance
improvement on Paraview (+0.06%) rather than a major regression like on
prior platforms. The improvement was measured by taking the average of 3
runs of: waveletvolume.py -d 256 -f 600.
Also, the Intel performance CI reports a 3.81% ±0.12% FPS improvement in
Bioshock Infinite.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12398>
On TGL+, require that the surface format supports CCS_E in order to
support CCS. This aligns with the ISL code that pads the primary
surface for CCS on this platform.
Pre-TGL, require support for either CCS_D or CCS_E.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12398>
This fixes the problem of large indirect draws, and at the same time avoids
allocating too large buffers for tessellation.
Reworked by @anholt to use a separate tess factor BO so we can skip the
WFIs to set the TESSFACTOR_ADDR.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6089>
We used to request vec4 alignment for everything on the nir codepath,
but this triggers an assertion failure since a0b82c24b6, which prohibits
vec4 alignment on scalars. Since requiring vec4 alignment on scalars is a
little silly anyway, this patch relaxes the alignment to naturally aligned
for scalars.
Fixes about 27 crashing tests in piglit and deqp on kepler, including eg
piglit/tests/spec/glsl-1.30/execution/fs-large-local-array.shader_test
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13883>
When using the Vulkan API, command buffers can be recorded way before
perfetto is enabled. This can be problematic if you want already
recorded command buffers to produce traces.
This new environment variable makes perfetto enabled internally so
that command buffers are recorded with timestamps, even though no
perfetto recording happens.
v2: rename to GPU_TRACE_INSTRUMENT (Rob)
v3: Move instrumentation check to generated headers (Danylo)
Decouple instrumentation enabling from tracing (Danylo)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13911>
Found while running valgrind :
==3583454== Invalid read of size 4
==3583454== at 0xF48336: glsl_get_struct_field_offset (nir_types.cpp:84)
==3583454== by 0xC7CD0D: opt_replace_struct_wrapper_cast (nir_deref.c:1068)
==3583454== by 0xC7CDD9: opt_deref_cast (nir_deref.c:1087)
==3583454== by 0xC7DD8E: nir_opt_deref_impl (nir_deref.c:1369)
==3583454== by 0xC7DF4E: nir_opt_deref (nir_deref.c:1428)
==3583454== by 0xA63F3C: brw_kernel_from_spirv (brw_kernel.c:325)
==3583454== by 0xA3BC2C: main (intel_clc.c:481)
==3583454== Address 0xe4f7e88 is 24 bytes after a block of size 48 in arena "client"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13952>
libLLVM for Android is built without RTTI, but after commit ad86267
mesa inherits meson default RTTI enabled state.
cpp_rtti=false is added to meson options in android/mesa3d_cross.mk
(v2) Add Fixes tag and use spaces instead of tabs for aligning the trailing \
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Fixes: ad862674 ("meson: Don't override built-in cpp_rtti option, error if it's invalid")
Cc: "21.3" "21.2" mesa-stable
Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13901>
These flags control special CRC handling for empty tiles using the CRC
clear colour field added on Bifrost. Their use depends on CRC being
used. We missed these flags earlier; let's add them since they are used
by the Valhall DDK but are not new to Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13982>
ISL recently started allowing linear ASTC surfaces to be created. With
that in place, iris can perform GPU-based uploads to ASTC textures in
the same way it does so with other compressed surfaces.
We're not aware of any reason to continue special-casing ASTC texture
uploads, so we get rid of the code which does so.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13881>
The sampler can only decode ASTC surfaces that are Y-tiled. ISL has
been asserting this restriction at surface creation time.
However, some drivers want to create a surface that is only used for
copying compressed data. And during the copy, the surface won't have a
compressed format.
To enable this behavior, we choose to move the tiling assertion to the
moment a surface state is created for the sampler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13881>
We don't support typed image writes for multisampling, so we can't
handle multisampled destinations. We also usually handle MSAA by
running the fragment shader per-sample, which we aren't accounting
for in our compute shaders, so we can't handle MSAA sources either.
We could do both of these things if we really wanted to, but we don't.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13524>
When we're doing a stencil blit via a fragment shader, we can avoid
W-tiling shenanigans by using the stencil write hardware on Skylake
and later.
Of course, the compute engine doesn't have stencil fragment writes,
so it can't do that. Just fall back to the detiling shenanigans.
Caught by Piglit's arb_copy_image-formats when forcing iris to use
BLOCS for resource_copy_region on Icelake.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13524>
When dispatching compute shaders to do a blit, our destination rectangle
may not line up perfectly with the workgroup size. For example, we may
round the left x0 coordinate down to a multiple of the workgroup width,
and the right x1 coordinate up to the next multiple of the workgroup
width. Similarly for y0/y1 and workgroup height. This means that we
may dispatch additional invocations which should not actually do any
blitting. We need to set key->uses_kill to bounds check and drop those.
Caught by Piglit's arb_copy_image-simple when forcing iris to perform
resource_copy_region via BLOCS and running with INTEL_DEBUG=norbc on
Icelake.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13524>
It turns out that BLEND_MIN and BLEND_MAX in Utgard take blend factors
into account. My guess is that actual equation looks like:
OP(As * S + Ad * D, Ad) for alpha, and
OP(Cs * S + Cd * D, Cd) for color.
So we have to set S factor to 1 and D factor to 0 to be compliant with
GL spec.
Fixes following piglit tests:
spec@!opengl 1.4@blendminmax
spec@arb_blend_func_extended@arb_blend_func_extended-fbo-extended-blend
(with patch my for ES2_compatibility and EXT_blend_func_extended)
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13873>
It was a bit trickier to RE, since blob doesn't expose this
functionality at all, however we had a clue from the very beginning:
lima_blend_factor is 3 bits, i.e. 8 values, but only 5 of them were
used, it just waited till someone tried what 3 unused values do.
Interestingly enough, it turns out "5" works just as "0" (which is
PIPE_BLENDFACTOR_*SRC_*), but only if output register for gl_FragColor
is $0, So it looks suspiciously similar with PIPE_BLENDFACTOR_*SRC1_*
behavior, and looks like secondary output is taken from $0.
Since output regs for all other outputs are configured via RSW, there
must be a field in RSW for output register for secondary color, it's
likely 4 bits and it's currently set to 0 for reg $0.
Then it was just a matter of brute-forcing various consecutive 4 bits
in RSW - and indeed, setting top 4 bits of rsw->aux0 to the index of
gl_FragColor output register fixes blending tests when we use "5"
blend factor instead of "0".
So it must be a register number for gl_SecondaryFragColor. Unlike
gl_FragColor, the field is only repeated once in RSW.
Wire it up in compiler, and piglit arb_blend_func_extended now passes.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13873>
The shader locations are now directly stored in radv_shader_args which
makes sense because they are tied to the arguments. The locations are
then copied to radv_shader_info but they will be moved into a new
radv_shader_binary_info with upcoming changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13542>
Detected by UBSAN.
../src/amd/vulkan/radv_private.h:2939:1: runtime error: member access
within null pointer of type 'struct radv_device_memory'
../src/amd/vulkan/radv_private.h:2926:1: runtime error: member access
within null pointer of type 'struct radv_buffer'
../src/amd/vulkan/radv_private.h:2945:1: runtime error: member access
within null pointer of type 'struct radv_image'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13965>
We expose the compact array cap, which means that we get compact
clipdist arrays. Indicate this to the lowering pass so that it works for
gl_ClipDistance from fs, among others.
Fixes, among others, on a420,
tests/spec/glsl-1.30/execution/clipping/fs-clip-distance-interpolated.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13891>
As explained in "egl/wayland: add initial dma-buf feedback support", we
still don't use the per-surface dma-buf feedback. In this patch we start
to use it.
If per-surface dma-buf feedback is advertised, use it to allocate
surface buffers. Also, the dma-buf protocol states that the feedback is
resent only when the client is using a suboptimal format/modifier pair.
So listen for new per-surface feedback events and reallocate the surface
buffers based on them. We can't change the format of a buffer, but we
can pick a new modifier.
This patch is based on previous work of Scott Anderson (@ascent).
Signed-off-by: Scott Anderson <scott.anderson@collabora.com>
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248>
In get_back_bo() we have two calls to loader_dri_create_image() and some
overhead. As in the next commit we add another call to this same
function and more overhead to get_back_bo(), it starts to lose
legibility.
So move loader_dri_create_image() calls to separate functions, allowing
us to have an easier to read get_back_bo(). It also adds some minor
style changes.
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248>
This bumps the supported dma-buf version up to 4 and adds the initial
dma-buf feedback implementation. It follows the changes in the dma-buf
protocol extension [1] to include the dma-buf feedback interface, which
should be incorporated by most Wayland compositors in the future.
From version 4 onwards, the dma-buf modifier events are not sent by the
compositor anymore, so we use the default feedback to pick the set of
formats/modifiers supported by the compositor. Also, we try to avoid the
wl_drm device event and instead use the dma-buf feedback main device. We
only fallback to wl_drm when the compositor advertises a device that
does not have a render node associated.
In this initial dma-buf feedback implementation we still don't do
anything with the per-surface dma-buf feedback, but in the next commits
we add proper support.
It's important to mention that this also bumps the minimal supported
version of wayland-protocols to 1.24, in order to include [1].
[1] https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/8
This patch is based on previous work of Scott Anderson (@ascent).
Signed-off-by: Scott Anderson <scott.anderson@collabora.com>
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248>
Most Wayland compositors have already implemented the dma-buf protocol
extension. So we can stop relying on the wl_drm events and start to
depend only on the dma-buf interface to receive format/modifier pairs
and create wl_buffer's. So we can deprecate drm_handle_format() and
drm_handle_capabilities().
Note that we still use the wl_drm interface to find out the DRM device
that the compositor is using, so we can't deprecate it fully for now. In
the future (when the dma-buf feedback interface is added to the dma-buf
protocol extension [1] and most compositors incorporate it) we may be
able to fully deprecate wl_drm.
[1] https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/8
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248>
Currently we have a weird design. We have a hardcoded array (named
dri2_wl_visuals) with all the formats that we support, which are 9.
And we also have EGL_DRI2_MAX_FORMATS, which is a constant set to 10. In
patches in which people added new formats to dri2_wl_visuals, this
constant had its value increased. This is confusing, as its name gives
the idea that we can't support more formats.
This constant is only used to define the bitset size of
dri2_egl_display::formats. And it should work just fine if we created
this bitset with the number of formats supported.
To make things clearer, replace EGL_DRI2_MAX_FORMATS by
EGL_DRI2_NUM_FORMATS, which must be equal to ARRAY_SIZE(dri2_wl_visuals)
(i.e. the number of supported formats).
In the next commits we get rid of this constant completely, as it is
prone to errors.
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248>
If we want to add a variable with a type name that is too long, we have
to realign all the variables within these structs. The other option is
to ignore and don't realign, and then we end up with a very ugly code.
So get rid of these unnecessary spaces, as they don't bring anything
useful. Instead, they are annoying.
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248>
Rather than relying on distro packages, build libwayland and
wayland-protocols from known versions everywhere we need it.
The only place we do not do so but rely on distro packages is the LAVA
rootfs, for which it does not matter right now since the version is
sufficiently new, but this could/should be cleaned up later.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248>
These macros expand to a mov in an if statement which breaks address
assumption that instruction which produces address and consumes it
are in the same block.
Fixes test:
dEQP-VK.subgroups.ballot_broadcast.framebuffer.subgroupbroadcast_intvertex
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13931>
I was trying to be clever here, skipping ahead to the newly-created
block and processing the remaining instructions after the split in the
same loop. But if the last instruction in a block was lowered, the saved
next instruction would be the head of the block before the split, not
the new block, and we would compare it to the new block so we wouldn't
stop like we were supposed to. Stop being so clever, and just restart
processing with the new block after lowering an instruction.
Because we're wrapping the actual transform in yet another loop, and the
restarting logic is a bit tricky, refactor the actual lowering into a
separate lower_instr function. Otherwise we'd be mixing the two and
indenting the actual logic even more.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13928>
This was plain broken. The solution is to not require any framebuffer
changes. Silently skipping results in broken behavior. e.g:
(secondary cmdbuffer with no framebuffer)
ClearAttachment 2
translated into
bind attachment 2 as attachment 0 (skipped)
clear attachment 0
restore original bindings (skipped)
which results in clearing attachment 0, not what we wanted. It is
a small wonder CTS doesn't find it until VK_KHR_dynamic_rendering.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13699>
They might not be available in secondary cmdbuffers with inheritance.
To avoid binding anything we need to create pipelines per attachment
index. I've excluded these from the "compile on device creation" set
because I think almost nobody will need them.
Alternative solution would be to reuse the same shader but muck with
a bunch of registers to shift them for the attachment index. That is
however a lot of complexity and has to execute on every pipeline
change, which is probably more expensive in overhead and definitely
in complexity.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13699>
If we have an inherited subpass in a cmdbuffer we can't really
emit any framebuffer data if it isn't provided.
Note we still do it for resolve clears, but we can't have that
in a secondary cmdbuffer with inherited renderpass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13699>
When cloning a chunk of tracepoints, we cannot just copy the elements
of the traces[] array. We also need the payloads associated with
those.
This change introduces a new u_trace_payloaf_buf object that is
refcounted so that we can easily import traces[] elements and their
payloads from one utrace to another.
v2: use u_vector (Danylo)
v3: Delete outdate comment (Danylo)
Fix assert (Danylo)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0565c993f9 ("u_trace: helpers for tracing tiling GPUs and re-usable VK cmdbuffers")
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13899>
spirv_to_nir sometimes wraps derefs in vec2 or mov instructions as part of
its texture handling. These get in the way of
nir_rematerialize_derefs_in_use_blocks_impl. Running copy propagation
should get rid of the extra move instructions and get us back to intact
deref chains for everything except variable pointer use-cases.
fossil-db (Sienna Cichlid):
Totals from 6 (0.00% of 134572) affected shaders:
CodeSize: 92656 -> 93088 (+0.47%)
Instrs: 17060 -> 17138 (+0.46%)
Latency: 224408 -> 227539 (+1.40%)
InvThroughput: 37402 -> 37924 (+1.40%)
VClause: 408 -> 402 (-1.47%)
Copies: 1065 -> 1107 (+3.94%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5668
Fixes: 14a12b771d ("spirv: Rework our handling of images and samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13924>
Even although the option is called shaderdb, it is not really used by
shaderdb (for V3D shaderdb uses the debug option "precompile"). And in
fact, right now the output format is not compatible with shaderdb.
This commit tries to fix that, and as we are here, also try to make
the option more useful for the Vulkan case, as that debug option also
works with v3dv.
We can't really fully imitate shaderdb use with OpenGL (run with a set
of glsl shader tests), but we can at least assign a unique name (the
pipeline sha1 in text format) so we can compare executions of the same
vulkan application. For that remember to disable the on-disk cache.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13938>
Mali4x0 supports writing depth and stencil from fragment shader
and we've been using it quite a while for depth/stencil buffer reload.
The missing part was specifying output register for depth/stencil.
To figure it out, I changed reload shader to use register $4 as output
and poked RSW bits (or rather consecutive 4 bit groups) until tests
that rely on reload started to pass again.
It turns out that register number for gl_FragDepth/gl_FragStencil is in
rsw->depth_test and register number for gl_FragColor is in
rsw->multi_sample and it's repeated 4 times for some reason (likely for
MSAA?)
With this knowledge we now can modify ppir compiler to support multiple
store_output intrinsics.
To do that just add destination SSA for store_output to the registers
list for regalloc and mark them explicitly as output. Since it's never
read in shader we have to take care about it in liveness analysis -
basically just mark it alive from the time when it's written to the end
of the block. If it's live only in the last instruction, mark it as
live_internal, so regalloc doesn't clobber it.
Then just let regalloc do its job, and then copy register number to the
shader state and program it in RSW.
The tricky part is gl_FragStencil, since it resides in the same register
as gl_FragDepth and with the current design of the compiler it's hard to
merge them. However gl_FragStencil doesn't seem to be part of GL2
or GLES2, so we can just leave it not implemented.
Also we need to take care of stop bit for instructions - now we can't
just set it in every instruction that stores output, since there may be
several outputs. So if there's any store_output instructions in the
block just mark that block has a stop, and set stop bit in the last
instruction in the block. The only exception is discard - we always need
to set stop bit in discard instruction.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13830>
We can't insert mul node into add node instruction if it's a virtual dep
(sequence or write_or_read dep), so use ppir_node_has_single_src_succ
in addition to ppir_node_has_single_succ.
We can't use ppir_node_has_single_src_succ alone, since node may have
a virtual dependency in addition to source dependency, and we can't
insert it either in this case.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13830>
The function need_temp_reg_initialization looks suspecious.
It will only ever return true if we get past this if:
if (!(emit->info.indirect_files && (1u << TGSI_FILE_TEMPORARY)) ...
Using the logical && means the intended initialization done
based on the result of this check is not performed.
This code was both introduced and altered in MR 5317.
ccb4ea5a introduces the function.
ba37d408 is a collection of performance improvements and misc
fixes. This altered the if from using bitwise to logical and.
This commit changes it back to bitwise.
Spotted from a compile warning.
Fixes: ba37d408da ("svga: Performance fixes")
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12157>
We don't advertise bufferDeviceAddressCaptureReplay capability and
neither does blob, because at the moment there is no way to allocate
bo with predefined iova.
There is no support of any arithmetic with addresses since shaderInt64
is not enabled. However, we could enable int64 support whenever we want.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
Separating atomic opcodes makes possible to express a6xx global
atomics which take iova in SRC1. They would be needed by
VK_KHR_buffer_device_address.
The change also makes easier to distiguish atomics in conditions.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
LLVM has all the required intrinsics available on IBM Z, so use them for
rounding operations (they will be implemented as a single instruction).
This change makes the test case lp_test_arit pass, because it avoids
using the buggy generic code.
v2: update .gitlab-ci/cross-xfail-s390x to reflect passing lp_test_arit
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13927>
As preparation for changing the behavior of LLVMpipe on IBM Z, add a
flag to detect that platform. As it is always known at compile-time, we
do not add it to the struct for cpu flags to avoid inflating that
struct's size.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13927>
Add a new command associated to glLinkProgram. With this we should be
able to compile and link shaders when requested by the user, thus
avoiding that to happen in the middle of a frame.
Together with the command we pass an array of shader handles attached to
the program, where each position of the array corresponds to a pipe
shader type.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13674>
If we did, we would have the instruction coming right after ldvary write
to the same implicit destination as ldvary at the same time. We prevent
this when merging instructions, but we should make sure we prevent this
when we move ldvary around for pipelining too.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13921>
In the following scenario:
CmdBindPipeline()
CmdBindVertexBuffers()
CmdSetVertexInput()
CmdDraw()
CmdBindVertexBuffers()
CmdSetVertexInput()
CmdDraw()
The VBO won't be updated for the second draw because the state is
cleared when the dynamic state is emitted and the pipeline isn't dirty.
Found by inspection.
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13855>
For SIMD8 half float payload, each component takes a full register, so
we can use existing LOAD_PAYLOAD infrastruture for required padding by
alternating plain 8-wide half float vector and null vector.
Also this patch removes an unwanted assertion from
opt_copy_propagation_local for LOAD_PAYLOAD.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
To support SIMD8 half float payloads, each component takes one full
32bit wide register in both SIMD8H and SIMD16H mode. So we can make use
of existing LOAD_PAYLOAD infrastructure alternating a half float vector
and a null vector, in order to handle required padding.
v2: (Francisco)
- Skip header sources
- Fix comparision units
- Don't allocate VGRF for padded source
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
This function is big and I don't think it will won't get meaningfully
constant-propagated during inlining without LTO. Move it to a .c file so
we just have one copy, saving 2.8MB from libnir.a on an amd64 release
build.
text data bss total filename
before:
18953406 7768312 687260 27408978 build-release/driver-symlinks/iris_dri.so
9734366 5542453 481692 15758511 build-release/lib/libvulkan_intel.so
28687772 13310765 1168952 43167489 (TOTALS)
after:
15478350 7767864 687260 23933474 build-release/driver-symlinks/iris_dri.so
6810366 5541685 481692 12833743 build-release/lib/libvulkan_intel.so
22288716 13309549 1168952 36767217 (TOTALS)
No statistically significant performance difference on iris shader-db, n=8.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>
According to the spec the hardware locks the scoreboard on the first
or last thread switch (selected via shader state) and any TLB accesses
executed before this are not synchronized by hardware.
This change updates the logic to ensure we respect this requirement
and that we don't assume that the lock is acquired automatically
on the first TLB access, which is not valid at least since V3D 4.1+.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13910>
For this each driver must :
- report its clock_id (if no particular clock just default to cpu
boottime one)
- be able to sample its clock (gpu_timestamp())
The PPSDataSource will then emit timestamp correlation events in the
trace ensuring perfetto is able to display GPU & CPU events
appropriately on its timeline.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13831>
This solves a case where a NIR geometry shader was storing the output in
a non-constant:
vec4 32 ssa_1 = load_const (0xc0800000 /* -4.000000 */, 0xc1100000 /* -9.000000 */, 0x40400000 /* 3.000000 */, 0x40e00000 /* 7.000000 */)
vec1 32 ssa_7 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_8 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_9 = iadd ssa_7, ssa_8
vec1 32 ssa_19 = mov ssa_1.x
intrinsic store_output (ssa_19, ssa_9) (1, 1, 0, 160, 288) /* base=1 */ /* wrmask=x */ /* component=0 */ /* src_type=float32 */ /* location=32 slots=2 gs_streams(x=0 y=0 z=0 w=0) */
When lowering the VPM output we check if the destination (ssa_9 in this
case) is a constant to add to the VPM offset. We run a constant folding
optimization in an earlier VS lowering, and we should do the same for
GS.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
While fragment and geometry shader were handling structs as inputs, they
weren't doing for it arrays of structures.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures and
assertions.
v2:
- Fix style (Iago).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
Now that we removed the intel intrinsic and just use the generic one,
we can skip it in the intel call lowering pass and just deal with it
in the intel rt intrinsic lowering.
v2: rewrite with nir_shader_instructions_pass() (Jason)
v3: handle everything in switch (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 423c47de99 ("nir: drop the btd_resume_intel intrinsic")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12113>
TLSDESC speeds up access to dynamic TLS. This is especially important
for non-glibc targets, but is also helpful for non-initial-exec TLS
variables.
The entry asm does not support TLSDESC, but it only accesses
initial-exec symbols, so it is not necessary to handle that separately.
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12722>
Unlike Linux dma-bufs, D3D12 resources are strongly typed, and
can't necessarily just reinterpret the memory arbitrarily.
Allow importing resources with no description coming from the frontend,
and populate the resource desc from the driver instead. If there was
a template, make sure that it matches the incoming resource.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13054>
When dealing with swrast, there's two possibilities: If you have LLVM, you get
llvmpipe, which is pretty fast. If you don't, you get softpipe, which is slow,
but does have a couple nice qualities, like being smaller and not needing
executable memory for JIT.
If you're building a driver that requires LLVM like radeonsi then you need the
LLVM stub for the build to find LLVM. But for swrast, since it can mean either
softpipe/llvmpipe, you don't strictly need LLVM. So this just makes the
Android build files flexible like the Meson build files (where you can specify
-Dllvm=disabled even if LLVM is findable).
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13532>
The fallback path we averages unorm textures, but if we don't have ubwc on
either then we can just cast them to uint which then just takes sample 0.
The proper UBWC format I think ends up averaging, though.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13867>
If the source and destination were within the same full register, like
hr90.x and hr90.y (which both map to r45.x), then we'd perform the
swap/copy with the wrong register. This broke
dEQP-VK.ssbo.phys.layout.random.16bit.scalar.35 once BDA is enabled.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>
This is required by
dEQP-VK.ssbo.phys.layout.random.all_shared_buffer.47, where we need to
spill a lot of pointers due to NIR CSE being a little too aggressive and
creating a large register pressure across basic blocks, too large to fit
within the boundaries of ldp/stp offsets.
Note that this will be a lot more difficult with support for "real
functions" because the base register will become unknown at compile
time. However this hack gets things working for the time being.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>
This would've caught the previous issue earlier. We checked that the
physreg made sense when inserting via ra_file_insert() but not
ra_push_interval() which is used for live-range splitting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>
This might be safe to relax to all Windows compilers, but I didn't
test Clang or MinGW, so scoping to MSVC for now. For MSVC, this is
safe to mismatch, because the vftables are emitted into all objects
with "pick largest," and the definition with RTTI is larger than the
one without. This is different than the Itanium ABI, which only emits
one copy of the typeinfo in the object which defines the key method.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13064>
v3dv, radv, and turnip are using several C&P format helpers (most of
them wrappers over util_format_description based helpers). methods.
This commit moves the common helpers to the already existing common
vk_format.h. For the case of v3dv we were able to remove the vk_format
header. For turnip and radv, a local vk_format.h header remains, with
methods that are only used for those drivers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13858>
Fix a bug in BIND_PIPELINE XML reported by Dougall, which cleans up
a bit of both decoder and driver.
Instead of...
* 17 bytes BIND_PIPELINE (17)
* An unused 8 byte record (25)
* A set of N 8 byte records (25 + 8 * N)
* Oops, 1 byte too many! One just disappeared (24 + 8 * N)
It seems to instead be
* 24 bytes BIND_PIPELINE (24)
* A set of N 8 byte records (24 + 8 * N)
without the sentinel record. These means the 8 byte records themselves
are shuffled, with the high byte of the pointers split from the low
word, but that's less gross than an off-by-one.
It's still not clear what the last 8 bytes of the BIND_VERTEX_PIPELINE
structure mean, or the last 4 byte of the BIND_FRAGMENT_PIPELINE
structure which seems to be a bit shorter.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13784>
Dougall Johnson observed these structures make more sense with indices[]
first in the entries and indices[] absent from the header. Then the
sentinel entry disappears, nr_entries makes more sense, and a few magic
numbers pop out. Many thanks to Dougall's astute eyes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13784>
sometimes a driver might want to always reclaim all bo objects in the course
of allocating a new bo. this is useful when it's known that a given memory
heap is very small and will likely need to keep its usage minimized
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13850>
Some apps may try to use a viewport adjusted by 0.5 pixels (among other
things) to emulate d3d9 pixel center, and in this case we would end up
with incorrect "fake scissor" box (shifted by 1 pixel), hence pixels
being incorrectly scissored away when permit_linear_rasterizer is set
(this happens even if the linear rasterizer is not used in the end).
So adjust the offset so that the half-way points get rounded down instead
of up.
(This is all a bit iffy I think since we don't use fractional
boxes (with 8 subpixel bits) anywhere yet, but at least without msaa
it should work out.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13794>
Currently we run deqp-runner inside a single VM, which makes very poor
use of the available CPUs because Virgl has a bottleneck in the VMM that
serializes everything.
With this change, we can run several Crosvm instances in a runner and
make full use of the CPUs. Getting the same coverage with 3 runners
instead of 6.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12828>
this reworks PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER into an
enum as PIPE_CAP_TEXTURE_TRANSFER_MODES, enabling drivers to choose
a (sometimes) faster, compute-based download mechanism based on a new
pipe_screen hook
compute pbo download is implemented using shaders with a prolog to convert
the input format to generic rgb float values, then an epilog to convert
to the output value. the prolog and epilog are determined based on a vec4
of packed ubo data which is dynamically updated based on the API usage
currently, the only known limitations are:
* GL_ARB_texture_cube_map_array is broken somehow (and disabled)
* AMD hardware somehow can't do depth readback?
otherwise it should work for every possible case
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11984>
When early fragment tests are mandated by the shader, we must use
the Z value produced by the FEP even if there are elements that
would typically require late fragment tests (such as discards,
sample to coverage, etc).
This change means we also need to be a bit more careful when
we promote shaders to use early fragment tests so we don't
promote anything with discards for example.
Fixes:
dEQP-VK.fragment_operations.early_fragment.discard_early_fragment_tests_depth
dEQP-VK.fragment_operations.early_fragment.discard_early_fragment_tests_stencil
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13837>
ANV currently smashes off the TIMELINE bit depending on whether or not
the i915 interface supports them, triggering assert(!type->get_value).
Instead of requiring ANV to smash off function pointers, let the extra
function pointers through and then assert on the feature bits before the
function pointers get used. This should give us roughly the same amount
of assert protection while side-stepping the feature disabling problem.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13839>
Analogous to the pre-RA scheduler. Unfortunately this time it's a bit
more involved because we have to correctly handle (rptN), which is
already relevant for swz. This means we need the index of the
destination register that conflicts with the source register, to handle
swz, and we need to expose that part of ir3_delay. But once that's done,
we can delete ir3_delay_calc_postra.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>
We have a situation in some skia shaders like:
add.f r0.x, ...
(rpt2)nop
mul.f ..., r0.x
sam (xyzw) r0.x, ...
rcp ..., r0.x
Notice that rcp uses the result of the sam instruction, not the add.f,
but we didn't keep track of which instructions kill the sources in
ir3_delay, so we'd add an extra nop, resulting in a disagreement betwen
ir3_delay and the scheduling graph. Since postsched is correct, fix
ir3_delay. This only results in some very slight shader-db changes but
keeps the next commit from changing things.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>
We want to model soft dependencies, but because of how there's only a
single bit to wait on all of them, there may be unnecessary delays
inserted when a (sy)-consumer follows an unrelated (sy)-producer.
Previously there was some code to try to work around this, but we can
just model it directly using the sfu_delay and tex_delay cycle counts
that we have to maintain anyway and delete it.
This also gets rid of the calls to ir3_delay_postra with soft=true which
would be more complicated to handle in the next commit.
There is a functional change here: the idea of preferring less nop's
over critical path length (max_delay) up to 3 nops is kept (and we
delete the TODO which is already sort-of resolved by it), but delays due
to (ss)/(sy) and nops are now treated equally, rather than always
preferring nops over syncs. So if our estimate indicates that scheduling
an (ss) consumer will result in a wait of one cycle and there's another
instruction that will require one nop, we will treat them otherwise
equal and choose based on max_delay instead. This results in more
sstall, but the decrease in nops is much greater.
total nops in shared programs: 376613 -> 345482 (-8.27%)
nops in affected programs: 275483 -> 244352 (-11.30%)
helped: 3226
HURT: 110
helped stats (abs) min: 1 max: 78 x̄: 9.73 x̃: 7
helped stats (rel) min: 0.19% max: 100.00% x̄: 19.48% x̃: 13.68%
HURT stats (abs) min: 1 max: 16 x̄: 2.43 x̃: 2
HURT stats (rel) min: 0.00% max: 150.00% x̄: 13.34% x̃: 4.36%
95% mean confidence interval for nops value: -9.61 -9.06
95% mean confidence interval for nops %-change: -19.01% -17.78%
Nops are helped.
total sstall in shared programs: 126195 -> 133806 (6.03%)
sstall in affected programs: 79440 -> 87051 (9.58%)
helped: 300
HURT: 1922
helped stats (abs) min: 1 max: 15 x̄: 4.72 x̃: 4
helped stats (rel) min: 1.05% max: 100.00% x̄: 17.15% x̃: 14.55%
HURT stats (abs) min: 1 max: 29 x̄: 4.70 x̃: 4
HURT stats (rel) min: 0.00% max: 900.00% x̄: 25.38% x̃: 10.53%
95% mean confidence interval for sstall value: 3.22 3.63
95% mean confidence interval for sstall %-change: 17.50% 21.78%
Sstall are HURT.
total (ss) in shared programs: 35190 -> 35472 (0.80%)
(ss) in affected programs: 6433 -> 6715 (4.38%)
helped: 163
HURT: 401
helped stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1
helped stats (rel) min: 1.92% max: 33.33% x̄: 11.53% x̃: 10.00%
HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel) min: 1.56% max: 100.00% x̄: 15.33% x̃: 12.50%
95% mean confidence interval for (ss) value: 0.41 0.59
95% mean confidence interval for (ss) %-change: 6.22% 8.93%
(ss) are HURT.
total (sy) in shared programs: 13476 -> 13521 (0.33%)
(sy) in affected programs: 669 -> 714 (6.73%)
helped: 30
HURT: 78
helped stats (abs) min: 1 max: 2 x̄: 1.13 x̃: 1
helped stats (rel) min: 4.00% max: 50.00% x̄: 21.22% x̃: 21.11%
HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel) min: 3.45% max: 100.00% x̄: 31.93% x̃: 25.00%
95% mean confidence interval for (sy) value: 0.23 0.60
95% mean confidence interval for (sy) %-change: 11.19% 23.15%
(sy) are HURT.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>
The old code walked the instructions between each ready instruction and
each of its parents for every instruction, which can quickly become
accidentally quadratic. Instead we keep track of the current
"instruction pointer" of the to-be-scheduled instruction, and for each
ready instruction calculate an "earliest possible IP" which is the IP
that needs to be reached before we can schedule it. Because this stays
constant as soon as an instruction becomes ready, we never have to
recompute it and each call to ir3_delay_calc_prera() becomes a simple
comparison and subtract. We only need to iterate over the children and
update their earliest_ip when scheduling an instruction, and we already
do that in util_day_prune_head() so it should be cheap.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>
We need to perform cross-batch flushing if any batch writes to a BO
while others refer to it. We checked this case when recording a new
BO in the list which we'd never seen before. However, we neglected to
handle the case when we already read from a BO, but then began writing
to it. That new write may provoke a conflict between existing reads
in other batches, so we need to re-check the cross-batch flushing.
Caught by Piglit's copyteximage when forcing blits and copies to use
a new IRIS_BATCH_BLITTER that isn't upstream yet. But this bug could
be provoked by render/compute work today...we just hadn't noticed it.
Fixes: b21e916a62 ("iris: Combine iris_use_pinned_bo and add_exec_bo")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13808>
We are using the same definitions for both OpenGL and Vulkan, so let's
move it to common.
As we are here we are also adding versioning on the TFU register
definition. Those are basically register bit places, so really likely
to change between versions.
Adding 33 as it is the first version they got defined.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13832>
llvmpipe's fragment shaders are always run sequentially and
in API order for a single tile, so it's impossible to have
out of order render target writes requiring fetch barriers.
Issues fixed in previous commits were actually breaking most
piglit/deqp tests for coherent extension variant.
Signed-off-by: Pavel Asyutchenko <sventeam@yandex.ru>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13252>
Using 1 bit per wrap mode looked very suspicious and after some
experiments it turns out it's 3-bit enum.
Border color is also here, it sits right after depth field. For
some reason it uses 16 bit per channel just like for clear color in RSW
GL_CLAMP mode is broken for nearest filter just as on Midgard, so add
the same workaround - use GL_CLAMP_TO_EDGE for nearest filter.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13213>
It looks like MBS format used by blob doesn't distinguish sampler2D from
sampler3D, so load texture instruction is the same for 2D and 3D
textures.
So all we need to RE is texture descriptor for 3D textures, but blob
doesn't implement it, so we need to do some guesswork:
- unknown_3_1 looks like a depth since it sits after height/width and
always set to 1
- unknown_2_2 is exactly 3 bits and it follows wrap_t, so it must be
wrap_r
- missing part is texture type for 3D textures. By trial and error it
seems to be 4. First bit is only set for cubemap, so it's likely a
separate flag, and rest 2 bits look like number of tex dimensions akin
to midgard and later (thanks, panfrost!) with 0 for 1D, 1 for 2D
and 2 for 3D.
Put it all together and we have working 3D textures on lima!
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13213>
I'd assumed that since insert didn't take a deleter, it was
find-or-insert, not insert-or-replace. This caused a bo reference
leak if the same bo was used more than once in a batch.
Fixes: fde36d7992 ("d3d12: Don't wait for GPU reads to do CPU reads")
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13819>
Currently lima uses generic TXP lowering that results in downgrading
coords precision to FP16 since we have to do some calculations with
coords instead of loading them directly from varying.
Mali4x0 has native TXP support, however coords and projector have to
come from a single source.
Add NIR lowering pass that combines coords and projector into a single
backend-specific source and use it instead of generic lowering.
Unfortunately this change regresses one test, but it also fails in blob and
disassembly is now identical.
shader-db diff:
total instructions in shared programs: 15623 -> 15603 (-0.13%)
instructions in affected programs: 877 -> 857 (-2.28%)
helped: 7
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.87% max: 10.53% x̄: 4.93% x̃: 1.85%
95% mean confidence interval for instructions value: -4.95 -0.76
95% mean confidence interval for instructions %-change: -9.31% -0.55%
Instructions are helped.
total loops in shared programs: 3 -> 3 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 136 -> 137 (0.74%)
spills in affected programs: 0 -> 1
helped: 0
HURT: 1
total fills in shared programs: 598 -> 602 (0.67%)
fills in affected programs: 0 -> 4
helped: 0
HURT: 1
Tested-by: Denis Pauk <pauk.denis@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13111>
Instead of having a bunch of const vk_sync_type for each permutation of
vk_drm_syncobj capabilities, have a vk_drm_syncobj_get_type helper which
auto-detects features. If a driver can't support a feature for some
reason (i915 got timeline support very late, for instance), they can
always mask off feature bits they don't want.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This is, unfortunately, a large flag-day mega-commit. However, any
other approach would likely be fragile and involve a lot more churn as
we try to plumb the new vk_fence and vk_semaphore primitives into ANV's
submit code before we delete it all. Instead, we do it all in one go
and accept the consequences.
While this should be mostly functionally equivalent to the previous
code, there is one potential perf-affecting change. The command buffer
chaining optimization no longer works across VkSubmitInfo structs.
Within a single VkSubmitInfo, we will attempt to chain all the command
buffers together but we no longer try to chain across a VkSubmitInfo
boundary. Hopefully, this isn't a significant perf problem. If it ever
is, we'll have to teach the core runtime code how to combine two or more
VkSubmitInfos into a single vk_queue_submit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
Get rid of most of the guts of the base class and just leave it as a
vtable. We can also drop some of wsi_display_fence. One functional
change here is that we're now using VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE
which is more correct anyway because, thanks to the funky reference
counting we do with destroyed and event_received, its lifetime is tied
to the physical device, at best.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This adds a new vk_queue_submit object which contains a list of command
buffers as well as wait and signal operations along with a driver hook
which takes a vk_queue and a vk_queue_submit and does the actual submit.
The common code then handles spawning a submit thread if needed, waiting
for timeline points to materialize, dealing with timeline semaphore
emulation via vk_timeline, etc. All the driver sees are vk_queue.submit
calls with fully materialized vk_sync objects which it can wait on
unconditionally.
This implementation takes a page from RADV's book and only ever spawns
the submit thread if it sees a timeline wait on a time point that has
not yet materialized. If this never happens, it calls vk_queue.submit
directly from vkQueueSubmit() and the thread is never spawned.
One other nicety of the new framework is that there is no longer a
distinction, from the driver's PoV, between fences and semaphores. The
fence, if any, is included as just one more signal operation on the
final vk_queue_submit in the batch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This is built on the new vk_sync primitives. In the vk_physical_device,
the driver provides a null-terminated array of vk_sync_type pointers in
priority order. The semaphore implementation then selects the first
type that meets the necessary criterion. In particular, semaphores may
or may not be timelines depending on the VkSemaphoreType. It also
auto-selects the semaphore type based on the external handle types
provided and can down-grade as needed to support a particular external
handle.
The implementation itself is mostly copy+pasted from ANV. The primary
difference is the fact that anv_semaphore_impl has been replaced with
vk_sync. The permanent vk_sync is still embedded (like ANV) but the
temporary one is a pointer. This makes stealing the temporary state as
part of VkQueueSubmit a bit easier.
All of the interesting stuff around waits, signals, etc. is implemented
by the vk_sync interface. All this code does is wrap it all in the
annoyingly detailed VkFence rules so we can provide the correct Vulkan
entrypoint behavior.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This is built on the new vk_sync primitives. In the vk_physical_device,
the driver provides a null-terminated array of vk_sync_type pointers in
priority order. The fence implementation then selects the first type
that meets the necessary criterion. In particular, fences can't be
timelines and need to support reset and CPU wait. It also auto-selects
the fence type based on the external handle types provided and can
down-grade as needed to support a particular external handle.
The implementation itself is mostly copy+pasted from ANV. The primary
difference is the fact that anv_fence_impl has been replaced with
vk_sync. The permanent vk_sync is still embedded (like ANV) but the
temporary one is a pointer. This makes stealing the temporary state as
part of VkQueueSubmit a bit easier.
All of the interesting stuff around waits, resets, etc. is implemented
by the vk_sync interface. All this code does is wrap it all in the
annoyingly detailed VkFence rules so we can provide the correct Vulkan
entrypoint behavior.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This doesn't map directly to any particular Vulkan object but is,
instead, a base class for the internal implementations of both VkFence
and VkSemaphore. Its utility will become evident in later patches.
The design of vk_sync will look familiar to anyone with significant
experience in DRM. The base object itself is just a pointer to a vfunc
table with function pointers providing the implementation of the various
operations. Depending on how the vk_sync will be used some of of those
vfuncs are optional. If it's only going to be used for VkSemaphore, for
instance, there's no need for reset().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
It's a bit on the over-complicated side but the objective is to make the
debug log messages show up in the same thread as the first
VK_ERROR_DEVICE_LOST so we don't massively confuse the app. It's
unknown if this is actually ever a problem but, with submit happening
off on its own thread, logging errors from threads the client doesn't
know about doesn't seem like a massively great plan.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This effectively partially reverts 13fe43714c ("anv: Add helpers in
anv_allocator for mapping BOs") where we both added helpers and reworked
memory mapping to stash the maps on the BO. The problem comes with
external memory. Due to GEM rules, if a memory object is exported and
then imported or imported twice, we have to deduplicate the anv_bo
struct but, according to Vulkan rules, they are separate VkDeviceMemory
objects. This means we either need to always map whole objects and
reference-count the map or we need to handle maps separately for
separate VkDeviceMemory objects. For now, take the later path.
Fixes: 13fe43714c ("anv: Add helpers in anv_allocator for mapping BOs")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5612
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13795>
We prefix names with an underscore to make them "safe" C identifiers
when necessary. For example, a value of "32x32" would become "_32x32".
However, when specifying something like
<field ... prefix="BLOCK_SIZE">
<value name="32x32" value="0"/>
</field>
we already have a prefix that makes the field name safe. We'd rather
generate a name with a single underscore, i.e.
#define BLOCK_SIZE_32x32 0
rather than
#define BLOCK_SIZE__32x32 0
This also fixes up affected defines in crocus.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13809>
When a <field> tag has multiple <value> children, listing symbolic names
for possible field values, we generate #defines for each value, with an
optional prefix. I don't know why, but this code was checking whether
self.default is None. We want to generate the same list of #defines,
with a prefix, regardless of whether the field has a default value
specified or not.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13809>
To avoid a scenario like this:
* One blit needed the four components => XYZW filled up with 4 values
* Following blit needing two components => ZW uses the previous values
We detected this using the v3d driver with the
arb_framebuffer_srgb-blit test, specifically:
./bin/arb_framebuffer_srgb-blit texture linear_to_srgb msaa enabled render -auto -fbo
The main linear to srgb with msaa (not doing the resolve yet) blit
requires the four components.
At the end (after a resolve copy), the test uses glReadPixels, and
internally it uses the blitter with two components, but the shader
still uses lod on the texel fetch, so it gets the one used for the
main blit, when it should be zero.
Right now v3d works fine even with that wrong value, and I assume that
any other driver too. But we can't ensure that would keep happening on
the future, so let's use correct values.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13753>
We're currently only calling it after creating the screen and the
bufmgr. There are a few cases where Iris checks for the DEBUG_BUFMGR
bit before we call brw_process_intel_debug_variable(), which means
intel_debug is 0 and so we don't run the debug code. Today, these are
all related to the creation of the workaround bo and its mmap.
I found this in a custom branch after I converted to INTEL_DEBUG an
environment variable that I had.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13780>
Mali4x0 PP doesn't have a swizzle for load_input, so use POT-aligned
varyings to avoid unnecessary movs for vec3 and precision downgrade
in case if this vec3 is coordinates for a sampler
shader-db:
total instructions in shared programs: 15707 -> 15623 (-0.53%)
instructions in affected programs: 3906 -> 3822 (-2.15%)
helped: 47
HURT: 18
helped stats (abs) min: 1 max: 9 x̄: 3.09 x̃: 2
helped stats (rel) min: 1.49% max: 23.53% x̄: 8.20% x̃: 6.45%
HURT stats (abs) min: 1 max: 7 x̄: 3.39 x̃: 3
HURT stats (rel) min: 0.78% max: 20.59% x̄: 10.45% x̃: 10.97%
95% mean confidence interval for instructions value: -2.18 -0.41
95% mean confidence interval for instructions %-change: -5.70% -0.38%
Instructions are helped.
total spills in shared programs: 146 -> 136 (-6.85%)
spills in affected programs: 39 -> 29 (-25.64%)
helped: 6
HURT: 0
total fills in shared programs: 617 -> 598 (-3.08%)
fills in affected programs: 125 -> 106 (-15.20%)
helped: 6
HURT: 0
HURT shaders are vertex shaders where we may need more instructions
for non-packed vec3s. It's acceptable trade-off since we don't get
precision downgrade if this varying is coordinates for a sampler.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13151>
We no longer have reg defs for the HI fields, so all we can access from
lua is the low 32 bits. LUA has only double-precision floats for numbers,
so we can't fix that. However, the high bits are almost always the same,
so it's not that big of a deal to be ignoring them for this script.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13733>
No matter the format, this should return true if the instruction has an
exec operand.
Otherwise, eliminate_useless_exec_writes_in_block() could remove an exec
write in a block if it's successor begins with:
s2: %3737:s[8-9] = p_parallelcopy %0:exec
s2: %0:exec, s1: %3738:scc = s_wqm_b64 %3737:s[8-9]
Totals from 3 (0.00% of 150170) affected shaders (GFX10.3):
CodeSize: 23184 -> 23204 (+0.09%)
Instrs: 4143 -> 4148 (+0.12%)
Latency: 98379 -> 98382 (+0.00%)
Copies: 172 -> 175 (+1.74%)
Branches: 95 -> 97 (+2.11%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: bc13049747 ("aco: Eliminate useless exec writes in jump threading.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5620
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13776>
for now only for texture dest and if there is no rounding mode required.
Totals from 2 (0.00% of 150170) affected shaders: (GFX10.3)
CodeSize: 7980 -> 7948 (-0.40%)
Instrs: 1441 -> 1422 (-1.32%)
Latency: 7703 -> 7626 (-1.00%)
InvThroughput: 2336 -> 2302 (-1.46%)
VClause: 34 -> 36 (+5.88%)
Copies: 57 -> 58 (+1.75%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13592>
VK CTS just added some new tests to write to a compressed image
from a compute shader, which was overrunning memory.
The image width/height need to be sized according to the block
sizes to avoid overwriting memory.
dEQP-VK.image.sample_texture.*bit_compressed*
Cc: mesa-stable
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13618>
Bifrost supports a special "dual texture" instruction, sampling from two
textures at once at the same coordinate. Each subinstruction is highly
restricted (a subset of TEXS_2D); together, they are represented by TEXC
with a special dual texture operation descriptor. Add an optimization
pass to fuse these instructions.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13723>
Pattern match the point coord sysval and support lowering it as well.
This is required to handle flipped framebuffers on Bifrost. However,
what this pass normalizes to is the opposite of the hardware mode we
used on Bifrost before, so we need to swap modes at the same time to
prevent regressions.
Fixes Piglit glsl-fs-pointcoord and glsl-fs-pointcoord_gles2
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13073>
fma32 only round once so has 0.5UP accuracy. mad32 round twice so
has 1UP accuracy. This accuracy difference sometimes make the result
different at the last bit.
Applications like META need more accuracy for display right result.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13686>
This makes a4xx look more like a3xx for these settings. Most importantly
it adds the workaround for allowing the hw to decide between min and mag
filtering. This fixes a number of dEQP texture filtering tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13763>
We need to guarantee that when vkQueueSubmit() returns the application
can actually wait on a signaled semaphore/syncobj.
When using a thread to do the submission to i915, this gets a bit
tricky in the following case :
A syncobj is used both as a wait & signal semaphore and has been
signaled once already. It contains a fence before entering
vkQueueSubmit().
This means we need to reset the syncobj to ensure when we return
from vkQueueSubmit(), the syncobj contains no stale fence.
Currently in the Anv, the submission thread is in charge of putting
the new fence in the syncobj and also picks up the wait fence
directly from the syncobj. This means we can't reset the syncobj
from vkQueueSubmit().
The solution to this has been pointed by Bas & Jason :
In vkQueueSubmit(), clone the wait syncobj fence into a new
temporary syncobj that will be destroy after submission and use
this temporary syncobj as a wait fence for i915. This allows us to
reset the original syncobj in vkQueueSubmit().
For this to work with wait_before_signal behavior, we also need to
do a wait-on-materialize on binary semaphores from vkQueueSubmit().
Otherwise the application thread calling vkQueueSubmit() could race
the submission thread and pick up the wrong fence when cloing.
v2: Use copy semantic for clone_syncobj_dma_fence() (Jason)
Do the cloning prior to adding the syncobj to anv_queue_submit so
that if the cloning fails don't have an invalid syncobj in
anv_queue_submit (Jason)
v3: Fix another syncobj leak (Jason)
v4: Fix invalid argument order (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4945
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11474>
From VUID-VkAttachmentReference2-attachment-04755:
"If attachment is not VK_ATTACHMENT_UNUSED, and the format of the
referenced attachment is a depth/stencil format which includes both
depth and stencil aspects, and layout is
VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL or
VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL, the pNext chain must include
a VkAttachmentReferenceStencilLayout structure."
We did not check that there even could be a stencil layout
before fetching it.
Fixes a few tests from:
dEQP-VK.image.depth_stencil_descriptor.depth_read_only_optimal.*
Fixes: 979ea394e5 "vulkan/util: Move
helper functions for depth/stencil images to vk_iamge"
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13740>
GLES2.0 spec allows parts of wide lines and points to be drawn even if
their center is outside the viewport.
Therefore 0x2000 in PLBU_CMD_PRIMITIVE_SETUP has to be set for points.
This is already our default setting as it seems to have no negative
effect when this bit is always set. Points work as expected but lines
don't. It's hard to RE it, because the affected deqp tests also fail
with the blob.
To respect this behaviour for lines and solve another 2 tests, we need
to do a workaround and temporarily extend the viewport by half of the
line width. The scissor rectangle is still equal with the initial
viewport.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12971>
In order to implement GL_PRIMITIVES_GENERATED, v3d allocates a small
resource and adds a command to the job to store the prim counts to it.
However it was only doing this when TF was enabled which meant that if
the query was used with a geometry shader but no TF then the query would
always be zero. This patch makes the driver keep track of how many
PRIMITIVES_GENERATED queries are in flight and then enable writing the
prim count if its more than zero.
Fix dEQP-GLES31.functional.geometry_shading.query.primitives_generated_*
v2: Update CI expectations and references to fixed tests in commit log.
v3: - Add comment that GL_PRIMITIVES_GENERATED query is included because
OES_geometry_shader, but it is not part of OpenGL ES 3.1. (Iago)
- Update Fixes to commit introducing geometry shaders. (Iago)
Fixes: a1b7c084 ("v3d: fix primitive queries for geometry shaders")
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13712>
../src/microsoft/spirv_to_dxil/dxil_validation.cpp: In function ‘bool validate_dxil(dxil_spirv_object*)’:
../src/microsoft/spirv_to_dxil/dxil_validation.cpp:129:12: error: ‘stderr’ was not declared in this scope
129 | fprintf(stderr, "DXIL validation only available in Windows.\n");
| ^~~~~~
Fixes: 37c366e283 ("microsoft/spirv_to_dxil: Add DXIL validation to spirv2dxil")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13736>
This deletes a whole lot of code, but there's a modest drawoverhead perf
loss:
drawoverhead 1-image change -6.48856% +/- 4.28269% (n=50)
drawoverhead 8-image change -5.29195% +/- 2.62549% (n=90)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13443>
Having checked the deltas between fdl6_view and what we did before, switch
over to fdl6_view now.
No statistically significant difference on no-hw drawoverhead 8-texture
change (n=50) with the texture cache disabled from this and the previous
commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13443>
turnip was playing fast and loose with the name, using the R8_G8B8 format
name but actually setting the descriptors up to read G8_B8R8 like Vulkan
(sensibly) wants. This caused trouble when trying to make freedreno and
turnip share code. By having both orderings as format names, we can share
the descriptor code and also confuse readers less.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13443>
We can see from the dynamically_uniform (compiler doesn't know if you're
uniform or not) vs uniform (compiler can see it's uniform) case in the
blob which is which. Now that we have the right names, also use the
nonunif flag for encoding the actual non-uniform mode (previously, we were
always setting it always in a way that meant uniform).
I verified this behavior back to a418 with samplers. The a3xx blob I have
only does GLES3, so we don't have the opaque_type_indexing tests to see.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13601>
We reference the scratch BO using a bindless index in the command
streamer instructions, but we forgot to add them to the BO list.
v2: Make use of pipeline reloc list (Jason)
v3: Don't add NULL BOs to the reloc list (Lionel)
v4: Don't add BOs twice to reloc list when dealing with addresses
(Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: eeeea5cb87 ("anv: Add support for scratch on XeHP")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13544>
There's only really one format that makes sense here, since there's no
sRGB equivalent for 10-bit packed formats.
It's possible that this results in flipped colors, if 10-bit visuals
exist that have the channels in the opposite order. (They don't seem to,
on my end)
Perhaps a more principled solution would also compare the exact r/g/b
bitmasks. But for now, this works.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3537
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9450>
We had been storing pointers to a driver owned swizzle table
rather than storing the actual swizzle value in various shader
and pipeline keys on both GL and Vulkan drivers.
This doesn't look very robust, particularly since we also
compute sha1 hashes from these values and we may store these
hashes to disk (for the disk cache).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13738>
this was a little spaghetti-ish: the module hash was sometimes being applied
during module update, sometimes in draw during program create, and then also
it was removed when a shader unbind would cause the program to no longer be reachable
now things are more consistent:
* keep removing module hash when program becomes unreachable
* only apply module hash in draw during updates there
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13727>
some drivers won't create zs textures in any shape but 2D. this can be
handled instead by using 2D textures and then performing shader rewrites to
convert shadow samplers for 1D and 1DArray types to 2D/array
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13583>
Some filesystems don't fill in d_type in dirent. Resulting in
DT_UNKNOWN. Pass this entry through to the next step and use stat on the
full filepath to determine if it is a file.
sshfs is known to not fill d_type.
This resolves an issue where driconf living on an sshfs path wasn't
working.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13697>
Needed for RADV.
The mirror situation is kinda messy since the library is not
maintained and the original website is offline. Put a mirror
in that seemed to be used by some non-fdo CIs already, but
if reliability is still a concern we can discuss more mirrors.
There is an alternative implementation that is maintained in
elfutils, but that doesn't build on Android:
1) Doesn't build with clang (resolved in git, so next release probably)
2) Needs argp_parse with is a glibc specific feature.
There is a version of elfutils in AOSP but instead of fixing upstream
they just made an Android.bp that avoids building most stuff, which
isn't really usable here.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13164>
If we ever want to stop depending on the EXEC_OBJECT_PINNED to detect
when something is pinned (like for VM_BIND), having a helper will reduce
the code churn. This also gives us the opportunity to make it compile
away to true/false when we can figure it out just based on compile-time
GFX_VERx10.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13610>
Starting with 3b363d5b55 ("anv: Assume syncobj support"), we assume
syncobj support and no longer use the execbuf sync_file API directly so
there's no point in checking for it. For the one physical device check
this deletes, we can assume has_exec_fence is always true because every
kernel with syncobj support also has sync_file.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13610>
Soft-pin is but one possible mechanism for pinning buffers. We're
working on another called VM_BIND. Most of the time, the real question
we're asking isn't "are we using soft-pin?" but rather "are we using
relocations?" because it's relocations, and not soft-pin, that cause us
all the extra pain we have to write code to handle. This commit flips
the majority of those checks around. The new helper is currently just
the exact inverse of the old use_softpin helper.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13610>
When the client calls vkMapMemory(), we have to align the requested
offset down to the nearest page or else the map will fail. On platforms
where we have DRM_IOCTL_I915_GEM_MMAP_OFFSET, we always map the whole
buffer. In either case, the original map may start before the requested
offset and we need to take that into account when we clflush.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13610>
This is left-over from the days where we didn't use BO pointers and had
them embedded and we didn't have a nice allocation API. Now that block
pools get their memory via anv_device_alloc_bo(), they really should be
using anv_device_release_bo() to tear it down. This is equivalent in
this case because anv_device_release_bo() does the following:
1. Decrements refcount. The pool holds the only reference so it will
proceed onto the clean-up steps
2. Unmaps it, if needed. This is the same as the tear-down code today
except anv_device_release_bo() will avoid doing an unmap if it's
been created via userptr. The current code probably unmaps the
userptr which is wrong but pretty harmless since it happens on a
tear-down path.
3. Unmaps the CCS range from the AUX-TT, if any. There is none, so
this is a no-op.
4. Frees the VA range if it's not pinned and doesn't have a fixed VA
assignment. These BOs always either have a fixed VA range (softpin)
or aren't pinned (relocations).
5. Closes the GEM handle. Same as the current code.
In short, anything created using the BO API should probably be destroyed
that way. We were getting away with hand-rolling it because this is a
simple case. Why did we do it this way? It dates back to before the
more formlized BO cache and BO API in ANV.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13610>
We have to mark the root as non-spillable in case the interval is the
child of some other interval, but we can't know whether it's the child
of some other interval until it's been inserted. Move the setting of
cant_spill below the insertion. This prevents us from using a bogus
parent value.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13650>
We don't current enable post sync operations, but it is probably
better to set them to "internal" MOCS than to remove the non-zero
checking for this genxml field.
Reworks:
* Fix COMPUTE_WALKER in cmd_buffer_trace_rays (s-b Jason)
Fixes: 7b78b2fcac ("intel/genxml: Assert that all MOCS fields are non-zero on Gfx7+")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13624>
On GFX version 12.5+ with COMPUTE_WALKER, this is the limit based on the
size of the HW packet. On older HW, we can technically go a bit bigger
but there's not much point. Technically, some hardware can support a
scalar workgroup size up to 2048 but most apps don't go any bigger than
1024.
As discussed on the merge request page, the current limit assumes
SIMD32, but it is unclear if we want to encourage applications to use
SIMD32 if it may lead to additional register spilling in shader
programs. Many applications have likely tuned for a limit of 1024
based on the OpenGL minimum limit, so it might not gain much by
advertising more than 1024.
Reworks:
* Jordan: Use MIN2 and limit total invocations as well.
* Jordan: Add second paragraph to commit message based on merge
request discussion.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13538>
having this be super granular was a neat idea, but really I don't care
even a little bit about a driver that's weirdly implementing *only*
dynamic vertex input or *only* dynamic state2
this massively cuts down the combinatorics and provides a more accurate
gauge of driver feature levels, since this is the general level of support
that they're likely to have
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13715>
Modern games may use more than 16 sampler views, so get what the host
actually supports, and default to 16 on old hosts that don't pass the
value.
Since the possible maximal value of PIPE_MAX_SHADER_SAMPLER_VIEWS doesn't
fit into an uint32_t remove the binding flags, they were only used for
releasing the sampler views, and this can be achieved differently.
v2: Fix compilation error
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: John Bates <jbates@chromium.org> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13646>
This will allow have tests that use gtest (which is C++) for.
Note the reordering of designated initializers in structs is
because C++ requires them to follow the order that they were
defined in the struct.
The table `bifrost_reg_ctrl_lut` is not used by the tests at the
moment, so bailed out trying to replace the C syntax designated
initializer for arrays (that doesn't have an equivalent in C++), and
simply #ifdef it out when including from C++ code.
Also, use not_mod instead of not. not is a keyword in C++ and can't be
used. Luckily, the naming is not determined by the XML, so we can freely
rename.
[Alyssa: squash in not_mod]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13684>
There are no similar macros in gtest. They do recommend pulling
another library (gmock) and use that, but having our own let us
control the output more precisely. Extracted the code from a similar
existing macro in blob_test.cpp, making changes to the output.
A check like
ASSERT_U8_ARRAY_EQUAL(expected_u8, result_u8, 32);
that fails will output
```
Expected 32 values to be equal but found 1 that differ:
expected_u8 values are:
[000] 29 44 00 00 44 60 cb 80 93 65 07 c0 08 00 40 29
[016] 03 81 00 00 5c a0 21 87 *b0 31 00 00 00 00 00 60
result_u8 values are:
[000] 29 44 00 00 44 60 cb 80 93 65 07 c0 08 00 40 29
[016] 03 81 00 00 5c a0 21 87 *af 31 00 00 00 00 00 60
```
and a check like
ASSERT_U64_ARRAY_EQUAL(expected_u64, result_u64, 4);
```
Expected 4 values to be equal but found 1 that differ:
expected_u64 values are:
[000] 80cb604400004429 29400008c0076593
[002] 8721a05c00008103 *60000000000031b0
result_u64 values are:
[000] 80cb604400004429 29400008c0076593
[002] 8721a05c00008103 *60000000000031af
```
Note the asterisk indicating wrong values.
The new header for extra macros is in src/gtest/include/ so it
doesn't get in the way when updating the real gtest headers that are
in src/gtest/include/gtest/.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13672>
If you gonna view context of function parse_atomic_op,
then you gonna know that index for array (data_src)
can be unitialized. Imho this approach is cleaner
than doing stuff inside parse_atomic_op.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12995>
Rather than having 2 paths to set the slice/subslice/eu masks, reuse
the other internal functions. This simplifies finding bugs within this
code :
* If we have i915 query topology support, update_from_topology() is
called.
* If we don't have query topology support but we have getparam for
slice/subslice/EU, we generate a topology data and call
update_from_topology()
* If we have no kernel support to query any kind of topology, we
generate the values return by the kernel for slice/subslice/EU and
call update_from_masks() which in turns calls
update_from_topology()
v2: Fixup typo (Adam)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10015>
10c75ae4 moved handling of this state to the functions that
depend on ctx->_ImageTransferState.
So we can't depend on _NEW_PIXEL being set to call this function,
since it'll be always clear earlier by _mesa_update_state_locked.
Example sequence that would trigger the issue:
glPixelTransferi(...)
glClear(...)
glTexSubImage2D(...) <-- won't use the new value set by
glPixelTransferi because glClear caused
_NEW_PIXEL to be cleared.
_NEW_PIXEL itself is kept because st_update_pixel_transfer depends
on it.
Fixes: 10c75ae4 ("mesa: move _mesa_update_pixel out of _mesa_update_state")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5273
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13596>
this ended up being a little trickier than I thought; lazy
descriptors don't use dynamic ubo types for the push set,
which means drivers that (correctly) assert dynamic offset existence
explode because the descriptor template will never work with the
push set
the better, though slightly more annoying, option here is to use the
lazy manager's faster descriptor allocation and lesser complexity to
quickly grab a push set, then tweak the existing cached codepath slightly
in order to update a raw vkdescriptorset
Fixes: 417477f60e ("zink: always use lazy (non-push) updating for fbfetch descriptors")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13677>
If an app re-issues a timestamp query a lot, but doesn't ever ask
for the results, we could end up running off the end of our query
heap. But we don't actually need to advance/accumulate, so just
use a single entry in the heap.
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12920>
The slice->size0 temp is being used as both the array stride (incorrectly)
and as the size of the slice (for this assert). This assert doesn't seem
to be in the right place to me, if you want to check that offset+slice
size is < bo size, you could just do that at the end of layout setup.
This caused troubles when fixing the temp to be the actual array stride
for filling out the HW state, since then rendering to nonzero levels would
think that the rendering overflowed the BO when it doesn't.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13668>
This should be exactly equivalent code, except for the is_3d "level <= 1"
which doesn't bring over 6c19d37331 ("freedreno/a6xx: fix 3d tex
layout") due to it failing our unit tests where we compare to the blob's
behavior. The layer_stride setup is pulling in what freedreno_resource.c
was doing after the layout setup, so we match fd6 and so that it could
potentially be checked in unit testing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13668>
These changes implement vkRegisterDeviceEventEXT and detection of
monitor hotplug. Wsi launches a thread that listens to udev events and
signals the appropriate device fences when hotplug hapens.
v2: use wsi fences instead of syncobj api (Jason Ekstrand)
v3: refactor + cleanups, create thread on demand (Samuel Pitoiset)
v4: bring back syncobj support from initial version for radv
v5: make libudev dependency optional, check for poll errors (Simon Ser)
v6: change matching mechanism to use udev device node instead of path
v7: remove the matching mechanism
v8: fix a race with thread creation + use single mutex + other cleanups
(Jason Ekstrand)
Fixes:
dEQP-VK.wsi.display_control.register_device_event
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12305>
vertex shader stages that can produce xfb must have
their input size clamped to the compiler define MAX_VARYING
to successfully be able to export an xfb output for each input
fixes KHR-GL46.geometry_shader.limits.max_input_components
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13632>
Some applications may request an indirect context but this feature is
disabled by default on Xorg and thus context creation will fail.
This commit adds a drirc setting to force the creation of direct glx
context, regardless of what the app is requesting.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13246>
si_update_ps_colorbuf0_slot used blitter_running as a way to detect
recursive calls.
Unfortunately this catch too many cases; for instance a backtrace
like:
#0 si_update_ps_colorbuf0_slot
#1 si_set_framebuffer_state
#2 do_blits
[...]
#5 si_blit
#6 si_copy_region_with_blit
Would end-up not updating ps_uses_fbfetch; so if the new fb_state is
something like:
cbufs = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, zsbuf = 0x55b8987545e0}
We can have ps_uses_fbfetch=true but cbufs[0] = NULL, which causes a
crash later in si_ps_key_update_framebuffer.
This commit fixes intermittent crashes in KHR-GL46.stencil_texturing.functional.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13550>
Assumes cmd_buffer != NULL for this path to eliminate the checks in the generic code.
Benchmarks:
I made a microbenchmark based on Bas' vulkan microbench suite.
https://gitlab.freedesktop.org/bnieuwenhuizen/vulkan_microbench/-/merge_requests/1
In this benchmark, this improves vkDescriptorTemplateUpdate performance consistently by 36%.
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
DescriptorTemplateUpdate 81.2 ns 81.2 ns 8573169
->
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
DescriptorTemplateUpdate 52.9 ns 52.9 ns 13306065
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13342>
We have to set 8c01 to say "leave these channels alone" when
clearing/storing just Z or S of z24s8. Fixes the bypass path for
KHR-GLES3.packed_depth_stencil.verify_read_pixels.depth24_stencil8.
Cc: mesa-stable
Fixes: #5592
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13659>
According to Jonathan Marek:
Only one immediate can be decoded in a cat2 instruction (if both srcs
are immediates, they will use the value of the either the first or
second one, I don't remember which) - using 2 immediates in a cat2
instruction is only "correct" if they are both equal.
The (i,j) in the second src of flat.b is not unused, but behaves as 0
for any (small) integer because it is a float src. The hack I
suggested is to set the second src equal to (immediate) first src,
which seems to work.
This allows us to remove a couple of mov instructions or a bit of extra
constfile usage.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13558>
the context flag is immutable, and it's set by drivers that always
require the shader to export a psiz value for rasterization
"always" is all cases except GL_VERTEX_PROGRAM_POINT_SIZE being
enabled, in which case it should be assumed that the shader already
writes it, and no changes should be made
thus the shader key value can be changed to reflect that, when set,
it requires the shader to export a psiz value if it doesn't already
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13588>
We rely on mesa/main to tell us whether to update vertex elements.
This decreases overhead for obvious reasons.
The select/feedback code path doesn't use this, which is why you see
unconditional "ALL" in a few codepaths.
This sequence of GL calls doesn't update vertex elements if only
the pointer and stride vary:
glVertexPointer()
glDrawElements()
glVertexPointer()
glDrawElements()
glVertexPointer()
glDrawElements()
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13512>
This splits the per-VAO NewArrays flag into NewVertexBuffers and
NewVertexElements, and adds a global NewVertexElements flag to be used
by drivers.
This allows gallium to skip updating vertex elements when only vertex
buffers need to be updated.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13512>
There are three approaches for problem:
- use dummy shader (current solution)
- use software rendering
- stub
First option always gonna give bad results, second one
gonna be slideshow (r300/r400 at best are running with
old 2 cores cpu), third one can sometimes give graphical
gliches.
IMHO third one is least annoying option.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13642>
sparse binds have to be processed synchronously with cmdbuf recording to
avoid resource object desync in the vk driver, which means they have to be
done in the driver thread instead of the flush thread. this necessitates
adding locking for the queue since there is now a case when submissions occur
in a different thread
fixes illegal multithread usage in KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13597>
vulkan requires that vertex attribute access be aligned to the size of
a component for the attribute, but GL has no such requirements
the existing alignment caps are unnecessarily restrictive for applying
this limitation, so this cap now pre-calculates the masks for elements
and vertex buffers in vbuf to enable rewriting misaligned buffers
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13556>
The single-plane descriptor emit helper doesn't strictly need the UBWC
reloc, since imageBuffer can't be UBWC, but it means the function is ready
to be used for non-buffer image descriptors later.
no-hw drawoverhead 1-imageBuffer change throughput 1.95457% +/- 1.44325%
(n=127).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13635>
The gmem store stores both depth and stencil for z24s8. So, if we're
doing a write (clear or draw) to one or the other of the channels, we need
the other one restored as well.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13649>
Instead of checking for MESA_SHADER_COMPUTE (and KERNEL). Where
appropriate, also use gl_shader_stage_is_compute().
This allows most of the workgroup-related lowering to be applied to
Task and Mesh shaders. These will be added later and "inherit" from
cs_prog_data structure.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13629>
The code is correct, but compiler can't see it. Initialize the value
to NULL and assert on it if the function succeeds. It both helps the
compiler and make the code slightly more robust.
```
../src/intel/vulkan/anv_queue.c: In function ‘anv_QueueSubmit2KHR’:
../src/intel/vulkan/anv_queue.c:932:16: warning: ‘impl’ may be used uninitialized in this function [-Wmaybe-uninitialized]
932 | result = anv_queue_submit_add_timeline_wait(queue, submit,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
933 | &impl->timeline,
| ~~~~~~~~~~~~~~~~
934 | value);
| ~~~~~~
../src/intel/vulkan/anv_queue.c:899:31: note: ‘impl’ was declared here
899 | struct anv_semaphore_impl *impl;
| ^~~~
```
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13638>
Fix defect reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member error is not initialized
in this constructor nor in any functions that it calls.
Fixes: 7558340ebb ("intel/compiler: Add helpers to select SIMD for compute shaders")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13608>
brw_simd_select return type is int.
Fix defect reported by Coverity Scan.
Unsigned compared against 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. selected_simd < 0U.
Fixes: 7dda0cf2b8 ("intel/compiler: Use SIMD selection helpers for CS")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13606>
After running the (renamed) dxil_nir_split_typed_samplers pass, the
shader will have either:
* Textures, which map to D3D SRVs
* Bare samplers, which map to D3D bare samplers
* Images, which map to D3D UAVs
There shouldn't be any remaining samplers with type information
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13390>
Typically, optimization passes go through all the blocks in a shader
and make adjustments on the fly, so we always want them to update
the current block or the current block pointer will become outdated.
Also, we don't need to keep track of the previous current block
pointer to restore it, since optimization passes run after we have
completed conversion to VIR, and therefore, anything that comes after
that should always set the current block before emitting code.
Fixes debug assert crashes when running shader-db:
vir.c:1888: try_opt_ldunif: Assertion `found || &c->cur_block->instructions == c->cursor.link' failed
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13625>
In the menu of CS:GO R8_SRGB textures are uploaded and read back, and
since R8_SRGB can't be read back on GLES, because it is not a rendertarget
format and glGetTexImage and siblings don't exists, we can't default to
enabling reading back this format. This leads to an emulation of the
glGetTexImage calls issued by CS:GO, and this slows down the menus a lot
(below 1 fps on Intel XE hosts).
So add this driconf tweak and enable it for CS:GO to work around the issue.
It can be done safely, because in this case we actually can use the data
that is stored on the host in the backing IOV.
This tweak lets the CS:GO menu run at around 60 FPS when run with virgl
on a Intel XE host when it would run with less than 1 FPS without the tweak.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: John Bates <jbates@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13572>
This shouldn't fix any real world bugs, except it will now clear more
stale syncobjs than it was previously doing, and actually do what the
comment says it does.
I could not find a real workload where this change would be relevant,
although I didn't try too much. I wrote my own little egl program to
test this.
I spotted this while reading the code when investigating a Piglit
failure [0]. It turns out this part the code was not relvant for the
failure.
[0]: ext_external_objects-vk-image-display-overwrite
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13536>
* Update Kconfig for x86_64 and ARM64. Follow the dependency tree of the
kernel modules to make sure that the intended configurations are being
set. Check scripts/merge_config.sh output as well to see if there is
a requested Kconfig not being considered.
For a630 devices:
* Use kernel version with a6xx workaround for frequency scaling
* Enable CONFIG_QCOM_LMH targeting a630 slowness on new kernel
---- Out of tree patches used ----
For a360 device:
* Revert a commit which remove slpi_region from msm8996:
8b0031f8bda2 ("Revert "arm64: msm8996: fix memory region overlap"")
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13089>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In unsized = unsized =
glsl_array_type(glsl_uintN_t_type(bit_size), 0U, bit_size / 8U),
unsized is written twice with the same value.
Fixes: f79a25653b ("zink: move all shader bo/sharedmem access to compiler passes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13607>
Lavapipe was hitting asserts in this area due to incorrect bits being
specified.
Set the handle type depending on the sw flag, and set a correct handle
type for the memory host ptrs.
v2: add image export struct to image creation (Jason)
Fixes: 895d3399f7 ("lavapipe: add support for KHR_external_memory_fd")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13615>
This fixes a crash if a data slice is submitted before the decoder
is initialized. A well-behaved application shouldn't encounter this
but returning an error is still better than crashing the entire
process and the rest of the code is similarly defensive.
Signed-off-by: Lorenz Brun <lorenz@brun.one>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11792>
Fixes dEQP-VK.reconvergence.*nesting* tests.
There are cases when cmod is set to an instruction that cannot have
conditional modifier. E.g. following:
find_live_channel(32) vgrf166:UD, NoMask
cmp.z.f0.0(32) null:D, vgrf166+0.0<0>:D, 0d
is optimized to:
find_live_channel.z.f0.0(32) vgrf166:UD, NoMask
v2:
- Add unit test to check cmod is not set to 'find_live_channel' (Matt Turner)
- Update flag_subreg when conditonal_mod is updated (Ian Romanick)
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5431
Fixes: 32b7ba66b0 ("intel/compiler: fix cmod propagation optimisations")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13268>
We can't rely on regid to be unique, shaders can have multiple varyings
with the same output value. Normally shader linking deduplicates these,
but we still need to handle the case for xfb. So use slot instead as
the unique identifier.
Fixes KHR-GLES31.core.gpu_shader5.fma_precision_*
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13605>
When looking at the output of some CTS tests, I realized that the
barriers vtn currently inserts to emulate coherent memory accesses
were being turned into fences, even though we never needed to do
anything special for coherent accesses before so presumably accesses are
already cache-coherent by default. Ignore make_visible/make_available
semantics to get us back to parity with the old path.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13599>
This just adds the structure with a name and its update function.
strlen and others will be added in the following commits. The idea is to
parse and analyze the name in advance to make glGetUniformLocation faster.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13507>
This decreases overhead because there are fewer nodes to parse.
There are 2 changes done here:
- If the next node offset is (offset % 8) == 4, pad the last node instead
of inserting NOP. This makes sure that the node offset is aligned.
- The vertex list node will add 4 bytes to the header to make the payload
aligned, so the payload will be at &n[2].
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13506>
We've noticed issues with these tests when uprevving Mesa in Chrome OS.
This CI catches some existing failures, and some debug-build assertion
failures as well.
To do this, uprev deqp-runner for its new gtest-runner command. This
runner is not as efficient as I would hope, due to some expensive code in
gtest. I've reported the issue to gtest and it should be easily fixable,
but for now it at least means we get to use the same baseline/skip/flake
handling we have from deqp and piglit runners.
I also fixed build-libdrm for our rootfses to not throw away libdrm's
share directory, which was causing a bunch of test-time spam from radeon's
libdrm when trying to look up its marketing name tables (not that big of a
deal for deqp-runner, but really noisy for piglit and libva-utils which
make gallium screens approximatly per-test).
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13419>
This fixes flakes in dEQP-VK.pipeline.stencil.nocolor.format.* when run
after ycbcr tests. Apparently LRZ needs to know if there's a media
format enabled even if there are no color attachments, so we need to
write something here. Presumably any "normal" format would work but 0
seems like a good neutral choice.
Fixes: 9c895e13 ("tu: Emit GRAS_LRZ_MRT_BUF_INFO_0")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13578>
This is a new blitter command introduced on Tigerlake and expanded
substantially on XeHP. XY_BLOCK_COPY_BLT is actually fast, unlike
the legacy blitter commands. iris will use this in the future, and
anv hopefully could use it for a transfer queue someday as well.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13520>
We don't use bindless surface or sampler states today, and are unlikely
to ever implement that in i965, but we can set a MOCS value regardless
to avoid asserts in upcoming patches that assert MOCS isn't zero.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is disabled stream output targets, MOCS shouldn't matter, as
there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for these null stream
output buffers; we can just assume a MOCS for generic internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We set MOCS on Ivybridge/Baytrail, but not Haswell, and not Skylake
and later. We shoud set it everywhere. While we're at it, we also
set it for null constant buffers, so that we aren't programming a 0
MOCS, to allow us to add some safeguards against that.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
isl now uses info->mocs regardless of whether there's any actual
depth/stencil/HiZ buffers involved, so pass it a legitimate one,
rather than zero. When we have entirely NULL surfaces, we just
default to the MOCS value for an internal buffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
isl now uses info->mocs regardless of whether there's any actual
depth/stencil/HiZ buffers involved, so pass it a legitimate one,
rather than zero. When we have entirely NULL surfaces, we just
default to the MOCS value for an internal buffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is disabled stream output targets, MOCS shouldn't matter, as
there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for these null stream
output buffers; we can just assume a MOCS for generic internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
This reorganizes the code so that we set fields in a tidy order:
1. Set the base addresses
2. Set either buffer sizes (Gfx8) or upper bound values (Gfx4-7)
(These are logically the same thing, but expressed differently.)
3. Set MOCS (Gfx6+)
I find this easier to follow.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is disabled stream output targets, MOCS shouldn't matter, as
there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for these null stream
output buffers; we can just assume a MOCS for generic internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is 3DSTATE_VERTEX_BUFFERS where we set NullVertexBuffer.
It shouldn't matter here, as there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for null vertex buffers.
We can assume an internal buffer and request isl's vertex buffer MOCS.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
isl now uses info->mocs regardless of whether there's any actual
depth/stencil/HiZ buffers involved, so pass it a legitimate one,
rather than zero. When we have entirely NULL surfaces, we just
default to isl's MOCS value for an internal depth buffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is disabled stream output targets, MOCS shouldn't matter, as
there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for these null stream
output buffers; we can just assume a MOCS for generic internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is 3DSTATE_VERTEX_BUFFERS where we set NullVertexBuffer.
It shouldn't matter here, as there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for null vertex buffers.
We can assume an internal buffer and request isl's vertex buffer MOCS.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we missed
setting a non-zero MOCS was in 3DSTATE_CONSTANT_ALL packets which fully
disable all constant buffers. (If any constant buffer was present, we
would set an actual MOCS value.)
MOCS really shouldn't matter here, as there are no actual constant
buffers to be cached. That said, it should be harmless to do so, and
we can just assume a generic MOCS for internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
isl now uses info->mocs regardless of whether there's any actual
depth/stencil/HiZ buffers involved, so pass it a legitimate one,
rather than zero. When we have entirely NULL surfaces, we just
default to isl's MOCS value for an internal depth buffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is disabled constant buffers, where MOCS shouldn't matter, as
there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for these null constant
buffers; we can just assume a generic MOCS for internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
isl now uses info->mocs regardless of whether there's any actual
depth/stencil/HiZ buffers involved, so pass it a legitimate one,
rather than zero. We just assume a generic internal MOCS when we
have entirely NULL surfaces.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is SURFTYPE_NULL surfaces, where MOCS really shouldn't matter,
as there's no actual surface to be cached.
That said, it should be harmless to set MOCS for these null surfaces;
we can just assume a generic MOCS for internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is SURFTYPE_NULL depth, stencil, and HiZ buffers, where MOCS
really shouldn't matter, as there's no actual surface to be cached.
That said, it should be harmless to set MOCS for these null surfaces.
We now set the one provided in info->mocs regardless of whether any
buffers actually exist or not.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is SURFTYPE_NULL surfaces, where MOCS really shouldn't matter,
as there's no actual surface to be cached.
That said, it should be harmless to set MOCS for these null surfaces;
we can just assume a generic MOCS for internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
The Broadwell PRM says: "Constant Buffer Object Control State must
always be programmed to zero."
This patch changes the MOCS field in gen8.xml to be "mbz" type, so that
it's impossible to set it to a non-zero value.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
There are some fields which Must Be Zero, and we don't want to allow
setting them from the template struct, but we do want them in the XML
to allow them to be decoded properly, and for documentation purposes.
This adds a new "mbz" type, much like "mbo", except it doesn't set
anything in the struct. We also update the decoder to handle it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
The GLSL spec says it's an error if a shader statically writes to these
2 variables.
Until this commit, Mesa refused to link a shader if it had an unused
function writing to one of these variables while another (used) function
wrote to the other.
This commit adds an option to perform dead function elimination after
the intra-stage linking step but before performing these checks.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12897>
1. advertise high hit rate cache combinations, and we should limit the
caches to those only require device memory pool alloc
2. use size = 1 to ask for buffer memory requirements so that we do a
sanity check on our assumption of returned size and alignment. For
implementations don't meet our assumption, continue without cache.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13428>
this check was illegal because the usage bits weren't yet populated,
so add another check after usage bits are determined to figure out if
CUBE_COMPATIBLE can be applied
additionally, checking sample counts was never needed since the spec
prohibits CUBE_COMPATIBLE use with multisampling
zink DEBUG: ERR: 'Validation Error: [ VUID-vkGetPhysicalDeviceImageFormatProperties-usage-requiredbitmask ] Object 0: VK_NULL_HANDLE, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x991b3105 | vkGetPhysicalDeviceImageFormatProperties: value of usage must not be 0. The Vulkan spec states: usage must not be 0 (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-vkGetPhysicalDeviceImageFormatProperties-usage-requiredbitmask)'
Fixes: 71494c4874 ("zink: only mark resources as cube-compatible if supported")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12580>
The test names are definitely unique (deqp has specific prefixes, piglit
uses '@' as a separator instead of '.'), so we can just have a single file
regardless of test type. Merges the two groups of xfails together so you
can't mix up which file to edit (I certainly have), and so that we don't
need to introduce yet another set of files when we add gtest for libva.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13517>
While there seems to be no bug with the state things are today, I was
recently doing some debugging and put an iris_bo_wait() before a
bo_close() in iris_bufmgr_destroy(), which caused an issue since the
bo_deps_lock mutex had already been destroyed.
Since there are quite a few things we do with the bufmgr after
destroying the mutexes, I figured we should probably postpone mutex
destruction in order to be a little safer against future code
modifications like the one I just did.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13494>
This should make it easier to tune the runtime, and enable KHR-GL* tests
in the future. (Not done currently because something in KHR-GL* causes
oomkiller).
This drops the redundant FDO_CI_CONCURRENT settings, since the default on
these boards is 4 anyway.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13504>
The original implementation in os_memory_fd.c always uses memfds.
Replace this by using the already existing os_create_anonymous_file in
order to support older systems or systems without memfd.
Fixes: 1166ee9caf ("gallium: add utility and interface for memory fd allocations")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13331>
This seems simpler and probably faster.
This also fixes a warning for these CTS tests:
dEQP-VK.pipeline.creation_feedback.graphics_tests.vertex_stage_geometry_stage_delayed_destroy_fragment_stage_delayed_destroy
dEQP-VK.pipeline.creation_feedback.graphics_tests.vertex_stage_geometry_stage_fragment_stage
because we no longer set found_in_application_cache=false for pipelines
with NGG GS.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13528>
This was not quite correct in that our checks for the allowed cases
were not checking that there were no other peripheral access other
than the ones allowed.
For example, we allowed wrtmuc signal and TMU write other than
TMUC, and we also allowed TMU read and VPM read/write. But we
cannot allow wrtmuc with TMU write other than TMUC and at the
same time a VPM write for example, so we can't just check if we
have a combination of allowed peripherals, we still need to check
that those are the only ones in use by the combined instructions.
Another example is that even if we allow a TMU write (other than TMUC)
with a wrtmuc signal, the resulting instruction must still have just
one TMU write other than TMUC, but we were allowing the merge if one
instruction signaled wrtmuc and the other wrote to tmu other than tmuc
without testing if the combined result would have 2 tmu writes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13527>
OpenGL 3.0 requires the driver can draw to 8 simultaneous render
targets. Similarly, OpenGL ES 3.0 requires the driver can draw to 4
simultaneous render targets. Fix the version computation logic to take
this into account.
On Mali T720, we support ~all features of OpenGL ES 3.1 except we only
support a single render target. Mali T720 should advertise OpenGL 2.1
and OpenGL ES 2.0 only. With the previous logic, it incorrectly
advertised OpenGL ES 3.1.
v2: Lie about the minimum for GL 3.0 to make freedreno a3xx happy. Add
Emma's reviewed-by.
v3: Update the Mali T720 CI expectations. There are tests that pass on
GLES3 but not GLES2. Unclear if these are dEQP bugs or Mesa bugs, lima
hits the same issues. Add them to the known fails
Note to mesa-stable maintainers: this downgrades the OpenGL version
advertised on Mali T720. As such, this patch should apply to the
unreleased 21.3 (Eric) but should NOT be backported to any released Mesa
versions (21.2 or older should NOT have this patch). This is a bit of a
compromise; Emma agreed with this plan on IRC.
Reported-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net> [v2]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> [v2]
Cc: 21.3 mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13455>
nir_lower_blend was written against the OpenGL ES 3.2 specification,
which does not support blending SNORM render targets. The ES spec
says that non-floating point buffers get clamped to [0, 1] before
blending. The story is not so simple: SNORM buffers are blendable in
OpenGL and must clamped to [-1, 1] rather than [0, 1]. Handle this case.
NIR does have the fsat_signed_mali instruction to clamp to [-1, 1], but
it is only implemented in Panfrost, and this pass is in common code.
Open code it instead. Panfrost optimizes the open coded version, so this
is good enough.
Fixes SNORM subtests of Piglit arb_texture_view-rendering-formats.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13499>
Some Mali GPUs can avoid storing to the tilebuffer if src alpha = 0, and
can replace blending with a store if src alpha = 1. This saves power in
the common case of alpha blending. Add predicates to check if these
optimizations are valid for a given blend equation.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13508>
Variable workgroup size works by compiling as much SIMD variants as
possible and then selecting the right one during dispatch (when the
actual workgroup size is passed to us).
Instead of replicating the logic in a separate function, reuse the
same logic for regular SIMD selection. And move function for that
together with the remaining simd selection functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13249>
Clean up the logic and move it to functions that work with prog_data
attributes to select the right SIMD. This shouldn't change any
behavior compared to the original.
Having it extracted will allow reuse by Task/Mesh and make it easier
to write tests.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13249>
zink_resource_has_binds() only checks descriptor binds, and this doesn't
include streamout or fb bindings, so call these functions from the specific
rebind points to ensure those cases are also checked
fixes#5541
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13490>
this migrates existing uses of zink_descriptor_layout_key to use
zink_descriptor_pool_key instead, which ends up being more helpful overall
since it puts all the pool-related data into a single struct
this also has a(n incredibly small) benefit of removing VkDescriptorPoolSize[6]
from every program data struct and deduplicating it into the descriptor
pool key set
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13483>
When compiling for x86 with MSVC, Vulkan API entry points follow the
__stdcall convention (VKAPI_CALL maps to __stdcall), which uses the
following name mangling:
_<function_name>@<arguments_size>
Fix the vk_entrypoint_stub()/alternatename definitions accordingly.
Fixes: 6d44b21d4f ("vulkan: Fix weak symbol emulation when compiling with MSVC")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13516>
The streamout_config SGPR is used to determine if streamout is enabled.
This fixes a GPU hang with various transform feedback tests:
- dEQP-GLES3.functional.transform_feedback.*
- KHR-GL46.transform_feedback.api_errors_test
- KHR-GL46.draw_indirect.basic-draw*-xfbPaused
- KHR-GL46.geometry_shader.api.draw_calls_while_tf_is_paused
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13514>
AFBC is keyed to the format. Depending on the hardware, we'll get an
Invalid Data Fault or a GPU timeout if we attempt to sample from an
AFBC-compressed RGBA8 texture as R32F (for example).
Fixes Piglit ./bin/arb_texture_view-rendering-formats_gles3 with AFBC.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13205>
We need to know the internal (physical) formats used for AFBC of a given
logical format, in order to check format compatibility and determine if
we need to decompress AFBC for conformance.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13205>
According to mali_kbase, all Bifrost and Valhall GPUs are affected by
issue TSIX_2033. This hardware bug breaks the INTERSECT frame shader
mode when forcing clean_tile_writes. What does that mean?
The hardware considers a tile "clean" if it has been cleared but not
drawn to. Setting clean_tile_write forces the hardware to write back
such "clean" tiles to main memory.
Bifrost hardware supports frame shaders, which insert a rectangle into
every tile according to a configured rule. Frame shaders are used in
Panfrost to implement tile reloads (i.e. LOAD_OP_LOAD). Two modes are
relevant to the current discussion: ALWAYS, which always inserts a frame
shader, and INTERSECT, which tries to only insert where there is
geometry. Normally, we use INTERSECT for tile reloads as it is more
efficient than ALWAYS-- it allows us to skip reloads of tiles that are
discarded and never written back to memory.
From a software perspective, Panfrost's current logic is correct: if we
clear, we set clean_tile_writes, else we use an INTERSECT frame shader.
There is no software interaction between the two.
Unfortunately, there is a hardware interaction. The hardware forces
clean_tile_writes in certain circumstances when AFBC is used.
Ordinarily, this is a hardware implementation detail and invisible to
software. Unfortunately, this implicit clean tile write is enough to
trigger the hardware bug when using INTERSECT. As such, we need to
detect this case and use ALWAYS instead of INTERSECT for correct
results.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13205>
The gl_FragColor lowering in the fragment shader depends on the number
of render targets, which can change every set_framebuffer_state.
set_framebuffer_state thus needs to force a rebind of the fragment
shader.
Fixes a regression in Piglit fbo-drawbuffers-none gl_FragColor -auto
-fbo from enabling AFBC on Mali G52.
Fixes: 28ac4d1e00 ("panfrost: Call nir_lower_fragcolor based on key")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13498>
This uses the new property for AFBC we've added. The AFBC quirk is
applied only to v4, and we only set dev->has_afbc on v5+ so this is not
a regression. It now respects the hardware-specific AFBC disable.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13497>
u_format defines depth formats as having depth in .x, mesa/st samples for
depth or stencil in .x (not making use of any other channels).
util_make_fs_blit_zs() looks for depth or stencil in .x. The MSAA path
was the exception looking for it in .z or .y, which was causing drivers to
need to splat their values out to the other channels. This should be
better on hardware that can emit shorter messages for sampling just the
first channels.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13446>
Specifically this enables these VK_FORMAT_FEATURE bits:
VK_FORMAT_FEATURE_TRANSFER_SRC_BIT
VK_FORMAT_FEATURE_TRANSFER_DST_BIT
VK_FORMAT_FEATURE_SAMPLED_IMAGE_YCBCR_CONVERSION_LINEAR_FILTER_BIT
VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_MINMAX_BIT
VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT
VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT
VK_FORMAT_FEATURE_MIDPOINT_CHROMA_SAMPLES_BIT
VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT
Fixes the following tests:
dEQP-VK.api.info.format_properties.g8_b8_r8_3plane_420_unorm
dEQP-VK.api.info.format_properties.g8_b8r8_2plane_420_unorm
dEQP-VK.api.info.image_format_properties.2d.optimal.g8_b8_r8_3plane_420_unorm
dEQP-VK.api.info.image_format_properties.2d.optimal.g8_b8r8_2plane_420_unorm
Additionally allows 339 tests in dEQP-VK.ycbcr.* to go from Skip to
Pass.
[ Connor: Fake support for 3-plane formats, fixup modifiers path ]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6792>
The blob seems to always emit this, even though it seems to only be used
when rendering to the special planar formats (which we only do in the
blit path). Based on the LRZ prefix it might used in other cases though.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6792>
It's not actually used for 2d blits, it's supposed to mirror
RB_MRT_BUF_INFO[0].COLOR_FORMAT and seems to be used only when rendering
to the special planar formats, although the blob name seems to suggest
it's connected to LRZ.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6792>
If the application first uses nontrivial divisors, the driver emits
the vertex shader VA to the upload BO rather than directly via the
user SGPRs locations. But, if the vertex input dynamic state changes,
the driver might select a different VS prolog that no longer needs
nontrivial divisors.
In this case, the driver needs to re-emit the prolog inputs because
otherwise the VS prolog will jump to the PC that is emitted via the
user SGPR locations, and the previous one was somewhere in the
upload BO...
This fixes a GPU hang with Bioshock and Zink.
Fixes: d9c7a17542 ("radv: enable VK_EXT_vertex_input_dynamic_state")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13377>
This was not considering the possibility that the driver has called
nir_before_block() or nir_after_block() to update the cursor, in which
case the cursor link points to the instruction list header and not
to an actual instruction.
Fixes incorrect debug-assert crash in:
dEQP-VK.graphicsfuzz.cov-increment-vector-component-with-matrix-copy
Fixes: 265515fa62 ("broadcom/compiler: check instruction belongs to current block")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13467>
Tigerlake revision 0 is an early stepping that should not be used in
production anywhere, so this code was only used for hardware bringup.
We can drop the unnecessary workarounds. This also keeps them from
triggering on early steppings of other Gfx12 parts.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13266>
Avoids assert:
../src/compiler/glsl_types.cpp:1134: static const glsl_type *glsl_type::get_array_instance(const glsl_type *, unsigned int, unsigned int): Assertion `glsl_type_users > 0' failed.
caused by us trying to compile built-in shaders (ie. clear, gmem<->mem,
etc) before clover has initialized glsl_types. But we don't need these
shaders for compute-only contexts.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13300>
A TSY barrier becomes effective at the point of the next thread switch,
so if we have one coming after a previous thread switch we need to
be careful not to emit it in its delay slots, or we would be effectively
moving the barrier earlier than intended.
Fixes simulator assert crash in:
dEQP-VK.graphicsfuzz.two-for-loops-with-barrier-function
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13468>
We want these functions to take a Vulkan format and return a
pipe_format, but tu6_plane_format() was getting redundantly called on
the result of copy_format() and copy_format() was also getting called
twice with image to image copies. Pull these functions further up the
call chain so that they're only called once.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13379>
This is prettier in the log files, less shell code, and for non-suite mode
adds checking that the driver has the right git sha1. Also, no need for
suites to have a DEQP_VER to say which dEQP we should run for the renderer
check.
The version checks can help us make sure that GL version exposed doesn't
accidentally regress, and the ".*git" checks that we're using a git
version of Mesa rather than something that snuck in through distro
packages.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13372>
Less artifacts and less time running linker. The
load_store_vectorizer test is still split since we need to update
gitlab-ci scripts to skip certain tests in certain builds. Added a
TODO with the concrete suggestion.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13414>
When we have a series of loads/stores we were creating a constant for
each one, which isn't great. Furthermore, because the nir pass puts the
offset constant at the top of the shader, it resulted in extra register
pressure and spilling when that happened inside a loop. Fix this by
using the base/offset form of stp and ldp.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13307>
this is a wild race condition, but it's possible for these to get their
final unref, enter their destructor, and then get a cache hit while waiting
on the lock to remove themselves from the cache
in such a scenario, a second, normal check of the refcount will suffice,
as the increment is atomic, and the value will otherwise be zero
fixes crashes in basemark
cc: mesa-stable
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13442>
the final ref on pools is owned by their program struct(s), and liveshader
cache can trigger shader deletion after a context is destroyed, so
attempting to prune pools here may end up deleting them before the
last ref is actually removed
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13430>
this was a bit confusing to read, and I originally left it in the hybrid
path to enable fallbacks to push descriptors in hybrid mode. the problem with
that idea is that it's impossible: the constant buffer set is the one set
that will never, ever trigger a fallback, so leaving it there just leaves
room for error and confusion
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13350>
While at it also remove CL_LUMINANCE and CL_INTENSITY, which are optional
but also quite broken.
Also advertize all formats we can already support and make the list easier
to read. Also adds support for newer formats.
v2: fixup packing for non-8 bits (airlied)
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13424>
Patch 77c2b022a0 removed lowering of 64-bit vertex attribs to 32bits.
This has thrown TGSI translation off the guard for 64bit attrib.
This lead to fail/crash of 1000+ piglit tests.
This patch basically fixes 64 bit attrib index for TGSI shader by adding placeholder
for second part of a double attribute.
It fixes all regressed piglit tests.
A big help from Charmaine to fix this regression
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: 77c2b022a0 ("st/mesa: remove lowering of 64-bit vertex attribs to 32 bits")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13363>
This will help us create staging resources with a Y8 format and avoids
calling into the Vulkan-level entrypoints which will have to be changed
to use vk_image and vk_image_view.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13359>
This is mostly based on tu_image_view. The notable difference is that we
don't handle choosing the correct plane out of multiple planes when
indicated by the aspect, which means that there is no equivalent of
VK_IMAGE_ASPECT_PLANE_1 etc. This is expected to be done in the driver,
and note that freedreno gallium handles this very differently anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13359>
The following chain of events results in an incorrectly sized buffer
persisting beyond its useful lifetime, and causing visual artifacts.
buffer is attached at size A
window is resized to size B
rendering takes place for size B
window is resized back to size A
swapbuffers with damage is called
In this scenario, update_buffers fails to recognize that the surface it's
about to commit is a different size than it has rendered. The
attached_width and attached_height are set incorrectly, and periodic
flickering is observed.
Instead, we set a boolean flag at time of resize and use this at the time
we latch the window dimensions as surface dimensions to decide whether to
discard stale buffers.
Signed-off-by: Derek Foreman <derek.foreman@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13270>
This uses a C++ template to compute the memcmp size at compile time,
which is important for getting inlined memcmp.
There are 4 different key sizes now:
GE with inlined uniforms: 68 bytes
GE without inlined uniforms: 52 bytes
PS with inlined uniforms: 28 bytes
PS without inlined uniforms: 12 bytes
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13285>
ps is for the pixel shader, while ge is for VS, TCS, TES, and GS.
si_shader_key: 68 bytes
si_shader_key_ge: 68 bytes
si_shader_key_ps: 28 bytes
The only notable change is that si_shader_select_with_key is changed
to a C++ template. Other changes are trivial.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13285>
Instead of using gsamplerND types for sampled images, use the new
gtextureND types for sampled images and reserve gsamplerND for combined
image+samplers. Combined image+sampler bindings still get a gsamplerND
type.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
With nir_var_image, we've now run out of bits in our packed blob for
deref instructions. We could revert to an unpacked blob or we could be
a bit more clever about how we encode deref modes and pack them into 5
bits.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
With the `gtest` protocol meson will add some extra arguments to the
test to generate better junit results, which may be useful. This
protocol is only available in meson 0.55.0+, so keep using the default
`exitcode` protocol for meson older than that.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8484>
This is broken for bindless images declared as local variables. It
turns out nir_variable::data::bindless is only used for uniforms and we
already assume anything in nir_var_function_temp or similar is bindless.
We could try to make a tricky assert but now that we have everything
else passing but now that we've got everyone converted the extra
validation probably isn't necessary.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13384>
INTEL_DEBUG is defined (since 4015e1876a) as:
#define INTEL_DEBUG __builtin_expect(intel_debug, 0)
which unfortunately chops off upper 32 bits from intel_debug
on platforms where sizeof(long) != sizeof(uint64_t) because
__builtin_expect is defined only for the long type.
Fix this by changing the definition of INTEL_DEBUG to be function-like
macro with "flags" argument. New definition returns 0 or 1 when
any of the flags match.
Most of the changes in this commit were generated using:
for c in `git grep INTEL_DEBUG | grep "&" | grep -v i915 | awk -F: '{print $1}' | sort | uniq`; do
perl -pi -e "s/INTEL_DEBUG & ([A-Z0-9a-z_]+)/INTEL_DBG(\1)/" $c
perl -pi -e "s/INTEL_DEBUG & (\([A-Z0-9_ |]+\))/INTEL_DBG\1/" $c
done
but it didn't handle all cases and required minor cleanups (like removal
of round brackets which were not needed anymore).
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13334>
originally, a slab attempts to reclaim a single bo. there are two outcomes
to this which can occur:
* the bo is reclaimed
* the bo is not reclaimed
if the bo is reclaimed, great.
if the bo is not reclaimed, it remains at the head of the list until it can
be reclaimed. this means that any bo with a "long" work queue which makes it
into a slab will effectively kill the entire slab. in a benchmarking scenario,
this can occur in rapid succession, and every slab will get 1-2 suballocations
before it reaches a bo that blocks long enough for a new slab to be needed.
the inevitable result of this scenario is that all memory is depleted almost instantly,
all because pb assumes that if the first bo in the reclaim list isn't ready, none of them
can be ready
for drivers like radeonsi, this happens to be a fine assumption
for drivers like zink, this is entirely not workable and explodes the gpu
Cc: mesa-stable
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Witold Baryluk <witold.baryluk@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13345>
If an <override> overrides the definition of a field, don't emit
encoding for both the override's definition and the fallback. (See
"SAMP" in #cat5-src3). It is harmless currently, because (in this
case) it will just re-encode the low bits of "SAMP". But when we
start asserting on that the field being encoded fits in the allowed
number of bits, the re-encoding of the fallback field definition
will start triggering asserts.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13353>
The only big change is that lower_vars_to_explicit no longer assigns
a driver_location for images. That means that the storage for the
format/order loads is no longer implicitly "allocated" in the middle
of the kernel args. Instead, manually add the storage for that to the end
of the input args buffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
We don't use it for bindless images because the uniforms in that case
just contain a bindless handle and aren't an actual image. Bound
images, on the other hand, go in the nir_var_mem_image class.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Instead of making images have a well-defined size, insert a dummy
variable of the appropriate type which we can use for the parameter
block layout. This will work much better when we switch over to
nir_var_mem_image.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Storage images will start using nir_var_mem_image but sampled images
still use nir_var_uniform. If we're going to rewrite types, we need to
rewrite the modes as well. Otherwise, nir_validate will get grumpy and
drivers might get confused.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Instead of walking all uniforms and handling images as a special case,
walk "normal" uniforms first and images as a second pass. This lets us
use nir_foreach_image_variable which will survive the upcoming refactor.
While we're at it, use nir_foreach_image_variable in
brw_nir_lower_gl_images too.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Scratch patching code in iris_upload_dirty_render_state (see MERGE_SCRATCH_ADDR
calls) assumes that in all shader stages derived_data field stores 3DSTATE_XS
packet first.
This is not true for TESS_EVAL (DS), so we end up patching 3DSTATE_TE
instead of 3DSTATE_DS leading to DWordLength becoming 11 instead of 9
(9 == 3DSTATE_DS.DWordLength, 2 == 3DSTATE_TE.DWordLength, and 9|2 == 11),
and hardware hanging on the next instruction.
Fix this by reversing the order of packets for TESS_EVAL stage.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5499
Fixes: 4256f7ed58 ("iris: Fill out scratch base address dynamically")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13358>
(ported from iris - airlied)
The MI_COPY_MEM_MEM version of resource_copy_region has known bugs:
It's failing to set valid_buffer_range correctly
It's missing iris_emit_buffer_barrier_for() for the source/destination, so there may be missing flushes.
There are some bad interactions with the tile cache and VF using L3.
Even with those fixed, if you expand the "no more than 16 bytes" restriction to allow copies up to 1024 bytes, then it starts failing Piglit tests on Icelake.
We could probably fix this. However, I had originally only measured a 0.689096% +/- 0.473968% (n=4) speedup in Shadow of Mordor's OpenGL port, which is already fairly small, especially before adding missing flushes. Further, some of that likely came from not switching between render and compute...which we'll soon be able to avoid thanks to BLOCS.
Folks were also worried that MI_COPY_MEM_MEM can't be pipelined, and that stalling the command streamer may actually slow things down, especially as the GPUs become more powerful. We aren't really sure about this, but it's another concern.
So, let's just get rid of this optimization. It seemed like a good idea at the time, but it's just causing issues for very little gain.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13374>
When eglReleaseThread() is called from application's
destructor (API with __attribute__((destructor))),
it crashes due to invalid memory access.
In this case, _egl_TLS is freed in the flow of
_eglAtExit() as below but _egl_TLS is not set to NULL.
_eglDestroyThreadInfo
_eglFiniTSD
_eglAtExit
_run_exit_handlers
exit
Later when the eglReleaseThread is called from
application's destructor, it ends-up accessing
the freed _egl_TLS pointer.
eglReleaseThread -> in libEGL_mesa
eglReleaseThread -> in libEGL(glvnd)
destructor() -> App's destructor
To resolve the invalid access, setting the _egl_TLS
pointer as NULL after freeing it.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5466
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13302>
This sets the conformance version to 0.0.0.0 for GPUs that have
incomplete support for vulkan, so that it's easier to check if vulkan is
fully supported by a GPU at runtime for applications/libraries.
$ vulkaninfo|grep conf
MESA-INTEL: warning: Ivy Bridge Vulkan support is incomplete
conformanceVersion = 0.0.0.0
Signed-off-by: Clayton Craft <clayton@craftyguy.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13275>
The burden was put on the caller, which caused:
- Reading of tess levels back in TCS not accounting for component
- Reading patch outputs in TES account for component twice
Fixes vkd3d tests:
- test_tessellation_read_tesslevel
- test_tessellation_primitive_id
- test_line_tessellation_dxbc
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13338>
We already have code to deal with non-client-visible objects but we were
asserting if it didn't fall into one of the clearly mappable error
cases. However, we didn't have a mapping for VK_ERROR_NOT_PERMITTED
which can happen during object creation. Let's just be sloppy and drop
the assert. Worst case, the client gets an error with no object.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13341>
VK_ERROR_INITIALIZATION_FAILED can happen as part of device creation and
isn't really an instance error in that case.
VK_ERROR_EXTENSION_NOT_PRESENT, on the other hand, is always an instance
thing and we should handle it as such.
Fixes: 0cad3beb2a ("vulkan/log: Add common vk_error and vk_errorf helpers")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13341>
Mapping unimplemented entrypoints to a global function pointer variable
initialized to NULL is a bit cumbersome, and actually led to a bug
in the vk_xxx_dispatch_table_from_entrypoints() template: the !override
case didn't have the right check on the source table entries. Instead of
fixing that case, let's simplify the logic by creating a stub function
and making the alternatename pragma point to this stub. This way we get
rid of all those uneeded xxx_Null symbols/variables and simplify the
tests in vk_xxxx_dispatch_table_from_entrypoints().
Cc: mesa-stable
Fixes: 98c622a96e ("vulkan: Update dispatch table gen for Windows")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13348>
Two carchase compute shaders (shader-db) and two Fallout 4 fragment
shaders (fossil-db) were helped. Based on the NIR of the shaders, all
four had structures like
for (i = 0; i < 1; i++) {
...
for (...) {
...
}
}
All HSW+ platforms had similar results. (Ice Lake shown)
total loops in shared programs: 6033 -> 6031 (-0.03%)
loops in affected programs: 4 -> 2 (-50.00%)
helped: 2
HURT: 0
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 143692018 -> 143692006 (-0.0%)
SENDs in all programs: 6947154 -> 6947154 (+0.0%)
Loops in all programs: 38285 -> 38283 (-0.0%)
Cycles in all programs: 8434822225 -> 8434476815 (-0.0%)
Spills in all programs: 191665 -> 191665 (+0.0%)
Fills in all programs: 298822 -> 298822 (+0.0%)
In the presense of loop unrolling like this, the change in cycles is not
accurate.
v2: Rearrange the logic in the if-condition to read a little better.
Suggested by Tim.
Closes: #5089
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13323>
Using recommended values based on performance studies across a range
of workloads.
Rework:
* Always enable geometry distribution
* Set ListCutIndexEnable if primitive restart is enabled
* Set distribution mode based on TEEnable
* Add comment explaining the 3DSTATE_VFG bits (Sagar)
v2:
- Emit 3DSTATE_VFG dynamically based on primitive restart (Ken)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12091>
The basic vertex+fragment shader state uses the packet
GL_SHADER_STATE, but when geometry shader are involved, the packet
used is GL_SHADER_STATE_INCLUDING_GS.
Without this commit any program using a geometry shader would dump
their shader state (and their shader state record and attribues) as
binaries.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13269>
At that point we didn't call all the v3dv lowerings. So the reference
nir shader used to call the v3d compiler could be different.
Note that at that point the nir shader is only available for internal
shaders (like gs multiview).
This specifically affected multiview tests that wrote gl_PointSize, as
the nir shader for the geometry shader were wrongly exposing
per_vertex_point_size as false, as we were basing our check on the
nir_shader_info, and that was gathered calling nir_shader_gather_info
at pipeline_lower_nir.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13325>
echo'DEQP_SUITE must be set to the name of your deqp-gpu_version.toml, or DEQP_VER must be set to something like "gles2", "gles31-khr" or "vk" for the test run'
Every kernel uprev should update 3 image tags, located at two files.
:code:`.gitlab-ci.yml` tag
^^^^^^^^^^^^^^^^^^^^^^^^^^
-**KERNEL_URL** for the location of the new kernel
:code:`.gitlab-ci/image-tags.yml` tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-**KERNEL_ROOTFS_TAG** to rebuild rootfs with the new kernel
-**DEBIAN_X86_TEST_GL_TAG** to ensure that the new rootfs is being used by the Gitlab x86 jobs
Development routine
-------------------
1. Compile the newer kernel locally for each platform.
2. Compile device trees for ARM platforms
3. Update Kconfigs. Are new Kconfigs necessary? Is CONFIG_XYZ_BLA deprecated? Does the `merge_config.sh` override an important config?
4. Push a new development branch to `Kernel repository`_ based on the latest kernel tag used in Gitlab CI
5. Hack `build-kernel.sh` script to clone kernel from your development branch
6. Update image tags. See `Updating image tags`_
7. Run the entire CI pipeline, all the automatic jobs should be green. If some job is red or taking too long, you will need to investigate it and probably ask for help.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.