Trying to use Zink with the DRM winsys is problematic at best. In
theory all the pieces should be there but there are bugs around
synchronization, etc. This copies the current behavior of X11 which
also refuses to initialize DRM on Zink.
This is particularly important for Nouveau users which are getting Zink
via the loader returning "zink" for get_driver_for_fd(). On Wayland
today, we go ahead and load zink and attempt to use the DRM EGL paths,
which are broken. On X11, it fails to load the driver and then the EGL
device initialization code re-tries with zink which succeeds. Because
the second try is using Zink from the start, it's equivalent to if the
user had set MESA_LOADER_DRIVER_OVERRIDE=zink and they get the kopper
path, which is more reliable.
This absolutely not what we want because it means we don't load the
driver the loader requested and are dependent on fallback paths to get a
driver loaded at all. However, the real solution in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36014
is too complex to backport.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13498
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36119>
If there is a separate stencil in use, the resource invalidation
flag was not being removed for the depth buffer as rsc was assigned
to the separate stencil.
Fixes: 6ff509593c ("v3d: Only apply TLB load invalidation on first job after FB state update")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36030>
(cherry picked from commit 7d51a10cda)
rustfmt would change not only lib.rs, which is output by this script,
but also all of the other files in the library. The problem is, those
other files are inputs to lib_rs_gen so the script was changing its own
inputs which meant ninja needed to be run twice before we reached a
fixed point. Remove rustfmt for now to prevent this issue.
Fixes: 591b5da4 ("nouveau/headers: Run rustfmt on generated files")
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36121>
(cherry picked from commit 9b275cdbcc)
The border color of the rv770 gpu behaves the same way as
the evergreen border color. This change updates the software
accordingly.
This change is enabled for all the pre-evergreen gpus.
This change fixes 120 piglit tests. The rv770 ci is updated
as well.
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34502>
(cherry picked from commit 5bee7c0b12)
This is the backport of 0c0b978938 "radeonsi: set NEVER as
the depth compare func if depth compare is disabled".
The function r600_tex_compare arguments are updated with the "const"
keyword.
This change fixes the test below which was broken after 0c6e56c391:
khr-gl4[5-6]/incomplete_texture_access/sampler: fail pass
Fixes: 0c6e56c391 ("mesa: (more) correctly handle incomplete depth textures")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35968>
(cherry picked from commit d9baadcfb5)
this addresses an ancient race condition where unmapping memory
in one thread at the same time memory is mapped in a different thread
could proceed without synchronization and result in the second thread
writing to unmapped memory
this was the actual cause of #12533
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36076>
(cherry picked from commit 841080ed42)
The Vulkan spec says:
"If bindingCount is zero or if this structure is not included in
the pNext chain, the VkDescriptorBindingFlags for each descriptor
set layout binding is considered to be zero. Otherwise, the
descriptor set layout binding at
VkDescriptorSetLayoutCreateInfo::pBindings[i] uses the flags in
pBindingFlags[i]."
Fixes recent VKCTS dEQP-VK.api.maintenance3_check.*.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13503
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36044>
(cherry picked from commit 36879c4f99)
First, we handle the case where GetMemoryFdKHR fails. This is unlikely
and, if it's a Mesa driver it probably won't stomp the FD but we should
be extra careful. Then, we can close the dma-buf file immediately after
we call drmIoctl() on it, ensuring we don't leak the dma-buf file
descriptor if drmIoctl() fails. If ImportSemaphoreFdKHR() fails, then
we need to clean up the sync file.
Fixes: d4f8ad27f2 ("zink: handle implicit sync for dmabufs")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36048>
(cherry picked from commit de4224a57c)
The program info contains all sorts of flags concerning the compilation state of
the program, and those need to be reset when the program is going to be recompiled
when its source changes.
For example, before this change, after info.flrp_lowered got set following the
first call to glProgramStringARB, flrp lowering would not happen for any subsequent
updates to the program string, leading to compilation failures.
Signed-off-by: Charlotte Pabst <cpabst@codeweavers.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35523>
(cherry picked from commit be6b49449c)
The consumer of the Android surface may or may not be display. e.g. it
can also be a media encoder. When BufferQueue makes the allocation, it
takes the gralloc usage bits from both the client API (EGL/Vulkan) and
the consumer side.
Cc: mesa-stable
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35785>
(cherry picked from commit 8f4c938c1e)
The composer usage bit is automatically added by the surface consumer
side when the consumer is SurfaceFlinger. e.g. if the swapchain is
connected with a media encoder surface, the consumer side would append
encoder usage bit instead.
Fixes: c406d53858 ("vulkan/android: Add common vkGetSwapchainGrallocUsage{2}ANDROID")
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35785>
(cherry picked from commit 495156079c)
Descriptor set layout lifetime can be shorter than what the
implementation requires. One example is :
* create descriptor set layout
* create graphics pipeline library
* destroy descriptor set layout
* link optimize library in a final pipeline
The last step might need the descriptor set layout information again.
We've so far worked around this by taking a reference on the
descriptor set layout in the pipelines. But we forgot that descriptor
set layouts have pointers to samplers (for immutable & embedded
samplers).
We could take a reference to samplers but that sucks for various
reasons :
- it consumes dynamic state heap space
- it could cause issues with capture-replay placement
So instead we copy the information from the samplers that might be
needed in cases like link optimization. This includes :
- ycbcr conversion state (used for NIR lowering)
- embedded sampler data (to recreate the sampler)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35955>
(cherry picked from commit 67e452669e)
If the last block is empty, nir_block_last_instr returns NULL, which
sets the cursor to NULL, which crashes.
I think this can't crash currently because if xfb is present, there is
always at least 1 output store in the last block due to
lower_io_vars_to_temporaries, but that won't be true after we stop
calling it in a later commit.
Fixes: fa9cee4247 - glsl: implement lower_xfb_varying() as a NIR pass
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
(cherry picked from commit 9083e8b984)
From the Vulkan spec:
`If pColorAttachmentLocations is NULL, it is
equivalent to setting each element to its index
within the array.`
Use similar logic to what we do in
CmdSetRenderingInputAttachmentIndices to handle
this behaviour properly.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35948>
(cherry picked from commit 1ceded0c83)
This issue was generating unwanted write accesses that
could overwrite previous operations.
Note: This functionality could also be tested with
nir_lower_wrmasks. This problem seems to only affect
the ssbos.
This change was tested on cypress, barts and cayman. Here are the tests fixed:
khr-gl4[3-6]/compute_shader/pipeline-pre-vs: fail pass
khr-gl4[5-6]/direct_state_access/queries_functional: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/advanced-cast-cs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/advanced-cast-fs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_storage_buffer_object/advanced-switchbuffers-cs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_storage_buffer_object/advanced-switchprograms-cs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_storage_buffer_object/basic-operations-case1-cs: fail pass
khr-gl4[3-6]/shader_storage_buffer_object/advanced-switchbuffers-cs: fail pass
khr-gl4[3-6]/shader_storage_buffer_object/advanced-switchprograms-cs: fail pass
khr-gl4[3-6]/shader_storage_buffer_object/basic-operations-case1-cs: fail pass
khr-gl4[4-6]/texture_buffer/texture_buffer_max_size: fail pass
khr-gles31/core/compute_shader/pipeline-pre-vs: fail pass
khr-gles31/core/shader_image_load_store/advanced-cast-cs: fail pass
khr-gles31/core/shader_image_load_store/advanced-cast-fs: fail pass
khr-gles31/core/shader_storage_buffer_object/advanced-switchbuffers-cs: fail pass
khr-gles31/core/shader_storage_buffer_object/advanced-switchprograms-cs: fail pass
khr-gles31/core/shader_storage_buffer_object/basic-operations-case1-cs: fail pass
khr-gles31/core/texture_buffer/texture_buffer_max_size: fail pass
khr-glesext/texture_buffer/texture_buffer_max_size: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35830>
(cherry picked from commit 9e5d11bff3)
the semaphore stage is VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
so the src access barrier must also use this in order to ensure it happens
after the acquire
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35940>
(cherry picked from commit 69b5abee14)
We haven't wired this up in the Midgard compiler, so we can't expose
sample shading on Midgard GPUs. This all seems fixable, because the KILL
instruction can update the coverage without the kill-flag (yeah, a bit
confusing naming), but until someone puts in the time to wire up that,
let's just disable the functionality to avoid crashes.
Fixes: 6bba718027 ("panfrost: Advertise SAMPLE_SHADING")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35881>
(cherry picked from commit 504e511c44)
Per Ken Graunke, corruption issues with push
constants for render batches on Gen12 graphics
have been observed and worked around by re-emitting
push constants at the start of the batch buffer.
We're seeing similar issues with compute batches,
so we'll apply the same work-around.
Fixes corruption reported in Blender on ADL/RPL
CC: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35873>
(cherry picked from commit 8fd008a45f)
The Midgard compiler only deals with sized NIR types for image loads and
stores. Since we already have nir_get_nir_type_for_glsl_base_type()
which can provide us with the corresponding sized type, let's just use
that, and drop the extra table.
This fixes the following piglits on Mali-T760:
- spec/ext_texture_compression_s3tc/getteximage-targets 2d s3tc
- spec/ext_texture_compression_s3tc/getteximage-targets cube s3tc
Fixes: 9123ee0f18 ("st/mesa/pbo: Set src type on image_store")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35882>
(cherry picked from commit 96d124092f)
The chances of this happening are near zero with the way we do surface
ops today but I have seen it in the wild and this is apparently a rule.
The hardware throws an illegal instruction encoding if it sees 255.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35895>
(cherry picked from commit 19bee26056)
This looks like a hw bug on GFX10.3-GFX11.5 because RB+ seems to only
work as expected when all channels (RGBA) are written. With that format,
RGB channels must be all set or unset but setting the A channel is
legal so far.
This will reduce rendering performance with that format but it's the
less intrusive solution for now. This might be revisited in the near
future, also with more VKCTS coverage.
This has been tested and verified on GFX10.3 (NAVI21) and GFX11
(NAVI31) and GFX12 (NAVI48), unfortunately I don't have GFX11.5 but
let's assume it's broken there too.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13371
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35631>
(cherry picked from commit 10ef9c6a80)
Maximum texture dimension is 2^16, but we're limited by the 32-bit
fields that are used to pass strides/sizes in various descriptors.
Assuming RGBA32_FLOAT is the biggest format we support, that gives us a
16k-1 image size for 2D and cube map, and 512 for 3D.
Change our GetPhysicalDeviceImageFormatProperties2() implementation so
that smaller formats can still advertise bigger image sizes.
Fixes: d5ed77800e ("panvk: Fix GetPhysicalDeviceProperties2() to report accurate info")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35555>
(cherry picked from commit e25a91d919)
Piglit arb_texture_buffer_object-render-no-bo was generating
gpu resets because the uniform stream was missing the last
Fragment Shader uniform. So it was reading instead of the last
fragment shader uniform the first uniform of the vertex shader.
And using that unrelated VS uniform as the sampler address where
the texture should be read.
So now if a buffer object is not bound for a texture buffer object
we write the texture state base address to 0 (NULL) so the default
texture state is used.
So only is needed to set the 4 lower bits of the tmu_p0 with
the bit-mask of word enables.
Fixes: bb8285c258 ("v3d: add support for no buffer object bound")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35847>
(cherry picked from commit 0f8c681c5c)
As per the old comment:
"These formats correspond to the similarly named MESA_FORMAT_*
tokens, except in the native endian of the CPU. For example, on
little endian __DRI_IMAGE_FORMAT_XRGB8888 corresponds to
MESA_FORMAT_XRGB8888, but MESA_FORMAT_XRGB8888_REV on big endian."
Fixes: 7e10601786 ("dri: Redeclare __DRI_IMAGE_FORMAT_* as PIPE_FORMAT_*")
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35814>
(cherry picked from commit 2ca46c7a7a)
nvidia hardware can't render to linear surfaces except under some
very limited circumstances, one of those is if Z is enabled.
However there appears to be some combination of gnome-shell, and
prime (with 2 nouveau cards) where we end up getting through the
GL API to the situation where we try this. This in a production
build causes the kernel to crash with a GR error.
However there existed a period of time where the hw/kernel due to
some other random hw misconfiguration didn't crash when this happened
and doing this was prefect fine. (linear + tiled Z).
This restores the userspace code to do this and just ignores the
Z buffers if we are asked for linear rendering, and seems sufficient
to fix the problem.
I do understand this is a workaround, but I think it's reasonable to
add to the nouveau GL driver at this time since we don't want to
maintain if for ever and it probably should fix a bunch of wierd
user problems with multi gpu and nouveau.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35221>
(cherry picked from commit 06e8db646a)
The new layout affects the whole buffer so it needs to be done
on a full clear.
This fixes this piglit test on a RX 6800 XT:
ext_framebuffer_multisample-accuracy 6 depth_resolve small depthstencil
Fixes: 75a03d733a ("radeonsi: simplify and fix enable_tc_compatible_htile_next_clear logic")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35582>
(cherry picked from commit 04d283c628)
Otherwise, outputs_read/outputs_written might not be up-to-date
(mostly after nir_remove_dead_variables) and remove_point_size() might
reach an assertion later because the output variable isn't found.
It seems better to run nir_shader_gather_info() at the very end of
radv_optimize_nir() which can change a lot of things anyways.
No fossils-db changes.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35707>
(cherry picked from commit 30ccd97cd2)
`cc: mesa-stable` instead of `fixes:` because several commits have
modified this but keeping this bug:
- 06e57e3231 ("virtio: Add vdrm native-context helper") made
an unconditional copy of subdir(virtio)
- cede4e7ac3 ("meson: Only include virtio when DRM available")
introduced a new condition, which doesn't cover everything that was
needed
- other commits made more changes
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35723>
(cherry picked from commit d0c7bea727)
When panfrost_resource_init_afbc_headers() fails, freeing the newly
created resource is not enough, because we need to unreference its BOs.
This will also take care of freeing its resource label.
Also replace instances of FREE() in error-handling paths with
panfrost_resource_destroy(), as it is capable of handling partially
initialised resources.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Fixes: e3f2bc7963 ("panfrost: handle mmap failures")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34224>
(cherry picked from commit 32b128be01)
The snorm formats are not compatible with the srf flag
which was set by the emit_image_load_or_atomic() function.
In this specific case, "use_const_fields" is not set which
implies that the format definition is local. The other
supported formats do not require the srf flag as well.
This change was tested on cypress, barts and cayman. Here are the tests fixed:
khr-gl4[2-6]/shader_image_load_store/basic-allformats-load: fail pass
khr-gl4[2-6]/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/basic-allformats-loadstorecomputestage: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
khr-gles31/core/shader_image_load_store/basic-allformats-loadstorecomputestage: fail pass
khr-gles31/core/shader_image_load_store/basic-alltargets-loadstorecs: fail pass
deqp-gles31/functional/image_load_store/2d/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d_array/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/2d_array/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/3d/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/buffer/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/buffer/format_reinterpret/rgba8_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/cube/format_reinterpret/r32f_rgba8_snorm: fail pass
deqp-gles31/functional/image_load_store/cube/format_reinterpret/rgba8_rgba8_snorm: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35548>
(cherry picked from commit d27ed38d1a)
The mode r10g10b10a2_snorm processed as vertex on palm at the
hardware level doesn't follow the current standard. Indeed, the .w
component (2-bits) is not calculated as expected. The table below
describes the situation.
This change fixes this issue by adding three gpu instructions at
the vertex fetch shader stage. An equivalent C representation and
a gpu asm dump of the generated sequence are available below.
.w(2-bits) expected palm
0 0.0 0.000000
1 1.0 0.333333
2 -1.0 0.666667
3 -1.0 1.000000
w_out = (4.*w_in > 1. ? 1. : 4.*w_in) - (w_in > 0.5 ? 2. : 0.);
0002 00000008 A0080000 ALU 3 @16
0016 00000C02 A0000CC0 1 y: MOV*4_sat __.y, R2.w
0018 801F8C02 600004A0 w: SETGT*2 __.w, R2.w, 0.5
0020 839FC4FE 60400010 2 w: ADD R2.w, PV.y, -PV.w
Note: The rv770 and cypress don't need this correction. This is
definitely a hardware change between these gpus.
This change was tested on palm, barts and cayman. Here are the tests fixed:
spec/arb_vertex_type_2_10_10_10_rev/arb_vertex_type_2_10_10_10_rev-array_types: fail pass
deqp-gles3/functional/draw/random/124: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/normalize/int2_10_10_10/components4_quads1: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/normalize/int2_10_10_10/components4_quads256: fail pass
khr-gl43/vertex_attrib_binding/basic-input-case5: fail pass
khr-gl44/vertex_attrib_binding/basic-input-case5: fail pass
khr-gl45/vertex_attrib_binding/basic-input-case5: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32427>
(cherry picked from commit e8fa3b4950)
`getopt_long()` returns an `int`, not a `char`; putting the value in
a `char` before comparing it to `-1` was making the comparison always
fail, resulting in the invalid codepath taken that then fails with:
option `-' is invalid: ignored
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
(cherry picked from commit 99e8d804bf)
`subprocess.Popen()` returns immediately, and the subprocess might not
have finished by the time `stdout` is read on the next line, spuriously
failing the tests.
`subprocess.check_output()` makes sure the output is available before
returning, solving this issue; it additionally raises an error if the
subprocess failed, giving a better error than a failed diff later in the
script.
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
(cherry picked from commit de6ab1beda)
Clients are expecting the color info to be fully filled when the api
exists. Give proper defaults for the metadata to stay aligned with
legacy backends.
Also amend the missing ChromaSiting cases.
Fixes: ee42e2166d ("android: Introduce the Android buffer info abstraction")
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35613>
(cherry picked from commit 64d18f84b0)
The MEMORY_BARRIER instruction has some issues, where we end up
dead-code eliminating it before it gets to do what it's supposed to do.
But even if we fix that, we have issues where we can end up inserting
flow control into it, which isn't going to work because we have nothing
to emit here either.
So let's rework this to a special-cased NOP instruction, which is marked
as a scheduling barrier. The beneft here is that NOPs are already properly
handled when it comes to flow control.
Note that this isn't perfect either; this only prevents memory operations
from crossing the scheduling barrier. We should really prevent any
operation with observable side effects from crossing the barrier. This
includes things like reading clocks etc.
But that's a larger change, and it's a step in the right direction to get
this to no longer be dead-code eliminated. So let's put this band-aid on
for now.
Fixes: f77a50e45e ("pan/bi: add a MEMORY_BARRIER pseudo-instruction")
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35502>
(cherry picked from commit 18893a250f)
It might be the case that both the branch and exec mask write in a
divergent branch block are removed. try_remove_simple_block() might then
try to remove it, but fail because it has multiple logical successors.
Instead, just skip these blocks.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202>
(cherry picked from commit 5344abbc56)
u_blitter sets a viewport transform with depth range [-1,1], which is
outside the [0,1] range that is allowed by opengl.
The mali hardware docs state that setting the LOW_DEPTH_CLAMP register
outside of [0,1] is undefined behavior. We haven't observed any problems
with this so far, but better to fix it.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 810135fb42 ("gallium/u_blitter: Fix depth.")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35225>
(cherry picked from commit b8c7fcda27)
The SPIR-V spec is pretty clear that coordinates on subpass attachments
are relative to the current pixel. They're required to be zero but we
should stay consistent with ourselves (we already do this for image
intrinsics) and with the spec.
Fixes: 84b08971fb ("nir/lower_input_attachments: lower nir_texop_fragment_{mask}_fetch")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35551>
(cherry picked from commit 2c13e1e655)
There's nothing in NIR which guarantees that the deref is the first
source or that the coordinate is the second. Use
nir_tex_instr_src_index() to get the actual indices.
Fixes: 84b08971fb ("nir/lower_input_attachments: lower nir_texop_fragment_{mask}_fetch")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35551>
(cherry picked from commit 9a52b9372c)
Turnip when cross-compiled for i386 needs to be built with SSE2 as a
minimum spec, as it uses clflush unconditionally. Make sure to pass in
the sse2_args, which will be empty on Arm64 targets.
Fixes: 7231eef630
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35621>
(cherry picked from commit f872cbea37)
When running `ffmpeg -vaapi_device /dev/dri/renderD128 -f lavfi -i smptebars=duration=5:size=1280x720:rate=30 -vf format=rgba,hwupload,scale_vaapi=format=nv12,hwdownload testout.mp4`,
the vpe is asked to transform into NV12, which cannot be done with a blit. `si_vpe_construct_blt` fails, but then it gracefully falls back into `vlVaPostProcCompositor` and finishes the task correctly, but not before logging the error:
SIVPE ERROR ../mesa-25.1.3/src/gallium/drivers/radeonsi/si_vpe.c:1095 si_vpe_construct_blt Failed in checking process operation and build settings(9)
for each frame that is processed. Since this is expected, my original thought was to demote this to a warning. But looking at all the reasons it could fail for, there already is a warning (or error) logged, so it seems to me the best thing to do is remove the error entirely
There may be a better approach here, and I'm all ears.
Fixes: e85a6b6a63 ("radeonsi/vpe: check reduction ratio")
Reviewed-by: Peyton Lee <peytolee@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35449>
(cherry picked from commit e1bcd0f4a5)
This shouldn't matter if FDM is actually enabled, because in that case
the pipeline must enable the bit and we dirty FDM state at the
beginning, but pipelines can enable FDM even if the renderpass they're
used in doesn't use FDM and in that case we still need to use the FDM
path to duplicate the viewports. Fix the case where a different pipeline
is bound that enables FDM without actually using FDM.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35581>
(cherry picked from commit 60843bc806)
The max dimensions are in units of pixels, not bytes. But the x
coordinate shift is based on aligning the address/offset to 64.
Rework the buffer clear loop to iterate in terms of pixels, but
with the x dimension shift based on converting aligned offset
to pixels.
Fixes OpenCL-CTS test_buffers.
Fixes: dafc4476f7 ("freedreno: Implement fast clear_buffer for Adreno 6xx and 7xx")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35557>
(cherry picked from commit 8d13fc447e)
DCC tilings info needs to be set for all surfaces, including
depth/stencil. But because this is a C union, settings those fields
for depth/stencil surfaces might accidentally overwrite HiZ info.
This fixes rendering issues with RADV_DEBUG=nohiz.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35515>
(cherry picked from commit 251b23f6c2)
The frontend does not know that etnaviv may keep multiple shadows around
for a resource, so it will always pass in the base resource as blit source
and destination. For those blits to work as expected by the API we need to
work out which shadow is the most recent one and use those as blit source
and destination resources.
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35526>
(cherry picked from commit ede41372f4)
etna_copy_resource() and etna_copy_resource_box() are used to keep the
internal shadow copies of a resource up to date. They are supposed to
always use the RS or BLT engines to do the copy, never requiring any
fallbacks or fake format handling. They should also work regardless of
the current render condition state. So instead of going through the
pipe_context blit hook, directly call the RS or BLT blit hook on the
etna_context.
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35526>
(cherry picked from commit d4780f03fc)
This reverts commit 57ea689273.
This optimization is only sound when the operands of iadd are unsigned.
It turns out this is not always the case.
While the particular failure I was seeing was fixed by changing the
unsigned shifts to signed ones, I don't believe this is sound either. So
it's better to disable it for now until we find a better solution.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 57ea689273 ("ir3: optimize SSBO offset shifts for nir_opt_offsets")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34324>
(cherry picked from commit 29eb9ec7b7)
This prevents a regression from the next commit which would write
garbage for combined image+sampler descriptors and that might break
capture&replay.
It seems also more robust to write zeroes than garbage overall.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35455>
(cherry picked from commit 22e06d65d7)
The referenced commit was a step in the right direction, but not
complete.
ac_build_image_opcode returns a vec<4> or a struct<vec<4>, int>
so we can simplify visit_tex. We just need to map these 4/5 values
to the expected layout from NIR.
eg: depth + TFE would produces "<d, x, x, x>, t" so it has to be
transformed into <d, t>.
nir_texop_fragment_mask_fetch_amd + sparse doesn't exist, so it's
another opportunity for simplification.
This is required to get KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup_texture_2d_depth_component16
working properly.
The same test fails with ACO so it probably needs a change in the
same area.
Fixes: c0ef2aa7f8 ("DEPENDENCY: ac/llvm: fix sparse code handling")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35206>
(cherry picked from commit 4a84ebfcb1)
If the clone_append was to a chunk of the same u_trace that gets
process_chunk()ed after where we're cloning from, then the payloads would
have been unreffed in the previous chunk's cleanup_chunk().
Fixes use-after-frees with turnip gmem rendering that resulted in
corrupted payloads.
Fixes: 14e45cb21e ("util/u_trace: refcount payloads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35379>
(cherry picked from commit 6e97df1d76)
The previous fix 0cae8d372e is the right way to proceed, but it
should also apply when index_size is non-zero.
This change was tested on palm and cayman. Here is the test fixed:
spec/arb_multi_draw_indirect/arb_draw_elements_base_vertex-multidrawelements -indirect: fail pass
Fixes: 0cae8d372e ("r600: don't set an index_bias for indirect draw calls")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34726>
(cherry picked from commit a640b7233c)
The ISA docs don't mention this, but instead of always truncating
like other integer conversions, this opcode actually uses the single
precision rounding mode.
We could continue to use the opcode and set the rounding mode to rtz
in lower_to_hw_instrs, but I think I should just concede that f2u8
isn't worth the effort.
Fixes: 9bb10b58 ("aco: use v_cvt_pk_u8_f32 for f2u8")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391>
(cherry picked from commit d95e90ab5f)
We were allocating these, but never freeing the actual CSOs here.
Let's wire things up so we delete the data when we destroy the
hash-table. Because we don't have access to the context in that
callback, we can't call the pipe-level function to delete a CSO,
but luckily we don't actually need the context for the
driver-logic. So let's add an internal helper for that.
Fixes: ae3fb3089f ("panfrost: Add infrastructure for internal AFBC compute shaders")
Fixes: f39194cdd3 ("panfrost: support MTK 16L32S detiling")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35336>
(cherry picked from commit fb0a422be2)
At draw time, if the number of BOs is bigger than 2048, the current
job submission is forced.
The 2048 limit has been validated to be big enough to not be reached
in most of the scenarios. Only a couple of CTS tests get over this
threshold.
So the new V3D_JOB_MAX_BO_HANDLE_COUNT is defines as 2048 and
V3D_JOB_MAX_BO_REFERENCED_SIZE is defined as 768MB.
This forced submission is useful to handle scenarios where the client
application is not calling glFlush() or where SwapBuffers() is a NOP
because of not having a window surface. In this case, the CLE can
grow indefinitely until the system runs out of memory resources.
This approach is followed by different drivers forcing the flush
of CL when it reaches a defined size because of HW limitations.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227
Cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35042>
(cherry picked from commit ed16884bfa)
Mark when at least one job for the current active FBO has already been
submitted since the last framebuffer state update.
With this we can apply TLB load invalidation only to the first
job that is submitted to the same FBO. Not applying TLB
loads invalidation on follow-up jobs targeting the same framebuffer
state.
With this we avoid doing incorrect invalidations when we force
a job submission for a reason not related with a new framebuffer bind.
Cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35042>
(cherry picked from commit 6ff509593c)
Before we move a UBO load to a previous location in the block we take a
reference to the instruction after it so we can continue the loop from
there, however, if the load we just moved was already the last instruction
in the block we just want to break the loop right there.
Fixes crashes with shaders from http://flightradar24.com
Fixes: 8998666de7 ("broadcom/compiler: sort constant UBO loads by index and offset")
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35333>
(cherry picked from commit c059c721fb)
0e3e5146cf ("intel/brw: Use correct instruction for value change check
when coalescing") enabled some new cases that exposed a pre-existing
bug that would turn something like this :
mul.sat(16) %789:F, %787:F, %788:F
mov.g.f0.0(16) %790:F, %789:F
(+f0.0) sel(16) %800:UD, %790:UD, 0u
into this :
mul.sat(16) %790:F, %787:F, %788:F
mov.g.f0.0(16) null:F, null<8,8,1>:F
(+f0.0) sel(16) %800:UD, %790:UD, 0u
The mov[] array can contain the same instruction because it's repeated
for each REG_SIZE writes and a SIMD16 instruction will write 2
REG_SIZE.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0e3e5146cf ("intel/brw: Use correct instruction for value change check when coalescing")
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35276>
(cherry picked from commit a51d061c00)
in a scenario like:
* begin_rendering(cbuf1:store=DONTCARE, cbuf2)
* draw
* remap(cbuf2, NULL)
* draw
* end_rendering
cbuf1 will be poisoned at the end of the renderpass, but the corresponding
clear call to trigger the poisoning will not be able to detect that this
texture is being written by an async fs, causing a write hazard
unremapping the fb here ensures that all attachments are fb-referenced
as expected in order to guarantee threads sync before memory is poisoned
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35319>
(cherry picked from commit d8a6ec5985)
When binning with a GS, both VS and GS are active. This means that we
could have to use the safe-const variant for the GS. However we only
emitted VPC state for the binning case with the "normal" GS variant.
Emit the VPC state with the safe-const variant too, and select between
the state variants at link time.
This fixes a few tests like
dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.32struct_to_8struct.uniform_uint_geom
with TU_DEBUG=gmem,forcebin.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35294>
(cherry picked from commit 723a1fabac)
If the render area is restricted to a section of the framebuffer, there
is no need to consider all the framebuffer size when configuring the
supertiles, as only the supertiles coordinates of the affected area will
be submitted.
This allow to create supertiles smaller than the ones in case
considering the full screen, reducing the tiles that need to be
processed.
This also fixes https://gitlab.freedesktop.org/mesa/mesa/-/issues/13218.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35257>
(cherry picked from commit d30a6f8102)
So far the driver was configuring the supertiles to be less than 256.
But actually, there can be up to 256, not strictly less than 256.
There is one restriction though: the frame width or height in supertiles
must be less than 256.
It also moves this limit to the limits file, which is shared by v3d and
v3dv.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35257>
(cherry picked from commit 2cac70558d)
This mirrors AMDVLK. 128-byte alignment is possible, but DOOM: The Dark
Ages screws up scratch allocation with alignments <256 bytes.
Fixes hangs in DOOM: The Dark Ages.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35152>
(cherry picked from commit dac6f09451)
This try to mitigate the HiZ GPU hang by increasing a timeout. Loosely
based on PAL but I can confirm it delays the hang when
BOTTOM_OF_PIPE_TS is used as a workaround.
This must be emitted when the GFX queue is idle.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35212>
(cherry picked from commit 47f5d25f93)
Applications using Mesa built with LLVM 20.1.4 fail to start with
strange segmentfaults/bus errors when radeonsi driver is used. The last
piece of stacktrace looks like
- pipe_reference_described
- pipe_reference
- radeon_bo_reference
- radeon_ws_bo_reference
- radeon_lookup_or_add_real_buffer
Coredump shows the pointer dst passed to pipe_reference_described() is
either unaligned or even invalid, which is the reason of crashing. The
crash goes away when Mesa is built without optimization.
Looking through the related functions, it's found that
radeon_ws_bo_reference() contains unsafe type cast from radeon_bo to
pb_buffer_lean: though the former's first field is just the later, this
violates strict aliasing rules as pb_buffer_lean isn't compatible with
radeon_bo. Such violation ultimately results in miscompilation.
Let's take the address of pb_buffer_lean field, avoiding the unsafe
cast. It's still required to cast pb_buffer_lean back to radeon_bo since
radeon_bo_reference may update the pointer, which is safe as radeon_bo
contains a pb_buffer_lean member and C language permits access members
through a pointer in type of the container.
Fixes: 6d913a2bcc ("r300,r600,radeonsi: switch to pb_buffer_lean")
Link: https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Aliasing-Type-Rules.html
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35249>
(cherry picked from commit b1d81a7df1)
In panfrost_clear_depth_stencil and panfrost_clear_render_target, we
start the blit context before binding the clear targets. If we don't
legalize AFBC beforehand, we get a recursive blit crash. panfrost_clear
does not need this because the resource should already be legalized in
panfrost_batch_add_surface.
Fixes the following piglit tests with pan_force_afbc_packing:
- spec@arb_clear_texture@arb_clear_texture-base-formats
- spec@arb_clear_texture@arb_clear_texture-simple
- spec@arb_clear_texture@arb_clear_texture-sized-formats
Fixes: 17a62ff993 ("panfrost: legalize afbc before blitting")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
(cherry picked from commit 104ea2e4cf)
In 59a3e12039, we changed the UBO->push optimization in panfrost to
only push UBOs that are available in a CPU buffer. We require
first_ubo_is_default_ubo, to ensure that UBO0 will be a user buffer. We
weren't setting this flag for the image conversion shaders, so got an
assertion failure compiling them. This can be triggered by the
panvk_force_afbc_packing driconf option.
The conversion shader info UBO isn't exactly a "default" UBO in the
sense of being lowered from uniforms, but it is a user buffer, so
setting the flag should be fine.
Fixes: 59a3e12039 ("panfrost: do not push "true" UBOs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
(cherry picked from commit bed54fa402)
The BSpec page for Structure_RENDER_SURFACE_STATE says:
"For typed buffer and structured buffer surfaces, the number of
entries in the buffer ranges from 1 to 2^27. For raw buffer
surfaces, the number of entries in the buffer is the number of
bytes which can range from 1 to 2^30. After subtracting one from
the number of entries, software must place the fields of the
resulting 27-bit value into the Height, Width, and Depth fields as
indicated, right-justified in each field. Unused upper bits must be
set to zero."
According to the vkd3d-proton developers, this is what is happening
with the applications:
"There's also the problematic case of games using typed descriptors
but passing non-typed buffer descriptors, which is an extremely
common app bug that works on all D3D12 drivers that we need to work
around by creating typed views."
Previously, we had an assert() to check for "num_elements > (1 <<
27)", but that assert was preventing us from running games such as
Marvel's Spider-Man Remastered and Assassin's Creed: Valhalla in Debug
mode. So not only I removed the assert, but I also made the code clamp
num_elements to the maximum of (1 << 27) based on my incorrect
interpretation of the paragraph quoted above from BSpec.
What I did not realize was that num_elements is being used just to
calculate Structure_RENDER_SURFACE_STATE Height, Width and Depth, and
our register bit fields on SKL and newer are big enough to fit any
number of num_elements up to 2^32, not only 2^27. Clamping
num_elements results in an incorrect value for S.Depth, which
generates visual corruption in some games.
On Marvel's Spider-Man Remastered, without this patch the texture of
the asphalt in some streets (like the very first one you jump to when
the game starts) gets rendered incorrectly.
Testcase: vkd3d-proton/d3d12/test_large_texel_buffer_view
Link: https://github.com/HansKristian-Work/vkd3d-proton/issues/2071
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12827
Fixes: f3c7e14f09 ("isl: don't assert(num_elements > (1ull << 27))")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35032>
(cherry picked from commit ecc90e1bb3)
We should use the cl_slice code to get proper validation, which also makes
it simpler to read out data and gets rid of some UB there.
This also fixes CL_KERNEL_EXEC_INFO_SVM_PTRS with param_value being null.
Cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32942>
(cherry picked from commit 35a9829391)
Normal fmin doesn't make any promises about NaN, common additionally
doesn't make any promises about infinities. Would be nice to hook that
up to codegen but lowering them to normal works for now.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34941>
(cherry picked from commit 4b1c824b67)
For partial secondary cmdbufs, we emit FBDs/TDs in the primary cmdbuf
before calling the secondary. In order to set the provoking vertex mode
correctly here, we need to look at the mode set by pipelines bound in
the secondary cmdbuf.
This leaves one edge case: reemitting FBDs/TDs in a secondary cmdbuf
after a flush. If the secondary cmdbuf only contains vk_meta draws,
without ever binding a pipeline, we won't know which provoking vertex
mode to use here. This is actually okay, because in that case the
provoking vertex mode doesn't matter for any of the draws in the
secondary, and the FBDs/TDs will be reemitted on the primary with the
correct mode.
Fixes: 7a9f14d3c2 ("panvk: advertise VK_EXT_provoking_vertex")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 65406cf500)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
Because we advertise provokingVertexModePerPipeline=false, the provoking
vertex mode must be set the same for all pipelines used in a renderpass.
vk_meta doesn't care about the provoking vertex mode, but the vulkan api
doesn't provide a way to express this, so it always sets
PROVOKING_VERTEX_MODE_FIRST (the vulkan default). This causes an
assertion failure when vk_meta is used in a renderpass where the
application sets PROVOKING_VERTEX_MODE_LAST.
There are a few different cases here, that need different handling. The
simplest is when vk_meta is used after the first application draw, in
which case we can just ignore the state passed by vk_meta and use the
existing state.
Fixes: 7a9f14d3c2 ("panvk: advertise VK_EXT_provoking_vertex")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 4d99346477)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
The tiler OOM exception handler allocated a region of memory to dump
save/restored registers. For defining more functions in the future, we
allocate a register dump region for each subqueue, that can hold the
largest number of registers needed by any functions executed on that
subqueue.
This does mean that we cannot have function calls more than one deep. If
we ever need nested function calls, we will have to consider a real
stack.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit d60c688317)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
tc_sync reuses the same batch, which breaks the current disambiguation
methods by returning !busy for work which is currently executing
on the reused batch
by also tracking the completed generation, this scenario is detected
and disambuguated
Fixes: 9cc06f817c ("tc: allow unsynchronized texture_subdata calls where possible")
(cherry picked from commit b89e0fa226)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35204>
During direct dispatch, we calculate the size of the WLS allocation
based on the number of WLS instances which is an unbounded calculation
on number of workgroups.
This leads to extreme allocation sizes and potentially
VK_ERROR_OUT_OF_DEVICE_MEMORY for direct dispatches with a high amount
of workgroups.
This change adds an upper bound to the number of WLS instances, using
the same value we assume for indirect dispatches.
Additionally, this commit fixes the WLS max instance calculation (which
should be per core).
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34979>
(cherry picked from commit 0a47a1cb6d)
This update is aimed at fixing pop-free clipping and follows
the advices by Vitaliy Kuzmin: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12440
This functionality requires calculating the value of the following two
registers: PA_CL_GB_HORZ_DISC_ADJ and PA_CL_GB_VERT_DISC_ADJ. These two
registers are available on all the gpus of the r600 family.
This code is built on the backport of radeonsi updates which are relevant
to this very functionality:
57e658d041 "radeonsi: rework how guardband registers are updated to decrease overhead"
146c2b7c28 "radeonsi: adjust clip discard based on line width / point size"
4d74432dd3 "radeonsi: don't discard points and lines"
63680471f9 "radeonsi: remove si_context::{scissor_enabled,clip_halfz}"
This change was tested on rv770, barts and cayman:
deqp-gles[2-3]/functional/clipping/line/wide_line_clip_viewport_center: fail pass
deqp-gles[2-3]/functional/clipping/line/wide_line_clip_viewport_corner: fail pass
deqp-gles[2-3]/functional/clipping/point/wide_point_clip: fail pass
deqp-gles[2-3]/functional/clipping/point/wide_point_clip_viewport_center: fail pass
deqp-gles[2-3]/functional/clipping/point/wide_point_clip_viewport_corner: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35052>
(cherry picked from commit df2c774a83)
We still allow mesh shader promote constant output to flat, but
mesh shader like geometry shader may store multi vertices'
varying in a single thread. So mesh shader may store different
constant values to different vertices in a single thread, we
should not promote this case to flat.
I'm not using shader_info.mesh.ms_cross_invocation_output_access
because OpenGL does not require IO to have explicit location, so
when nir_shader_gather_info is called in OpenGL GLSL compiler to
compute ms_cross_invocation_output_access, some implicit output
has -1 location which causes ms_cross_invocation_output_access
unset for it.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13134
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35081>
(cherry picked from commit 6f2a1e19da)
llvmpipe driver name needs to be added to the list triggering MESON_GEN_LLVM_STUB := true
due to swrast driver name being an invalid gallium driver
swrast driver name is still used for lavapipe vulkan driver
Fixes: a3909092 ("meson: drop deprecated `swrast` alias for softpipe+llvmpipe")
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35121>
(cherry picked from commit ff6b181c2d)
When an application issues a sparse binding operation, it may be the
case that the state the app is setting is the state that is already
there. In that case, both n_l3l2_binds and n_l1_binds are zero, so the
batch doesn't contain anything and, since 0802bbd486, we just skip
the batch submission and return.
The problem is that skipping the batch submission and returning
ignores the synchronization: there may be syncobjs that we have to
wait and, more importantly, there may be syncobjs that we have to
signal.
This case is exercised by vkd3d-proton's test suite, but I'm not aware
of any other workload that triggers it. This commit only affects
Meteor Lake and older, as TR-TT is only the default behavior for the
platforms running i915.ko.
Testcase: vkd3d-proton/d3d12/test_sparse_buffer_memory_lifetime
Fixes: 0802bbd486 ("anv/trtt: don't submit empty batches when there are no binds to do")
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35078>
(cherry picked from commit d77b49eb0a)
Freedreno limits set_global_binding() to 16 resource entries
(MAX_GLOBAL_BUFFERS), however RustiCL can pass more global resources
(e.g. OpenCL CTS test api / min_max_constant_args requires passing of
17). Follow example of other drivers and use dynamic array for global
bindings.
Backport-to: 25.1
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35122>
(cherry picked from commit 5c43cf823c)
The cubemap-to-array pass was changing variable types from samplerCubeArray
to sampler2DArray but leaving the corresponding deref instruction types
unchanged. This caused NIR validation to fail with "instr->type ==
instr->var->type" assertion.
Fix by updating both the variable type and the deref instruction type
to maintain consistency required by NIR validation.
Cc: mesa-stable
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35117>
(cherry picked from commit 86f7ce06be)
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.
So just handle load_push_constants in the backend and stop changing
things in Hasvk.
Fixes a bunch of tests in
dEQP-VK.descriptor_indexing.*
dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
(cherry picked from commit b036d2ded2)
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.
So just handle load_push_constants in the backend and stop changing
things in Anv.
Fixes a bunch of tests in
dEQP-VK.descriptor_indexing.*
dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
(cherry picked from commit df15968813)
These are benign style warnings. The code is generated by bindgen and it's a bug there that these
names get generated at all.
Silences these warnings since we can't do anything about them:
```
warning: method `use__raw` should have a snake case name
--> src/etnaviv/isa/isa_bindings.rs:358:19
|
358 | pub unsafe fn use__raw(this: *const Self) -> ::std::os::raw::c_uint {
| ^^^^^^^^ help: convert the identifier to snake case: `use_raw`
|
= note: `#[warn(non_snake_case)]` on by default
warning: method `use__raw` should have a snake case name
--> src/etnaviv/isa/isa_bindings.rs:1023:19
|
1023 | pub unsafe fn use__raw(this: *const Self) -> ::std::os::raw::c_uint {
| ^^^^^^^^ help: convert the identifier to snake case: `use_raw`
```
Fixes: 15a784689e ("etnaviv: isa: Generate Rust FFI bindings for asm.h")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34969>
(cherry picked from commit 040ef8f5c9)
As stated in the OpenCL standard, the lowest allowed values
CL_DEVICE_MAX_PARAMETER_SIZE and CL_DEVICE_LOCAL_MEM_SIZE in case of the
embedded profile are 1K. Limit the check to full profile only, in order
to stop forcing OpenCL 1.0 for embedded-profile device like Qualcomm
Adreno A702.
Backport-to: 25.1
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35073>
(cherry picked from commit 275a39b3c6)
On GFX11+, DISABLE_FOR_AUTO_INDEX=1 automatically disables primitive
restart enable for non-indexed draws.
On GFX10-GFX10.3 the hw considers primitive restart enable for
non-indexed draws and the driver must disable it explicitly.
GFX9 and older gens aren't affected but applying the change for them
simplifies the implementation.
To fix that, move emitting primitive restart enable at draw time
because it needs to know if the draw is indexed or not.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13037
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34996>
(cherry picked from commit 4d1fcd75f9)
`lp_build_round` intends to implement round with ties-to-even behavior,
as can be seen by its test's use of `nearbyint` to generate reference
values and by it use in implementing `nir_op_fround_even`.
Fixes: 0d3b285360 ("gallivm: use llvm intrinsics for 16-bit round/trunc/roundeven")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34937>
(cherry picked from commit eea3ed6a37)
The 2 helpers we're using for doing internal operations (copies,
command generation, etc...) can work on command buffers or lower level
batches.
When working with command buffers, the helpers should set the
preemption using genX(cmd_buffer_set_preemption) so that whatever
operation comes after toggles the state back to what it needs and we
minimize the toggles.
When working with batchs, the helpers should disable preemption using
genX(batch_set_preemption) and turn it back on when done.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35030>
(cherry picked from commit c570740272)
Otherwise we end up removing refcount even when we don't have a color
surface already for applications going from SRGB_NONLINEAR to
PASS_THROUGH dring runtime.
To reproduce the bug, start mpv with "--target-colorspace-hint=yes" then
set it to "no" during runtime with a keybind
Fixes: 789507c99c ("vulkan/wsi: implement the Wayland color management protocol")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34708>
(cherry picked from commit 033ce1bae1)
On v11+, because FROUND.v2f16 is gone we end up with precision issues.
We now lower ffract in bifrost_nir_algebraic instead of during common
algebraic to ensure lower_bit_size has been performed.
This fixes
"dEQP-GLES3.functional.shaders.builtin_functions.common.fract.vec2_lowp_vertex"
and
"dEQP-GLES31.functional.shaders.builtin_functions.common.fract.vec2_lowp_compute".
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34970>
(cherry picked from commit 60b131a712)
When the hardware doesn't natively support 32-bit predication, the
driver has a fallback which allocates a 64-bit predicate to the upload
BO in order to copy the original value.
But when conditional rendering is enabled in the stateCommandBuffer
which is used by preprocess() and the execute() is recorded also in the
stateCommandBuffer. If the preprocess() is recorded in a different
cmdbuf which is submitted before the cmdbuf that contains execute(),
the fallback (ie. alloc + COPY_DATA) will be performed after. This would
cause the predicate value to be always 0.
To fix that, keep track of the user predication VA which is the only
VA that needs to be used by DGC because it reads 32-bit from the shader.
This fixes a very weird corner case with vkd3d-proton.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13143
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34953>
(cherry picked from commit 3ca2f71f3d)
'X4.exe' is the executable. But there is also a script 'X4' that is used to
launch the game. This script is what steam uses.
This updates driconf to match that.
This also brings the executable in line with other configs for the game.
Fixes: 5532f13566 ("driconf: override vendor id for X4 Foundations on NVK")
Fixes: 8654a7727f ("driconf: set vk_zero_vram driconf for X4 Foundations")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34168>
(cherry picked from commit a87c9bc49e)
This optimization used to optimize the allocated space for descriptors
when immutable samplers are equal. Though, this was basically broken :
- descriptor copies were broken for combiner image sampler (or sampler)
with equal immutable samplers because 96 bytes were copied instead of
64 bytes (cf. the linked ticket). This could be fixed but it's not
worth it.
- the value returned by vkGetDescriptorLayoutSupport() was broken, it
should have been 96 with no immutable samplers (or when they aren't
equal)
This optimization was also not applied for descriptor buffers which is
the default for vkd3d-proton and Zink. DXVK doesn't use db but it
doesn't use immutable samplers, so basically only native vulkan games
would be concerned.
Note that immutable samplers would still be inlined in shaders if no
indirect access which should be 99.9% of the usecase.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11165
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34928>
(cherry picked from commit 69ff204422)
The hardware requires a power of two bpe. To do that, the driver
needs to adjust the pitch/offset/extent based on a texel scale factor
which only applies to 96-bits formats.
This fixes new VKCTS coverage.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34927>
(cherry picked from commit 4b73d7e817)
Storage access to images using LEA_TEX[_IMM] has limitations on some
fields in the texture descriptors, making them incompatible with the
descriptors required for texture access, specifically in the case
non-zero levels.
This change sets up two sets of texture descriptors for image views of
storage images, then picks the correct one when writing the image view
descriptors.
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34839>
(cherry picked from commit 7451bc3bef)
The width/height fields in the plane descriptors for v13 are missing
their minus(1) modifiers.
This change adds the missing modifiers, which implies also setting
default values to 1 due to how the Two-Plane YUV Overlay interacts with
the plane descriptors.
Fixes: ece01443e1 ("pan/genxml: Add v13 definition")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34839>
(cherry picked from commit 009e4c2eba)
The width/height fields in the plane descriptors for v12 are missing
their minus(1) modifiers.
This change adds the missing modifiers, which implies also setting
default values to 1 due to how the Two-Plane YUV Overlay interacts with
the plane descriptors.
Fixes: b6d5e01120 ("pan/genxml: Add v12 definition")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34839>
(cherry picked from commit e38eb00e4e)
The width/height fields in the plane descriptors for v10 are missing
their minus(1) modifiers.
This change adds the missing modifiers, which implies also setting
default values to 1 due to how the Two-Plane YUV Overlay interacts with
the plane descriptors.
Fixes: 486c341769 ("panfrost: Add architecture description XML for v10")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34839>
(cherry picked from commit 2542857259)
Right now llvmpipe only successfully supports single-plane formats,
limiting the number of supported YCbCr formats to a relatively small
number.
The implicit support for R8G8_R8B8 style subsampled RGB formats
causes the most common ones, YUYV and its variants, to chain up
to to lp_build_fetch_subsampled_rgba_aos() when importing (u)dmabufs
with EXT_image_dma_buf_import.
This code path currently has at least the following issues:
1. It doesn't support the YVYU/PIPE_FORMAT_R8B8_R8G8_UNORM and
VYUY/PIPE_FORMAT_B8R8_G8R8_UNORM, resulting in asserts/crashes.
2. The supported cases, YUYV and UYVY, end up with sub-optimal results
as they always return BT.601/narrow results, ignoring
EGL_YUV_COLOR_SPACE_HINT_EXT and EGL_SAMPLE_RANGE_HINT_EXT.
Stopping advertising support for those formats, as well as native support
for PIPE_FORMAT_YUYV and PIPE_FORMAT_UYVY, results in all four variants
taking fallback paths which happen to be much better supported.
An additional effect is that YUYV and UYVY are correctly advertised as
external only.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34775>
(cherry picked from commit 4051d4ef59)
The vdrm_vpipe_connect() prototype defined in vdrm.c doesn't match
vdrm_vpipe_connect() defined in vdrm_vpipe.c. This leads to a compilation
warning about the wrong proto when Mesa linked with enabled LTO. Fix the
vdrm_vpipe_connect() definition.
Fixes: bf0e3d6274 ("virtio/vdrm: Add vtest backend")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34819>
(cherry picked from commit daad392d5c)
intel_decoder_init() initializes intel_batch_decode_ctx so later
we can call decode functions but it depends on data stored in
brw/elk_isa_info but that was being allocated in stack
of intel_decoder_init() then when the decode functions were executed
it was accessing garbage at the brw/elk_isa_info memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ec2d20a70d ("intel/tools: Add helpers for decoder_init/disasm")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34776>
(cherry picked from commit 3e5a735d01)
Or current render target cache setting is to key on the binding table
index, meaning the HW associates a number in the range [0, 7] to a
RENDER_SURFACE_STATE description. If you want change the render target
0 between 2 draw calls, you need to insert a PIPE_CONTROL in between
the 2 draw calls with pb-stall + rt-flush in order to flush an writes
to a previous RENDER_SURFACE_STATE that has now becomed disassociated
with the [0, 7] number.
This PIPE_CONTROL taking care of the flush is dealt with in
cmd_buffer_maybe_flush_rt_writes(). This function diffs the current
BTI setup for render targets (first 0 to 7 BTIs) with what the next
fragment shader wants.
The issue here is we might have a render pass with 0 color attachments
and yet in 98cdb9349a we added one pointing to the render target 0,
but in the emit_binding_table() when we finally program the BTI, we
check the render pass color count and program a null surface state
instead of an actual surface state. And this leads to hangs because
the render target cache will end up with inconsistent state data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 98cdb9349a ("anv: ensure null-rt bit in compiler isn't used when there is ds attachment")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12955
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34603>
(cherry picked from commit 63f633557f)
On a7xx, the max constlen for compute is increased to 512 vec4s or 8KB,
however the size of the LB was not increased beyond 40KB. A quick
calculation shows that 8KB of consts multiplied by 2 banks plus the
API maximum of 32KB shared memory would exceed 40KB. This means that
we can't always use a constlen of 512, and sometimes have to fall back
to 256 when a lot of shared memory is in use.
In the future, we can use similar calculations to figure out how much
"extra" shared memory is available for the backend to spill to, but we
currently don't support spilling to shared memory.
Fixes: 5879eaac18 ("ir3: Increase compute const size on a7xx")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34746>
(cherry picked from commit ea9d694a7b)
This fixes a compilation issue in Marvel Rivals where the legalization
logic and the encoding logic don't line up, which results in an
assertion failure on this instruction:
r17 = hfma2 r17.xx -r18.xx 0x3c003c00
The fix here is a little overly restrictive because it turns out we
actually do have modifiers for all 3 sources. Those modifiers will
be added in later commits.
Fixes: 567cae69c3 ("nak: Add 16-bits float operations")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34750>
(cherry picked from commit 1ff7135691)
Most of the churn in this commit is changing unit tests that were
testing things that are now invalid.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122204 -> 17122669 (<.01%)
instructions in affected programs: 120669 -> 121134 (0.39%)
helped: 0 / HURT: 124
total cycles in shared programs: 895602370 -> 895613210 (<.01%)
cycles in affected programs: 17868974 -> 17879814 (0.06%)
helped: 35 / HURT: 85
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736518 -> 210743769 (+0.00%)
Cycle count: 30377733040 -> 30377699060 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66056852 -> 66056966 (+0.00%)
Totals from 1505 (0.21% of 706776) affected shaders:
Instrs: 1890151 -> 1897402 (+0.38%)
Cycle count: 48397408 -> 48363428 (-0.07%); split: -0.11%, +0.04%
Max live registers: 256821 -> 256935 (+0.04%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit e26270249b)
When I originally wrote that code, I didn't understand what a jerk NaN
can be.
v2: Remove the brw_type_is_uint stuff. This function is currently only
called for float types. In a later commit, integer types will be
supported but only for NZ and Z conditions. Noticed by Matt.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122197 -> 17122204 (<.01%)
instructions in affected programs: 1691 -> 1698 (0.41%)
helped: 0 / HURT: 4
total cycles in shared programs: 895602484 -> 895602370 (<.01%)
cycles in affected programs: 912964 -> 912850 (-0.01%)
helped: 2 / HURT: 2
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736388 -> 210736518 (+0.00%)
Cycle count: 30377728900 -> 30377733040 (+0.00%); split: -0.00%, +0.00%
Totals from 130 (0.02% of 706776) affected shaders:
Instrs: 169911 -> 170041 (+0.08%)
Cycle count: 18021210 -> 18025350 (+0.02%); split: -0.00%, +0.02%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit 0dab520a19)
A8_UNORM on v9+ is using RGBA8_UNORM as a pixel format with the
A8_UNORM clump format to dealing with the diffences between
RGBA8 and the actual A8 in-memory layout.
The problem is, LEA_TEX only loads the InternalConversionDescriptor
which contains only the pixel format, and that's what ST_CVT uses
to do the conversion, so we'll actually store 4 components instead
of one.
This shows up with
dEQP-VK.image.load_store.without_any_format.buffer.a8_unorm* after
enabling maintenance5.
For now I've turned off the image storage capability for A8_UNORM
on all gens, but I'd be fine disabling it only on v9+ if you think
that's preferable.
Fixes: d95423686f ("pan/format: Add PAN_BIND_STORAGE_IMAGE flag")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34648>
(cherry picked from commit 9d1262e108)
This is the most serious bug we've had in a long time due to a fundamental
misunderstanding of the hardware (due to incomplete reverse-engineering). It
caught me off guard.
The texture descriptor has "mode" bits which configure two aspects of how the
address pointer is interpreted:
* whether it is indirected, pointing to a secondary page table for sparse
* whether it writes texture access counters (for Metal's idea of sparse).
...Neither of these is a "null texture" mode.
So why did I see Apple's blob using a non-normal mode for null textures, and why
did I copy those settings?
1. Because the hardware texture access counters provide a cheap way to detect
null texture accesses after the fact, which I think their GPU debug tools
use. I'm not sure why release builds of the driver do/did that, but whatever.
2. Because I assumed Cupertino knew best and I didn't bother looking too close.
We can't use them here (without doing extra memory allocations), since then
the GPU will increment access counters. And since our null texture address used
to just be a pointer in the command buffer, that mean the GPU will trash
whatever memory happened to be 0x400 bytes after the start of the null texture
descriptor. The symptom being random faults.
This bug was caught when trying to use the zero-page instead, which raised a
permission fault when the GPU tried to write counts. Then I remembered the
sparse mechanism and had a bit of a eureka moment. Immediately followed by an
"oh, f#$&" moment as I realized how many random bugs could potentially be root
caused to this.
The fix is two-fold:
1. Use normal layout instead.
2. Set the address to the zero-page (which is a fixed VA) and detect null
textures by checking the address, instead of the mode.
The latter is a good idea anyway, but both parts needs to be done atomically to
maintain bisectability.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34703>
(cherry picked from commit 3eb7575679)
Since headless overrides create_mem, it needs to override finish_create
too. Fixes a segfault in nvk that was caused by us mixing
wsi_create_null_image_mem with wsi_finish_create_blit_context, which
would then call CmdCopyImageToBuffer with image->blit.buffer == NULL
Fixes a cts failure on nvk in:
dEQP-VK.image.swapchain_mutable.headless.2d.r8g8b8a8_unorm_b8g8r8a8_unorm_clear_copy_format_list
and several others
Fixes: 579578f10a ("vulkan/wsi/drm: Break create_prime_image in pieces")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34646>
(cherry picked from commit 60452e016e)
Currently mesa-clc bundles OpenCL headers from Clang only if the static
LLVM is used (which means Clang / LLVM are not present on the target
system). In some cases (e.g. when building in OpenEmbedded environemnt)
it is desirable to have shared LLVM library, but skip installing the
whole Clang runtime just to compile shaders. Add an option that forces
OpenCL headers to be bundled with the mesa-clc binary.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34551>
(cherry picked from commit 419a9e9d42)
I don't understand why, but somehow the changes in e38631ad...3d9ac270
is causing this 1) in a file that has not changed, and 2) on lines that
are indented identically, with the `for` in the macro being terminated
with a `;` semicolon by the caller, so it looks all good to me.
Silencing this allows me to get the release through, but I probably
won't look back either, so hopefully there won't be a legitimate
instance of that warning in the future 😇
Xe2 changed the MOCS field in few instructions, those now have a field
for the MOCS index and other the encryption enable bit but ISL returns
the combination of both aka MEMORY_OBJECT_CONTROL_STATE.
To minimize changes I have added 2 macros to extract the values
from the value returned by isl.
From all the instructions changed Mesa only make use of two, so the
other instruction will be handled in the next patch.
Cc: mesa-stable
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34592>
(cherry picked from commit 161c412a82)
si_destroy_context needs to call context->set_debug_callback(...) to
avoid the debug logs to access the destroyed context.
Adding this change introduced a different problem: when an aux context
is destroyed from si_destroy_screen, parts of the screen have been
freed already: the shader_compiler_queue_*.
c467a87e06 ("radeonsi: Destroy queues before the aux contexts") moved
the util_queue_destroy calls above the context destruction, but with
the 59a3f38ff6 change, it's not needed anymore: si_destroy_context
will finish the screen shader queues before proceeding with releasing,
so use-after-free isn't possible.
Fixes: 59a3f38ff6 ("radeonsi: clear the debug callback on ctx destroy")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12035
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34574>
(cherry picked from commit 2a381bbc3c)
Letting the shader offset instanceID by baseInstance works only if
the divisor is one. If the divisor is greater than one, the firstInstance
parameter shouldn't be applied this divisor, but it currently is. Zero
divisors are also problematic, in that they will force use of the
instance zero attribute all the time.
The only way to fix that is to tweak the offsets of the per-instance
attributes instead, like is done in the JM backend.
Fixes: 1570f0172e ("panvk: Fix base_{instance,vertex} handling")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34642>
(cherry picked from commit b2a8e3838d)
DRM_FORMAT_MOD_QCOM_COMPRESSED forces the image to be UBWC regardless
of what's better for perf, we should respect that.
The regression is seen in GTK4 when it tries to create tiny swapchain
images.
Fixes: fc50fb35b0
("tu,freedreno: Enable linear mipmap tail for UBWC images")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34628>
(cherry picked from commit 36f22cc951)
When doing the flushing, I forgot that because the staging buffer can be
used with different formats with different cpp, we need to make sure
that CCU is properly flushed and invalidated between each copy to the
staging buffer to prevent stale cache entries from creeping in, as the
CCU seems to rely on the cpp staying the same, even on a7xx which
dropped some of the other restrictions like using the same RT
index/layer. For "normal" user-visible copies this is done via
transitioning from UNDEFINED.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34611>
(cherry picked from commit ee10938bee)
We were forgetting to reset the map count to 0 in case of dyn_bufs in
create_copy_table.
This was causing invalid copy entries to be added to the table causing
invalid copies in most situation with holes in the set definition while
still binding set 0 or at worst an assert to be triggered in
cmd_fill_dyn_bufs.
This fixes "dEQP-GLES3.functional.ubo.*" and
dEQP-GLES31.functional.ubo.*" on PanVK+ANGLE.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: e350c334b6 ("panvk: Extend the descriptor lowering pass to support Valhall")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34652>
(cherry picked from commit 8d2e16cc11)
Fixes cts tests:
dEQP-VK.conditional_rendering.conditional_ignore.blit_image
dEQP-VK.conditional_rendering.conditional_ignore.blit_image_inverted
dEQP-VK.conditional_rendering.conditional_ignore.resolve_image
dEQP-VK.conditional_rendering.conditional_ignore.resolve_image_inverted
which were introduced in vk-gl-cts commit 4aa277c300
Fixes: 32f2317223 ("nvk: Use meta for doing blits with the 3D hardware")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34644>
(cherry picked from commit 2fc4c98aaf)
On v12+, the hardware report support for 8 levels but
effectively only support up to 4 levels.
In case more than 4 levels are used, it will default to 0xAA when
tile_size is 32x32 or lower, otherwise 0xAC when the tile_size is greater than 32x32.
This patch makes it that we now ensure that the bins can fit inside out
tiler budget and otherwise drop levels until it fit.
This also allows the hardware to decide the hierarchy on v12+
if we know it will fit.
This fixes "dEQP-GLES31.functional.fbo.no_attachments.maximums.all" and
dEQP-GLES31.functional.fbo.no_attachments.maximums.size" on v12+ but
also likely more if we were exhausting the memory budget.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34559>
(cherry picked from commit 92afeb37bf)
When VMMs do not support VIRTGPU_DRM_CAPSET_VENUS the capset data
remains zeroed. By requiring the stable wire_format_version 1 this can
be detected early without initialising the renderer.
Avoids triggering `assert(capset->supports_blob_id_0);` in debug builds
under such circumstances.
Cc: mesa-stable
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34613>
(cherry picked from commit 3d3ca9b65e)
rather than always early killing and then hitting pathological shuffle
situations, only early-kill when we can prove that we won't need to shuffle. it
turns out that's most of the time.
even with this heuristic, we still get hurt bad in shader-db due to extra moves.
but hopefully, the #s here are small enough that we can move on with our lives
and fix this source of known unsoundness.
this is tagged for backport as it's needed to avoid a perf regression with the
previous patch.
combined stats from this commit and the previous commit:
total instrs in shared programs: 2846065 -> 2852257 (0.22%)
instrs in affected programs: 618734 -> 624926 (1.00%)
total alu in shared programs: 2329477 -> 2335534 (0.26%)
alu in affected programs: 508119 -> 514176 (1.19%)
total gprs in shared programs: 894762 -> 901327 (0.73%)
gprs in affected programs: 36946 -> 43511 (17.77%)
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
(cherry picked from commit b1e86b3eae)
shader-db stats combined with next commit. this is the rip off the bandaid, next
is the optimize. split to enable bisecting.
the code we have to shuffle clobbered killed sources is broken and, after
thinking about that for a Long time, I don't see a reasonable way to fix it. But
if we late-kill sources - or model our calculations as-if we were late-killing
souces - we never have to shuffle onto a killed source and the problem goes away
entirely.
this is similar in spirit to what NAK does. it's not "optimal", but it's sane.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
(cherry picked from commit b88fe9b0c5)
This hurts us in two ways:
* slightly more spilling (not actually a big problem)
* slightly worse occupancy (the shaders that are "helped" here are from trying
less hard to fit at higher occupancy levels)
However, in exchange we get a LOT more flexibility in the RA.
total instrs in shared programs: 2847015 -> 2846065 (-0.03%)
instrs in affected programs: 84134 -> 83184 (-1.13%)
total alu in shared programs: 2330406 -> 2329477 (-0.04%)
alu in affected programs: 62305 -> 61376 (-1.49%)
total code size in shared programs: 20497326 -> 20491690 (-0.03%)
code size in affected programs: 586664 -> 581028 (-0.96%)
total gprs in shared programs: 894202 -> 894762 (0.06%)
gprs in affected programs: 8900 -> 9460 (6.29%)
total scratch in shared programs: 13292 -> 13304 (0.09%)
scratch in affected programs: 2924 -> 2936 (0.41%)
total threads in shared programs: 27819712 -> 27814272 (-0.02%)
threads in affected programs: 55296 -> 49856 (-9.84%)
total spills in shared programs: 907 -> 914 (0.77%)
spills in affected programs: 419 -> 426 (1.67%)
total fills in shared programs: 857 -> 862 (0.58%)
fills in affected programs: 389 -> 394 (1.29%)
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
(cherry picked from commit 7fad96d194)
Since MR !33891 EGL only supports a software driver (LLVM).
Routine dri3_x11_connect at
src/egl/drivers/dri2/platform_x11.c fails if DRI3 is not
available. So at that location variable *allow_dri2 should be set.
Looking at the major codition, we see it is not executed
if LIBGL_DRI3_DISABLE is set. In that case the hardware driver
is activated as desired. Previously this was not needed.
Also it is not practical, and not necessary.
I do not understand the major condition, so I did not change it.
This causes some duplicate coding.
Fixes: 323bad6b18 ("egl/x11: split out dri2 init entirely")
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34530>
(cherry picked from commit 995dc61bf5)
This is now basically the same as the original VALUMaskWriteHazard, except
it now considers both VALU and SALU writes.
Now that it's a part of VALUMaskWriteHazard, differences from the original
VALU lanemask workaround are:
- it includes SALU reads after the write
- it includes VALU writes and SALU/VALU reads after the write which are
not lanemasks
- it combines s_waitcnt_depctr instructions when it's a read after both a
SALU write and a VALU write
- non-exec VALU SGPR reads reset the SGPRs read by VALU as a lanemask
- exec SGPRs are ignored
resolve_all_gfx11() is also finished.
fossil-db (navi31):
Totals from 21538 (27.13% of 79377) affected shaders:
Instrs: 27628855 -> 27552972 (-0.27%); split: -0.30%, +0.03%
CodeSize: 145968448 -> 145667616 (-0.21%); split: -0.23%, +0.02%
Latency: 209537805 -> 209509519 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 36304270 -> 36301624 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12623
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11480
Backport-to: 25.0
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529>
(cherry picked from commit ce2be5ab8e)
In a previous iteration of the spilling code, we added an extra check to
only spill across edges if the value being spilled is in the W set.
This was due to a misunderstanding of the modeling of S and W in Braun
and Hack. In the current implementation, we maintain the invariant that
every live value is in at least one of S or W so we don't need that
check but it was left in by mistake.
One exception to this rule was added when we special-cased constant
values. Now the invariant is that every live value is in S, in W, or is
a constant. When we made this change, the check we accidentally left in
bit us because now if a value is constant but not in W, it wasn't
getting spilled across the edge. This can result in a value getting
filled later which was never spilled, leading to undefined values.
Fixes: 7b82e26e3c ("nak: Don't spill/fill const values")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12993
Co-authored-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34563>
(cherry picked from commit de1ed48325)
I accidentally disabled compression on CPS surfaces marked as storage or
color attachment for all platforms, when this should only be limited to
Xe.
Fixes: 80f9b6 ('anv: CPB surfaces that are used as color attachments or for stores cannot be compressed')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34297>
(cherry picked from commit cbc1ec4f73)
These values are shared with xcb/dri2.h, and can't be changed
without breaking the legacy dri2 compatibility. This change
reverses partially the update done by 3b603d1646.
For instance this issue is triggered on dri2 i915 with
"piglit/bin/glx-copy-sub-buffer -auto" or
"piglit/bin/hiz-depth-read-window-stencil0 -auto".
Fixes: 3b603d1646 ("mesa_interface: remove unused stuff")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34561>
(cherry picked from commit 60a31156b0)
Introduces a backwards dataflow analysis pass to determine when a
certain register is always written to prior to being read in a
similar manner to SSA liveness but performed after RA which we can
use to determine when we can insert (last) on src regs on A7XX.
Passing VK-CTS: dEQP-VK.pipeline.*
Signed-off-by: Mark Collins <mark@igalia.com>
Co-Authored-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25077>
Floating point SEL.CMOD may flush denorms to zero. We don't have enough
information at this point in compilation to know whether or not it is
safe to remove that.
Integer SEL or SEL without a conditional modifier is just a fancy
MOV. Those are always safe to eliminate.
See also 3f782cdd25.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
The condition modifier on SEL means something completely different than
it means on MOV. On MOV it means to modify the flags based on the value
written to the destination. On SEL it means to compare the sources using
that mode and pick the result (i.e., as min() or max()) without
modifying the flags.
The resulting MOV should not have a condition modifier for the same
reason it (already) doesn't have a predicate. This bug was found by
inspection, so I added a unit test.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
Floating point SEL.CMOD may flush denorms to zero. We don't have enough
information at this point in compilation to know whether or not it is
safe to remove that.
Integer SEL or SEL without a conditional modifier is just a fancy
MOV. Those are always safe to eliminate.
See also 3f782cdd25.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209903490 -> 209903492 (+0.00%)
Cycle count: 30546025224 -> 30546021980 (-0.00%); split: -0.00%, +0.00%
Max live registers: 65516231 -> 65516235 (+0.00%)
Totals from 2 (0.00% of 706657) affected shaders:
Instrs: 3197 -> 3199 (+0.06%)
Cycle count: 361650 -> 358406 (-0.90%); split: -10.05%, +9.15%
Max live registers: 300 -> 304 (+1.33%)
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
The condition modifier on SEL means something completely different than
it means on MOV. On MOV it means to modify the flags based on the value
written to the destination. On SEL it means to compare the sources using
that mode and pick the result (i.e., as min() or max()) without
modifying the flags.
The resulting MOV should not have a condition modifier for the same
reason it (already) doesn't have a predicate. This bug was found by
inspection, so I added a unit test.
No shader-db or shader-db changes on any Intel platform.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
I was surprised this had any affect on Intel GPUs because we have been
unconditionally performing this optimization in the backend since June
2014.
Once that error is fixed (later in this MR), this change prevents a
couple dozen regressions in shader-db and around 90 regressions in
fossil-db. Many of the regressions in fossil-db were loss of SIMD32, and
that can be a big deal.
v2: Add 64-bit too. Suggested by Alyssa.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16970141 -> 16970139 (<.01%)
instructions in affected programs: 40 -> 38 (-5.00%)
helped: 2 / HURT: 0
total cycles in shared programs: 914617580 -> 914617548 (<.01%)
cycles in affected programs: 3428 -> 3396 (-0.93%)
helped: 2 / HURT: 0
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 30546028462 -> 30546025224 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237017827 -> 237017731 (-0.00%)
Totals from 83 (0.01% of 706657) affected shaders:
Cycle count: 3042978 -> 3039740 (-0.11%); split: -0.13%, +0.02%
Non SSA regs after NIR: 78997 -> 78901 (-0.12%)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
Our strategy of covering the entire address space in buffer views
requires that we be able to create very large buffer views. The
pre-Maxwell texture unit doesn't allow for this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34542>
If shared RA is used, it may have handled some phis. These are already
ignored by regular RA in handle_phi but were used before that in
potentially dangerous ways. More specifically, the interval of such phis
was accessed which may cause an out-of-bounds read since it was never
created. Fix this by skipping such phis earlier.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: c6a932d4b3 ("ir3/ra: handle phis with preferred regs first")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34503>
There have been multiple issues related to accessing intervals through
invalid register names. This usually results in a (difficult to
diagnose) out-of-bounds access. Wrap all the interval accesses in a
helper where we can assert that the name is in-bounds.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34503>
We can't have merged shaders where the first part is compiled using ACO
and the second part is compiled using LLVM.
Add ACO-specific main shader parts to fix that.
This happens when ACO is enabled for gfx12 streamout where GS can be paired
with a previous shader compiled by LLVM.
Fixes: 8ba718fb7d - radeonsi/gfx12: use ACO for streamout because it's faster
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34491>
TFU units is doing a readahead of 64 bytes. This is causing invalid read
MMU errors that can be observed at the nightly full Vulkan runs on
Broadcom devices.
04:13:59.969: [ 85.623205] v3d 1002000000.v3d: MMU error from client TLB (3) at 0x4869000, pte invalid
04:14:05.408: [ 91.019321] v3d 1002000000.v3d: MMU error from client TLB (3) at 0x5209000, pte invalid
04:14:05.413: [ 91.031662] v3d 1002000000.v3d: MMU error from client TLB (3) at 0x7521000, pte invalid
Although the log reports the TLB the real culprit is the TFU. A fix
to the kernel was submitted to fix AXI ID on V3D 4.2 and 7.1
So doing an over-allocation of 64-bytes at v3dv_AllocateMemory is
the simplest method to make these MMU errors itp disapear.
Running ./deqp-vk for an hour, we can see that ~%40 of allocations
would need an extra page (4096 bytes) to accomodate this 64 bytes
padding.
Fixes: ca330f7f04 ("v3dv: implement VK_EXT_memory_budget")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34475>
Read-only ZS optimization can only happen if the ZS tile buffer is not
written, which can only be known when the fixed-function settings is
set.
Change pan_earlyzs_get() to take an enum instead of a boolean and
differentiate ZS-read and ZS-read-with-readonly-optimization-allowed.
Fixes: 25a993731087 ("pan/earlyzs: Support the shader ZS read-only case and its optimization on v10+")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34480>
The setting of the clean_pixel_write_enable flag in pan_prepare_rt
was not consistent with the crc valid calculations in pan_emit_fbd.
This caused the crc_valid flag to not be accurate, causing transaction
elimination to fail.
Fixes: eac8f1d460 ("Revert "panfrost: Disable CRC by default"")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34408>
When texture_index and sampler_index are over 32, we can't really check
for them in a single 32-bit word. This happens among other things when
Panfrost uses preload shaders on v9 and later. Otherwise, we trigger
undefined behavior.
We're already doing this for textures in one case, let's be consistent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>
In commit 292ac71a4a ("nir/lower_tex: handle deref casts"), we avoided
using texture_index when a texture instruction contained a variable
deref. There's no good reason why this should be done to some of the
lowering, but not all.
So let's fix up code-paths that were added after this change to do the
same.
The first two patches here crossed paths with the commit that introduced
texture_mask, so it's not strange that the change was missed. The last
one seems to have just copied what was done around it, propagating the
issue.
Fixes: 880b00dc59 ("nir/lower_tex: Add support for lowering YUYV formats")
Fixes: 1358d93650 ("nir/lower_tex: Add support for lowering Y41x formats")
Fixes: 65d6f5aed2 ("nir: add options to lower y_vu, yv_yu, yx_xvxu and xy_vxux")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>
Log all operations with the information used to decide whether they
are supported or unsupported. Include tensor data types, conv2d fused
activation and dilation parameters to debug output.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34472>
CP is very slow on GFX12 and parsing the packet header is the main
bottleneck. Using paired context regs reduce the number of packet
headers and it should be more optimal.
It doesn't seem worth when only one context reg is emitted (one packet
header and same number of DWORDS) or when consecutive context regs are
emitted (would increase the number of DWORDS).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34421>
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
__OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
macros.h,u_math.h instead of libcl.h.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>
This allows us to create temporary VGRFs that are larger than
MAX_VGRF_SIZE(devinfo), which will be split eventually. They may not
be split on the initial pass, because we may need LOAD_PAYLOAD lowering,
copy propagation, and so on to occur first. So we allow registers to
exceed that size initially.
The "Register allocation relies on split_virtual_grfs()" assertion in
brw_reg_allocate.cpp still asserts that all VGRFs which reach the
register allocator have been properly split.
One case where this is useful is for vectorizing convergent block loads.
We create temporaries to splat the SIMD1 values out to SIMD(N), which
can lead to some very large temporaries. However, copy propagation and
so on ultimately eliminate these and they'll get split down to proper
sizes or elided entirely in the end.
(Note: both this and the prior commits from this merge request are
needed to close the linked issue.)
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
Post-RA scheduling doesn't use liveness analysis, so we continue using
MAX_VGRF_SIZE(devinfo). But for pre-RA scheduling, we now use
live->max_vgrf_size.
This helps get us to a place where we can emit arbitrarily large VGRFs
early on in compilation, but which will be split and cleaned up prior to
register allocation. It may also allocate smaller arrays in practice
since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things
we actually could need to allocate.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
We're already looking at this data to calculate the per-component
vars_from_vgrf[] and vgrf_from_vars[] mappings, so just record the
largest VGRF size while we're here. This will allow passes to size
arrays based on the actual size needed, rather than hardcoding some
fixed size. In many cases, MAX_VGRF_SIZE(devinfo) is larger than
necessary, because e.g. vec5 sparse sampling results aren't used.
Not hardcoding this means we can also temporarily handle very large
VGRFs which we know will be split eventually, without having to
increase the maximum which is ultimately used for RA classes.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
VK_KHR_sampler_ycbcr_conversion is a dependency from the
VK_EXT_image_drm_format_modifier spec. panvk arch < 10 still
doesn't support it, so VK_EXT_image_drm_format_modifier should
not be exposed.
Otherwise, a Vulkan validation error is triggered for users of
VK_EXT_image_drm_format_modifier and it may cause applications
to fail to create a device.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34458>
This verifies that the requested allocation doesn't exceed the maximum
in cases where the size passed to `clSVMAlloc()` isn't a multiple of the
provided alignment. It also clamps the maximum allocation to `i32::MAX`,
which prevents overflowing `pipe_box`'s `width` field.
Both of these changes prevent possible undefined behavior on 32-bit
systems due to violation of `Layout` prerequisites.
v2: use safe layout creation for maintainability, add a few comments
v3: use Layout utils for aligned size calc, split out max alloc changes
v4: use `checked_compare()` for alloc/size comparison
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34166>
In order to enable higher MSAA modes, we're going to have to perform
some calculations on how to budget the (sometimes) limited tile-buffer
space.
Due to limited tilebuffer space, we need to prioritize a bit here.
First, we reserve space for 4x MSAA for all formats. Then we try to fit
8 color attachments into the tile-buffer. And then finally, we calculate
how many extra multi-sample buffers we can fit into the rest.
The reason we reserve 4x MSAA first, is that this is required by all
Vulkan versions. It also prevents us from regressing existing features.
Then we try to pick 8 color attachments next, because that's required by
Vulkan 1.4 as well as Vulkan Roadmap 2024 and D3D12. Vulkan Roadmap 2022
requires 7 as well.
This adds helpers that implements this, which can be used by both the
Gallium and the Vulkan driver. It's really benefitial if both of these
drivers prioritize the same way here.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33925>
We also have a budget for the tile size for depth-buffers. It's
currently hard to trigger issues with this than for color-buffers,
but this becomes important when we support larger MSAA counts.
We also need to take a bit of care for stencil-only attachments, because
they also count against a limit here. We really only care about the
sample counts here, because the stencil buffer budget is always a
quarter of the depth-buffer budget, and always uses a single byte per
sample.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33925>
There's two limitations we have to cater to:
1. The HW needs at least one render-target. We can disable write-back for
it, but it needs to allocate tile-buffer space for it.
2. The HW can't have "holes" in the render-targets.
In both of those cases, we already set up dummy RGBA8 UNORM as the format,
and disable write-back. But we forgot to take this into account when
calculating the tile buffer allocation.
This makes what we program the HW to do consistent, meaning we don't end
up smashing the tile-buffer space. We might be able to do something
better by adjusting how we program these buffers, but let's leave that
for later.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33925>
These GPUs had their tilebuffer sizes listed at twice their actual
values. While that still works, it ends up disabling pipelining in some
cases. This gives a significant performance hit, compared to using the
correct values.
But, it turns out to be hard or impossible to trigger at the moment, due
to the limited number of MSAA samples we support. Once that changes,
this is a lot easier to trigger, so let's fix it up.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33925>
This is an n-queen pattern, where no two values should be on the same
row or column. But this and the second to last element has the same y
component, and neither has the negative one.
Let's fix this up by setting the first value to the negative value. This
matches the D3D 16x sample pattern.
Fixes: a61fb62966 ("panfrost: Upload sample positions on device init")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33925>
Fixes two known issues:
- We did not lower invalid swizzles for IADD.v4s8, triggered in the CTS by
enabling uniformAndStorageBuffer8BitAccess and storageBuffer8BitAccess in
panvk.
- We did not lower invalid swizzles for IMUL.v4i8, triggered by
dEQP-VK.spirv_assembly.instruction.compute.mul_extended.(un)signed_8bit
on bifrost.
The old logic was missing several other instructions, so there may be
additional bugs that we don't know about.
There are no cases where the new behavior will keep swizzles that would
have been lowered previously, so this change should not introduce any
new bugs with valhall.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33416>
Primary reason to do this is to make codegen using the swizzle names in
bifrost/ISA.xml simpler. A secondary benefit is that dependent code can
now use the swizzle name that matches the context, making things a
little more readable.
We may want to consider giving widens separate values later, so that
va_lower_constants and bi_opt_constant_fold can fold them correctly, but
I don't know of current bugs caused by this.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33416>
Valhall supports all combinations of ftz/preserve denorm behavior
between FP16 and FP32 except FP16=ftz, FP32=preserve. Because of this,
we can't advertise independent denorm behavior.
Even with INDEPENDENCE_NONE, it is still possible for shaders to set
denorm behavior for one size and leave the other size unspecified.
Previously we were defaulting to preserve for any unspecified size, but
with FP16=ftz, we need to default unspecified FP32 to preserve.
When advertising INDEPENDENCE_NONE, the CTS checks that the
shaderDenormFlushToZeroFloat* and shaderDenormPreserveFloat* features
are equal for all sizes, so we need to advertise the same supported
denorm behavior for FP64 even though we don't support FP64 at all.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33660>
With vkCmdPipelineBarrier, it's possible to specify a barrier with
pipeline stages but without any memory barriers. These might not be
practical, but are legal Vulkan code.
Barriers like this are currently ignored in mesa, as we only convert
barriers with passed memory barriers into vkCmdPipelineBarrier2.
This commit adds handling of execution only barriers by converting them
into a memory barrier without access masks.
Fixes: 97f0a4494b ("vulkan: implement legacy entrypoints on top of VK_KHR_synchronization2")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34187>
Fix the nginx cache snippets - I'd missed the file nesting somehow.
Tested on a debian:bookworm image with nginx-full installed, checked
that we could pull an arbitrary external site, as well as S3, as well
as GitLab artifacts.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34341>
Fixes some upcoming CTS tests for texture clears.
* some drivers will attempt to issue clears with zero range
and hit asserts/crashes (spec clarification for negative
values)
* fix error thrown with negative values to match spec
* fix cases for clearing generic compressed formats
* fix negative case of using color format while having
depth/stencil internalformat and vice versa
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34428>
On Valhall we can optimize lower waits, which waits for both readers and
writers, into resource_waits which only wait for writers, allowing
threads accessing read-only resources to execute concurrently.
Let's use that on LD_TILE instructions so we can optmize the read-only
case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
In order to dynamically load the content of the tile buffer, we need
to know the target (color, depth or stencil) and the conversion to
apply. Let's define the load_input_attachment_{target,conv}_pan
intrinsics so we can dissociate the logic lowering input attachment
loads into load_converted_output_pan, and the part optimizing the shader
when input attachment map is passed at compile time.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
We take the color attachment remapping into account when emitting
blend descriptors, and we make sure we re-emit those when this color
attachment map is dirty.
We also need to take the remapping into account when checking the
render targets written by the fragment shader, hence the addition of
a color_attachment_written_mask() helper.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Re-order things in panvk_deserialize_shader() to avoid declaring local
variables for stuff we feed the panvk_shader with. The only exception
is pan_shader_info, because we need to know the shader stage to call
vk_shader_zalloc(), which if part of pan_shader_info.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Some upcoming changes in the runtime will make it impossible to rely
on the pipeline or runtime information to know whether a fragment
shader has input attachments.
Instead we gather that information at compile time and store it in our
shader bind_map.
At runtime we check whether the fragment shader has input attachments
and whether those map to the runtime depth/stencil input attachments
to set the 3DSTATE_PS_EXTRA::PixelShaderKillsPixel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d2f7b6d5a7 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
For drivers using the render pass emulation provided by the
runtime, it's important to express the mapping between
depth/stencil/color attachments and input attachments using
VkRenderingInputAttachmentIndexInfoKHR, otherwise those drivers
have to special-case emulated render passes in their
CmdBeginRendering() implementation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Panfrost supports pushing uniforms to hardware uniform registers (RMU/FAU for
Midgard/Bifrost respectively). Since OpenGL uniforms are lowered to UBO #0, it
does this with a pass that pushes UBOs. That's good!
The pass also pushes 'true' OpenGL UBOs, since they look the same in the backend
at this point. This is where the trouble comes in:
- True UBOs are allocated in GPU BOs, not CPU allocated buffers. That means it's
write-combine memory, which we cannot read from efficiently (at least
depending on coherency details that were never plumbed through panfrost.ko and
unlikely to be replumbed now that panthor is the new hot stuff). So, pushing
true UBOs reduces GPU overhead at the cost of tremendous CPU overhead. This is
dubious... When I benchmarked this on MT8192 in early 2023, this pushing
improved FPS in SuperTuxKart but hurt FPS in Dolphin.
- True UBOs can be written on the GPU. In OpenGL, we have batch tracking
infrastructure to sort this mess out in theory. What this means is that
pushing UBOs requires us to flush writers AND STALL at draw-time. If this is
ever hit, our performance is utterly trashed. But it gets worse.
- True UBOs can be written in the same batch that reads them. For example, we
could bind a buffer as a transform feedback buffer, do a draw with XFB, then
rebind as a UBO and do a draw reading. This is where we collapse -- our logic
will flush the writer, which is the same batch we were in the middle of
enqueueing a draw to. When we try to push words, we'll crash with theatrics.
This could be solved by smartening the batch tracking logic but it's not
trivial by any means.
So, pushing true UBOs on the CPU is broken and can hurt performance. Stop doing
it!
Long term, the solution will be to push on the GPU instead. This avoids all of
these issues. This can be done with a compute kernel or with CSF instructions.
The Vulkan driver will likely have to do this for performance, since pushing
UBOs from the CPU is utterly broken in Vulkan for the above reasons.
I have a branch somewhere doing this on v9 but I'm doing this on NIR time to
unblock a core change that was crashing piglit due to this pile of unsoundness.
Let's fix the correctness issues first, then someone can look at recovering
performance later when we're not blocking unrelated work.
Fixes corruption in Piglit test
gles-3.0-transform-feedback-uniform-buffer-object, which writes a UBO with
transform feedback. (I suspect the test still doesn't pass for the same reason
it's broken on other tilers. But that's a better place to be than oodles of
memory corruption.)
According to CI, fixes spec@arb_uniform_buffer_object@rendering{-dsa}-offset.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34193>
ss->info.ubo_mask includes the push+sysval UBO so if there's a user
UBO bound at the same index as the push+sysval UBO, without this
change we end up writing a descriptor for the user UBO at that index.
Fixes: 3b3cd59f ("panfrost: Launch transform feedback shaders")
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34193>
For instance, this issue is triggered with "piglit/bin/fcc-blit-between-clears -auto -fbo":
Direct leak of 16400 byte(s) in 5 object(s) allocated from:
#0 0xb720689a in __interceptor_calloc (/usr/lib/libasan.so.6+0xb289a)
#1 0xaf10f896 in draw_create_fragment_shader ../src/gallium/auxiliary/draw/draw_fs.c:47
#2 0xaef64619 in i915_create_fs_state ../src/gallium/drivers/i915/i915_state.c:550
#3 0xae16a955 in ureg_create_shader ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2194
#4 0xae17f45f in ureg_create_shader_with_so_and_destroy ../src/gallium/auxiliary/tgsi/tgsi_ureg.h:150
#5 0xae17f45f in ureg_create_shader_and_destroy ../src/gallium/auxiliary/tgsi/tgsi_ureg.h:159
#6 0xae17f45f in util_make_fs_blit_zs ../src/gallium/auxiliary/util/u_simple_shaders.c:365
#7 0xaf13300e in blitter_get_fs_texfetch_depth ../src/gallium/auxiliary/util/u_blitter.c:1157
#8 0xaf13300e in util_blitter_cache_all_shaders ../src/gallium/auxiliary/util/u_blitter.c:1322
#9 0xaef6b738 in i915_create_context ../src/gallium/drivers/i915/i915_context.c:233
#10 0xacb33c49 in st_api_create_context ../src/mesa/state_tracker/st_manager.c:986
#11 0xac845740 in dri_create_context ../src/gallium/frontends/dri/dri_context.c:178
#12 0xac854d97 in driCreateContextAttribs ../src/gallium/frontends/dri/dri_util.c:631
#13 0xb6ce79a3 in dri2_create_context_attribs ../src/glx/dri2_glx.c:240
#14 0xb6c9606f in dri_common_create_context ../src/glx/dri_common.c:665
#15 0xb6ca4f00 in CreateContext ../src/glx/glxcmds.c:322
#16 0xb6ca5c0b in glXCreateNewContext ../src/glx/glxcmds.c:1449
Fixes: 1a69b50b3b ("i915g: Fix point sprites.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27570>
For instance, this issue is triggered with "piglit/bin/glx-multithread-texture -auto -fbo":
Direct leak of 256 byte(s) in 1 object(s) allocated from:
#0 0xb71eda62 in __interceptor_realloc (/usr/lib/libasan.so.6+0xb2a62)
#1 0xadd5a32f in tokens_expand ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:239
#2 0xadd5a32f in get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:262
#3 0xadd62519 in copy_instructions ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2079
#4 0xadd62519 in ureg_finalize ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2129
#5 0xadd64bde in ureg_get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2206
#6 0xade377d0 in nir_to_tgsi_options ../src/gallium/auxiliary/nir/nir_to_tgsi.c:4043
#7 0xade3da63 in nir_to_tgsi ../src/gallium/auxiliary/nir/nir_to_tgsi.c:3831
#8 0xaeb606c9 in i915_create_vs_state ../src/gallium/drivers/i915/i915_state.c:662
#9 0xac781a2c in st_create_common_variant ../src/mesa/state_tracker/st_program.c:720
#10 0xac78e8a4 in st_get_common_variant ../src/mesa/state_tracker/st_program.c:773
#11 0xac78fc10 in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:1259
#12 0xac78fc10 in st_finalize_program ../src/mesa/state_tracker/st_program.c:1345
#13 0xac790b1a in st_program_string_notify ../src/mesa/state_tracker/st_program.c:1378
#14 0xace457a9 in _mesa_get_fixed_func_vertex_program ../src/mesa/main/ffvertex_prog.c:1397
#15 0xac5ef8db in update_program ../src/mesa/main/state.c:281
#16 0xac5f0ece in _mesa_update_state_locked ../src/mesa/main/state.c:560
#17 0xac5f1653 in _mesa_update_state ../src/mesa/main/state.c:593
#18 0xacdf9fe2 in _mesa_DrawArrays ../src/mesa/main/draw.c:1403
Fixes: 487a493325 ("i915g: Add support for per-vertex point size.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27570>
For instance, this issue is triggered with "piglit/bin/fcc-blit-between-clears -auto -fbo":
Direct leak of 836 byte(s) in 1 object(s) allocated from:
#0 0xb71eb6f2 in malloc (/usr/lib/libasan.so.6+0xb26f2)
#1 0xaefadc78 in slab_add_new_page ../src/util/slab.c:179
#2 0xaefadc78 in slab_alloc ../src/util/slab.c:221
#3 0xaef7d461 in i915_texture_transfer_map ../src/gallium/drivers/i915/i915_resource_texture.c:789
#4 0xac9e931e in pipe_texture_map ../src/gallium/auxiliary/util/u_inlines.h:555
#5 0xac9e931e in _mesa_map_renderbuffer ../src/mesa/main/renderbuffer.c:494
#6 0xad49c5e4 in readpixels_memcpy ../src/mesa/main/readpix.c:260
#7 0xad49c5e4 in _mesa_readpixels ../src/mesa/main/readpix.c:898
#8 0xad5d8cfe in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:568
#9 0xad4a0caf in read_pixels ../src/mesa/main/readpix.c:1199
#10 0xad4a0caf in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1216
#11 0xad4a155b in _mesa_ReadPixels ../src/mesa/main/readpix.c:1231
or "piglit/bin/fcc-read-to-pbo-after-clear -auto":
Direct leak of 772 byte(s) in 1 object(s) allocated from:
#0 0xb726b6f2 in malloc (/usr/lib/libasan.so.6+0xb26f2)
#1 0xaf0adc88 in slab_add_new_page ../src/util/slab.c:179
#2 0xaf0adc88 in slab_alloc ../src/util/slab.c:221
#3 0xaf07aad7 in i915_buffer_transfer_map ../src/gallium/drivers/i915/i915_resource_buffer.c:75
#4 0xad10de74 in pipe_buffer_map_range ../src/gallium/auxiliary/util/u_inlines.h:398
#5 0xad10de74 in _mesa_bufferobj_map_range ../src/mesa/main/bufferobj.c:499
#6 0xad5677ce in _mesa_map_pbo_dest ../src/mesa/main/pbo.c:308
#7 0xad59be3b in _mesa_readpixels ../src/mesa/main/readpix.c:894
#8 0xad6d8cfe in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:568
#9 0xad5a0caf in read_pixels ../src/mesa/main/readpix.c:1199
#10 0xad5a0caf in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1216
#11 0xad5a155b in _mesa_ReadPixels ../src/mesa/main/readpix.c:1231
Fixes: e7a73b75a0 ("gallium: switch drivers to the slab allocator in src/util")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27570>
If you suspect a workload is failing because it needs more memory, you
can set ANV_SYS_MEM_LIMIT=100 to give it all the memory available.
This could make, for example, certain games start working (it really
depends on how much RAM you have and how much the game wants).
If you suspect a workload is too resource hungry, you can try to limit
it with ANV_SYS_MEM_LIMIT=30 (or some other value) to see if it can
deal with the more restricted environment and behave accordingly.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
"We paid for sixteen gigs of RAM, so we gonna use the whole damn
sixteen gigs of RAM!"
- My Mom
First, some history:
The Anv 50%-or-75% rule was originally added in 2017 by 060a6434ec
("anv: Advertise larger heap sizes"). When i915.ko started reporting
memory sizes in its ioctls, it didn't impose any restrictions: 100% of
SRAM was reported as available, so the restriction was in Mesa. When
xe.ko was introduced, it only reported 50% of the SRAM as available
through its ioctls, so commit b571ae6e7a ("intel: Make memory heaps
consistent between KMDs") adapted the code to not take an extra 25% of
the 50% that was already cut, and restricted i915.ko to 50% instead of
the 50%-or-75%. In Kernel commit d2d5f6d57884 ("drm/xe: Increase the
XE_PL_TT watermark"), xe.ko changed to reporting 100% of SRAM through
its ioctls, so we adapted Mesa to do the right thing depending on
which Kernel version was running.
While this was all happening, we were discussing about which behavior
was actually the best: restrict everything to 50% in order to avoid
issues when many things are running in parallel, or keep the
restriction only at 75% in order to allow high demanding workloads to
make full use of the hardware.
The way I see, if parallel applications are causing the system to run
out of resources, the user always has the option to kill applications
and use one thing at a time. On the other hand, if a single
application needs more than 50% of the SRAM and we don't allow it in
our heaps, the application will never work (unless, of course, the
user patches Mesa). So in this commit we go back to allowing
high-demanding applications to work by restoring the 50%-or-75% rule.
This commit is especially useful in systems with integrated graphics,
like LNL, where the option to upgrade RAM is not present.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
Kernel commit d2d5f6d57884 ("drm/xe: Increase the XE_PL_TT watermark")
changed how xe.ko reportes memory: its ioctls now report 100% of the
system RAM as available. Since our policy is to report 50% of the SRAM
as available for the heaps, add some code to check the amount reported
by xe.ko against the amount reported by the system, then act
accordingly.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
Before commit b571ae6e7a ("intel: Make memory heaps consistent
between KMDs"), we had the following policy for reporting Sytem RAM
memory sizes:
- For OpenGL, we reported the total available RAM.
- For Vulkan, we reported the total available RAM as:
- 50% of the total RAM if the total RAM was <= 4GB,
- 75% otherwise
- In addition, the Memory Budget (for VK_EXT_memory_budget) is 90%
of the "free" memory, which can be an extra 10% off of the 50% or
75%.
When xe.ko was added, one key difference was noted: while i915.ko
reported the "real" RAM memory sizes in its ioctls, xe.ko reported
only 50% of the system RAM as available. Because of that (and other
reasons, see this discussion on MR 28513), commit b571ae6e7a decided
to unify the behavior by changing the Anv i915.ko rule to "always 50%"
instead of "50% or 75%". This also changed the Iris rule to 50%
instead of 100%.
In my research, I couldn't find any reason why this restriction should
also apply to Iris, so here we revert back to handling these size
restrictions on Anv only.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
This is necessary to appropriately uniformize the first component
access of a convergent vector. Without this, this is produced:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0:F, 0.5f
add(32) %22:F, %18+2.0<0>:F, 0.5f
This is the correct code:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0<0>:F, 0.5f
add(32) %22:F, %18+2.0<0>:F, 0.5f
Without 38b58e286f, the code generated was more incorrect, but happened
to work for this test case:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0<0>:F, 0.5f
add(32) %22:F, %18+0.4<0>:F, 0.5f
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 38b58e286f ("brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset")
Closes: #12969
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34427>
Indeed, this resource was assigned twice and was not properly freed.
For instance, this issue is triggered with:
"piglit/bin/glsl-fs-pointcoord -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: 0278d1fa32 ("gallium: add unbind_num_trailing_slots to set_vertex_buffers")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27572>
Add conditioning before making driver calls to be
able to workaround some of the fatal errors, such
as unboxing issues during or after snapshot load.
This enables invalidating a host dispatcher based
on the application state. A default error will be
returned for vulkan calls.
Builtin expectation function is used to reduce
performance cost of the checks.
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34418>
Track pipeline layout creation and destroy calls
to cleanup them correctly on device teardown.
Pipeline layouts require delayed delete operations for
VulkanQueueSubmitWithCommands feature which modifies order
of commands and they need to stay valid during recording.
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34418>
This tells it to actually panic instead of unwinding and returning NULL.
I find myself commenting out the unwind code pretty frequently so I can
get GDB to break at the panic. This should help avoid that extra debug
step.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34435>
We need different Temps for each resume shader, because registers aren't
preserved across resume boundaries.
This was likely fine in practice because arg registers are the same for
each shader, but resulted in invalid IR and asserts.
Fixes crashes in Indiana Jones RT with assertions enabled on GFX8.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34114>
VUID-VkDescriptorSetAllocateInfo-pSetLayouts-09380 says that :
"If pSetLayouts[i] was created with an element of pBindingFlags
that includes VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT,
and VkDescriptorSetVariableDescriptorCountAllocateInfo is included
in the pNext chain, and
VkDescriptorSetVariableDescriptorCountAllocateInfo::descriptorSetCount
is not zero, then
VkDescriptorSetVariableDescriptorCountAllocateInfo::pDescriptorCounts[i]
must be less than or equal to
VkDescriptorSetLayoutBinding::descriptorCount for the corresponding
binding used to create pSetLayouts[i]"
But applications like are not following the spec. RADV doesn't apply
that limit and allocates if there is enough space in the pool. Let's
just do the same.
Note that this issue got resolved with a vkd3d-proton change :
a7ac1a7d2f
But since this change is deleting more code than it adds, might as
well go with it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12185
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32305>
Created stand alone tool for sampling gfx data on regular
intervals. Tool has inner loop that performs sampling every N
useconds. Press any key to end sampling. Results will be dumped
when intel_monitor exits.
First application of intel_monitor will be to collect eu stall
data. Perhaps more applications can be added at a later date.
How to use:
0. Set sysctl dev.xe.observation_paranoid=0
1. Clean shader cache and launch gfx INTEL_DEBUG=shaders-lineno.
Redirect stderr to asm.txt.
2. When gfx app ready to monitor, begin capturing eustall data by
launching `intel_monitor -e > eustall.csv` in separate console.
3 When done collected, close intel_monitor by pressing any key.
4. Correlate eustall data in eustall.csv with shader instructions in
asm.txt by matching instruction offsets. Use data to determine which
instructions are stalling and why.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
Xe2+ GPUs have support for eu stall sampling perf debug feature.
This feature allows driver to collect count and reasons for why
EUs are stalled on GPU. Stall data is cross referenced with ip
address within individual shaders so it is possible to know which
instructions in which shaders are generating stalls. This should
be a very useful feature for debugging performance of slow shaders.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
Add support for dumping shader asm containing instruction line numbers
matching offsets within instruction state pool buffer. Offsets
should match values collected from eu stall sampling. This is
required for match eu stall data with individual shader instructions.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
logicop_func=COPY is different from logicop_enable due to overriding blending.
maintain the info we need to implement properly. fixes
dEQP-VK.pipeline.shader_object_unlinked_binary.logic_op_na_formats.r32g32b32a32_sfloat.copy_blend
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34426>
This pulls out the logicop_func variable from the options struct, so we can
modify it in the next commit in a central place. It then refactors out the
format variable from the options struct since we end up duplicating
options->format[rt] a zillion times and passing in both an options struct and a
logicop func override is confusing so this will just make everything neater and
self-contained next commit.
no functional change.
Cc'd to make the next commit cherrypickable.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34426>
A few Android tools are based on/assume the datasource names
gpu.renderstages and gpu.counters. It is less effort to align with that
naming for Android builds than to chase down those tools and fix them,
not to mention account for new tools that may be created in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34330>
This is the backport of eca57f85ee ("radeonsi: fix
gl_ClipDistance and gl_ClipVertex for points").
This change was tested on rv770, palm, barts and cayman. It
fixes 450 khr-gl tests and 64 khr-gles tests on evergreen
and cayman gpus. Here is the list:
spec/glsl-1.20/execution/clipping/vs-clip-vertex-primitives: fail pass
spec/glsl-1.30/execution/clipping/vs-clip-distance-primitives: fail pass
spec/glsl-1.50/execution/compatibility/clipping/gs-clip-vertex-primitives-points: fail pass
khr-gl(3[0-3]|4[0-5])/clip_distance/functional: fail pass
khr-gl(33|4[0-5])/cull_distance/functional_test_item_[0-8]_primitive_mode_points_max_culldist_[0-7]: fail pass
khr-gles3/clip_distance/functional: fail pass
khr-gles3/cull_distance/functional_test_item_[0-8]_primitive_mode_points_max_culldist_[0-7]: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34403>
This is the backport of 9c49550163. This rounding functionality
is available on all the gpus of the r600 family.
This change was tested on rv770, palm and cayman. This change fixes
at least the "turn-on-off" tests on all these gpus and it does not
add any regression. Here are the tests fixed on palm:
spec/ext_framebuffer_multisample/interpolation 6 centroid-edges: fail pass
spec/ext_framebuffer_multisample/interpolation 8 centroid-edges: fail pass
spec/ext_framebuffer_multisample/turn-on-off 2: fail pass
spec/ext_framebuffer_multisample/turn-on-off 4: fail pass
spec/ext_framebuffer_multisample/turn-on-off 6: fail pass
spec/ext_framebuffer_multisample/turn-on-off 8: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34403>
When executing CopyBufferToImage or CopyImage with multiple regions of
both depth and stencil aspects targeting an interleaved depth stencil
image, we must split the regions into one copy-command for each aspect
and add a barrier between them to avoid a write-after-write race.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 5067921349 ("panvk: Switch to vk_meta")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34384>
The colorspace SRGB_NONLINEAR could be added twice when querying
available formats, leading to duplicate entries and VulkanCTS WSI test
failures.
Fixes: 789507c99c ("vulkan/wsi: implement the Wayland color management protocol")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34410>
Funny story... this is how regular b2f was implemented before Curro
implmented the `MOV dst:F -src:D` method 9 years ago (see
3ee2daf23d).
Eliminating the type conversion in the arithmetic operation enables the
next commit.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
This commit prevents code quality regressions in the next
commit. Without this, some fragment shaders in Batman: Arkham Origins
have code like:
shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V
...
and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW
...
add(8) g56<1>D -g52<8,8,1>D 1D
transformed to
shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V
...
and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW
...
mov(8) g56<1>D -g52<8,8,1>D
...
and(8) g57<1>UD ~g56<8,8,1>D 0x00000001UD
Propagating through the negation allows the added MOV to be deleted.
shader-db:
All Intel platforms had simlar results. (Lunar Lake shown)
total instructions in shared programs: 16968020 -> 16968019 (<.01%)
instructions in affected programs: 281 -> 280 (-0.36%)
helped: 1 / HURT: 0
total cycles in shared programs: 914598850 -> 914598832 (<.01%)
cycles in affected programs: 5398 -> 5380 (-0.33%)
helped: 1 / HURT: 0
A single Blender vertex shader was affected.
fossil-db:
Lunar Lake, Tiger Lake, Ice Lake, and Skylake had similar results. (Lunar Lake shown)
Totals:
Instrs: 209894650 -> 209894651 (+0.00%)
Cycle count: 30545958586 -> 30545952860 (-0.00%)
Totals from 2 (0.00% of 706657) affected shaders:
Instrs: 3582 -> 3583 (+0.03%)
Cycle count: 1875100 -> 1869374 (-0.31%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Subgroup size: 9906400 -> 9906416 (+0.00%)
Totals from 2 (0.00% of 805770) affected shaders:
Subgroup size: 16 -> 32 (+100.00%)
Two compute shaders in Hogwarts Legacy were affected.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
This is mostly defensive. If a convergent value ever ended up as a
source of a DDX or DDY, the eu_emit code will ignore the stride. This
will result in bad code being generated.
No shader-db or fossil-db changes on any Intel platform.
v2: DDX and DDY will always be float, but brw_imm_for_type only works
with integer types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Ken
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
The vast majority of the callers want channel = 0. During the
development process, using this default parameter value saved a lot of
pain in rebasing. However, it seems to be more trouble than it's worth.
Issue #12464 occurred because LNL was merged while this code was in
review. As a result, one caller of get_nir_src that wanted channel = -1
was not inspected closely, and it got the default channel = 0 instead.
To prevent this happening in the future (with possible branches still
yet to be merged, for example), remove the default parameter. This will
force the inspection of any callers that don't have an explicit channel
parameter. Hopefully that will prevent more problems.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
The source of nir_intrinsic_load_barycentric_at_offset is a vector, so
-1 should be passed to get_nir_src. This is also done for texture
sampling intrinsics.
I skimmed the other user of get_nir_src, and I believe they are
correct. This one was just missed as LNL support landed an many, many
rebases of the original MR occurred.
v2: Fix another get_nir_src call. Suggested by Lionel.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Closes: #12464
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
Bifrost keeps a cache of information about buffers being
used as indices. Unfortunately, it was not keeping information
about the size of the indices (probably because this rarely
changes). If a program deliberately re-interprets the indices
as a different type (e.g. UNSIGNED_INT instead of UNSIGNED_SHORT)
then we will use incorrect values from the cache. This actually
showed up in a test program we were running.
Fix by saving the index size in the cache key.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34011>
This issue seems to be specific to textureGather() which could
fail when processing some surfaces. These surfaces are configured
with non-standard one and zero swizzles. The gpu doesn't support
this very specific setup with all the possible hardware formats.
This change selects a compatible configuration when this is
possible.
This change was tested on palm, barts and cayman. This change
fixes the 216 remaining arb_texture_gather tests:
spec/arb_texture_gather/texturegather/.*-zero-.*: fail pass
spec/arb_texture_gather/texturegather/.*-one-.*: fail pass
spec/arb_texture_gather/texturegatheroffset/.*-zero-.*: fail pass
spec/arb_texture_gather/texturegatheroffset/.*-one-.*: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34293>
After ca09c173f6, util_blitter_clear_render_target() requires
a call to util_blitter_save_fragment_constant_buffer_slot().
The r600 implementation was using the same sequence with
util_blitter_clear_depth_stencil() which does not need this
call. This was the cause of the refcnt imbalance.
For instance, this issue is triggered with:
"piglit/bin/ext_clear_texture-stencil -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: ca09c173f6 ("gallium/u_blitter: remove UTIL_BLITTER_ATTRIB_COLOR, use a constant buffer")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34292>
The vtn_ssa_value for a cmat is not backed by a nir_def, but by a nir_variable, so
can't be used directly when calling a function. In most cases the cmat is used by
reference so code will take the value of deref for it (which is a `nir_def`).
When passing a cooperative matrix to a function by value, let the caller pass the deref
value, and the callee copy to a new local variable from that deref.
Fixes: b98f87612b ("spirv: Implement SPV_KHR_cooperative_matrix")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34364>
The `meson configure` command is used in `.gitlab-ci/meson/build.sh` to
show a summary of the build configuration, however when the script is
re-used locally on systems when there is a pager the commands blocks
for user input, which can be annoying.
Pass the `--no-pager` option to `meson configure`, so that it never
blocks for user input.
This does not change the behavior in the CI, it just makes the script run
uninterrupted also on local invocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34289>
The cid loop in the previous implementation stopped at n_counters for a
given category, even though cid is a global id that does not start
counting from zero at the beginning of each category. As a result, we
missed most of the counters outside of the first category.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: 513d1baaea ("pps: Panfrost pps driver")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34202>
Starting with Rust 1.83 this benign warning is show when compiling the pest dependency:
```
warning: elided lifetime has a name
--> pest/src/iterators/pairs.rs:330:70
|
89 | impl<'i, R: RuleType> Pairs<'i, R> {
| -- lifetime `'i` declared here
...
330 | ) -> Filter<FlatPairs<'i, R>, impl FnMut(&Pair<'i, R>) -> bool + '_> {
| ^^ this elided lifetime gets resolved as `'i`
|
= note: `#[warn(elided_named_lifetimes)]` on by default
```
Meson, at least as of version 1.7.0, unfortunately does not suppress warnings originating in dependencies.
Upstream has resolved the warning in 2.8.0, so update to that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34368>
__LINE__ can be inconsistent when using different compilers. This patch
changes the test runner to do a simple string find/replace of the test
source file instead of looking for the line where the reference string
starts.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33980>
There is a special address defined in kernel from which ALWAYSON
counter could be read.
Blob uses this sequence to read it:
getone #l15
mov.s32s32 r2.y, -4096
mov.s32s32 r2.z, 131071
(rpt5)nop
ldg.u32 r2.w, g[r2.y], 1
ldg.u32 r2.y, g[r2.y+4], 1
(sy)(ss)mov.s32s32 r48.x, (last)r2.w
mov.s32s32 r48.y, (last)r2.y
l15:
Passes:
dEQP-VK.glsl.shader_clock.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29860>
The temporary syncobj created in the fast path of kgsl_queue_submit was
not being destroyed, and potentially being assigned to multiple syncobjs
without being properly duplicated. This could lead to a use-after-free
or double-free since multiple syncobjs could be assigned the same FD.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34328>
This reverts commit 0342d34bdb which
introduced a regression in the Turnip's KGSL backend, causing various
sync issues since KGSL doesn't advance the GPU timeline when a submit
without cmdbufs is made. A comment explaining the issue was added to the
code, and the fast path is reintroduced.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34328>
Using constant-size array instead of variable-length array is preferred
due several issues with the latter.
Particularly, for this case using VLA generates several warnings by
static analyzer: passed-by-value struct argument contains uninitialized
data (e.g., field: 'file').
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34050>
Most compositors work with damage in the buffer local coordinate space.
This change spares the compositors some work converting the provided
INT32_MAX x INT32_MAX damage region to the buffer local coordinate
space. It has no significant performance impact, but it'd still be nice
to use wl_surface_damage_buffer() if possible.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34227>
Most compositors work with damage in the buffer local coordinate space.
This change spares the compositors some work converting the provided
INT32_MAX x INT32_MAX damage region to the buffer local coordinate
space. It has no significant performance impact, but it'd still be nice
to use wl_surface_damage_buffer() if possible.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34227>
When running under DXVK or vkd3d, the texture coordinate rounding behavior
should match D3D expectations. On Adreno, this behavior can be toggled
through the SP_TP_MODE_CNTL register.
A driconf-based option is introduced to help set the relevant register flag
that enables this behavior.
This fixes the cause of test_sampler_rounding test case failure in vkd3d on
Turnip's side, but a small change in vkd3d is also required, so the test
failure expectation isn't removed yet.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33987>
This commit makes radv use vk_video_derive_h265_scaling_list, which properly
applies default scaling lists whenever they're needed. It also simplifies
update_h265_scaling function into a simple memcpy. The firmware interface
struct and Vulkan's StdVideoH265ScalingLists struct both have identical memory
layouts, so it's not neccessary divide it into multiple copies with offsets.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34096>
H265 specification defines default scaling lists to use whenever scaling lists
are not specified in neither sps nor pps. Currently drivers ignore this
requirement and set the lists to zero. This commits adds a helper function
vk_video_derive_h265_scaling_list (similar to its h264 counterpart) that
selects either sps or pps lists and falls back to default values if neither
were specified. The default values were taken from ITU-T H265 specification
(revision 8), section 7.4.5.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34096>
The intervening_saturating_copy test is removed. The defs version of the
pass does not handle this case. It should not occur often in practice
anyway. Copy propagation and brw_nir_opt_fsat should prevent this
scenario from happening.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 212677275 -> 212677278 (+0.00%)
Cycle count: 30466062848 -> 30466056040 (-0.00%)
Totals from 1 (0.00% of 706300) affected shaders:
Instrs: 1343 -> 1346 (+0.22%)
Cycle count: 411664 -> 404856 (-1.65%)
v2: Stop counting ip. The non-defs part of the pass was the only thing
that used it.
v3: Also delete "if (block != def->block) continue;" code. I noticed
this while working on some other changes to this function. It's the last
thing in the loop, so it's totally useless. Delete some other spurious
continues too.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
v2: Add support for WE_all instructions... this already just worked, so
I only had to delete the check and the FINISHME comment.
v3: Use logic more like def_analysis::update_for_reads to determine when
to not insert LOAD_REG instructions. Based on a suggestion by Ken.
v4: Eliminate "store" from all the names since STORE_REG does not exist
anymore. Fold insert_load_reg into brw_insert_load_reg. Elminate extra
call to s.def_analysis.require() after progress. Pull a loop-invariant
check out of the inst->srouces loop. Drop call to
brw_opt_split_virtual_grfs after lowering load_reg. All suggested by
Caio.
v5: Assert that LOAD_REG doesn't already exist in
brw_insert_load_reg. Update comment before fully_defines. Both
suggested by Caio.
v6: Don't explicitly special-case SHADER_OPCODE_MEMORY_STORE_LOGICAL.
Move the inst->dst.file != VGRF check earlier to avoid the loop over
sources. Both suggested by Ken. Move the call the brw_insert_load_reg
a little bit later, and explain why it's at that location. Suggested
by Caio.
v7: Many changes to the for-each-source loop in brw_insert_load_reg.
Removes incorrect multiplication of s.alloc.sizes with reg_unit. Adds
checks for matching SIMD size and NoMask in the search for pre-existing
LOAD_REG of same value.
v8: Add some unit tests. Suggested by Caio.
shader-db:
Lunar Lake
total instructions in shared programs: 16923237 -> 16921895 (<.01%)
instructions in affected programs: 450565 -> 449223 (-0.30%)
helped: 251 / HURT: 377
total cycles in shared programs: 910428418 -> 889920590 (-2.25%)
cycles in affected programs: 719248184 -> 698740356 (-2.85%)
helped: 9076 / HURT: 9082
total fills in shared programs: 2242 -> 2218 (-1.07%)
fills in affected programs: 116 -> 92 (-20.69%)
helped: 2 / HURT: 0
total sends in shared programs: 848635 -> 848421 (-0.03%)
sends in affected programs: 810 -> 596 (-26.42%)
helped: 10 / HURT: 0
LOST: 82
GAINED: 78
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19875784 -> 19871694 (-0.02%)
instructions in affected programs: 1050091 -> 1046001 (-0.39%)
helped: 251 / HURT: 2403
total cycles in shared programs: 905328238 -> 882446458 (-2.53%)
cycles in affected programs: 682736344 -> 659854564 (-3.35%)
helped: 7869 / HURT: 7911
total spills in shared programs: 5512 -> 5032 (-8.71%)
spills in affected programs: 1830 -> 1350 (-26.23%)
helped: 8 / HURT: 0
total fills in shared programs: 5648 -> 4782 (-15.33%)
fills in affected programs: 3312 -> 2446 (-26.15%)
helped: 8 / HURT: 0
total sends in shared programs: 1032942 -> 1032722 (-0.02%)
sends in affected programs: 572 -> 352 (-38.46%)
helped: 10 / HURT: 0
LOST: 138
GAINED: 53
Tiger Lake
total instructions in shared programs: 19711930 -> 19715591 (0.02%)
instructions in affected programs: 1040623 -> 1044284 (0.35%)
helped: 317 / HURT: 2474
total cycles in shared programs: 862988990 -> 860573870 (-0.28%)
cycles in affected programs: 612392461 -> 609977341 (-0.39%)
helped: 7447 / HURT: 7686
total sends in shared programs: 1034763 -> 1034555 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 56
GAINED: 143
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20545461 -> 20545220 (<.01%)
instructions in affected programs: 422405 -> 422164 (-0.06%)
helped: 180 / HURT: 459
total cycles in shared programs: 872697345 -> 866874523 (-0.67%)
cycles in affected programs: 573117917 -> 567295095 (-1.02%)
helped: 6783 / HURT: 6980
total spills in shared programs: 4335 -> 4336 (0.02%)
spills in affected programs: 90 -> 91 (1.11%)
helped: 1 / HURT: 2
total fills in shared programs: 4194 -> 4196 (0.05%)
fills in affected programs: 463 -> 465 (0.43%)
helped: 1 / HURT: 2
total sends in shared programs: 1079446 -> 1079238 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 117
GAINED: 37
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209708136 -> 209695617 (-0.01%); split: -0.02%, +0.01%
Send messages: 10927753 -> 10927640 (-0.00%)
Cycle count: 30540172048 -> 30427084732 (-0.37%); split: -0.99%, +0.62%
Spill count: 511621 -> 510932 (-0.13%); split: -0.22%, +0.08%
Fill count: 621166 -> 618440 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35574784 -> 35648512 (+0.21%); split: -0.06%, +0.26%
Max live registers: 65453860 -> 65453140 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 75374990 -> 35195764 (-53.31%)
Totals from 503284 (71.25% of 706391) affected shaders:
Instrs: 180203778 -> 180191259 (-0.01%); split: -0.02%, +0.01%
Send messages: 9699732 -> 9699619 (-0.00%)
Cycle count: 30080349592 -> 29967262276 (-0.38%); split: -1.01%, +0.63%
Spill count: 511584 -> 510895 (-0.13%); split: -0.22%, +0.08%
Fill count: 621120 -> 618394 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35443712 -> 35517440 (+0.21%); split: -0.06%, +0.27%
Max live registers: 52566092 -> 52565372 (-0.00%); split: -0.01%, +0.00%
Non SSA regs after NIR: 70110949 -> 29931723 (-57.31%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
This prevents assertion failures in brw_eu_emit in a later commit in
this MR. Even though they have not been previously observed, these
assertion failures could happen even without that commit.
No shader-db or fossil-db changes on any Intel platform.
Fixes: 04e1783278 ("brw: Call brw_fs_opt_algebraic less often")
v2: Add SHUFFLE. Suggested by Ken. Fixed indentation.
v3: Update BROADCAST exec_size after rebasing on "brw/build: Use SIMD8
temporaries in emit_uniformize".
v4: Explain why munging the exec_size is correct.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
The changes to try_copy_propagate will be removed later in the series.
v2: Fix up some comments to note that offset != 0 is allowed only when
stride == 0. Apply same offset=0 restriction in try_copy_propagate_def
too. Allow copy propagation if the source is either a def or
UNIFORM. Don't copy prop a load_reg through a non-def value.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
load_reg is something like load_payload except it has a single
source. It copies the entire source to the destination. Its purpose is
to convert a non-SSA VGRF into an SSA value. This copy is marked as
volatile so that it will act as a scheduling barrier.
v2: Fix some typos in the commit message. Eliminate the
brw_builder::LOAD_REG overload that returns a brw_inst*. This is
unlikely to ever be used. Add some checks to brw_validate. All
suggested by Caio.
v3: Force the source and destination types of the LOAD_REG to by
integer. This will (eventually) simplify the creating of unit tests for
the pass that adds LOAD_REG instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
v2: Move to immediately before the main optimization loop. Most
importantly, this is after the first call to DCE.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Non SSA regs after NIR: 237045283 -> 100183460 (-57.74%); split: -58.12%, +0.39%
Totals from 701423 (99.26% of 706657) affected shaders:
Non SSA regs after NIR: 236868848 -> 100007025 (-57.78%); split: -58.17%, +0.39%
Suggested-by: Ken
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
device-select-layer needs to obtain the display server's preferred
display device, and has so far relied on wl_drm for this. wl_drm is
superseded by linux-dmabuf with some Wayland servers having dropped
support for wl_drm entirely.
Implement linux-dmabuf as preferred mechanism for obtaining the main
device, with wl_drm support retained as a fallback for now.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34219>
Starting with sRGB meant we would refcount to -1 if an application
chooses PASS_THROUGH. Instead, just initialize with PASS_THROUGH so the
initial refcount of 0 reflects reality.
Previously, we would segfault if an application chose PASS_THROUGH at
swapchain initialization then switched to a color managed colorspace
later in the runtime, because we would increment refcount from -1 -> 0
and this would result in not creating a new color managed surface.
Fixes: 789507c99c ("vulkan/wsi: implement the Wayland color management protocol")
Signed-off-by: llyyr <llyyr.public@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34353>
Calling dri2_x11_setup_swap_interval with swap_available = false
sets the min/max/default swap interval values to zero.
EGL_MIN/MAX_SWAP_INTERVAL is always reported as 0 and the interval
value set by eglSwapInterval gets clamped to 0.
Set swap_available to true before calling dri2_x11_setup_swap_interval,
as was done before.
Fixes: c00701c83a ("egl/x11: unify swrast/kopper/dri3 paths a bit")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34235>
We need to remember which streamout buffers and streams were enabled,
even if the shader doesn't actually write any outputs to them,
because the API requires that we count vertices created by this shader
towards queries against those streams.
That information can be gathered by nir_gather_xfb_info_with_varyings
from the original NIR I/O variables that we get from the frontend,
but it isn't included in any intrinsics so would be otherwise lost here.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
On i686, where VK_USE_64_BIT_PTR_DEFINES is unset and Vulkan handles are
represented as 64-bit integers instead, the code used the wrong format
specifier, causing a build error.
Fixes: 7fb31361f4 ("Handle external fences in vkGetFenceStatus()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34124>
Per spec, logic operations between fragment values and color attachments
should be disabled when attachments are using float or sRGB formats.
Regardless of attachment's format, enabled logic operations should keep
blending disabled.
Fixes: dEQP-VK.pipeline.*.logic_op_na_formats.*
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34212>
Context flushes can be caused by all kinds of operations that aren't
obvious to a GL API user. As those are quite heavy-weight operations
it is nice to have some insight into how many of those are happening
per frame. Add a sw query to make this information easily accessible.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34350>
Shared RA might insert new defs to be handled by regular RA (e.g.,
shared spills). However, their interval offsets were not initialized
which caused their intervals to sometimes be mistakenly matched with
those containing offset 0. Fix this by calling index_merge_sets after
shared RA and modifying that function to only index new defs in that
case.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33319>
shpe is a bit of a special instruction: it's not really a terminator
(i.e., it does not perform a jump) but it does have to stay at the end
of its block. Up to now, we tried to enforce this by creating const
write barriers on shpe; the assumption being that everything that
happens in the preamble ends in a write to the const file so shpe stays
at the end. Alas, it turns out this is not true: things like sampler
prefetches do not write the const file and nothing was preventing those
from being scheduled after shpe.
Instead of trying to create even more barrier dependencies, fix this by
making shpe a terminator. Both sched and postsched treat terminators
specially to make sure they always stay at the end of their block.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34290>
In order to implement FDM offset, we will have to offset the viewport
and scissor in the binning pass. In order to do this, we have to pass a
bin with nonsensical negative offsets to the patchpoint function, which
would result in asserts when patching the load/store sequences. But we
don't really need to patch these anyways as they are unused during
binning, so add the ability to skip them when binning. FS params and
some implementations of CmdClearAttachments (that don't contribute to
visibility) can similarly be skipped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33500>
For FDM offset, we will need to expand the number of bins by 1, which
can change how pipes are allocated. We don't necessarily know whether
FDM offset will be used when creating the VkFramebuffer, so we'll have
to create two different configs when FDM is enabled. Split out the parts
that are affected by the number of bins into a separate "VSC config"
struct that will be duplicated with FDM offset.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33500>
nir_opt_vectorize could replace swizzled movs with vectorized movs in a
different block. If this happens with swizzled movs in a then block, it
could leave this block empty. ir3 assumes only the else block can be
empty (e.g., when lowering predicates) so make sure ifs are in that
canonical form again.
This fixes empty predication blocks in some shaders, for example:
predt
predf
...
prede
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34272>
At least on all a6xx/a7xx, mad.f32 and mad.f16 are not fused. This means
that when the sources of a NIR ffma are all uniform we can split it in
two to execute it on the scalar ALU. This is important to reduce
register pressure and make more preambles executed early.
On fossil-db the statistics are mostly a wash as expected, but with
early preambles increasing dramatically:
Totals:
MaxWaves: 2249180 -> 2249230 (+0.00%); split: +0.01%, -0.01%
Instrs: 49668884 -> 49662951 (-0.01%); split: -0.12%, +0.11%
CodeSize: 103662656 -> 103831154 (+0.16%); split: -0.22%, +0.38%
NOPs: 8502571 -> 8495568 (-0.08%); split: -0.61%, +0.53%
MOVs: 1554442 -> 1538804 (-1.01%); split: -2.01%, +1.01%
Full: 1820906 -> 1814292 (-0.36%); split: -0.39%, +0.03%
(ss): 1168628 -> 1165868 (-0.24%); split: -1.01%, +0.78%
(sy): 616751 -> 616521 (-0.04%); split: -0.52%, +0.49%
(ss)-stall: 4384397 -> 4361662 (-0.52%); split: -1.44%, +0.93%
(sy)-stall: 17850227 -> 17858949 (+0.05%); split: -0.58%, +0.63%
Early-preamble: 102262 -> 115702 (+13.14%)
Cat0: 9375820 -> 9367978 (-0.08%); split: -0.57%, +0.48%
Cat1: 2470212 -> 2454318 (-0.64%); split: -1.28%, +0.64%
Cat2: 18673655 -> 18707106 (+0.18%)
Cat3: 14227810 -> 14211106 (-0.12%)
Cat5: 1424184 -> 1424150 (-0.00%)
Cat7: 1404718 -> 1405808 (+0.08%); split: -0.39%, +0.47%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34115>
There are three copies of this function, all of them have the same
memory leak in them. Instead of fixing them one by one, just use a
common implementation for all three, since they already all have a
shared helper lib.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34118>
With "classic" renderpasses, the VkFramebuffer's layerCount must be 1 if
multiview is enabled. We accidentally rely on this to not disable GMEM
for multiview, and possibly for other things too. Apparently the dynamic
rendering equivalent, VkRenderingInfo::layerCount, can be anything when
multiview is enabled, and some CTS tests set it to the number of views.
Sanitize it when constructing the internal framebuffer for dynamic
rendering.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34080>
We were a bit too conservative and fully disabled LRZ for when stencil
or blending were involved. There is no need to fully disable LRZ
in those cases, only LRZ writes should be disabled.
The final rules are:
LRZ is DISABLED until depth attachment is cleared when:
- Depth Write + changing direction of depth test
e.g. from OP_GREATER to OP_LESS;
- Depth Write + OP_ALWAYS or OP_NOT_EQUAL;
- Clearing depth with vkCmdClearAttachments;
- Depth image is a target of blit commands.
- (pre-a650) Not clearing depth attachment with LOAD_OP_CLEAR;
- (pre-a650) Using secondary command buffers;
LRZ WRITE is DISABLED until depth attachment is cleared when:
- Depth Write + blending (color blend, logic ops, partial color mask, etc.);
- Fragment may be killed by stencil;
LRZ is disabled for CURRENT draw when:
- Fragment shader side-effects (writing to SSBOs, atomic operations, etc);
- Fragment shader writes depth or stencil;
LRZ WRITE is DISABLED (via LATE_Z) for CURRENT draw when:
- Fragment may be via killed alpha-to-coverage, discard, sample coverage;
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33851>
This adds the latency information provided by NVIDIA. This is copied
from excel spreadsheets provided to Red Hat.
This fully passes CTS on Turing TU104 with no regressions.
I'm sure future use of some instructions like IMAD may require some
changes to this, but it should be functionally complete.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573>
Delays greater than 15 needs to be encoded into a nop following the
instruction. These delays will start to happen when we add accurate
latency handling and with certain instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573>
Otherwise we schedule this sort of thing wrong,
r0 = iadd3 r0 c[0x0][0x0] rZ
r0 = shf.l.w.i32 r0 rZ 0x2
r0 p0 = iadd3 r0 c[0x1][0x0] rZ
since raw latencies are more important than waw, but we go do a
waw for the first two instructions instead of a raw which is correct.
Fixes: 2d4e445099 ("nak/calc_instr_deps: Rewrite calc_delays() again")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573>
When we are using compute resolve, we can get
values the CTS does not expect due to the value
we end up writing for UNORM in
`nir_image_deref_store`.
Make the compute resolve rounding path match with
the output of the fragment shader resolve path,
by going through the same FP16 RTZ conversion as
we do for UNORM/SNORM formats.
This is why VK_EXT_sample_locations CTS was
failing on > GFX9.
On <= GFX9, I am assuming we are falling back to
RESOLVE_FRAGMENT, due to DCC stuff, which is why
it works there.
I tested a handful of images from the Vulkan CTS
for the sample locations and resolve tests for
diff UNORM formats from the qpa file forcing
FRAGMENT and with this change.
With this change, we now match on the compute
resolve path the same sha for the ones I compared
with ImageMagick `identify`.
CTS passes for: *resolve*, *image_clearing* and
*sample_locations* on RX 7900XTX.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28237>
This is based on Timur Kristof's code, but there are a lot of differences.
The idea is that it doesn't just compute an intersection between a point
and a triangle. It computes the *distance* between a point and a triangle
and it does so in screen space. It accurately takes the subpixel precision
of the rasterizer into account, so that it works optimally at all
resolutions, all MSAA modes, and all quant modes.
The distance computation is only approximated because it only considers
the infinite lines going through triangle edges. However, it seems to be
more than sufficient in practice because the existing rounding-based small
prim culling compensates for it.
The performance improvement is up to 10% in some geometry-bound tests,
though targeted microbenchmarks can show a lot more than that.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33361>
Migrate the panfrost-g52-vk job to mt8186-corsola-steelix-sku131072, a
new device in LAVA. This DUT is faster than the Khadas VIM3 device it
replaces, and since more of these devices are available in LAVA, reduce
the DEQP_FRACTION and increase the parallelism for the pre-merge job.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33996>
We currently just assume that textureCompressionETC2 and
textureCompressionASTC_LDR are always supported. And while that's true
for all the G52s, G610s abd G310s we've seen out in the wild, it's not
guaranteed to be true. An SoC vendor might disable support for one of
these formats.
So let's check properly, just for good measure.
Fixes: d970fe2e9d ("panfrost: Add a Vulkan driver for Midgard/Bifrost GPUs")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34206>
The panfrost_emit_plane function expects an array, and Coverity
complains about passing a pointer instead. Yeah, that's a bit nit-picky,
but it's easy enough to use an actual array here instead of trying to
fudge it.
This should be a non-functional change.
CID: 1636773, 1636744
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34156>
Enforce a default job timeout of 1 second, to make jobs which don't
explicitly specify a timeout insta-fail, rather than potentially hanging
around for an hour.
Container builds get the full hour as they can run long and are not run
in pre-merge context, and LAVA jobs also get the full hour as they have
multiple internal timeout mechanisms which aim to fast-fail jobs once
they actually start. However, as they just queue jobs to an external
host (shared with other projects like KernelCI), these timeouts aren't
reflected into the GitLab CI definitions.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34280>
quick_shader is for some reason now excruciatingly slow on the Microsoft
runners, but fine on the GStreamer runners. Until we can figure out why
this is happening - 27min runtimes instead of 3min - just keep them over
there.
Dozen is miraculously unaffected.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34280>
Most of our build jobs have had a timeout of 1 hour since 6425b6e3d4,
which pushed the timeout as high as possible to allow for LTO building
taking forever. Since then, we've pulled LTO back to only running on the
fedora-release nightly job.
This tries to bring us back to the ideal from 322a83f321, where the
build jobs all have very low timeouts both for the overall job as well
as the build section we run. So if anything goes wrong - apart from
fedora-release - we'll just assume the runner has some environmental
damage, give up, and try again.
Of the build-for-tests jobs, all but the ASan and UBSan jobs regularly
complete in around 5 minutes, apart from debian-testing which is our
critical-path build job for almost all x86-64 testing. This is obviously
not good, but is tracked in mesa#12544.
The build-only jobs not using sanitizers also typically complete in
3-4 minutes, with the exception of debian-clang-release and
debian-x86_32 which are closer to 10 minutes. That's not ideal, but
they're also not currently on the critical path, so we can live with
that.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34280>
Buffer resources are quite special as they are only one dimensional,
always linear, don't have miplevels or array slices, never have a
texture or render compatible sibling, don't ever use TS.
The gallium context interface acknowledges this fact by providing
separate entry points for buffer maps/unmaps/flushes.
Provide a specialized etna_buffer_resource as a much more lightweight
alternative to the fullblown etna_resource and implement buffer
maps/unmaps in the same straight forward, direct map manner that is
hidden inside all the tiling, TS and resource sibling handling in
etna_transfer_map/unmap. It is expected that further map optimizations
can be added on top of this simple implementation much more easily
than in the merged buffer/texture transfer code.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34061>
A begin/end sequence is something like (it's all macros based):
radeon_begin(cs);
radeon_emit(PKT3(PKT3_DRAW_INDEX_AUTO, 1, cmd_buffer->state.predicating));
radeon_emit(vertex_count);
radeon_emit(V_0287F0_DI_SRC_SEL_AUTO_INDEX | use_opaque);
radeon_end();
This is loosely based on RadeonSI (see !8653 (a0978fff)) and it seems
indeed faster overall.
The main goal of this rework is to re-use the same logic as RadeonSI
for paired packets on GFX12 (also GFX11 dGPUs) because it's supposed
to be way faster, especially on GFX12 where the CP is slow. The other
goal is to share more cmdbuf emission between both drivers in the near
future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34229>
This is probably a little more code but we're about to add real data for
Turing+ so it's better to have things contained like this. Since Volta
and earlier will always remain hacks, we might as well have those hacks
in the per-SM files rather than pretending we have a general thing in
sched_common.rs.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34302>
We're about to add instruction latencies which are going to be in their
own files because they're also massive. This makes things follow a bit
more of a module structure where sm70.rs is the thing that ties it all
together, sm70_encode.rs is the encoder, and smXX_instr_latencies.rs
will be the individual latency files.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34302>
After ca09c173f6, util_blitter_clear_render_target() requires
a call to util_blitter_save_fragment_constant_buffer_slot().
The radeonsi implementation was using the same sequence with
util_blitter_clear_depth_stencil() which does not need this
call. This was the cause of the refcnt imbalance.
For instance, this issue is triggered with:
"piglit/bin/ext_clear_texture-stencil -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: ca09c173f6 ("gallium/u_blitter: remove UTIL_BLITTER_ATTRIB_COLOR, use a constant buffer")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34291>
This enables support for NV12, which are really useful when
dealing with hardware video decoders. This patch makes use
of the integrated YUV tiler to convert multi-planar to YUYV.
The binary blob uses the same method to deal with multi-planar
YUV formats. Other formarts will be added in a follow-up patch.
Tested with kmscube (nv12-1img) and the following gstreamer pipeline:
gst-launch-1.0 filesrc location=/tmp/test.mp4 ! qtdemux ! v4l2slh264dec ! video/x-raw,format=NV12 ! glimagesink
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Peter Frühberger
Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3418>
By default mesa now only supports DRI3.
If Xorg is not activating DRI3, it will show an almost empty screen, which is not working.
No error message is given.
This e.g. happens at Debian package
xserver-xorg-video-intel at the i915 gallium driver.
This commit generates error messages to explain the problem.
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33679>
A pipeline barrier which contains an image layout transition like
COLOR_ATTACHMENT_OPTIMAL -> TRANSFER_DST_OPTIMAL on compute queue
would just hang. Such a barrier is useless in practice but it's legal.
Prevent GPU hangs by skipping FCE or FMASK_DECOMPRESS when it's not
on the graphics queue.
Fixes dEQP-VK.synchronization2.layout_transition.compute_transition*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34231>
It is important to get D/S only draw calls to bypass invoking
the fragment shader. The public documentation for Adreno states:
"Hint the driver to engage Fast-Z by using an empty fragment
shader and disabling frame buffer write masks for renderpasses
that modify Z values only."
"The GPU has a special mode that writes Z-only pixels at twice
the normal rate."
We are promised a big performance improvement in this case.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33735>
This affects:
- generic jobs (sanity, rustfmt, shader-db, docs, etc.)
- linux image builds
- linux mesa builds
- software renderer tests
- android tests
- virgl & venus tests
Marge pipelines get high priority, nightly pipelines get low priority,
and everything else is in between.
(Hardware test farms have their own mechanisms.)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34264>
This changes the static cycle count estimate so that it takes into
account estimated variable latency instruction delays. Statistics from
before this commit are not comparable to statistics generated after
this commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32311>
To get us started, this is designed to be pretty much the simplest thing
possible. It runs post-RA so we don't need to worry about hurting
occupancy and it uses the classic textbook algorithm for local (single
block) scheduling with the usual latency-weighted-depth heuristic.
-14.22% static cycle count on shaderdb
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32311>
The problem this fixes is currently hidden because of the order in
which we run nir_lower_tex & intel_nir_lower_texture. The issue is
that nir_lower_tex removes the LOD source in some cases and the second
run of nir_lower_tex can add it back.
This is also only needed on Gfx12.5+ if the LOD is present.
Finally move all of the texture lowering to the postprocess phase. No
need to run this multiple times.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
The previous concept was to emit the non-deferred shader part
first, including the culling code, and then modify the
non-deferred part accordingly.
This caused some issues because it was really impossible to tell
which sysvals the deferred part needs after DCE, so we had to
run an additional cleanup pass afterwards.
The new concept is to prepare the deferred part first by applying
reusable variables (from the non-deferred part) and run DCE.
This opens the possibility to accurately gather info about what
the deferred part needs.
This idea is further expanded in the next commits.
Fossil DB stats on Navi 21:
Totals from 17 (0.02% of 79377) affected shaders:
Instrs: 18063 -> 18064 (+0.01%)
CodeSize: 93368 -> 93372 (+0.00%)
Latency: 49889 -> 49899 (+0.02%); split: -0.01%, +0.03%
SALU: 2416 -> 2417 (+0.04%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073>
Calculate the IP ranges of the shader as an analysis pass. This will
later replace the existing tracking of start_ip/end_ip as the blocks are
changed (and the defer/adjust scheme to avoid too much churn when that
happen).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
It was used by the pass tests to verify output with TEST_DEBUG=1,
replace it with brw_print_instructions().
The output is slightly different (not printing IP, not reordering the
blocks), we can add those features as we need, but given the usage was
already very reduced, don't bother with that until need arises.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
Drivers are expected to ignore unknown structs in pNext chains. Venus
is a bit weird because we advertise features based on the host driver
and so we have code for all sorts of things which may not be supported
by the host driver. When globalPriorityQuery is unsupported, we
shouldn't even attempt to return anything. Currently, we just crash in
this case because vn_physical_device::global_priority_properties is an
uninitialized pointer. While we're here, initialize it to NULL if it's
invalid.
Fixes: e488b5e45e ("venus: support VK_KHR_global_priority")
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34218>
When we're using the PRIME path and using vkCmdCopyImageToBuffer to copy
to a linear image, the buffer memory is what's shared with the window
system. For legacy drivers that depend on memory signaling via
wsi_memory_signal_submit_info, we need to tell the driver to signal the
buffer memory, not the image memory or else the window system may wait
on a driver-internal buffer and not wait for the copy to complete.
Cc: mesa-stable
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34218>
a1b05991 ("radv/rt: Flush L2 after writing internal node offset on GFX12")
did this for radv-internal CP writes - we also need to do this for PLOC
sync data initialization which is done in the common framework.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34178>
The mirror needs to be reversed because the rotation is applied
before the mirroring.
VAAPI docs:
Mirroring of an image can be performed either along the
horizontal or vertical axis. It is assumed that the rotation
operation is always performed before the mirroring operation.
Cc: mesa-stable
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34140>
This should probably be done different, params should probably be considered immutable,
and this should be moved into the command buffer, also this gets set on decode paths as well
which might not make sense.
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34204>
Because if FMASK_COMPRESS_1FRAG_ONLY is set, the FMASK decompress
operation actually doesn't occur. Note that DCC_DECOMPRESS implicitly
decompresses FMASK.
This fixes an issue on GFX10-GFX10.3 which is uncovered by enabling
VK_EXT_sample_locations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33639>
FMASK_DECOMPRESS also implies FAST_CLEAR_ELIMINATE, so it can run first.
The only exception is fast-clear for color images that have DCC and
FMASK but without comp-to-single (only GFX10) because FMASK_DECOMPRESS
can't eliminate DCC fast-clears.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33639>
comp-to-single supports MSAA since a while and it's useless to perform
a fast clear eliminate for these fast color clears.
Only GFX10-GFX10.3 are affected because these are the only GPUs that
support DCC with MSAA with FMASK.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33639>
Mesa is built twice in the same debian-android job, once for aarch64 and
once for x86_64 to catch as many build regressions as possible.
However the install dir used for the two builds is the same, and this
results in a mix of aarch64 and x86_64 artifacts ending up in
install.tar, because .gitlab-ci/prepare-artifacts.sh is called at the
end of the second build.
Having two separate jobs for aarch64 and x86_64 build would be cleaner
but it would also use more resources.
Since the aarch64 libraries are not used for anything for now, a cheaper
workaround is to build x86_64 first and just call prepare-artifacts.sh
after first build.
This way the aarch64 build will still be done to catch regressions, but
the artifacts won't end up in install.tar which is also more consistent
with the fact that S3_ARTIFACT_NAME only has x86_64 in the name
(mesa-x86_64-android-${BUILDTYPE}).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34234>
We were never setting has_multiview. It's not actually necessary anyway,
since we can just do the optimization we were trying to do whenever
num_views is 1 instead.
This doesn't affect the actual fragment size, which was already correct,
only gl_FragSizeEXT.
Fixes: 6f2be52487 ("tu, ir3: Handle FDM shader builtins")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33991>
When bin merging we maximize instead of minimize the VSC pipe size,
which means that we fail when there are too many pipes instead of when
the pipes are too large. This means that we need to calculate
binning_possible differently, and we need to skip
tu_tiling_config_update_pipes() when binning is impossible because
otherwise we will write out-of-bounds.
Fixes: 3fdaad0948 ("tu: Implement bin merging for fragment density map")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34196>
Most implementation defined behaviors have been properly defined via
new maintenance extensions, and the mapping support is the only one
left on the table...in a very unfortunate way for historical reasons.
Meanwhile, update the host driver list for the latest that has been
tested working. Two major KVM improvements have also been documented
along with the setups that rely on those.
Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34225>
Previously we were printing this information whenever the driver
started, but that proved to noisy.
For example, if running thousands of tests, this would cause thousands
of warnings messages to be printed. (Assuming the driver was built in
debug mode.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34243>
We will move this check into the `intel_dev_info` tool. Unfortunately,
this means we will be much less likely to notice inconsistencies, but
the current strategy has proven to be far too noisy.
For example, if the driver was built in debug mode, then when test
suites are running thousands of tests, the current approach can lead
to thousands of messages being printed.
Closes: mesa/mesa#12141
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34243>
Previously we'd burn 4 minutes waiting for curl to bang its head
trying to get results which would never emerge. Change this so we don't
fail on a 404, and also pull the retry down quicker, so we'll spend a
maximum of 2 minutes waiting for a server to come back, and no minutes
when the results aren't there to be fetched.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34244>
The new s3-proxy we have on fd.o is just a redirecting agent: it queries
the policy and, if allowed, returns a URL with temporary credentials
where the client can pull the real data from.
This means that nginx would see /fdo-opa/mesa-rootfs/misc?key=foo and
/fdo-opa/mesa-rootfs/misc?key=bar as distinct objects, not knowing that
it could cache only on the base URL.
Strip the query params from the nginx cache key so we can actually do
caching.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34244>
RenderDoc has a hard-coded limit of 16B for descriptor captures and
we're currently burning 24B. Reducing is pretty easy, though, since
storage doesn't support multi-plane. Any storage image views will use
VK_IMAGE_ASPECT_PLANE_N_BIT.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34128>
We detect whenever a method hasn't changed from one generation to the
next and just `pub use` the older generation's method struct and any
enum types associated. This keeps each mthd module independently usable
with all necessary types while reducing the number of unique Rust types
and associated trait implementations generated. This drops the size of
libnvidia_headers.rlib from 84 MiB to 22 MiB and will hopefully make
Rust code a little less expensive to build.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34136>
This adds proper Method and Field classes and uses them instead of
the pile of dictionaries we had before. This is probably faster and
definitely more readable. I've verified with cl9097.h that this makes
no differenct to the generated C or Rust besides whitespace.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34136>
The low-end Turing cards (TU117 for sure) are SM73, not SM75. These are
the cards on which we have tensor cores but they're so slow as to be
almost useless. We're not back-porting this because nouveau currently
returns 75 for these cards and because Volta binaries work fine on
Turing so they'll just be a little slower. We have, however, seen this
advertised by NVRM so we want our handling of shader models to be
correct.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34201>
It's possible that the source is uniform but the destination is not. In
this case, we need to insert a copy or else we might accidentally
propagate a uniform into some place we don't expect it.
This fixes a bunch of fp64 KHR-Single-GL46.subgroups.arithmetic.* tests.
Fixes: d09d3f5246 ("nak/from_nir: Emit uniform instructions when !divergent")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34201>
Currently it is not possible to mmap() the exported dma-bufs from etnaviv
for writing, through the GBM APIs, such as gbm_bo_get_fd(). etna_bo_dmabuf()
calls drmPrimeHandleToFD() only with DRM_CLOEXEC flag, omitting DRM_RDWR.
A typical call sequence, ending in etna_bo_dmabuf, for illustration:
gbm_bo_get_fd -> gbm_dri_bo_get_fd -> dri2_query_image ->
dri2_query_image_by_resource_handle -> etna_resource_get_handle
-> etna_bo_dmabuf.
Signed-off-by: Nikolas Zimmermann <nzimmermann@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34213>
In 35df3925ca ("brw: ensure VUE header writes in HS/DS/GS stages") I
misread the PRMs and thought that the VF would initialize the header.
What actually happens is that the VF does not write valid values in
there and the PRMs explicitly say that the VS shader should overwrite
whatever is in there.
We could avoid writing the header in some cases when no HW is going to
read back the header. For example with rendering disables through
3DSTATE_STREAMOUT::RenderingDisable. But those cases are dynamic and
the compiler is not able to tell. So just always write the header.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 35df3925ca ("brw: ensure VUE header writes in HS/DS/GS stages")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12880
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34211>
GL functions were exported by luck because we incorrectly used
link_with in meson, which exports functions only if the static lib is
used. link_whole guarantees that the functions are always exported
even if the static lib is unused.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34002>
- include the generated header directly instead of including
MAPI_ABI_HEADER
- remove the MAPI_MODE_BRIDGE macro
- must move code into .c files since we don't have the macros anymore
- clean up #includes
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34002>
Now the key consists of format + high order index from different pNext
chain structs. To be noted. we still don't want to cache modifiers to
avoid preparing dynamic length internal storage for per format
modifiers. The good thing is that most of the queries against modifier
support are one-time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34207>
Xe3 different SKUs can have different max_subslices_per_slice and
Xe KMD topology uAPI only provide us the available subslices.
Therefore, to correctly calculate the available slices, we need
max_subslices_per_slice to match the hardware.
This change retrieves this information from hwconfig for Xe3+.
This avoids adding all the PTL intel_device_info variants.
Additionally, the PTL topology values are currently embargoed and
cannot be hard-coded in public source code.
This could be simplified if we decide to apply max_slices and
max_subslices_per_slice to all platforms that hwconfig is required.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33328>
I've also included the CI infrastructure used by each farm, in case an
issue is affecting all farms of a kind, like I don't know, a bug in the
http proxy used by lava & baremetal farms :]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34195>
../src/asahi/compiler/agx_nir_lower_address.c:82:30: runtime error: passing zero to ctz(), which is not a valid argument
#0 0x56175dca5684 in pass ../src/asahi/compiler/agx_nir_lower_address.c:82
#1 0x56175dca2eda in nir_function_intrinsics_pass ../src/compiler/nir/nir_builder.h:164
#2 0x56175dca308c in nir_shader_intrinsics_pass ../src/compiler/nir/nir_builder.h:191
#3 0x56175dca68ae in agx_nir_lower_address ../src/asahi/compiler/agx_nir_lower_address.c:167
#4 0x56175dc885c0 in agx_optimize_nir ../src/asahi/compiler/agx_compile.c:3063
#5 0x56175dc9650d in agx_compile_shader_nir ../src/asahi/compiler/agx_compile.c:3827
#6 0x56175dc52148 in main ../src/asahi/clc/asahi_clc.c:359
#7 0x7fdf1c343249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#8 0x7fdf1c343304 in __libc_start_main_impl ../csu/libc-start.c:360
#9 0x56175dc44760 in _start (/builds/mesa/mesa/_build/src/asahi/clc/asahi_clc+0xbd0760)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33685>
Different versions of Android might have different ways of getting the
GLES runtime version, so factor this out to a function, so that the
mechanism can be changed in a centralized way.
Also rename MESA_RUNTIME_VERSION to GLES_RUNTIME_VERSION because this is
really what is being retrieved, in the future we might have a similar
check for the vulkan Mesa driver.
While at it remove mentions of SurfaceFlinger in some comments since the
mechanism to retrieve the versions is irrelevant for the purposes of the
checks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34111>
The intel vulkan driver is always built by the `debian-android` job,
since it may be needed for some future job, push it unconditionally, it
does not hurt to have it in the Android system.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34111>
Since ANGLE is always built for Android, always push it even if it is
not going to be tested directly.
Besides simplifying the script, this is also because ANGLE is going to
be mandatory anyway starting from Android 15+ and not having it in the
Android system might cause unexpected failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34111>
Remove old mesa and ANGLE libraries before pushing new ones, and do this
using a trailing wildcard, because some versions of Android might have
versioned libraries like /vendor/lib64/egl/libEGL_mesa.so.1 which should
also be removed to avoid any confusion when loading the freshly pushed
ones.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34111>
Some of the commands in cuttlefish-runner.sh, like updating mesa and
ANGLE, are not specific to cuttlefish, in general they can be executed
on any Android device under test.
So split those commands out of cuttlefish-runner.sh and put them into an
android-runner.sh script.
For example, when testing a physical Android device instead of a virtual
device, a mesa-ci job will call android-runner.sh directly instead of
cuttlefish-runner.sh
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34111>
The color format needs to be compatible with depth or stencil. Also
the depth/stencil format was incorrect when it's the source.
Fixes dEQP-VK.api.ds_color_copy.*
and VKD3D_TEST_FILTER=test_copy_texture.
Fixes: d4ff011b12 ("radv: advertise VK_KHR_maintenance8")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34142>
Our QA team extensively tested Helldivers 2 on AMD RX 7800 XT/RX 7600
with many different presents and didn't get any GPU hangs. Few users
also reported the game being very stable without this workaround.
Few other users reported issues with the workaround itself (like
pstate not correctly restored etc), so let's remove it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34164>
Need to properly fill layered api properties while adjusting query
wrapping based on maint7 feature enablement. Venus has to conditionally
advertise maint7 support only when renderer side vk is 1.2 or supports
VK_KHR_driver_properties.
Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33960>
Previously, when rebuilding LLVM for Android, the script would delete the
freshly built LLVM install. This caused the android_build container to
lack LLVM unless the container was rebuilt again, this time without
rebuilding LLVM, and instead just downloading the tarball.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34180>
Ninja was already included in the android_build container, so there's no
need to put it in the Ephemeral packages, which only meant that it was
removed at the end of the LLVM build.
Also, add the usual image tag header at the top of
build-android-x86_64-llvm.sh.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34180>
We can't do earlyz_with_discard when using fbfetch as we can have TLB reads
that occur before the discard. This can result in implicit Z writes which make
the setmsf instruction emitted as a result of the discard invalid.
Fixes: 332b313547 ("v3d: enable framebuffer fetch")
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34151>
Starting from VCN 2+ (ie. RDNA1+), video encode/decode extensions are
enabled by default if the firwmares are up-to-date.
GFX6-9 firmwares will probably never be fixed and video extensions will
remain experimental because it won't be possible to pass VKCTS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34144>
pipe_surface_width and pipe_surface_height helpers actually need
a pipe surface with a texture attached instead of just passing
directly the pipe_surface we get from the st.
The compressed formats are still broken unfortunatelly, so add them
to CI fails for now.
Partial fix for: 9d359c6d10
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34087>
The damage region can be useful to optimize the "resolve" step that we have on
imx6q (GC2000) because there isn't any tiling compatible with both render and
scanout or an any GPU when scanning out a linear buffer since we don't support
linear PE.
This improves fps for e.g `weston-simple-egl` by factor 2 (~30 fps -> ~60 fps).
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33226>
When updating delays, we'd update all dst regs based on reg_elems.
However, when wrmask has gaps, this would update delays for regs that
aren't actually written. Fix this by skipping regs for which the
corresponding wrmask bit is zero.
Note that this wasn't just a performance issue but could result in
illegal code because the delay is reset to zero for tex/sfu
instructions. For example, the following (post-legalization) code was
observed in the wild:
(rpt1)add.f r1.w, (r)r2.w, (r)c3.z
sam.base0 (f32)(w)r2.x, r3.y, s#0, t#1
rcp r2.x, r2.x
Here, the add would result in a required delay for r2.x which would then
be cleared by the sam (even though it doesn't write to it), resulting in
insufficient delay before the rcp.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 61b2bd861f ("ir3: Rewrite nop insertion")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34107>
The call looks up a Program and creates it if it doesn't
already exist. However we weren't locking the hash between looking
up the name and adding it to the hash so it could be possible
another thread also generated the same name.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: 842c91300f ("mesa: enable GL name reuse by default for all drivers except virgl")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34091>
The call looks up an ATIShader and creates it if it doesn't
already exist. However we weren't locking the hash between looking
up the name and adding it to the hash so it could be possible
another thread also generated the same name.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: 842c91300f ("mesa: enable GL name reuse by default for all drivers except virgl")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34091>
The calls look up a renderbuffer and create it if it doesn't
already exist. However they weren't locking the hash between looking
up the name and adding it to the hash so it could be possible
another thread also generated the same name.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: 842c91300f ("mesa: enable GL name reuse by default for all drivers except virgl")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34091>
The calls look up a framebuffer and create it if it doesn't
already exist. However they weren't locking the hash between looking
up the name and adding it to the hash so it could be possible
another thread also generated the same name.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: 842c91300f ("mesa: enable GL name reuse by default for all drivers except virgl")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34091>
The calls look up a texture object and create it if it doesn't
already exist. However they weren't locking the hash between looking
up the name and adding it to the hash so it could be possible
another thread also generated the same name.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: 842c9130 ("mesa: enable GL name reuse by default for all drivers except virgl")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34091>
Allow us to parse instructions in a form we currently generate
```
dpas.8x8(8) g55<1>F g47<1,1,0>F g31<1,1,0>HF g39<1,1,0>HF { align1 WE_all 1Q $4 };
```
Regions are not really needed, but this will be handled in a later patch
(that will also stop printing the regions).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34031>
The spec allows to create aliased disjoint image for a specific plane of
a multi-planar image, and the format can be R8. When querying memory
requirement of such image, VkImagePlaneMemoryRequirementsInfo is not
required to be chained although it has the disjoint bit.
This change fixes to look for aspect info from plane memory info only
when that's chained. The implementation can be passive here as the spec
VU has sufficient guarantees for the validity around. See below VU for
details:
- VUID-VkImageMemoryRequirementsInfo2-image-01589
- VUID-VkImageMemoryRequirementsInfo2-image-01590
- VUID-VkImageMemoryRequirementsInfo2-image-02279
- VUID-VkImageMemoryRequirementsInfo2-image-02280
Meanwhile, the existing disjoint check for size info is kept as is for
the special handling of VK_FORMAT_D32_SFLOAT_S8_UINT.
Test: dEQP-VK.ycbcr.plane_view.memory_alias.* pass with venus-on-panvk
Fixes: 412c286331 ("panvk: Enable multiplane images and image views")
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34134>
house protocols
- Remove some duplicate definitions (replaced with virgl_hw.h include,
which is also represented in gfxstream host code)
- Also removed the capset_ids from virtgpu_gfxstream_protocol.h. They
aren't needed to build guest-side driver, and are planned to be merged
to virtgpu_drm.h
Reviewed-By: Gurchetan Singh <gurchetansingh@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34116>
Per Vulkan spec 7.9. Host Write Ordering Guarantees, queue submission
commands automatically perform a domain operation from host to device
for all writes performed before the command executes. That is to say,
host updates to the mappings can occur after the end of the command
recording and must be flushed implicitly at submission boundary.
Before this change, necessary cache flushes could be missed once the
app starts reusing pre-recorded command buffers. e.g. a simple buffer
copy cmd while the app only updates the source buffer mapping in
different submissions. This changes backs out most of the current
version of cache flush reduction while still assigning LATEST_FLUSH_ID
to at least the final batch itself. This aligns with panfrost_batch
submit behavior on the gallium side.
Test: dEQP-VK.synchronization*.timeline_semaphore.* pass w/o flakiness
via venus-on-panvk
Fixes: 28e4d22497 ("panvk/csf: Pass a non-zero flush-id to benefit from cache flush reduction")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34093>
If the last owner thread has just unset the alive status and released
the watchdog, the new owner thread could have acquired to abort
unexpectedly if the ownership transfer occurs right before the next
owner's warn order. So we must set watchdog alive for new owner so that
it can properly check ring alive status in the next warn order.
Cc: mesa-stable
Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34135>
That tag was supposed to allow these jobs to run faster, but these
runners are currently having disk issues, and the normal runners look
like they're plenty fast enough (at least right now since almost nobody
runs ci jobs ^^).
We might revert this later, but for now let's merge this to unblock CI.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34120>
The current `s3cp` implementation does not work anymore after the
migration, and instead of fixing it and propagating the fix down to us,
it's simpler to directly use `curl`.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34120>
Register spilling can cause us to require thread local storage (tls).
However, we were not adjusting the tls stack size space to account for
the tls needed for the extra xfb shader when transform feedback is
needed. We noticed this when testing register allocation in the
OpenGL CTS (for testing we had forced spilling where none happened
before).
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33935>
Both GC7000 GPUs have the RA_WRITES_DEPTH feature, which needs a bit
more prodding to have valid fragcoord.zw components present in the
shader. This has been fixed by the previous commit, so we can remove
the related fails from the CI expectation.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34081>
On GPUs with the RA_WRITE_DEPTH feature, passing Z and/or W values
to SH can be gated. It doesn't have any impact on performance, so
maybe it's just to be able to free those register slots for other,
currently unknown, values. For now simply enable passing both Z and
W to SH unconditionally to make those GPUs behave like the ones
without the RA_WRITE_DEPTH feature.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34081>
This currently treats coarse and fine derivatives the same, but Qualcomm
needs to know whether just coarse derivatives are used or fine
derivatives/quad ops are also used. Rename this to
needs_coarse_quad_helper_invocations make clear the difference from the
new field, needs_full_quad_helper_invocations.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: 264d8a6766 ("ir3: Set need_full_quad depending on info.fs.require_full_quads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33862>
We no longer need finalize_nir and thus we don't need to support
texcoord as well. This is a nice rs state cleanup.
This effectivelly reverts commits
0ac6801970 and
d4b8e8a481. Also import the previous
location fixup from the state tracker, which was removed when the
unconditional nir_opt_varying pass was introduced.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33961>
This was added so we could report compile failures. Since we can
now just do that simply from create_vs/fs_state there is no need
for finalize_nir anymore.
Move the optimization loop to the beginning of create_vs/fs_state.
This could be probably optimized a bit more, but right now there
should be no functional change, we can improve the pass order later.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33961>
The register is usually a few dwords ahead of the memory value used by
the kernel, which can lead to an inaccurate calculation of where the SQE
is.
To compensate for the more accurate rptr, increase the lookback
slightly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34063>
For RB, it's not convenient to use a gpuaddr because of how the GPU
addresses wrap around. Instead pass the host address to the renamed
highlight_addr(), so that we can use it directly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34063>
This is needed for SPIR-V 1.6 support in OpenCL. This capability enables
the Uniform and UniformId decorations which prior were Shader only.
The CTS ends up using those decorations on function arguments, but we can
just ignore handling them there for now.
Fixes the spirv16_uniformdecoration_uniform and
spirv16_uniformdecoration_uniformid CL CTS test inside test_spirv_new.
Fixes: bb6d371c0e ("rusticl: support SPIR-V 1.5 and 1.6")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34004>
It seems the hardware requires a dummy PS state with a noop FS,
otherwise it might just hang. This used to work just fine on older
gens.
Note that RadeonSI refuses to draw if VS or PS is missing and AMDVLK
seems to also always emit this state. So, this might be a bug that AMD
didn't encounter at all.
This fixes a GPU hang during loading with Ghostwire: Tokyo.
Backport-to: 25.0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34070>
This change began by fixing an old regression related to the dceil
functionality. This issue affected palm. Now, this change adjusts
the software fp64 support to make it fully operational.
This change was tested on palm and barts. This change fixes 561
"piglit run all" tests. The khr_gl tests are fixed as well (243 tests).
Here is a summary:
spec/arb_gpu_shader_fp64/execution/built-in-functions/*
spec/arb_gpu_shader_fp64/execution/fs-isnan-dvec: fail pass
spec/arb_gpu_shader_fp64/execution/gs-isnan-dvec: fail pass
spec/arb_gpu_shader_fp64/execution/vs-isnan-dvec: fail pass
spec/glsl-4.00/execution/built-in-functions/*
spec/glsl-4.10/execution/conversion/*
khr-gl4[3-5]/compute_shader/fp64-case1: fail pass
khr-gl4[0-5]/gpu_shader_fp64/builtin/*
Fixes: aed6a39c10 ("glsl: Retire dround lowering.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33708>
The key change is to use a builder to write the expected shader result
and compare that. To make this less error prone, a few helper functions
were added
- a way to allocate VGRFs from both shaders in parallel, that way the
same brw_reg can be used in both of them;
- assertions that a pass will make progress or not, and proper output
when the unexpected happens;
- use a common brw_shader_pass_test class so to collect some of the helpers;
- make some helpers work directly with builder.
The idea is to improve the signal in tests, so that the disasm comments
are not necessary anymore. For example
```
TEST_F(saturate_propagation_test, basic)
{
brw_reg dst1 = bld.vgrf(BRW_TYPE_F);
brw_reg src0 = bld.vgrf(BRW_TYPE_F);
brw_reg src1 = bld.vgrf(BRW_TYPE_F);
brw_reg dst0 = bld.ADD(src0, src1);
set_saturate(true, bld.MOV(dst1, dst0));
/* = Before =
*
* 0: add(16) dst0 src0 src1
* 1: mov.sat(16) dst1 dst0
*
* = After =
* 0: add.sat(16) dst0 src0 src1
* 1: mov(16) dst1 dst0
*/
brw_calculate_cfg(*v);
bblock_t *block0 = v->cfg->blocks[0];
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
EXPECT_TRUE(saturate_propagation(v));
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
EXPECT_EQ(BRW_OPCODE_ADD, instruction(block0, 0)->opcode);
EXPECT_TRUE(instruction(block0, 0)->saturate);
EXPECT_EQ(BRW_OPCODE_MOV, instruction(block0, 1)->opcode);
EXPECT_FALSE(instruction(block0, 1)->saturate);
}
```
becomes
```
TEST_F(saturate_propagation_test, basic)
{
brw_builder bld = make_shader(MESA_SHADER_FRAGMENT, 16);
brw_builder exp = make_shader(MESA_SHADER_FRAGMENT, 16);
brw_reg dst0 = vgrf(bld, exp, BRW_TYPE_F);
brw_reg dst1 = vgrf(bld, exp, BRW_TYPE_F);
brw_reg src0 = vgrf(bld, exp, BRW_TYPE_F);
brw_reg src1 = vgrf(bld, exp, BRW_TYPE_F);
bld.ADD(dst0, src0, src1);
bld.MOV(dst1, dst0)->saturate = true;
EXPECT_PROGRESS(brw_opt_saturate_propagation, bld);
exp.ADD(dst0, src0, src1)->saturate = true;
exp.MOV(dst1, dst0);
EXPECT_SHADERS_MATCH(bld, exp);
}
```
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33936>
While nullity of the CheckedPtr object is checked, writing to a raw
pointer safely requires that several other invariants be satisfied, so
it should be marked as unsafe to reflect that.
v2: reorder commits for cherry-picking and remove alignment check
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33989>
std::slice::from_raw_parts requires that the slice pointer be non-null,
even when the slice contains zero elements. Failing this invariant is
undefined behavior.
v2: reordered commits to allow cherry-picking bugfixes
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33989>
clGetSupportedImageFormats will write as many supported formats as are
discovered at present, regardless of the value of num_image_formats.
This could result in writing out-of-bounds memory.
v2: reordered commits to allow cherry-picking bugfixes
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33989>
This test already passed when executed standalone, but hit a issue triggered
by switching between fast and slow ZS clears when executed together with other
dEQP tests. This issue has been fixed by the previous commit, so we can drop
the fail from the CI expectation now.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34029>
When a slow/fast/slow clear sequence is executed on a surface, the second
slow clear will not regenerate the clear command if the clear value of the
fast clear is the same as the one used for the second slow clear, as the
current stored surface clear value is the same as the new clear value.
The command generated on the first slow clear however may have used a
different clear value, which is now submitted unchanged to the hardware on
the second slow clear.
Fix this by only generating the clear command if there is no valid one
already. If we already have a valid clear command simply update the fill
value in that command with the new clear value. This has some marginal
overhead, but has been chosen over the alternative of adding more state by
remembering the last slow clear value.
Cc: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34029>
The problem occurs with a series of instructions build the subgroup
invocation value :
mov(8) g23<1>UW 0x76543210V
add(8) g23.8<1>UW g23<8,8,1>UW 0x0008UW
add(16) g23.16<1>UW g23<16,16,1>UW 0x0010UW
Our register spilling code operates on physical registers (64B on
Xe2+) and using the brw_inst::is_partial_write() helper only considers
32B registers. So the spiller doesn't see that the add(16) instruction
is doing a partial write and ends up discarding the previous value.
You can reproduce the issue by running a test like :
INTEL_DEBUG=spill_fs ./deqp-vk -n dEQP-VK.compute.pipeline.cooperative_matrix.khr_a.subgroupscope.constant.uint8_uint8.buffer.rowmajor.linear
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: aa494cbacf ("brw: align spilling offsets to physical register sizes")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33642>
Cayman has a non-deterministic behavior issue which is
visible with the test below (arb_shader_image_size).
The tests fail randomly at the "fragment" test category.
Anyway, if the "compute" category is removed, the same
tests are working flawlessly.
The "compute" part of the driver was interfering with the
graphic pipeline. The culprit is the packet PKT3_DEALLOC_STATE
which puts the gpu in an incorrect state to perform some
graphic operations.
This change fixes this problem by issuing a PKT3_CLEAR_STATE
packet just after the PKT3_SURFACE_SYNC packet. As explained
by d51dbe048a PKT3_DEALLOC_STATE is mandatory on cayman to
avoid a gpu hang at the PKT3_SURFACE_SYNC stage.
This correction makes tests like
"spec@glsl-4.30@execution@built-in-functions@cs-.*" to pass
in an utterly deterministic way without random failures.
This change removes around 500 random failures for a
"piglit run all".
For instance, this issue is triggered on cayman with
"piglit/bin/arb_shader_image_size-builtin -auto -fbo".
Fixes: d51dbe048a ("r600g/compute: Emit DEALLOC_STATE on cayman after dispatching a compute shader.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33973>
Venus can only enable VK_EXT_display_control after using common vk_sync,
unless we add new layered implementation in common. Like how I replaced
the common android present impl, but no bandwidth at this point.
Fixes: 89ec6c4d8f ("venus: add a few more trivial extensions")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34037>
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.
This fixes a GPU hang when launching Enshrouded on GFX1201.
No fossils db changes on GFX1201.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027>
The premise of the change was wrong: the case where the defs analysis
was required was rare and requiring the analysis inside just the
case we care was being used for another analysis too. So for now,
the change doesn't really helps. I'll revisit this whole pass later on.
This backs out commit 6e19215810.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34039>
A lot of people (including me) misinterpreted the conformanceVersion
field for so long. The Vulkan spec wasn't very clear either but it's
going to be clarified soon.
VkConformanceVersion is actually unrelated to the official CTS
conformance process in Khronos. It just reports the latest CTS version
that the driver can pass, not more.
For GFX8+, RADV should be passing CTS 1.4.0.0 on all GPUs because we
validated this CTS version recently for Vulkan 1.4.
For GFX6-7, which only suppports Vulkan 1.3, RADV should also be
passing CTS 1.4.0.0, because newer versions of the CTS can be used
to validate a driver against an older version of the spec, so
it's perfectly fine to report a higher CTS version than the Vulkan version.
Newer CTS versions likely can't pass 100% due to a DGC bug that I still
need to fix.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12799
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34018>
Instead of calling `require()` every instruction, call it once per pass.
Even though the defs are cached (i.e. we are not re-calculating them every
instruction), this prevents the extra check and the call to analysis
validation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34010>
Otherwise, the size of the EGLSurface and the drawable may get out of
sync if kopper needs to re-create the swapchain at a different size.
This can cause problems with things like eglSetDamageRegionKHR() where
the core EGL code clamps them to the size in the EGLSurface.
With Wayland, it's up to the client to choose a size and resize by
creating a new EGLSurface with a different size. Only on X11 can we
get a resize side-band like this.
Normally, without kopper, this goes the other direction where the X11
EGL code will detect a surface size change in dri2_x11_query_surface()
and it invalidates the drawable if they've changed, forcing
re-allocation. Kopper, however, works more like the DRI2 path where we
just get handed buffers at some size decided by X11 and have to deal
with them. In the DRI2 path, the size is unconditionally updated by
dri2_x11_get_buffers(). This is roughly equivalent, updating the size
right after every call to kopperSwapBuffers().
Fixes: 8ade5588e3 ("zink: add kopper api")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12797
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34015>
- brw_lower_3src_null_dest: Allocating a new destination, so include
INSTRUCTION_DATA_FLOW class.
- brw_lower_alu_restriction: Removing instruction, so include
INSTRUCTION_IDENTITY. No details are changed so remove
INSTRUCTION_DETAIL.
- brw_lower_vgrfs_to_fixed_grfs: Changing source and destination
numbers, so include INSTRUCTION_DETAIL.
- brw_lower_send_gather: Insert new instructions (scalar register) and
change sources and other information on existing ones. So include
INSTRUCTION_DETAIL and INSTRUCTION_IDENTITY. Promote to INSTRUCTIONS.
- brw_opt_eliminate_find_live_channel: Can change source, so include
INSTRUCTION_DATA_FLOW.
- brw_opt_copy_propagation_defs and brw_opt_cse_defs: Both can remove
instructions, so include INSTRUCTION_IDENTITY. Promote to
INSTRUCTIONS.
- brw_opt_saturate_propagation: Instruction can have `sat` modified,
and operands can have type modified, so include INSTRUCTION_DETAIL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33993>
Replace global path variables with ProjectPaths dataclass
- Add explicit file existence check before loading YAML
- Enhance tag retrieval by checking environment variables first
- Add logging for better debugging of tag selection process
- Remove redundant file existence check in main function
- Improve error handling for missing conditional tags file
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33863>
This reverts commit 1cc2c738bb
We originally changed some aliases into functions so scripts could use
them without needing to be sourced, keeping the environment cleaner.
However, this broke `x_off`, which is supposed to stop debug logs
(xtrace output) from showing in the console. The function version still
triggered xtrace before disabling it, while the alias correctly
redirected the logs to `/dev/null`.
It also fixes the `bin/ci/update_tag.py` script to be able to reuse the
aliases via double sourcing the setup-test-env.sh and the respective
build script.
Reported-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33863>
Venus query records have been properly propagated from nested cmds
already, so no special care is needed here for qfb optimizations.
Test:
- dEQP-VK.api.command_buffers.*nested*
- dEQP-VK.conditional_rendering.*nested*
- dEQP-VK.draw.dynamic_rendering.nested_*
- dEQP-VK.multiview.*nested*
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33992>
This matches Intel driver. Chromium always sets RT format to YUV420
which would cause us to not report other formats as supported.
Only check that the RT format is actually supported when creating
config, but don't limit supported surface formats.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34001>
This allows the exported fds to be mapped for writing. This is needed
for virtgpu native ctx support where the fds are mapped rw when the
mappings are added to the guest by kvm. This aligns with other mesa
drivers, and unblocks the extended testing with venus on top.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34017>
We also want to run Android CTS in the Android jobs.
Since the Android CTS is quite large, download it and strip it down to
only contain the interesting tests, so to reduce the space taken in the
container image.
Eventually we might want to have android-cts be run via deqp-runner
itself, but for now add a proof-of-concept mechanism which calls the
android-cts directly and uses an ad-hoc handling of expectations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33499>
To run deqp-runner in cuttlefish we do something similar to
deqp-runner.sh but adapted to be used on Android via adb.
Isolate those adapted commands in an android-deqp-runner.sh script so
that in the future it will be easier to compare with deqp-runner.sh and
evaluate if deqp-runner.sh and android-deqp-runner.sh could be possibly
consolidated into a single script.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33499>
In .gitlab-ci/cuttlefish-runner.sh 32bit libraries were removed but they
were not being replaced with newer ones, however this caused some
problems because by default the x86_64 target in AOSP is still
multi-library and for example the 32bit zygote process ended up crashing
because of the missing 32bit libraries, causing a general system
instability.
Since the CI is only building 64bit libraries for the android target,
use an x86_64_only cuttlefish product which only has components and
libraries built for the 64bit target, this avoids dealing with 32bit
EGL/Vulkan libraries at all, preventing any possible cause of
instability.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33499>
Disable the modem simulator in cuttlefish, it is not needed for testing
the graphics subsystem and avoids opening a few vsock ports which
reduces the chance of collisions in case of multiple instances of
cuttlefish running concurrently.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33499>
Cleaning up and stopping cuttlefish before launching it is not strictly
necessary when using gitlab shared runners.
This can be added back later when we have a better justification.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33499>
A significant CPU performance bottleneck in mesa GL is refcounting atomics:
even with the current pinning attempts, eliminating them can yield huge
performance gains (easily verified by running drawoverhead with return false at the top of pipe_reference_described()).
This is a proof of concept for removing refcounts from gallium objects,
namely sampler views and resources. Sampler views were smaller in scope,
so I started there. This MR alone is not expected to noticeably affect
performance, though if applied to all drivers/frontends,
it would enable a bunch of code deletion for crazy samplerview
refcounting hacks currently used to try circumventing the existing overhead.
Co-authored-by: Karol Herbst <kherbst@redhat.com
Co-authored-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33813>
When translating atomic instructions, the base type of the imageView can be
float only for image_atomic_exchange. If a float type image is used with other
atomic instructions the results are undefined.
Enforce this check in the shader translator and don't emit any instruction if
it fails.
Fixes crash in piglit test arb_shader_image_load_store@invalid.
Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33749>
The nir to tgsi translator flattens all constants in the nir shader into uint32
immediates. In the svga driver, the vgpu10 shader translator then packs all
these immediates into a constant buffer, and also optimizes it to prevent
repetitions by only emitting a 32-bit constant once.
This can cause problems with double sized constants, since either the lower or
higher 32-bits of different 64-bit constant can be identical, and in the constant
buffer that repeating 32-bit value will be emitted only once, so a 64-bit
constant gets split into two non-contiguous 32-bit values.
When this 64-bit constant is then invoked by a double instruction live ddiv or
dmul, the source register can now have invalid swizzles like .xz or .xw since
its 32-bit components are not contiguous.
We have seen this happen in the piglit test -
spec@arb_gpu_shader_fp64@execution@glsl-fs-loop-unroll-mul-fp64
which emits invalid swizzle values for double instructions.
To fix this, introduce a new option in nir to tgsi shader translator that
preserves uint64 constants. When a 64-bit immediate is translated into svga
shader code, its 32-bit components are contiguous and aligned in the constant
buffer, so accessing them only emits valid swizzles .xy and .zw.
Other drivers using the nir to tgsi shader translater should not see any change
in the tgsi shader emitted unless they too explicitly invoke the
keep_double_immediates option like svga.
Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33749>
During translation of tgsi shaders to vgpu10 shader code that is sent to
svga device, we may get as input a double instruction with incorrect
swizzles such as xzxz. In this case we have a workaround to move the value
in that register to a temporary register with an xyzw swizzle.
However the functions that check if the instruction has double source or
destination did not check for all instructions, such as DDIV, so if
incorrect swizzles are sent in the shader tgsi code then the same
incorrect swizzle is also emitted in the vgpu10 shader code.
Fix this by adding all the double instructions in double checking
functions.
Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33749>
The extra `all_free` that now remains in the list will be ignored in the
loop below anyway so there is no need to have complex code to try to
remove it.
This also means it becomes possible to set things like
`-D video-codecs=all_free,vc1dec`.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33949>
- Fix the checks for emulation (based on presence of the extension
on the host)
- Add flag in gfxstream_vk_physical_device, otherwise the real device
extensions are not properly filtered when communicating with the host.
- The "function" version of the check in ResourceTracker can eventually
just check the flag once mesa and gfxstream objects are combined
- Remove the duplicate getPhysicalDeviceFormatProperties2 impl, this is
covered by the ResourceTracker impl
- Add ResourceTracker impl for getImageDrmFormatModifierPropertiesEXT
- Remove isDmaBufImage flag from VkImage_info, and clean up all the code
associated with this flag. In on_vkCreateImage, all required info is
avaialble from the extMemImageCi::handleType. In on_vkAllocateMemory,
this is all associated with the tiling of the dedicatedImage for the
allocation
Reviewed-By: Gurchetan Singh <gurchetansingh@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33944>
With secondary command buffers, inherited rendering can be used but
it's basically impossible to know if the depth/stencil attachment
enabled HiZ/HiS. But it's required to disable WALK_ALIGN8 to avoid
GPU hangs.
This assumes that HiZ/HiS is enabled for inherited rendering as long
as a depth/stencil attachment is used. It's not the most optimal
approach but it's not supposed to hurt either.
This fixes a GPU hang with
dEQP-VK.dynamic_rendering.primary_cmd_buff.basic.contents_secondary_cmdbuffers
and friends.
GFX1200 isn't affected because it doesn't support HiZ/HiS.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33986>
We currently report a deviceName as e.g. "Mali-G610 (Panfrost)", but
panfrost has nothing to di with the physical device, and the suffix
doesn't belong there at all.
So let's remove that suffix from PanVK. This results in output like this
from vulkaninfo:
---8<---
VkPhysicalDeviceProperties:
---------------------------
apiVersion = 1.1.305 (4198705)
driverVersion = 25.0.99 (104857699)
vendorID = 0x13b5
deviceID = 0xa8670000
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = Mali-G610
pipelineCacheUUID = <snip>
---8<---
We already sort of namedrop Panfrost in the driver properties:
---8<---
VkPhysicalDeviceDriverPropertiesKHR:
------------------------------------
driverID = DRIVER_ID_MESA_PANVK
driverName = panvk
driverInfo = Mesa 25.1.0-devel (git-136dd9f985)
conformanceVersion:
major = 1
minor = 4
subminor = 1
patch = 2
---8<---
While this might techically speaking be a regression, PanVK has been
marked as experimental until Mesa 25.0. But to reduce the risk of people
starting to depend on this behavior, let's also backport this change to
the 25.0 release.
The patch looks a bit funny, because we add the " (Panfrost)"-suffix in
common code, and this moves it to the Gallium driver. But effectively,
this means PanVK is the only driver that sees a change of behavior.
Backport-to: 25.0
Reviewed-by: John Anthony <john.anthony@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33972>
We have a common implementation for this, let's just use that.
Similar to the previous commit, this is a bit silly. But if we ever get
in a situation where VK_EXT_display actually makes sense, this stuff
should "just work", so let's enable it for good measure.
Tested-by: Alexandre ARNOUD <aarnoud@me.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33916>
It seems the common WSI code does all that's really needed here for us
already. Enabling this lets me run vkmark on PanVK.
This is a bit silly, because what actually happens here is that we end
up passing -1 as the display_fd to wsi_device_init(). This in turn leads
us to returning zero usable displays, which renders the extension
somewhat useless. But it is better than not supporting the extension, and
not supporting applications who have a hard depdendency on it fail, like
is the case with vkmark.
Tested-by: Alexandre ARNOUD <aarnoud@me.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33916>
We're currently exposing a bunch of extensions that requiring Vulkan
1.1, and we'll soon enough do the same for Vulkan 1.2. Instead of having
to update each of these extensions separately once we add new Vulkan
version support for some gens, let's use a single variable for this
instead.
And while we *could* query the exposed vulkan version and do this a bit
more "automatically", this makes it easy to leave some needless checks
behind if the baseline version changes. Leaving this as a arch check in
this function should make it a bit more obvious when the check can be
removed.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33971>
This was only needed on Sandybridge. We can delete the brw code,
and replace the generic devinfo bit with a helper inside the elk
compiler itself.
Thanks to Iván Briano for noticing we still had dead brw code for this.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
We were not using the minimum values from devinfo for anything. For
tessellation control, the minimum value is 0, so we continue taking
MAX2 of that with 1 when tessellation is enabled so we have at least
something guaranteed to be present. For geometry, the minimum value
is already non-zero (and updated by the previous patch).
This will have the side-effect of raising the minimum number of URB
entries for geometry stages. This is currently not known to fix
anything, but should be more closely following the documentation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
We've been programming our minimum number of URB entries for geometry
shaders to 2, but it appears that we should have been setting 8 on
Broadwell and later. Additionally, there's a workaround on Skylake
and later that requires us to add flushing (which we haven't) or use
a minimum of 16 URB entries.
This alone will not fix anything, as nothing reads this devinfo field
presently (will be fixed in the next commit).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
As we added new platforms, the device info macros evolved over time
Most platforms had a "FEATURES" macro, some had a "HW_INFO" macro,
a few had macros for URB entries - some with min entries only, some
with min and max, some including the .urb = { ... } braces, others not.
Thread counts or subslice info was sometimes considered FEATURES,
sometimes HW_INFO, sometimes inserted only in the final structure.
FEATURES macros often inherited from an ancestor platform, but not
necessarily the prior platform - many were based on GFX8_FEATURES.
Many redundantly set the same feature bits as prior platforms.
This patch aims to clean up the situation, so it's a little more
organized, especially if you look at multiple generations. Macros
are now split into several separate pieces:
1. The FEATURES macro only has architectural features, such as LSC,
ray tracing support, 64-bit integers, flat CCS, and so on. Thread
counts, subslice info, and URB sizes that may vary by SKU are not
included here. This makes it easy for one platform to inherit the
features from the previous, while not pulling in that extra data.
2. THREAD_COUNTS macros contain maximum thread counts from the
3DSTATE_VS documentation and so on.
3. URB_MIN_MAX_ENTRIES macros contain the entire URB configuration,
including .urb = { ... }.
4. PAT_ENTRIES macros (on modern platforms) contains our choice of which
PAT entries to use for various types of resources.
5. CONFIG macros combine all of the above into a tidy bundle for use
in defining various structures, and may also include the platform
macro or simulator ID for convenience.
On recent platforms where hwconfig tables exist, items #2-3 could
potentially be dropped and filled in from there instead. For XEHP+
where we require hwconfig, we instead have a PLACEHOLDER_THREADS_AND_URB
macro that makes it clear that these values are updated from hwconfig.
One nice thing is that the bits that could (or do) come from hwconfig
tables are now cleanly separate from those that do not (i.e. platform
feature support, PAT entry selection, and so on).
This patch does not touch GFX7 or earlier macros. We could probably
offer a similar treatment there, but they're generally working and not
quite as complex.
To verify that this commit does not have unintentional changes, I
recommend running
objdump -s build/src/intel/dev/libintel_dev.a.p/intel_device_info.c.o
before and after this commit, and diffing the output. The devinfo
structures produced are identical.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
intel_device_info_init_common calculates this for Gfx9+ based on
max_threads_per_psd and slice information. Mark it as zero in the
structures to make clear that the value there isn't useful, and make
it easier to diff binaries for the next commit's refactors.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
The documentation for 3DSTATE_URB_HS has 0 as the minimum number of HS
URB entries for all platforms. See BSpecs 32162, 47137, 56271 for
Gfx6-11, Xe, and Xe2-3, respectively.
This should silence warnings about our device info field not matching
the hwconfig tables.
Notably, nothing in our drivers currently uses this value so it cannot
have a functional impact.
Fixes: 4064b5546b ("intel/dev: reduce warning noise from urb settings")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This code, since the dawn of time, has had a redundant check for gen5-6
separate stencil in the final else clause:
} else if (doing separate stencil on gen5-6) {
return compact
} else {
if (doing separate stencil on gen5-6)
return compact
...
}
We can eliminate that one. The else clause then has a single if, so it
can be folded into the "else if" ladder alongside the others.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
Compositors sometime try to import BOs with lower alignments than 128B.
This seems particularly common in the case of cursor images but it can
also happen on other BOs allocated by the old nouveau GL driver. As
long as we avoid rendering to them (which NVK will do), the
texture/image hardware is fine as long as they're at least 32B-aligned.
Panicing in this case isn't very nice to compositors.
Backport-to: 25.0
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33990>
When importing libdrm_radeon code [1][2] it was somehow missed
that what libdrm has in one r600_pci_ids.h, Mesa has split
into r600_pci_ids.h and radeonsi_pci_ids.h. So, devices
with ids from radeonsi_pci_ids.h were not considered valid for
radeon_surface_manager_new.
This commit changes that, thus fixing radeonsi for these
devices.
[1] commit 1299f5c50a
[2] commit 3aa7497cc0
Fixes: 1299f5c50a
Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33940>
to make it easier for people (especially newcomers to the project) to add review
tags, we need a database mapping gitlab usernames to author names & emails. that
way, if someone just comments "rb" or whatever, there's a direct way to look
that up. this comimt adds a list of current contributors with the following
methodology:
1. first, I grabbed all names + emails of recent authors, with mailmap applied,
as proxy for active contributors:
$ git log --since=2025-01-01 --pretty='%aN,%aE,'|sort | uniq
2. then, I scraped usernames via the gitlab api attempting to match by name. I
don't want to hammer the gitlab api too much which is why I tried to keep the
list in #1 as small as possible.
import gitlab
import subprocess
import tempfile
import sys
import urllib.request
import csv
gl = gitlab.Gitlab('https://gitlab.freedesktop.org', private_token=...)
names = {}
with open('dump.csv') as csvfile:
spamreader = csv.reader(csvfile)
for row in spamreader:
if len(row) == 3:
names[row[0]] = row[1]
for name in names:
users = gl.users.list(search=name)
print(', '.join([name, names[name]] + [u.username for u in users]))
3. finally, I fixed up various data issues by hand. there were cases of both
people with multiple usernames (I tried to pick the one that's actually in
use), and people whose name on their profile does not match the name in their
commits (I tried to determine the username from searching gitlab manually,
but dropped a number of such authors when it was nontrivial to figure out. I
am a regular reviewer across the tree so if I don't recognize your name
you're probably not that active, sorry.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33896>
..or "the one where Alyssa gets jealous by b4".
Those of us who have stuck around a while have a habit of just commenting "rb"
or "ab" on MRs. which raises the question for everyone else of what name/email
to use. I've personally built up a collection of 36 (!!) different
shell aliases to apply different people's trailers. I think other people do
similarly.
This calls for better tooling. This patch adds a little script for applying
review trailers given a fuzzy match on the reviewer's name. I recommend
contributors alias it to something like `mrb`, then you can do things like:
mrb alyssa
mrb -a faith
to add a review tag for me or an acked-by tag for Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33896>
It is expected that inputs and prefetches are always in the first block.
However, ir3_create_empty_preamble would create blocks before the first
one, leaving inputs after the preamble. This causes issues with
(probably among others) spilling/RA where precolored inputs could
illegally reuse the spill base register.
Fixes RA validation failures on a7xx for
dEQP-VK.ray_query.multiple_ray_queries.vertex_shader
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: f3026b3d3e ("ir3: add some preamble helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33977>
For empty BVHs we shouldn't emit any leaf nodes, but there is one
invocation to encode the root node. Guard leaf node encoding so that
invocation doesn't try writing any leaves.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33985>
Prime blit can be used in setups like venus on lavapipe over vtest. It's
native env so Venus relies on renderer side driver to tell about the pci
info, while lavapipe doesn't implement that extension, which ends up
with mismatched gpu thus prime blit.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33956>
The swizzle check was too strict, we actually don't care about the
swizzle on the constant source at this point, it is only checked
later whether the constant source actually has the correct form.
So this effectively enables INV and BIAS presub on R300/R400.
RV370 stats:
total instructions in shared programs: 85379 -> 84948 (-0.50%)
instructions in affected programs: 15669 -> 15238 (-2.75%)
helped: 336
HURT: 81
total presub in shared programs: 1318 -> 2991 (126.93%)
presub in affected programs: 797 -> 2470 (209.91%)
helped: 0
HURT: 514
total omod in shared programs: 387 -> 384 (-0.78%)
omod in affected programs: 9 -> 6 (-33.33%)
helped: 3
HURT: 0
total temps in shared programs: 13290 -> 13243 (-0.35%)
temps in affected programs: 1388 -> 1341 (-3.39%)
helped: 91
HURT: 52
total consts in shared programs: 81922 -> 81855 (-0.08%)
consts in affected programs: 173 -> 106 (-38.73%)
helped: 67
HURT: 0
total cycles in shared programs: 126746 -> 126560 (-0.15%)
cycles in affected programs: 30752 -> 30566 (-0.60%)
helped: 255
HURT: 124
LOST: shaders/godot3.4/22-69.shader_test FS
GAINED: shaders/ck2/172.shader_test FS
GAINED: shaders/tesseract/389.shader_test FS
GAINED: shaders/tesseract/393.shader_test FS
GAINED: shaders/unity/64-DeferredPointShadows.shader_test FS
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33915>
Previously asynchronous descriptor set allocation is only enabled when
the VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT bit is not set.
However, some engine would use that bit but alloc/free with identical
descriptor set layout. So this change extends the async set alloc to
cover that since the spec has guaranteed no fragmentation there.
Besides, a pool before any descriptor set free is also considered w/o
fragmentation. so this change extends to cover here as well. Both
would also help with dEQP run time since all descriptor pools involved
are with that bit set.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33966>
We get a kernel message "You are adding an unorder point to timeline!"
on many CTS runs. This stems from us SIGNALing the queue syncobj then
WAITing but not reseting it. It is assumed by the time we get to
panvk_queue_submit_init_signals() that the value is 0, however it is 1
due to the previous calls.
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 5544d39f ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33943>
VEGA10, RENOIR, NAVI10, RAPHAEL and NAVI31 are covered, they passed
100% of 25 runs each.
NAVI21 and VANGOGH still don't enable video testing in CI because I
got few hangs during my last stress test. Need to be stress tested
again.
Note that the kernel in Mesa CI is too old and doesn't have latest
firmwares that should fix the remaining failures.
GFX6-8 have different issues like GPU hangs on Polaris10, so it's not
yet enabled in CI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33968>
At least, NAVI10, NAVI21 and NAVI24 are affected by this what looks
like a hardware bug when primitive restart is changed and no context
registers are written between draws. It seems the hardware doesn't
consider primitive restart at all in this situation.
Adding SQ_NON_EVENT(0) as suggested by Marek seems to fix it reliably
without introducing any overhead. It's basically a NOP packet that adds
a small delay.
Fixes new VKCTS coverage dEQP-VK.transform_feedback.primitive_restart.*.
Also fixes this old vkd3d-proton issue.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7258
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33929>
The boot action was wrapping the deploy action, which could cause
timeout misalignment. For example, the boot `GitlabSection` timeout was
shorter than the deploy timeout in LAVA, leading to cases where LAVA
jobs were canceled during their own retry mechanism.
By splitting these actions, we can align the timeouts properly,
preventing interference and unnecessary job cancellations.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33906>
There are some jobs that were missing the FARM variable, which is useful
to lava_job_submitter.py to classify how it should interact with each
LAVA server and how it should assemble the job definition.
Right now, we use a set of regexex with the RUNNER_TAG variable, but
that is error-prone.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33888>
We check for iamge layouts and feedback loops when we bind image
resources but not queue families. If the resource isn't on the graphics
queue, we need to add it to need_barriers so we can transition it back
to our queue.
Fixes: d4f8ad27f2 ("zink: handle implicit sync for dmabufs")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33952>
Otherwise, we'll transition to QUEUE_FAMILY_FOREIGN and then forget that
we left it on the foreign queue and never transition back the next time
we use the resource. This was kind-of okay with Wayland compositors
because they always re-import the BO so it's always fresh and they pick
up on the queue transfer the first time. X11, on the other hand, does
not re-import BOs so they get stuck in this weird QUEUE_FAMILY_FOREIGN
limbo until something happens to randomly trigger a layout transition
check and then we find it and do the transition. We should mark them as
needing a barrier the moment we transition to QUEUE_FAMILY_FOREIGN.
Fixes: d4f8ad27f2 ("zink: handle implicit sync for dmabufs")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33952>
Implement all commands involved. No need to scrub anything in the RT
pipeline info since it has been well validated by the VUs related.
The nature of VkDeferredOperationKHR plays well with venus multi-ring
support. So later we can properly define our own concurrent limits for
RT pipeline creations.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33907>
This commit adds genxml to Lima, by copying mostly from Asahi. The definition of the texture descriptor has been moved there.
v2:
- remove mipaddress handling, use "shr(6)" modifier
- indent TD parser 8 spaces
v3:
- added copyright
- renamed ushort to unorm16
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33385>
With SPIR-V 1.6 OpIsHelperInvocationEXT was effectively replaced with
Volatile loads of the HelperInvocation built-in variable.
This patch also drops the distinction between is_helper_invocation and
load_helper_invocation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33492>
The "JIP:" and "UIP:" markers were being ignored, so was possible
to switch their order in the text but the parser would act as the
same. Just fix the order now and enforce it through the parsing.
Since we are here, remove the "Jump:" and "Pop:" that are not used
for Gfx9+ anymore.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33522>
And use brw_builder(brw_shader *) and brw_builder() constructors
where possible.
The way tests are written, it is necessary to initialize an "empty"
builder -- which is later replaced by a proper one. Default parameter
NULL make that initialization implicit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
We already merged it below, when the library has both fragment output
interface and fragment shader state (which is when we'd compute it
anyway). Setting it twice is probably harmless but also confusing and
useless.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33902>
fd_pipe_new2 can segfault when trying to set the is_64bit flag on new
pipes. This can happen when the current GPU is not be listed in the
fd_dev_recs table because it's not supported by mesa, but is supported by
the kernel.
Add a helper function to test if the current GPU is in the supported table,
and use it in fd_pipe_new2.
Signed-off-by: Loïc Minier <loic.minier@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33830>
The fast path for kgsl_queue_submit when there are no command buffers
and only sync objects led to breakage for two reasons:
* The fast path was not properly handling duplication of the merged sync
object assigned to signalled `kgsl_syncobj`(s), which could lead to
multiple `kgsl_syncobj`s owning the same FD and consequently issues
such as double close of that FD leading to UB. This is fixed by moving
to the slow path as it always produces a timestamp sync object which
can be trivially duplicated.
* The Vulkan specification requires that drivers strictly follow the
order of submission of command buffers and consequently the order of
semaphore signal/wait operations. Since no submission was being made
to the kernel, subsequent submissions could be executed without waiting
for wait/signal operations from previous submissions to complete.
As both of these issues are fixed by moving to the slow path, this patch
removes the fast path in favor of the more correct slow path.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33894>
RC_FILE_INPUT is pretty much just a RC_FILE_TEMPORARY with an initial
value in it. So we regalloc it the same way we do normal temps, however
for unknown reasons (probably to have a bit more readable shader dumps)
we still keep the RC_FILE_INPUT type even though its the same as
temporary. This is handled correctly when emitting the machine code,
however, it was not taken into account in shader stats.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33817>
The free list must be re-initialized. Found the bug while running:
dEQP-VK.ray_tracing_pipeline.acceleration_structures.device_compability_khr.gpu_built.top
where it invokes VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT to purge
the cmd pool resources, and the next alloc still gets cache hit with the
"empty" list.
Fixes: e2c4bafccc ("venus: free query batches for VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33908>
If PXP initialization is not completed and application requested a
protected context the GEM_CONTEXT_CREATE will wait up to 250ms for
PXP to finish initialization but if that do not happens it will
return a error and set errno to EIO.
This patch add the missing retry handling.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30723>
Implement 2 optimizations for branches:
1) if unconditional branch target is a block that only has
unconditional branch, propagate the target
2) optimize following contruction:
block 1:
...
if (cond) block 3
block 2:
branch block N
block 3:
...
into
block 1:
...
if (!cond) block N
block 2:
block 3:
...
Note: optimization 1) significantly improves runtime of if ladders, but
it is not visible in shader-db because we do not track shortest/longest
path and it doesn't always create dead code (usually just a single
instruction)
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33754>
On a7xx, const registers should be loaded via the preamble instead of
uploaded by the driver to the const state. This commit implements this
by adding a new pass that emits the consts created by ir3_cp to a
sequence of stc instructions in the preamble.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32454>
On a7xx, the immediates that get promoted to const registers will be
initialized in the preamble instead of being part of the const state. So
technically, we won't need the immediate state that is part of the const
state anymore on a7xx. However, it is still a convenient place for
ir3_cp to store the immediates that should be promoted to const
registers before they are lowered to the preamble.
This causes one issue: the binning pass isn't allowed to modify the
const state while it's perfectly fine for it to use different immediates
compared to the non-binning pass on a7xx. Even pre-a7xx this is fine as
long as the size of the immediate buffer is the same.
To allow the binning pass to modify its immediate state while keeping
its const state immutable, this commit moves the fields related to
immediates into a new struct. Runtime checks are added to enforce that
the size of the immediate buffer is the same for the binning and
non-binning variant pre-a7xx.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32454>
The AttrAccess structure provided inputs for similar instructions, some
inputs were used only in a subset of instructions, needing asserts and
dummy values.
This commit flattens the struct directly in the instructions removing
the unused fields and cleaning up the code.
Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33899>
Currently spilling code operates on individual ops rather than on
instructions, and as a result it may create a redundant load_temp op if
an instruction references spilling register several times.
Similarly, it creates multiple stores if there are multiple ops in the
instruction that write different components of the register.
Check whether the instruction already contains a necessary load_temp or
store_temp and reuse it if possible.
shader-db:
total instructions in shared programs: 27718 -> 27673 (-0.16%)
instructions in affected programs: 2786 -> 2741 (-1.62%)
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.50 x̃: 1
helped stats (rel) min: 0.39% max: 5.33% x̄: 2.05% x̃: 0.80%
95% mean confidence interval for instructions value: -3.70 -1.30
95% mean confidence interval for instructions %-change: -3.09% -1.01%
Instructions are helped.
total loops in shared programs: 4 -> 4 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 390 -> 381 (-2.31%)
spills in affected programs: 145 -> 136 (-6.21%)
helped: 9
HURT: 0
total fills in shared programs: 1210 -> 1174 (-2.98%)
fills in affected programs: 149 -> 113 (-24.16%)
helped: 9
HURT: 0
LOST: 0
GAINED: 0
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33753>
this contains two behaviour changes:
* NDEBUG no longer set in debug builds (so asserts work in debug, but are still
stripped out in release as expected).
* macro map set properly for assertions to be reported with proper paths.
together this makes assertions do the right thing on Intel and brings us in
alignment with asahi+panfrost
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33879>
There's already code to verify that any compacted instruction
that we produce is equivalent to the original uncompacted
instruction -- including detailed output if it fails.
This patch enables this verification in debug build and will
abort in case it fails.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33821>
* ballot{,_relaxed}'s src[0] must always be a 32-bit value.
* read_invocation's src[0] must always be at most 32-bit value. While
the HW instruction always operates on a 32-bit value, it's important
to remember that it's just a data movement operation, so garbage in
the high bits of a 32-bit value representing a narrower value don't
present an issue.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33365>
All mature drivers report shader statistics in various places. GL drivers use
util_debug for shader-db's report script. VK drivers use executable statistics
feeding the report fossil script. Many drivers also have a magic env var to
dump the stats to stdout/stderr in addition to these standard forms.
Implementing any of these 3 reports requires doing brittle string processing in
C (GL, stdout) or piles of boilerplate (VK). Additionally, the logic gets
duplicated in every driver and duplicated between GL and VK.
And to add insult to injury, the information is duplicated *again* in the report
fossil script :'(
This commit introduces a new 'shader statistic framework' that aims to unify
statistics reporting across all drivers and across GL&VK. With the new approach,
a common XML file defines all the statistics for the tree. The common code
introduced here then autogenerates from that XML file an appropriate C header.
The header contains a C struct for each ISA, and autogenerated print/report
functions. Minimal driver integration is required: just filling out the stats
struct and calling the appropriate functions.
In this MR, 3 driver families are added as examples. Panfrost/PanVK and
Asahi/Honeykrisp are added as "complete" examples. Neither Vulkan driver
reported nontrivial executable statistics; with these changes, both report all
the same statistics that the GL drivers report. Turnip is also added partially -
it's not plumbed into ir3/gallium yet but just using the XML reduces boilerplate
a ton for Vulkan statistics.
[It is intended for this XML to be consumed also by shader-db's python scripts,
but that's not done here.]
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33814>
This seems to fix some artifacts, but we're not sure why, so it might not
be a correct or optimal solution.
fossil-db (navi31):
Totals from 28424 (35.81% of 79377) affected shaders:
Instrs: 30112910 -> 30348977 (+0.78%); split: -0.00%, +0.78%
CodeSize: 159542980 -> 160485336 (+0.59%); split: -0.00%, +0.59%
Latency: 221438396 -> 221500856 (+0.03%); split: -0.00%, +0.03%
InvThroughput: 38154231 -> 38159984 (+0.02%); split: -0.00%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33853>
The old code got the accumulation a bit wrong. For one thing, it always
accumulates with whatever was there instead of resetting to empty each
time. For another, it sets with with y and height with x when it writes
back to the resource. This is also all too complicated because it
converts between pipe_box, u_rect, and VkRect2D on every iteration.
Instead, there are helpers in util/box.h which will do most of this work
for us and they're correct. Let's just use them to get rid of the bugs
and make everything simpler and more obvious at the same time.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12194
Fixes: 3d38c9597f ("zink: hook up KHR_partial_update")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33855>
The old calculations are wrong. They add width+x and call that a width
the same with y and height. This is wrong but it's wrong in a way that
only ever increases damage so we never noticed it. However, util/box.h
has helpers for these operations which don't have this bug. Let's use
them and make the code simpler, more obvious, and correct. We also
weren't flipping the damage like we're supposed to and that was most
likely not getting noticed because of the over-damage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33855>
In case of constant lowering with halfswizzle sources, we were selecting
h01 causing an invalid instruction error to be yield later.
This can only be hit by conversion instructions and shouldn't be seen in
the wild (as this should be eliminated before entering the backend).
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 7d07fb9a67 ("pan/va: Handle 8-bit lane when lowering constants")
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33867>
If we pass zink=false to pipe_loader_drm_probe_fd, it could happen that
a Gallium driver that had been already discarded because of not
supporting the graphics CAP will be chosen.
To avoid that, explicitly ask pipe_loader_drm_probe_fd to choose the
zink Gallium driver.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30096>
Exposing a rendernode from a supported driver is not a sufficient
matching criteria to qualify as the render part of a renderonly
device, as the rendernode might only expose compute or 2D accel
capabilities.
Look for a screen that actually supports gallium graphics operations
to qualify as a renderonly screen.
v2 (Tomeu): Have pipe-loader return a list of FDs for kmsro to choose
based on capabilities.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30096>
it's technically possible for a resource to have no usage but for
batch usage to be set; this can occur if a resource is used,
its cmdbuf completes, but the batch state is not reset before the
resource is used again
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33848>
This reverts commit df1ff3c711. When
clients allocate BOs via GBM's gbm_bo_create(), they expect those BOs to
work with KMS without modifiers. Assigning them a modifier and hoping
that they then query that modifier even though they used the legacy API
to create the BO is pretty mean. In particular, this breaks KWin which
doesn't use modifiers if the KMS device doesn't support atomic.
Fixes: df1ff3c711 ("zink: enable single-plane modifiers for generic 2D exports")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33887>
We clone loads for uniforms for each user, which means that each
ppir_node will have its own, however an instruction may have several
nodes that need to load uniforms. Currently, in this case ppir compiler
will place all but one load uniforms nodes into a new instruction and
create an extra mov.
We can optimize that by checking whether load_uniform node in the instruction
is the same as the one we are trying to insert and reuse it. It is safe
to do so, because successors use pipeline register, so regalloc doesn't
care about it.
shader-db:
total instructions in shared programs: 27808 -> 27718 (-0.32%)
instructions in affected programs: 2652 -> 2562 (-3.39%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.90 x̃: 2
helped stats (rel) min: 0.56% max: 33.33% x̄: 5.12% x̃: 4.11%
95% mean confidence interval for instructions value: -3.78 -2.03
95% mean confidence interval for instructions %-change: -7.24% -2.99%
Instructions are helped.
total loops in shared programs: 4 -> 4 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 391 -> 390 (-0.26%)
spills in affected programs: 1 -> 0
helped: 1
HURT: 0
total fills in shared programs: 1213 -> 1210 (-0.25%)
fills in affected programs: 3 -> 0
helped: 1
HURT: 0
LOST: 0
GAINED: 0
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33746>
Ideally this would be passed in pic params as the values are
in sequence header, but using the surface size also works.
Also add sanity checks for frame size.
Fixes decoding av1-1-b8-22-svc-L2T1 and av1-1-b8-22-svc-L2T2.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33737>
This change was tested on cayman. This change fixes all the tests fixed by
the evergreen commit, and fixes the 32-bits integer tests as well:
spec/arb_texture_compression_bptc/texwrap formats bordercolor-swizzled/gl_compressed_rgb_bptc_signed_float, swizzled, border color only: fail pass
spec/arb_texture_compression_bptc/texwrap formats bordercolor-swizzled/gl_compressed_rgb_bptc_unsigned_float, swizzled, border color only: fail pass
spec/arb_texture_compression_bptc/texwrap formats bordercolor-swizzled/gl_compressed_rgba_bptc_unorm, swizzled, border color only: fail pass
spec/arb_texture_rg/texwrap formats-int bordercolor-swizzled/gl_r32i, swizzled, border color only: fail pass
spec/arb_texture_rg/texwrap formats-int bordercolor-swizzled/gl_r32ui, swizzled, border color only: fail pass
spec/arb_texture_rg/texwrap formats-int bordercolor-swizzled/gl_rg32i, swizzled, border color only: fail pass
spec/arb_texture_rg/texwrap formats-int bordercolor-swizzled/gl_rg32ui, swizzled, border color only: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33579>
This change fixes a regression introduced with 8b5d41cacb.
Indeed, lookup_resid was not updated.
This change was tested on palm and cayman. Here are the tests fixed:
khr-gl4[3-5]/shader_image_size/advanced-nonms-cs-float: fail pass
khr-gl4[3-5]/shader_image_size/advanced-nonms-cs-int: fail pass
khr-gl4[3-5]/shader_image_size/advanced-nonms-cs-uint: fail pass
khr-gl4[3-5]/shader_image_size/advanced-nonms-fs-float: fail pass
khr-gl4[3-5]/shader_image_size/advanced-nonms-fs-int: fail pass
khr-gl4[3-5]/shader_image_size/advanced-nonms-fs-uint: fail pass
khr-gl4[3-5]/shader_image_size/basic-nonms-cs-float: fail pass
khr-gl4[3-5]/shader_image_size/basic-nonms-cs-int: fail pass
khr-gl4[3-5]/shader_image_size/basic-nonms-cs-uint: fail pass
khr-gl4[3-5]/shader_image_size/basic-nonms-fs-float: fail pass
khr-gl4[3-5]/shader_image_size/basic-nonms-fs-int: fail pass
khr-gl4[3-5]/shader_image_size/basic-nonms-fs-uint: fail pass
Fixes: 8b5d41cacb ("r600/sfn: Use range_base for atomics and images")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33352>
Per https://github.com/llvm/llvm-project/issues/119967 these
headers are internal implementation details of libclc and were
never supposed to be installed. They are not available anymore
since LLVM 20. Instead opencl-c.h should be used.
There already ise a code path for including opencl-c.h, so always
use it.
This didn't work for me out of the box, because the build system
currently hardcodes the clang resource directory, which is incorrect
for Fedora at least. Fix this by using GetResourcePath +
CLANG_RESOURCE_DIR provided by clang instead. This is basically
the same as what is done in clc_helper.c
I've still retained the old behavior as a fallback just in case
(e.g. if clang is linked statically?)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33805>
RUN_COMPUTE_INDIRECT has been found to cause intermittent hangs, so
this change replaces it with RUN_COMPUTE and a set TASK_AXIS_X.
While this task axis might be suboptimal, the performance cost is
somewhat offset by RUN_COMPUTE not being an emulated command.
Fixes: 2ffc05d8d2 ("panvk: Add support for CmdDispatchIndirect")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33841>
RUN_COMPUTE_INDIRECT has been found to cause intermittent hangs, so
this change replaces it with RUN_COMPUTE and a set TASK_AXIS_X.
While this task axis might be suboptimal, the performance cost is
somewhat offset by RUN_COMPUTE not being an emulated command.
Fixes: 447075eeee ("panfrost: Add support for the CSF job frontend")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33841>
BC250 is known to have non-functional compute queue. Thousands
for Vulkan CTS tests fail, and many games are known to have visual
glitches. RADV_DEBUG=nocompute is the known workaround for all these
issues.
Disable compute queue for this chip in both radv and radeonsi.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.
It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html
Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
Skip the zero-index perfcntr collection only if CP_ALWAYS_COUNT is used,
since we want to collect that at the very end/start of collection. But
don't unwittingly ignore that perfcntr if CP_ALWAYS_COUNT isn't being
collected.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 86456cf0e6 ("tu: support exposing derived counters through VK_KHR_performance_query")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33852>
This change fixes the indirect draw 8-bits path which does
a conversion to 16-bits. This change is implemented to process
the parameters the same way as the other indirect draw paths.
This change was tested on palm and cayman. Here are the tests fixed:
deqp-gles31/functional/draw_indirect/draw_elements_indirect/indices/index_byte: fail pass
deqp-gles31/functional/draw_indirect/random/35: fail pass
deqp-gles31/functional/draw_indirect/random/45: fail pass
khr-gl40/draw_indirect/basic-indicesdatatype-unsigned_byte: fail pass
khr-gl41/draw_indirect/basic-indicesdatatype-unsigned_byte: fail pass
khr-gl42/draw_indirect/basic-indicesdatatype-unsigned_byte: fail pass
khr-gl43/draw_indirect/basic-indicesdatatype-unsigned_byte: fail pass
khr-gl44/draw_indirect/basic-indicesdatatype-unsigned_byte: fail pass
khr-gl45/draw_indirect/basic-indicesdatatype-unsigned_byte: fail pass
Fixes: d80701df8a ("r600g: Implement GL_ARB_draw_indirect for EG/CM")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32802>
As explained by d80701df8a, the r600 hardware doesn't implement
the base vertex offset. This change implements this offset as a
constant buffer entry shared with lds.
This change is inspired from 3511a51be0 ("freedreno/ir3: handle
VTXID_BASE for indirect draws").
Note: this feature requires at least evergreen.
This change was tested on palm and cayman. Here are the tests fixed:
spec/arb_draw_indirect/gl_vertexid used with gldrawarraysindirect: fail pass
spec/arb_draw_indirect/gl_vertexid used with gldrawelementsindirect: fail pass
spec/arb_multi_draw_indirect/gl-3.0-multidrawarrays-vertexid -indirect: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32769>
Add a specific timeout for the U-Boot action in LAVA job definitions for
rockchip devices. This ensures sufficient time for U-Boot to download
the kernel and set up early network, preventing potential job failures
due to timeout constraints.
This behavior started to happen since LAVA 2025.02 version.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33839>
After dae57e184a ("glsl,st/mesa: always lower IO for GLSL, unlower
IO for drivers"), the shaders updated by sfn_nir_legalize_image_load_store
on cayman could trigger a segmentation fault. The main issue is that
sfn_nir_legalize_image_load_store does not handle properly the ssa
dominance functionality and this issue is exacerbated by this last
mesa update.
This change makes the ssa dominance functionality operational.
This commit implements pass_flags to avoid an infinite loop.
For instance, this issue is triggered on cayman using this
environment variable NIR_DEBUG=validate_ssa_dominance with:
"piglit/bin/oes_egl_image_external_essl3 -auto -fbo":
NIR validation failed after r600_legalize_image_load_store in ../src/gallium/drivers/r600/sfn/sfn_nir.cpp
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32894>
Since glapi is part of libgallium.so, the offsets don't have to be immutable
anymore, but they still have to be fixed values within a build because both
gl_XML.py and genCommon.py use them and they should match AFAIK.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33634>
This fixes missing script dependencies that didn't trigger rebuilds when
those files were changed. To keep it simple stupid, all xml and python
files used by python scripts indirectly are now in a single global list.
All variables holding file names are also inlined, so that we use file
paths everywhere.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33634>
Calculation was wrongly walking uncompacted instructions, even if we had
some compacted in the middle, generating invalid size. Since we are
here just drop the instruction count, since in practice the caller will
have to walk the instruction stream anyway.
Fixes: 6267585778 ("intel/brw: Also return the size of the assembled shader")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33532>
This GPU is located in the same host as Tahiti, and was kindly donated
to the RADV project by Leonardo Frassetto (@DottorLeo).
It's good to finally making use of it, one year after receiving it \o/
On a side now, the skips are removed since they do not appear to be
reducing the chances of hanging once paired with the updated postamble
flushes.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33563>
After nir_lower_io we need to gather the info about 64 bit usage
to be up-to-date when deciding whether the remaining 64 bit IO ops
be lowered.
Before 89dad5618d ("gallium: add PIPE_CAP_CALL_FINALIZE_NIR_IN_LINKER")
the info was eventually updated to include the use of 64 bit values
also if only some IO was using this so that SFN was handling the code
correctly. As it seems with above patch this is not always the case
anymore, and we have to take care of it.
Fixes: 89dad5618d ("gallium: add PIPE_CAP_CALL_FINALIZE_NIR_IN_LINKER")
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32774>
Turnip's current VK_KHR_performance_query implementation only exposed raw
perfcounters. These aren't exactly trivial to evaluate on their own.
This mode can still be used with the new TU_DEBUG_PERFCRAW flag. Existing
TU_DEBUG_PERFC now enables performance query mode where Freedreno's derived
counters are exposed instead. These default to using the command scope,
making them usable with RenderDoc's performance counter capture.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33208>
Freedreno's derived counters combine multiple perfcntrs into a more
sensible, human-friendly metric. This change picks up the counters
currently used in Freedreno's Perfetto producer and rolls them into a
more genericallly usable form.
First place of their use will be through VK_KHR_performance_query, but
the Perfetto producer should also be able to use this interface instead
of having the logic duplicated. For now the counters are available only
for a7xx devices.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33208>
When using vkGetQueryPoolResults for performance queries, the result of
each query is the VkPerformanceCounterResultKHR union, and the result of
each query should be written into that union according to the storage type
of the corresponding counter.
This fixes current behavior of using the generic write procedure, where
either 32-bit or 64-bit value is written depending on the specified query
result flags.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33208>
We currently keep a0.x/a1.x unnecessarily blocked until we have to spill
it, even if all its uses have been scheduled already. This causes
unnecessary spills and potentially schedules address writes later than
we'd like.
Fix this by keeping track of the number of uses that are still
unscheduled, unblocking address writes when it drops to zero.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33816>
This step is needed to support the same GPU model, but with a
different revision. A good example is the gc7000.
- imx8mp: model gc7000 with revision 6204 (uses RS)
- imx8mq: model gc7000 with revision 6214 (uses BLT)
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33811>
every vk driver wants these macros for executable statistics, so make them
common. there are two variants floating in-tree, a pure copy (radv) and a
formatted print (everyone else). we add both variants and then convert most
prints to copies where formatting isn't actually used. that has the benefit of
cleaning up trivial "%s" format strings in a bunch of places.
I didn't bother cleaning up the formatting in non-automatic-formatted drivers
because it's tedious and I'm planning to delete a lot of this driver code with
upcoming runtime work anyway. This is a step towards those runtime improvements.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33826>
Android's Vulkan loader provides KHR_android_surface and intercepts
during instance creation; it does not implement KHR_present_[wait/id]
nor does it pass the enablement of KHR_android_surface to the driver
so this check wasn't actually disabling KHR_present_[wait/id] for
Android.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33677>
> Allowing 1D and 3D images, and array images too, with DRM_FORMAT_MOD_LINEAR, is ok
> because VkImageDrmFormatModifierExplicitCreateInfoEXT::pPlaneLayouts is able to fully
> describe the image layout. IF miplevels == 1, which this patch continues to enforce.
Reviewed-by: Lina Versace <lina@kiwitree.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33668>
this is a nice idea, but there are apps/games that do not respect
hardware capabilities and yolo-bind fixed size buffers
fixes Ballionaire (2667120) launch on non-desktop drivers
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33819>
Beyond that, monolithic pipelines just bloat to incredible sizes,
destroying compile times for questionable, if any, runtime perf benefit.
Indiana Jones: The Great Circle has more than 100 stages and takes
several minutes to compile its RT pipeline on Deck when using monolithic
compilation, and yet separate shaders still end up faster (probably
because instruction cache coherency in traversal is better).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33818>
The first test verifies that, if possible, we don't emit unnecessary
renames/copies for temporaries where it's possible for them to stay
in their current register (if an operand is precolored to the register
the temporary is currently residing in).
The second test verifies that we correctly choose a non-clobbered
operand even if there is one fixed to the temporary's current register.
To minimize copies, we'll want to have the live copy of
%tmp0 in v[2] there, because v[0-1] gets overwritten.
The third test verifies that we add a copy to another free register and
rename if all possible precolored operands are clobbered.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
These traces are now stable enough to start running them again on TGL.
Additionally, add new lines between traces in preparation for adding
ADL coverage, and update zink-anv-tgl-traces-restricted to no longer
inherit rules from zink-anv-tgl-traces.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33793>
We would set the `src` flag on the interval of reloaded sources.
However, the interval might be merged with its parent when inserted and
the parent wouldn't have this flag set. This caused the parent interval
to potentially be reused to reload later sources. Fix this by setting
the `src` flag on the top-level interval after insertion.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33810>
Now that every ANGLE use is covered by tag consistency checks
(structured tagging), we don't need the USE_ANGLE flag anymore, because
if we have ANGLE_TAG set, it means that ANGLE is required in this job.
In detail, it means that the test job has inherited ANGLE_TAG from
`.container-builds-angle`.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33421>
Make structural tagging functions available for both test and build
scripts.
Introduces the update_tag.sh helper for listing, checking, and updating
deterministic tags.
Also adds the ci_tag_build_time_check and ci_tag_test_time_check
functions to validate tags during build and test phases, ensuring
consistent component versioning.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33421>
Alias are not exportable, in the current situation of the build scripts,
we use alias to deal with sections, but it infers that all build scripts
will be included in a bigger one that has already included
`setup-test-env.sh`.
With the structured tagging, we do a dry run of all build scripts, to
early check if the tagging is valid, before building stuff.
So changing alias to functions will not have an effect on the current
setup, but it also removes the need to reinclude library bash scripts in
some situations, as described above.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33421>
The android_build.sh script was not calling the container_pre_build.sh
and container_post_build.sh, we will need that to make the structural
tagging early checking to work. And also do the same cleanup and
configuration made for other container build jobs.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33421>
When spilling values, we can detect when a value is known to be constant
and avoid spilling it out to memory and/or GPRs by just re-materializing
the constant value instead of filling.
Shader-db stats:
Totals:
CodeSize: 30101168 -> 30052896 (-0.16%); split: -0.19%, +0.03%
SLM size: 146536 -> 146524 (-0.01%)
Static cycle count: 6952994 -> 6939532 (-0.19%); split: -0.30%, +0.10%
Spills to memory: 174139 -> 173625 (-0.30%)
Fills from memory: 174139 -> 173625 (-0.30%)
Totals from 555 (8.05% of 6891) affected shaders:
CodeSize: 18945520 -> 18897248 (-0.25%); split: -0.30%, +0.04%
SLM size: 128952 -> 128940 (-0.01%)
Static cycle count: 4344118 -> 4330656 (-0.31%); split: -0.47%, +0.16%
Spills to memory: 174139 -> 173625 (-0.30%)
Fills from memory: 174139 -> 173625 (-0.30%)
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33785>
Fixes: 692b5fa9f2 ("anv: Add shader to copy acceleration structures")
This commit fixes the future test "sparse_binding_structures" for
"header_bottom_address" for ray tracing pipeline.
Even on 48-bit ray tracing (Xe1/2), the software-defined part
instance_leaf_part1.bvh_ptr has to be in canonical form for copy.comp
to deference a bvh, which means we have to preserve the upper 16bits.
This is especially relevant in cases where the acceleration structure buffer
is located high, such as sparse buffer.
Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745>
Fixes: 2fe57947e3 ("anv: Implement encode shader to fit in ANV BVH")
This commit resolves the failures in the future tests
"sparse_binding_structures" for rayquery. Sparse buffers' heaps are
located high, and since it's in canonical form, the higher 16bits are
all set to 1. However, the existing encoder did not expect any non-zero
values at the higher 16bits. As a result, the instance flags got
corrupted, causing most triangle tests to fail.
Thanks for Paulo providing insights about sparse buffer properties.
Co-developed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745>
... since `BoxedHandleManager` should, well, manager the handles.
This simplifies `VkDecoderGlobalState` a little bit and should also
allow us to remove a bunch of functions that no longer need to
depend on `VkDecoderGlobalState`.
Test: cvd create --gpu_mode=gfxstream_guest_angle_host_swiftshader
Test: cvd snapshot_take --force \
--auto_suspend \
--snapshot_path=/tmp/snapshot1
Test: cvd reset -y
Test: cvd create --snapshot_path=/tmp/snapshot1
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33740>
Add vkGetFenceStatus and vkWaitForFences functions to the
global state tracking list for the host.
This will allow adding more functionality to the fences
and perform additional operations before waiting for and
signaling them.
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33740>
... to break the recursive behavior of the replay calling into
VkDecoderSnapshot so that locking and thread safety annotations can be
preserved in VkDecoderSnapshot.
Follow up to aosp/3412302.
Test: cvd create --gpu_mode=gfxstream_guest_angle_host_swiftshader
Test: cvd snapshot_take --snapshot_path=<>
Test: cvd create --snapshot_path=<>
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33740>
Bifrost LDEXP.v2f16 takes a 16-bit exponent, which requires messy
lowering. The codegen for this is quite bad currently, but would be
improved by implementing unpack_32_2x16_split_*, and by fusing
comparisons with CSEL.
The main alternative is converting to F32, then LDEXP.f32, then
converting back to F16. This has better codegen for dynamic exponents
currently, but worse in the common case with a constant exponent where
all the saturating cast logic can be folded.
Fixes dEQP-VK.glsl.builtin.precision_fp16_storage16b.ldexp.compute.vec2
when shaderFloat16 is enabled in panvk.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
On vulkan, truncating to S/U16 before converting is not valid, because
out-of-range conversions are specified to be correctly rounded. IEEE 754
requires that out-of-range values round to ±inf with RTNE and ±F16_MAX
with RTZ.
On gl, truncating is valid for U16->F16, because out-of-range int->float
conversions are undefined behavior. For S16->F16, it is not valid
because S16_MAX < F16_MAX, so some in-range values will be truncated as
well.
Instead, just handle S/U16->F16 as S/U16->F32->F16.
Fixes dEQP-VK.spirv_assembly.instruction.compute.convertstof.int32_to_float16_*
when shaderFloat16 is enabled in panvk.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: be74b84e6f ("pan/bi: Fill in some more conversions")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
Right now the aliasing/overlapping checks are only done with index
0. I guess that was done because variables don't get a different
internal location even if you have a different index.
But doing that, the checks would not detect a case like this:
layout(location = 0, index = 1) out vec4 color;
layout(location = 0, index = 1) out vec4 factor;
That was used on the following piglit parser test:
spec/arb_explicit_attrib_location/1.10/compiler/layout-13.frag
And as the spec included on that test, is a link error case:
" * if more than one varying out variable is bound to the same
number and index; or"
This commit executes the aliasing checks for index 1 too, and moves
the skip down, to only skip if the current variable and all previous
location-assigned variables has different index and location.
The bad news is that now such assigned variables need to be tracked on
OpenGL-ES. Before that commit that was avoided.
With this commit the mentioned parser test properly fails to link in
any driver.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33093>
If we insert
<code>
s_branch 1
s_branch Target
at the end of some block, and later hide an additional chained branch
after the existing one, then we have to update the 's_branch 1' to
also jump over the newly added branch.
Fixes: cab5639a09 ('aco/assembler: chain branches instead of emitting long jumps')
Closes: #12673
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762>
Venus has long supported creating swapchain image alias via binding. So
below are exposed without extra work needed:
- VK_EXT_surface_maintenance1
- VK_EXT_swapchain_maintenance1
Test: dEQP-VK.wsi.*.maintenance1.*
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33782>
Our name for this enum was brw_message_target, but it's better known as
shared function ID or SFID. Call it brw_sfid to make it easier to find.
Now that brw only supports Gfx9+, we don't particularly care whether
SFIDs were introduced on Gfx4, Gfx6, or Gfx7.5. Also, the LSC SFIDs
were confusingly tagged "GFX12" but aren't available on Gfx12.0; they
were introduced with Alchemist/Meteorlake.
GFX6_SFID_DATAPORT_SAMPLER_CACHE in particular was confusing. It sounds
like the SFID to use for the sampler on Gfx6+, however it has nothing to
do with the sampler at all. BRW_SFID_SAMPLER remains the sampler SFID.
On Haswell, we ran out of messages on the main data cache data port, and
so they introduced two additional ones, for more messages. The modern
Tigerlake PRMs simply call these DP_DC0, DP_DC1, and DP_DC2. I think
the "sampler" name came from some idea about reorganizing messages that
never materialized (instead, the LSC came as a much larger cleanup).
Recently we've adopted the term "HDC" for the legacy data cluster, as
opposed to "LSC" for the modern Load/Store Cache. To make clear which
SFIDs target the legacy HDC dataports, we use BRW_SFID_HDC0/1/2.
We were also citing the G45, Sandybridge, and Ivybridge PRMs for a
compiler that supports none of those platforms. Cite modern docs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33650>
This is part of the hashing key :
==25753== Uninitialised byte(s) found during client check request
==25753== at 0x93D29AE: blob_write_bytes (blob.c:164)
==25753== by 0x93A62C6: vk_pipeline_precomp_shader_serialize (vk_pipeline.c:722)
==25753== by 0x93AC55E: vk_pipeline_cache_add_object (vk_pipeline_cache.c:433)
==25753== by 0x93A691B: vk_pipeline_precompile_shader (vk_pipeline.c:875)
==25753== by 0x93A8FB9: vk_create_graphics_pipeline (vk_pipeline.c:1715)
==25753== by 0x93A9799: vk_common_CreateGraphicsPipelines (vk_pipeline.c:1860)
==25753== Address 0xf1adf82 is 82 bytes inside a block of size 152 alloc'd
==25753== at 0x64FA858: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25753== by 0x99AAC38: vk_default_alloc (vk_alloc.c:26)
==25753== by 0x93A403B: vk_alloc (vk_alloc.h:48)
==25753== by 0x93A406B: vk_zalloc (vk_alloc.h:56)
==25753== by 0x93A60A0: vk_pipeline_precomp_shader_create (vk_pipeline.c:680)
==25753== by 0x93A689D: vk_pipeline_precompile_shader (vk_pipeline.c:866)
==25753== by 0x93A8FB9: vk_create_graphics_pipeline (vk_pipeline.c:1715)
==25753== by 0x93A9799: vk_common_CreateGraphicsPipelines (vk_pipeline.c:1860)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9308e8d90d ("vulkan: Add generic graphics and compute VkPipeline implementations")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33792>
This is useful both for correctness (to ensure that things we think are
constant stay constant) and it improves performance a bit by reducing
register pressure and avoiding spilling.
Pipeline-db stats:
CodeSize: 29665072 -> 29437344 (-0.77%); split: -0.92%, +0.16%
Number of GPRs: 157124 -> 156082 (-0.66%)
SLM Size: 148900 -> 146436 (-1.65%)
Static cycle count: 6840286 -> 6805711 (-0.51%); split: -0.98%, +0.47%
Spills to memory: 177779 -> 173337 (-2.50%)
Fills from memory: 177779 -> 173337 (-2.50%)
Spills to reg: 17692 -> 16731 (-5.43%)
Fills from reg: 12013 -> 11897 (-0.97%)
Max warps/SM: 309128 -> 309456 (+0.11%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33771>
I'm not aware of any workloads that will be impacted by this change,
but let's keep our list of control flow instructions complete. A
shader-db run on MTL tells me nothing changes.
v2: "The scheduler relies on HALT not being considered control flow to
be able to move code past HALT instructions. Doing this would prevent
such optimization from happening and would reduce performance
dramatically in some cases." - Francisco.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021>
In `fossilize-replay --pipeline-hash 375a63e14afa96c4
fossils/fossil-db/steam-dxvk/f1_22_abu_dhabi.dx12vk-ultra.foz`,
`cf_count` would get decremented below zero. This would lead trying to
print `UINT_MAX` levels of indentation just a few lines below. I ran
out of disk space and patience before that finished. 🤣
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33748>
For setters, e.g. vk_set_physical_device_properties_struct used by venus
to fill all props, the out array storage comes from the driver, so we'd
assign directly. This change also fixes the template indent and drops an
unused arg.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33757>
The img-2-img and layout transition are trivial passthrough. For
img-2-mem and mem-2-img copies, host pointer has to be sized for proper
protocol encoding and decoding, and we have to either query or calculate
on our own based on VK_HOST_IMAGE_COPY_MEMCPY_EXT flag being used or
not.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33757>
No need to burn an entire PAGE_SIZE BO for an event. And in particular
the pattern of allocate + immediate mmap is expensive in a VM.
Suballocating cuts down the # of times we do this in
dEQP-VK.api.command_buffers.execute_large_primary from 10000 to 157,
avoiding problems with the test running up against watchdog timeout.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33721>
The issue with using video buffer associated data is that the data will
not be cleared when the buffer is removed from DPB. This will cause
issues if application tries to reuse such buffer (buffer that was
valid buffer in DPB in the past, but is currently not active in DPB)
as a dummy buffer for missing reference.
With Tier2 this works correctly because we allocate the DPB buffers
internally, but with UDT we use the video buffers directly for
references and so we need to make sure to only use the valid buffer
for a given index.
Instead of storing the buffer index as video buffer associated data,
use the render_pic_list array that we already have for keeping track
of active buffers in DPB.
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33641>
There are some tests that reaches out of memory (OOM) on purpose to
cover some fail cases.
But others that shouldn't are actually causing OOM too because we run
multiple tests in parallel, which increases the memory pressure.
This can affects other tests running in parallel, causing an increase of
the flakiness.
It is better to skip all of them
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33763>
It looks like even if we pass the header not present in the sampler descriptor,
it's not helping with the correct behavior of texelFetch.
Experiment on real HW shows that if we just zero out the header and include it
in the message, it helps with the correct behavior. I'm not sure if there is a
valid HW workaround for this one.
We can skip masking the sampler message header bits 4:0 but masking them out
doesn't hurt in this case.
Increasing number of parameter impact sampler performance, For example,
a sample message using 5 parameters will not be able to sustain the same
throughput as a sample message with only 4 valid parameters. We should
look out for any perf impact with respect to texel fetch.
This patch fixes ~3k tests involving texelFetch instruction on Xe3+
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33562>
If we have a single instruction that uses only combiner unit and previous
instruction doesn't use this unit, two instructions can be safely merged.
Implement compactification pass to do that.
The pass doesn't update instruction dependencies, so it should be run
right before codegen.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33570>
By actually setting the state packets according to the program data.
Also ensure that we correctly flag that the program may be dirty when
the geometry shader state changes
Fixes piglit tests: `spec@!opengl 3.2@gl-3.2-adj-prims * pv-first`
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33658>
Introduce an optimizer for ppir with 3 passes:
1) remove empty blocks: this one currently doesn't have any effect on
code generation, but it's required by other passes
2) remove redundant mov that is generated for store_output intrinsic when
possible
3) dead code elimination
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33636>
Discard block is the only block that we generate internally, and it
currently just gets an index of 0 which collides with the very first
block. It is not an issue for compiler, but an eyesore for debug output
for a program with discard_if.
Assign INT_MAX index for it.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33636>
Currently regalloc doesn't mark write destinations in the single
instructions as conflicting, as a result regalloc may assign the same
register to a multiple write destinations.
Before we started scheduling multiple root nodes into a single instruction
it was pretty much hidden. Fix it by marking destination registers as
conflicting if instruction has multiple writes.
Also stop handling a special case for output registers in regalloc and just
mark them as live in the last instruction of "stop" block(s)
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33636>
Recently, Indiana Jones and The Great Circle messed up this by adding
duplicates and this was causing the game to crash at launch.
Of course, this was an application bug that VVL was also able to catch
but I think maybe Mesa should ignore those instead of failing to create
the logical device.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33705>
The number of bound vertex buffers is now always equal to the number of
used buffers in the vertex elements state even if some buffers are NULL.
set_vertex_buffers doesn't unbind [count..last_count-1] buffers anymore.
bind_vertex_elements_state does that. It lets us remove code from
si_set_vertex_buffers.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27736>
Setting vertex elements before vertex buffers is a new requirement of gallium.
This is the only way to set the vertex elements state after vertex buffers
in st/mesa while setting the state before vertex buffers in tc_batch_execute.
A new TC call is added to set both vertex elements and vertex buffers.
Vertex buffers are filled by st/mesa first, and then the vertex elements
state is set in the same call. When TC calls it, it binds vertex elements
before vertex buffers.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27736>
Otherwise, the UUID changes for games that have shader-based drirc
workarounds and this breaks precompiled shaders on SteamDeck.
Instead, use this pdev cache key to compute the logical device hash
which is common to all pipelines.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33717>
As reported in issue #11825 the code that is meant to clean up old
cache dirs actually ends up creating an empty dir due to reusing
existing code to create the cache path required for the potential
cleanup.
Here we make the code more flexible allowing cache path strings
to be returned by the helpers if the directory already exists
or returning NULL if we don't want to create a new directory.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11825
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33699>
Make the "block after DO" more stable so that adding instructions after
a DO doesn't require repairing the CFG. Use a new SHADER_OPCODE_FLOW
instruction that is a placeholder representing "go to the next block"
and disappears at code generation.
For some context, there are a few facts about how CFG currently works
- Blocks are assumed to not be empty;
- DO is always by itself in a block, i.e. starts and ends a block;
- There are no empty blocks;
- Predicated WHILE and CONTINUE will link to the "block after DO";
- When nesting loops, it is possible that the "block after DO" is
another "DO".
Reasons and further explanations for those are in the brw_cfg.c comments.
What makes this new change useful is that a pass might want to add
instructions between two DO instructions. When that happens, a new
block must be created and any predicated WHILE and CONTINUE must be
repaired.
So, instead of requiring a repair (which has proven to be tricky in
the past), this change adds a block that can be "virtually" empty but
allow instructions to be added without further changes.
One alternative design would be allowing empty blocks, that would be
a deeper change since the blocks are currently assumed to be not empty
in various places. We'll save that for when other changes are made to
the CFG.
The problem described happens in brw_opt_combine_constants, and a
different patch will clean that up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
We never write the traversal shader address out to shader group handles,
so this is not necessary. On the flipside, it can cause conflicts if the
traversal shader is allocated in a range occupied by a replayed shader.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33711>
as previously discussed.
this is using "library CL" instead of kernel CL, which is the older way of doing
things. it works, it just has more boilerplate per-kernel than we'd want
otherwise. but library CL is basically free to integrate into a driver, whereas
kernel CL requires a lot more upfront investment. (I'm working on cleaning that
up but we're not quite there yet.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33362>
With some of the jobs migrated to the new brask and nissa devices, we can
increase zink-on-anv coverage on brya. Reduce the fraction of Piglit
tests and introduce fractional GLESCTS testing.
Also increase the parallelism of the zink nightly job, but lower its
FDO_CI_CONCURRENT variable to avoid OOMkills. To accommodate this,
decrease the parallelism of the anv-adl-full job.
Additionally, drop redundant HWCI_START_WESTON from full runs that
inherit the variable from their pre-merge jobs.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33671>
FDO_CI_CONCURRENT was getting overwritten by .intel-common-test
inheriting FDO_CI_CONCURRENT: 6 from .lava-test, so change the order of
these definitions to fix that.
This change unfortunantely means that GPU_VERSION has to be overwritten
in some cases.
Additionally, drop redundant .anv-test where .anv-angle-test is used.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33671>
Instead, only indicate that exec should be zero and do
so in the successive helper block. This allows to insert
the parallelcopies from logical phis directly before the
branch in break and continue blocks.
Totals from 56 (0.07% of 79377) affected shaders: (Navi31)
Latency: 2472367 -> 2472422 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 253053 -> 253055 (+0.00%); split: -0.00%, +0.00%
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527>
A proper emulation of a second queue requires handling of
wait-before-signal behavior of timeline semaphore. It's doable in Venus
but not that much useful since 1.4 requires a second transfer queue
family if not implementing hostImageCopy. So this change has limited
the second queue emulation as a workaround for android framework on
Android 14 and above.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33698>
There is a spec issue about this to clarify this behavior, but the current
wording can be interpreted that the platform always lists all extensions
supported by all drivers.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33667>
What we need is a way to tell GitLab "queue `build-only` jobs after
`build-for-tests` jobs have started", to make sure that `build-only`
jobs don't start before `build-for-tests` jobs and thus delays test jobs
and the overall pipeline.
The best I had found was "queue `build-only` jobs after *all* the
`build-for-tests` jobs have finished", but this introduces a larger
delay than we want, and causes `build-only` jobs to often be the last
ones to finish in a pipeline, after test jobs that respect the 15min
runtime limit.
Instead, we can tell GitLab "queue `build-only` jobs after the
`build-for-tests` jobs have been queued for X minutes", which is closer
to what we want, and in particular this ensures the correct order of
*starting* jobs as long as the CI is not overwhelmed and doesn't manage
to actually start a queued `build-for-tests` job within 5min, in which
case I'd argue we don't care about job order anymore because we have
bigger problems anyway and likely everything's going to timeout.
This also gets rid of the hard-to-maintain `.build-for-tests-jobs` list
of `needs:`, which also needed to be manually merged in half the jobs.
The trade-off is that we need to make a (shallow) copy of the
`.container+build-rules` list, that replaces all the `when: on_success`
with `when: delayed` + `start_in: 5 minutes`. This means that we'll need
to make sure the two lists of conditions remain identical, but this
seems more manageable; nevertheless, I added a comment to remind us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33564>
Imported linear images may have an arbitrary row pitch. As long as it is
aligned to 16 agx can support. Initialize `.linear_stride_B` from the
supplied parameter and let ail verify it.
Fixes gtk dmabuf based tests with a pitch aligned to 256.
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
`command_count` is under control of the vulkan application and can
become quite large. At a command count around 30000 the size of the
alloca() allocated buffers exceeds the default stack size of 16MB.
Fixes fixes segfaults in 'gtk:compare vulkan lots-of-offscreens-nogl*'
gtk 4 test cases which end up with a `command_count` around 32768.
Fixes: https://gitlab.freedesktop.org/asahi/mesa/-/issues/47
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
libwrap.dylib is helpful to trace control streams on macOS. When it was
originally implemented, we..
* supported macOS in our OpenGL driver and needed to actually exercise these
interfaces
* didn't have Linux support or hypervisor support or anything so needed the
traces to be utterly thorough
* only had a single macOS version to worry about
The landscape today is very different
* no macOS support in our driver stack
* we can trace registers via the hypervisor - libwrap.dylib is no longer
"correctness" bearing, it's just a convenience tool
* what counts is the hardware side - tracing all the macOS software structs is
not actually useful, the hypervisor is the right place to grab control regs
* piles of macOS versions, this code only ever worked properly on 11.x and 12.x,
but with m4 r/e coming up soon we need a lot more versions working.
So... we keep around libwrap.dylib, but slim it down to only decode the bare
minimum of macOS versioned structures, just enough to grab the control stream
pointer and dump that. This is a loss of functionality around CRs (but we have the
hypervisor as a much better way to grab CRs). In exchange it makes the code much
more manageable and less likely to break every 6 months.
So in exchange for all this deletion we also get things working again, this time
on 13.x. But porting back to 12.x or 11.x would be a very small diffstat given
the reduced focus of the new code.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
I dumped assembly generated by our driver with INTEL_DEBUG=shaders,
copied and pasted it into a lua file, tried to run it with
src/intel/executor, but the disassembler started telling me some
instructions were invalid.
This happened because we print the "compacted" flag in our assembly
text, so when brw_gram.y parses our assembly flag, it sees the
"compacted" flag and sets it to the instruction by calling
add_instruction_option(). But the executor tool never sets the
BRW_ASSEMBLE_COMPACT flag when it calls brw_assemble(), so when
brw_assemble() calls dump_assembly(), which calls brw_disassbemble(),
the disassembler gets confused and prints misinterpreted instructions
and calls them invalid.
It is not the job of brw_gram.y (our text assembly parser) to mark
instructions as compacted. Whatever is later assembling the
instruction is the entity that should decide if the instructions are
compacted or not. So in this patch we just ignore this flag.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33614>
The kernel+rootfs jobs previously downloaded the prebuilt kernel iamge,
but this was unnecessary as LAVA doesn't use them here, and the images
were never uploaded to S3. LAVA acquires the kernel in lava_submit.sh,
and baremetal downloads the required images and dtbs in baremetal_build.sh.
The kernel modules are still required for some devices.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33606>
Turns out we were missing the glapi bits, making it impossible to use get
the function pointers for this extension. Whoops?!
[daniels: Squashed in a618 SkQP fails, presumably caused by these not
being skipped anymore.]
Fixes: 9f5af68995 ("mesa/main: expose `EXT_multi_draw_indirect`")
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Tested-by: Chris Healy <healych@amazon.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33546>
Add support to rewrite shared atomics into compare-and-swap loops,
previously the nir_lower_atomics pass only supported global and ssbo
atomics.
Only freedreno irc3 reuses nir_lower_atomics, this change does not
impact their usage since they do not support shared atomics.
Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33572>
Adding the b2i(a) == 1 and b2i(a) != 1 patterns also helps prevent
regressions when spurious negations are removed from integer equality
comparisons, as is done in !33498.
v2: Make all variables part of the iteration instead of calculating some
of them. Suggested by Alyssa.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973331 -> 16973309 (<.01%)
instructions in affected programs: 266 -> 244 (-8.27%)
helped: 2 / HURT: 0
total cycles in shared programs: 915620774 -> 915620550 (<.01%)
cycles in affected programs: 4360 -> 4136 (-5.14%)
helped: 2 / HURT: 0
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748011 -> 209748003 (-0.00%)
Cycle count: 30514920286 -> 30514920400 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237334726 -> 237334710 (-0.00%)
Totals from 8 (0.00% of 706651) affected shaders:
Instrs: 16956 -> 16948 (-0.05%)
Cycle count: 261052 -> 261166 (+0.04%); split: -0.92%, +0.96%
Non SSA regs after NIR: 20000 -> 19984 (-0.08%)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
At least some Total War: Warhammer3 vertex shaders associate the
comparisons differntly, so the existing patterns were not triggered.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748654 -> 209748173 (-0.00%)
Cycle count: 30514333964 -> 30514361348 (+0.00%); split: -0.00%, +0.00%
Fill count: 622688 -> 622537 (-0.02%)
Max live registers: 65477039 -> 65477033 (-0.00%)
Non SSA regs after NIR: 237334768 -> 237334728 (-0.00%)
Totals from 512 (0.07% of 706651) affected shaders:
Instrs: 1000693 -> 1000212 (-0.05%)
Cycle count: 42174312 -> 42201696 (+0.06%); split: -0.15%, +0.21%
Fill count: 11456 -> 11305 (-1.32%)
Max live registers: 121599 -> 121593 (-0.00%)
Non SSA regs after NIR: 1253445 -> 1253405 (-0.00%)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
These are persistant objects that you can use to signal and wait over.
We need to import without VK_SEMAPHORE_IMPORT_TEMPORARY_BIT and we can't
throw away the Vulkan semaphore after each submit.
Fixes: 32597e116d ("zink: implement GL semaphores")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33549>
When the size of the signals[] array was changed to 3, the
signal_values[] array was not updated accordingly. If we have a
signal_semaphore and are presenting at the same time, this can lead to
an array overflow and the driver will read some random stack value as
the signal value. This is causing chromium to lock up when running
WebGL.
Fixes: 7f56fd9655 ("zink: it's kopperin' time")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33549>
From the Vulkan documentation, the queueFamilyIndex value will be
created with VkDeviceQueueCreateInfo. So let's avoid counting the
index value and just refer to the already-created value.
This will resolve crashes on some GPUs for various workloads.
v2: Needed to use GetDeviceQueue() in order to map the queueFamilyIndex
values. These values can be different when obtaining the queue used
for presentation, so we need to ensure we update the mapped
queueFamilyIndex value for the associated queue_data struct.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33487>
Not sure if this was intentionally left when block_check_for_allowed_instrs's
param was changed from bool to int, but it certainly was broken without the
previous commit for discards. Now those should work, so the (unintentional?)
special case can be removed.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33590>
MemScope::System has to synchronize with everything in the system,
including across PCIe so it's horribly slow. MemScope::GPU, on the
other hand, only has to synchronize within the GPU. This is way faster
and still satisfies all of Vulkan's requirements because Vulkan never
allows CPU<->GPU access without full semaphores and barriers.
Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33649>
Implementing alpha-to-coverage dithering broke the non-dithering case.
(Discovered by accident, not really a big deal since it's almost always
enabled and can only be disabled by using a Nvidia GL extension, and
can't be disabled with Vulkan.)
Fixes: ad4635d6ef
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33586>
This hasn't been hooked up to the build since we deleted autotools back
in 2019. It's effectively dead code anyway, as GLX is not a moving
target, and at this point is it easier to modify the generated code
directly than to modify the generator. xserver is encouraged to copy
the generators from 2019 into its own build if it wants, or -
preferably, in this GLX greybeard's opinion - find a prettier codegen
solution in the process of finishing GL 3.0 support.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33623>
Using memcpy with the max size generates a global-buffer-overflow, as
the performance counter strings are smaller than the max size.
Instead, use a string copy function to get a copy.
This was detected with address sanitizer enabled and running vulkaninfo.
Fixes: 3e8b2fe053 ("broadcom/simulator: Add DRM_IOCTL_V3D_GET_COUNTER to simulator")
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33627>
We can't use the local variable key to insert in the hashtable, as the
key needs to be persistent for future searches.
This makes a copy of the key in the pipeline, which is kept persistent
in the hashtable.
This fixes a stack-buffer-overflow.
Backport-to: 25.0
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33630>
_mesa_make_current will use st_flush(ctx) to execute pending
commands before switching to the new context.
Since we can't have multiple threads using a pipe_context at
the same time, we must finish glthread to avoid having the
unmarshalling thread executing at the same time.
It's fixing random crashes where a thread would do:
st_destroy_context ->
_mesa_make_current ->
st_glFlush(save_ctx) ->
tc_execute_batch
While there's a glthread unmarshalling thread that's still
adding commands to TC.
Fixes: 08d97aadd1 ("st/mesa: fix texture deletion context mix-up issues (v2)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33552>
On v11+, V2F32_TO_V2F16 doesn't exist anymore.
This commit ensure we stop using it on every codepath except when a
vectored conversion is prefered. (v9-v10)
Instead, we use FADD.F32 to handle data conversion thanks to the
swizzle defined for the destination.
This also work on older Valhall gens, so let's follow that
logic when we only have one component used.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33608>
Validation was checking that if an instruction was accessing FAU RAM,
only one 64-bit slot was accessed, and if it was accessing a FAU special
value, only one was accessed, however it was not checking if both RAM
and special were used, which is only allowed in messaging instructions
except ATEST and BLEND.
Fixes Piglit:
spec/ati_fragment_shader/ati_fragment_shader-render-ops/mov c0.r
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Fixes: fd1906afea ("pan/va: Add FAU validation")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33608>
Previously deqp tests with *.multisampled_image_sparse_residency.* would
crash with "Unknown image intrinsic" because
nir_intrinsic_bindless_image_sparse_load was not handled in the lowring
code.
This commits handles MSAA sparse residency lowering as with other cases.
Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Fixes: 7604697ec6 ("nvk: Implement shaderStorageImageMultisample")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33625>
Add support for GL_OVR_multiview's GL_MAX_VIEWS_OVR which can be
accessed with glGetIntegerv().
MaxViews is accessed via the hash table set up by get_hash_params.py as
a constant (MAX_VIEWS_OVR) using GL_MAX_VIEWS_OVR.
v2: Add this patch (thanks to Mike's guidance)
v3: Drop unnecessary enum size element in OVR_multiview.XML
v4: Switch to CONST(MAX_VIEWS_OVR) instead of gl_constants::MaxViews
(Marek's suggestion)
Fixes: 328c29d600 ("mesa,glsl,gallium: add GL_OVR_multiview")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: James Hogan <james@albanarts.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32992>
Implement the OVR_multiview framebuffer attachment parameters in
get_framebuffer_attachment_parameter():
- GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_NUM_VIEWS_OVR: This reads the
attachment's NumViews.
- GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_BASE_VIEW_INDEX_OVR: This reads the
attachment's Zoffset, but only if NumViews is non-zero.
This allows apitrace (PR 937[1]) to show the correct layers for
multiview framebuffer attachment surfaces, as well as to show this
information in the framebuffer attachments state.
[1]: https://github.com/apitrace/apitrace/pull/937
Fixes: 328c29d600 ("mesa,glsl,gallium: add GL_OVR_multiview")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: James Hogan <james@albanarts.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32992>
The OVR_multiview spec specifies the INVALID_VALUE error to be generated
by FramebufferTextureMultiviewOVR if:
"- <texture> is a two-dimensional array texture and <baseViewIndex> +
<numViews> is larger than the value of MAX_ARRAY_TEXTURE_LAYERS."
Implement this in check_multiview_texture_target(), similar to the test
in check_layer().
Fixes: 328c29d600 ("mesa,glsl,gallium: add GL_OVR_multiview")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: James Hogan <james@albanarts.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32992>
The OVR_multiview spec adds the following condition for framebuffer
completeness:
"The number of views is the same for all populated attachments.
{ FRAMEBUFFER_INCOMPLETE_VIEW_TARGETS_OVR }"
So add a condition to _mesa_test_framebuffer_completeness to check that
all attachments have identical NumViews. This avoids an infinite
recursion between zink_clear() and zink_clear_depth_stencil() in the
event of an incomplete FBO.
Fixes: 328c29d600 ("mesa,glsl,gallium: add GL_OVR_multiview")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: James Hogan <james@albanarts.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32992>
NumViews needs considering along with the other attachment data when
reusing a multiview framebuffer texture attachment (i.e. shared depth
and stencil texture).
The depth and stencil attachments should match in all respects including
NumViews before reusing the existing one, and NumViews should also be
copied when reusing.
This avoids an infinite recursion between zink_clear() and
zink_clear_depth_stencil() in the case of reuse of a multiview
depth/stencil attachment.
Fixes: 328c29d600 ("mesa,glsl,gallium: add GL_OVR_multiview")
Signed-off-by: James Hogan <james@albanarts.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32992>
Symlinking $CARGO_HOME to /usr/local/bin made rustup uninstaller delete
the entire folder, causing mysterious build errors, so let's do the
traditional .cargo/env sourcing to make rustup available to the rest of
the build scripts.
Also make sure that required scripts run the shell's rcfile to be able
to setup the PATH correctly.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33519>
Use `rustup self uninstall -y` instead of manually removing folders to
ensure a proper cleanup of the rustup installation, including cargo and
init command injections in shell rc files.
Failing to do so can cause issues, such as bash failing to run in a `set
-e` environment due to a missing `$HOME/.cargo/env`, for example.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33519>
In 2183bc73a6 ("nvk: Use suld for EDB uniform texel buffers"), we
started using suld instead of tld for EDB uniform texel buffers because
we needed it for correctness. However, it's slow as mud. Using
suld.constant seems to fix the performance regression. I don't know if
it's quite tld performance, but it's close.
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33612>
This is way faster than suld.sys, which is what we're using today. So
far I haven't seen it matter for anything but texel buffers but it
likely helps some app somewhere.
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33612>
This should never happen as the client should always give us aligned
addresses. However, in the off chance that it does, aligning down is
probably safer than aligning up as it won't cause the top end of the
range increase and potentially fault.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33610>
The tricks we play for texel buffers with VK_EXT_descriptor_buffer don't
work with tld with very large buffers. suld, on the other hand, doesn't
seem to have these limitations.
Fixes: 3b94c5c22a ("nvk: Lower descriptors for VK_EXT_descriptor_buffer buffer views")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33610>
We can theoretically hit this if CmdProcessGeneratedCommandsEXT is
called with a state command buffer that doesn't have compute shader set
if execute commands bind a shader. We do, however, need to still call
nvk_cmd_upload_qmd() because it also uploads push constants and we need
those regardless of whether or not there's a shader bound.
Fixes: 976f22a5da ("nvk: Implement CmdProcess/ExecuteGeneratedCommandsEXT")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33610>
For the instructions we parse with brw_gram.y, don't unconditionally
call brw_eu_inst_set_cond_modifier(). Do it like we do in
brw_generator::generate_code() and only call it if we have a
cond_modifier to set.
Why? Because for ONE_SRC instructions, CondCtrl (bits 95:92) only
exists if Src.IsImm is false. If Src.Imm is true, then bits 95:64 are
actually Src0.ImmValue[63:32]. If we unconditionally call
brw_eu_inst_set_cond_modifier(), we'll end up zeroing bits 95:92 for
ONE_SRC instructions with 64bit immediates. See BSpec page
Structure_EU_INSTRUCTION_BASIC_ONE_SRC (56880).
This issue can be reproduced with src/intel/executor if you try to
have the following instruction:
mov(16) g10<1>Q 0xfedcba9876543210:Q { align1 WE_all 1H };
our parser will end up zeroing the top bits, so the value of the
immediate will be 0x0edcba9876543210.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33559>
It's entirely possible that the same GEM handle is referenced across
different Vulkan object instances. As per the warnings in xf86drm.h,
the caller is responsible for reference counting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33451>
Since Xe2, the registers are bigger and even the instruction
structures got updated to have 6 bits.
The way I detected this issue was when I tried to use
src/intel/executor to add the following instruction:
add(8) g6.8<1>UD g4<8,8,1>UD 0x00000008UD { align1 WE_all 1Q I@1 };
Executor would read this and end up emitting an add with dst being
g6<1>UD instead of what we wanted. It turns out that inside
brw_gram.y, at dstoperand and dstoperandex we do:
$$.subnr = $$.subnr * brw_type_size_bytes($4);
which would overflow subnr back to 0.
The overflow doesn't seem to be a problem with code we emit directly
(unlike the code we parse, like above) due to the fact that we seem to
treat Xe2 registers as smaller all the way until we call phys_nr() and
phys_subnr() during code generation. The phys_subnr() function can
generate a value that would overflow reg.subnr, but this value is
never written back to reg.subnr, it's just returned as an unsigned
int.
Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33539>
The intent here was to check if the tile we're trying to merge
vertically (prev_y_tile) has already been merged horizontally into a
neighboring tile, but I used the slot_mask which also contains the tiles
that have been merged into the prev_y_tile, so the check was too
conservative and would fail even if another tile had been merged into
prev_y_tile. This meant that we would fail to ever create 2x2 regions of
tiles. Fix this by just testing prev_y_tile's bit in the mask.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33534>
Even though we always try to merge a horizontally or vertically adjacent
tile, when we try to merge a vertically adjacent tile it may not
actually be adjacent because it was merged horizontally and the current
tile wasn't or vice versa. We have to detect this and reject merging it.
Fixes: 3fdaad0948 ("tu: Implement bin merging for fragment density map")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33534>
This custom builder implements fine-grained instance node bounds
calculation by looking at all AABBs at tree depth 2.
Shaves off 0.3ms in the start scene for Indiana Jones: The Great Circle
on Deck (roughly 29.1ms->28.7ms).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32797>
This allows drivers to inject custom functions to calculate the bounds
of instance nodes. For example, this can be used to determine instance
bounds by transforming the AABBs of all child nodes at some level in the
BVH. When instance transforms contain rotations of close to 45°, this
can yield a tighter AABB than just taking the instance's top-level AABB
and rotating it.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32797>
For decode this is also done in decode_video.
This breaks if app doesn't call vkCmdEncodeVideoKHR before end, eg:
vkCmdBeginVideoCodingKHR
vkCmdControlVideoCodingKHR
vkCmdEndVideoCodingKHR
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33582>
KGSL unconditionally supports preemption so we cannot ignore it.
On a6xx, we have to emit VSC addresses per-bin or make the amble include
these registers, because CP_SET_BIN_DATA5_OFFSET will use the
register instead of the pseudo register and its value won't survive
across preemptions. The blob seems to take the second approach and
emits the preamble lazily. We chose the per-bin approach but blob's
should be a better one.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12627
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33580>
Combiner unit runs after fmul/smul/fadd/sadd units and it can consume
the results that previous units wrote to the registers. So prefer
placing scalar mul into combiner unit and predecessors (if any)
into other units
shader-db:
total instructions in shared programs: 29072 -> 27698 (-4.73%)
instructions in affected programs: 11237 -> 9863 (-12.23%)
helped: 163
HURT: 0
helped stats (abs) min: 1 max: 42 x̄: 8.43 x̃: 4
helped stats (rel) min: 0.64% max: 30.00% x̄: 13.03% x̃: 11.76%
95% mean confidence interval for instructions value: -9.89 -6.96
95% mean confidence interval for instructions %-change: -14.09% -11.97%
Instructions are helped.
total loops in shared programs: 2 -> 2 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 367 -> 372 (1.36%)
spills in affected programs: 16 -> 21 (31.25%)
helped: 1
HURT: 2
total fills in shared programs: 1208 -> 1224 (1.32%)
fills in affected programs: 51 -> 67 (31.37%)
helped: 2
HURT: 2
LOST: 0
GAINED: 0
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33568>
Fix multiple issues with atan in disassembler:
- arg1_en field in combiner unit actually seems to be a bit indicating
that one of sources is vector (e.g. for atan_pt2, or multiplication)
- atan2 has 2 arguments, not one
- properly handle all instruction variants
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33540>
Apparently fast path cannot handle mismatched mutability and we
should use CP_BLIT which has SP_PS_2D_SRC_INFO.MUTABLEEN to signal
src mutability. Previously it was partially handled by
tu_attachment_store_mismatched_swap.
Fixes: a104a7ca1a
("tu: Handle non-identity GMEM swaps when resolving")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33514>
If the program writes to shared variables after all reads, in the last
block of the program, no one will ever read the value we write. We can
just eliminate these dead writes.
(Thanks to Faith Ekstrand for improving the ends_program() conditions.)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33452>
glx/egl "multibuffers" denotes if server side support DRI3
multi plane and modifiers which is version >= 1.2. But now
it returns true just when DRI3 version >= 1.
This causes problem when xserver with amdgpu DDX which only
support DRI3 1.0, so "multibuffers" gets set unexpectedly,
and client send DRI3 >= 1.2 request to server which gets
unimplemented error.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31657>
multibuffers_available denotes xserver side support for
DRI3 and Present protocols which should not affect client
side support for dmabuf import/export.
This is for xserver with amdgpu DDX in which case
multibuffers_available will be false, but dmabuf import/export
should be enabled to support applications like mpv which use
dmabuf import for vaapi decoded buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31657>
This is part of VK_EXT_mutable_descriptor_type but we never did anything
with it. Since we use local memory for descriptor sets, copying from
them means reading VRAM through a WC map and it's pretty expensive.
Using malloc() for HOST_ONLY should be a nice perf boost for things
which give us the hint.
This massively improves the performance Dragon Age: The Veilguard,
taking it from 7 FPS to 25 FPS on an RTX 4060.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12622
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33571>
More are found missed from prior maint5 support. This change has
properly initialized the maint5 props as well as fixing its new
VkPipelineCreateFlags2CreateInfo integrations.
Verified with dEQP-VK.*maintenance5*
Fixes: be6fece6e1 ("venus: enable VK_KHR_maintenance5")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33566>
This machine registration command makes it possible to check for the
specified list of machine tags rather than just doing a full comparison
of tags between the expected state and the current state.
This is beneficial for multiple reasons:
* It enables having more than one GPU in a host, and we let the machine
registration container unbind the unwanted GPU before Mesa CI even
executes anything
* It makes it possible to alter the boot process so as to use a kernel
with a different architecture than the default kernel of CI-Tron uses
* Adding or modifying tags which are unused by a job won't fail the
first job after the new machine registration container lands.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33551>
This is not conformant and it can cause hard to debug issues or hide
existing bugs. Getting rid of this limit will allow lavapipe to use the
common bvh building framework since the ploc build shader has a loop
that waits to start the next phase.
cc: mesa-stable
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31426>
iceil can return bogus (negative) values in case there's an overflow
(or a NaN). This would then take forever to run due to a couple billion
loop iterations.
Use unsigned minimum instead which will clamp iterations to max aniso
(not sure if that makes more sense than clamping negative values to 0,
probably doesn't really matter).
Fixes: 350a0fe632 ("llvmpipe: Use a simpler and faster AF implementation")
Reviewed-by: Brian Paul <brian.paul@broadcom.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33537>
Previously DGC alignment requirements declared by
getGeneratedCommandsMemoryRequirementsExt were not also reported by
getDeviceBufferMemoryRequirements for preprocess buffers.
This fixes 1554 dEQP-VK failures related to device-generated commands
that previously failed with "DGC alignment requirement larger than
preprocess buffer alignment requirement".
Fixes: 976f22a5da ("nvk: Implement CmdProcess/ExecuteGeneratedCommandsEXT")
Reviewed-by: Faith Ekstrand <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33555>
This should use the stride and offset from transfer, because
the values from resource_get_info may not match the mapped
memory if the driver uses staging texture for transfer.
This also gives us data size and we don't need to calculate
it for each format.
Unfortunately we only know the values when mapping the buffer,
but VAAPI requires the values when creating the image and at
that point we don't know the usage yet (read/write).
Do a dummy map of all planes first time DeriveImage is called
for each surface and cache the values for subsequent calls.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32326>
This function is no longer the only concept we have of "Vulkan version",
so let's rename it to reflect that it's only about the API-versin. We
don't really need to specify that it's about Vulkan versions, that seems
pretty obvious here.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33254>
This will be needed in order to check off passing the VK CTS properly.
Please note, this does *not* mean that we are formally conformant, only
that we have passed the VK CTS at least once. Those are not the same
thing.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33254>
Previously, we stored the full color output before lighting, then compute
lighting and store just the .xyz of the result to .xyz.
We can save followup optimization work to clean up the unused .w
calculations during lighting, and DCEing the first .xyz store if we just
store .w when it's done, and only do lighting on .xyz. Some of that
redundant store work may not have been happening on all backends.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33501>
bi_opt_mod_prop_backward tries to propagate values backwards, but
stops checking for uses when it reaches the SSA definition. For
ordinary blocks that's fine, but for loops the definition can come
after a PHI that uses the value. This causes incorrect code to be
generated in shaderdb test `shaders/skia/2134.shader_test`. Fix this
by special casing PHI instructions, in a manner similar to done in
asahi/compiler/agx_optimizer.c.
This bug has been present a long time, so we want it back-ported to
stable.
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33483>
Normally the driver & compiler work together to use as few
3DSTATE_VERTEX_ELEMENTS/VERTEX_BUFFER_ELEMENT data as possible.
The compiler ignores unused bits and driver avoids emitting the
corresponding elements in 3DSTATE_VERTEX_ELEMENTS.
For device generated commands, we want an 3DSTATE_VERTEX_ELEMENTS
programming that is independent from the shader so that we can
implement indirect pipeline binding without complicating the
generation shader as well as emitting fewer generated commands.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>
It turns out that the change from CENTER_Y to CENTER_X for
422 YUV didn't actually happen until generation 14 of the
hardware, not generation 10 as some documents claimed. This
fixes the failing piglit tests ext_image_dma_buf_import-sample_yuv
associated with 422 formats (which apparently we aren't running on CI).
Fixes: 23aa784c
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33516>
When using VK_EXT_acquire_drm_display, the Vulkan API user must provide
the DRM master fd that will be used. This operation is performed through
`vkAcquireDrmDisplayEXT()` in which `drmFd` will be assigned to
`wsi->fd` and will be used for privileged operations.
This means that, when we are creating the physical device, we need to
open a DRM primary node (as the specification states that "The provided
drmFd must correspond to the one owned by the physicalDevice."), but it
doesn't need to be the DRM master.
Therefore, when using VK_EXT_acquire_drm_display, keep the primary fd
open and don't check if the fd is the DRM master.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33096>
Based on observations of the generated assembly, this instruction appears to:
- Swizzle the 8/16 component vector in src0 according to the pattern defined in src1.
- Apply a enable mask from src2 to selectively modify elements.
I encountered this instruction while experimenting with _viv_asm and
packed types.
Here is one exmaple kernel:
kernel void k(global int* out, int a, int b) {
_viv_char2_packed s;
_viv_asm(MOV, s.x, s, a);
_viv_asm(MOV, s.y, s, b);
out[0] = s.x + s.y;
}
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33474>
We currently assume the implicit offset shift is always 2. However, this
shift is actually based on the type, making sure the offset fields are
in units of the type size. The full offset calculation is as follows:
((SRC2<<SRC2_SHIFT) + OFF)<<TYPE_SHIFT
Where SRC2, SRC2_SHIFT, and OFF are instruction fields while TYPE_SHIFT
is implicit and derived from the TYPE field.
This commit implements (dis)assembly support for this, adopting the
syntax used by the blob.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33503>
For bin merging, we will have to first sample all bins in the pipe, then
determine which bins can be merged, then iterate over bins. Combine all
of the information required to render a bin into a tu_tile_config struct
and pass it down to tu6_emit_tile_select(). This will let us more
flexibly construct a list of bins later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33230>
This was causing a crash on
"dEQP-VK.graphicsfuzz.cov-function-large-array-max-clamp" where the
test was trying to allocate ~6GB of TLS.
Considering we were already doing something identical before those changes,
we can just add nir_lower_scratch_to_var before nir_lower_vars_to_scratch
to get the expected behavior (Cleaning LLVM spilling mess around pan_pack)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 1619fc596a ("bi: Optimize scratch access")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33510>
When a command buffer ends with pending barriers we were emitting a
serialized noop job, but this only works to ensure serialization of
follow-up CL jobs, it won't do what we want if the barrier was
intended for compute or TFU transfers for example. Fix this by
merging the barrier state into follow-up jobs in the same queue
submission.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33507>
Instead of the cmd_buffer. Also, rename it to drop the cmd_buffer
reference, make it a public helper, make it accumulate the
barrier state instead of overwriting it and make it return whether
it actually applies a barrier into the job.
We will use this new public helper in a follow-up change from the
queue to better handle barriers at the end of a command buffer.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33507>
For devices that load shader consts through preamble, HLSQ_INVALIDATE_CMD
should be used to invalidate VS state before CP_DRAW_INDIRECT_MULTI. This
avoids previous consts loaded through CP_LOAD_STATE6_GEOM for non-indirect
draws to affect the consts needed for the current indirect draw.
Fixes two failing vkd3d-proton test cases on a750:
test_vertex_id_dxbc
test_vertex_id_dxil
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32924>
Make setting the ANGLE_ARCH variable optional by providing a default
based on DEBIAN_ARCH, while keeping it possible to override it, which is
expected to be necessary for the Android-arm64 build.
Exclude unnecessary third party dependencies in the .gclient file, which
allows us to delete our first local patch. Thanks to Yuly Novikov for the
suggestion.
Use -no-history for gclient sync, which is equivalent to git's --depth=1
argument. This greatly speeds up the process of fetching sources.
Thanks to this speedup fetching third_party/catapult is no longer an
issue, allowing us to remove our second local patch.
Since we're no longer applying local patches, use ANGLE_REV and
/angle/version as the base for our version check on Android.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Antonio Ospite <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33467>
ANGLE really wants to use its own toolchain and sysroot by default.
Unfortunately, for AArch64, that toolchain is actually a cross-compiling
toolchain designed to be hosted on x64, which is ... not what we want.
Use the system toolchain, and since we're not using the bundled
compiler, also don't use LLVM's libc++ and abseil, since those don't
always work with the system toolchain.
v2 (Valentine)
* Only use native toolchain on linux
* Contain clang-19 environment variables within a subshell
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Antonio Ospite <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33467>
Handle phis in two groups: first those which already have a preferred
reg set and then those without. The second group should be rare but by
handling them last, they don't accidentally occupy a preferred reg of
another phi, preventing excessive copying in some cases.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33489>
The as_build and related functions only appear when MESA_GPU_TRACES=
perfetto is set. By default, when running an RT workload for profiling,
these traces should be recorded alongside other trace points. This
commit ensures that acceleration structure build events are properly
captured when running an RT workload.
v2(Michael Cheng): Move this logic up to anv_device_init_accel_struct_build_state
v3(Michael Cheng): Set emit_markers = true and let the generated
functions handle the check for u_trace_enable and intel_gpu_tracepoint
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33461>
This lets us remove some special casing from update_renames and make it
simpler. Doing this in update_renames was also fragile, since the
parallelcopies vector is not final when update_renames is called, so it
might not have been safe to do so in the end.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33444>
It seems we have some issue with the driver, due the high number of
(random) flakes that appear in different jobs, and that eventually are
causing issues with Marge.
While we don't identify where is the problem, let's disable the job to
avoid interferences with Marge.
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33490>
init_queue() calls cleanup_queue() if anything fails in the middle, which
means finish_render_desc_ringbuf() will be automatically called if
init_render_desc_ringbuf() failed. Get rid of the the error path and
return directly instead. The one exception we have is the dev_addr
allocation, which needs to be explicitly freed if an error occurs between
util_vma_heap_alloc() and pan_kmod_vm_bind().
Reported-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33471>
Between Rust's love of the stack, the size of Rust objects, and the
number of parameters we have to pass in order to sort out lifetime
issues, Rust recursion can be quite expensive. Combine that witn
Windows' tiny stack sizes and this call is blowing out the stack on
games running on DXVK and VKD3D-Proton. This gets rid of this bit of
recursion and replaces it with a loop and a worklist.
Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33472>
This code is old, copied from the old nouveau GL driver. As of Pascal,
we have have 32k images so we need 32k scissors as well. Use the
max_image_dimension() helper instead of hard-coding it.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33478>
Add current time when generating the stream handle initial value.
When running inside PID namespace there can be multiple processes
in the system that will share the same PID and with current code
this could result in the same stream handle being used at the same
time from different processes.
This can easily happen with Flatpak when running two instances of the
same application - both processes will have the same PID and we
will use the same stream handles.
For older UVDs kernel will reject the CS if we use duplicated handles.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12575
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33374>
Indeed, has_atomics is not yet initialized at the time of the
call of r600_init_shader_caps(). This change fixes this issue.
For instance, this issue is triggered with
"piglit/bin/clearbuffer-depth-cs-probe -auto -fbo":
clearbuffer-depth-cs-probe: ../src/gallium/drivers/r600/evergreen_state.c:5039: evergreen_emit_atomic_buffer_setup: Assertion `resource' failed.
Aborted
Fixes: 7cd606f01b ("r600: add r600_init_screen_caps")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33438>
The offset is measured in bytes. Some of the code here acted as though
it were measured in src.type units. Also modify the assertion to check
that all extracted bits come from data in the immediate value.
Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Fixes: da395e6985 ("intel/brw: Fix extract_imm for subregion reads of 64-bit immediates")
Yes, I missed this error *twice* in code review.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33049>
Add support for GL_EXT_protected_textures to create protected
texture in OpenGL ES 3.2. This enables allocating standard
GL textures as protected surfaces. This allows use-cases such
as depth, stencil, or mipmapped textures to be supported as
destinations for rendering within a protected context.
Signed-off-by: Saroj Kumar <saroj.kumar@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33358>
Starting from Android 14 (Android U), framework HWUI has required a
second graphics queue to avoid racing between webview and skiavk. For
non-Android, we leave the second queue emulation behind a debug option.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30985>
Line smoothing should only be enabled for line primitives.
So far we were only checking the pipeline topology, but this is not
enough if there is a geometry shader, as it can change the primitive
from line to anything else, or the other way around.
This fixes several failures in
dEQP-VK.draw.renderpass.non_line_with_params.*.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33395>
Command buffer private object destroy callbacks receive a 64-integer so their
signature should respect that to avoid alignment issues when passing pointers.
This is the same we were already doing for color pipelines, but now for D/S
pipelines too.
Fixes crash on 32-bit build with:
dEQP-VK.synchronization2.op.single_queue.fence.write_clear_attachments_read_copy_image_to_buffer.image_128x128_d16_unorm
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33463>
The proper name for the meson options changed to meson.options in Meson
1.1. Since we don't support older versions of Meson anyway, let's just
rename the options-file to the new name.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33445>
Since the binding value can be any 32-bit number, we cannot assume that
it is <= 27 bits. We need 64-bit keys to accommodate a 32-bit binding.
This will also provide more bits to store the subdesc id, which will be
needed for multiplane texture and sampler descriptors.
Fixes: 7bea6f86 ("panvk: Overhaul the Bifrost descriptor set implementation")
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32563>
We're about to start lowering these in the IR, at which point the
scheduler will see SEND instructions with fence messages. Previously,
we handled those in the generator, and didn't handle the virtual opcodes
here, letting them fall through to the default case of 14 cycles.
These new numbers are completely fabricated, matching the times we have
for atomic operations. This is basically what we did for LSC atomics.
While it may not be accurate, it's at least better than 14 cycles.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
Memory fences do not refer to an element of a binding table. Rather,
the reason we had "BTI" in these opcodes was to distinguish what in
modern terms are called UGM (untyped memory data cache) vs. SLM
(cross-thread shared local memory) fences.
Icelake and older platforms used the "data cache" SFID for both
purposes, distinguishing them by having a special binding table
index, 254, meaning "this is actually SLM access". This is where
the notion that fences had BTIs came in. (In fact, prior to Icelake,
separate SLM fences were not a thing, so BTI wasn't used there either.)
To avoid confusion about BTI being involved, we choose a simpler lie: we
have Icelake SLM fences target GFX12_SFID_SLM (like modern platforms
would), even though it didn't really exist back then. Later lowering
code sets it back to the correct Data Cache SFID with magic SLM binding
table index. This eliminates BTI everywhere and an unnecessary source.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
For some reason, we were using UW type for the destination of memory
fences at the generator level, while in the IR we selected UD.
There are some comments in the documentation for the message about it
writing the notification register to the destination, which is 32-bit.
Prior to Xe2, bits 31:16 were Reserved/MBZ. But on Xe2, all 32 bits
are populated with actual data.
I don't know whether this will fix anything in practice, but it seems
like a better plan to use UD. Often we used UW types to avoid having
the destination region of sends span too many registers, but we're in
SIMD1 here, so it shouldn't matter.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
brw_memory_fence() overrides the instructions generated by the
MEMORY_FENCE or INTERLOCK opcodes to be force_writemask_all with
exec_size == 1. But the IR was emitting it in SIMD8 (regardless
of dispatch width). Instead, just emit the IR as SIMD1/NoMask so
the IR matches what we actually generate. Have size_written indicate
that the entire destination is written, however, as it is ultimately
going to be a SEND that writes a whole register.
We were also using a UD register for the source of
FS_OPCODE_SCHEDULING_FENCE when the generator overrides it to UW,
so just specify UW in the IR as well so that they line up.
Also add validation for MEMORY_FENCE/INTERLOCK that we've done the
exec_size and masking right in the IR.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
Rather than using a bit in the generic fs_inst data structure, we can
simply set a source on our logical FB write messages. (We already do
so for many other cases.)
In the repclear shader, setting this wasn't actually having an effect,
as we were setting it on a SHADER_OPCODE_SEND message which ignored it.
(We had already correctly set the bit in the message descriptor.)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
On armhf integer divide by 0 can raise SIGFPE, whereas on aarch64
it just returns 0. This has become an issue because the recently
added panfrost_init_screen_caps always calls pan_gpu_time_to_ns to
calculate caps->timer_resolution, whereas before we only called it
when PIPE_CAP_TIMER_RESOLUTION was queried, and only OpenCL
does that (and not always).
Fixes: 205669e3a9 ("panfrost: add panfrost_init_screen_caps")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33435>
Noticed one by chance and searched for any others with find that were
clearly not meant to be executable.
For the curious:
33aa039acf changed texstore.c to
executable.
ed176e2c71 introduced si_vpc.c and
si_vpc.h which have always been executable.
d0e5203855 changed lava-gitlab-ci.yml to
executable.
328c29d600 introduced OVR_multiview.xml as
executable.
ac912b3754 introduced
OVR_multiview_multisampled_render_to_texture.xml as exectuable.
Signed-off-by: Dudemanguy <random342@airmail.cc>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33436>
Prevent following warning if not running as a normal user:
Failed to create /home for shader cache (Permission denied)---disabling
disk_cache_delete_old_cache() is going to create first the cache directory
using disk_cache_generate_cache_dir(). From mkdir_if_needed(), the stat()
of "/home" is failing with "Permission denied" under some circumstances
when using Firefox.
Fixes: #12168
Fixes: c3bc6991d2 ("util/disk_cache: Delete the old multifile cache if using the default.")
Signed-off-by: Benjamin ROBIN <dev@benjarobin.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32886>
The release branch contains only what was on the staging branch first,
so testing it again is a waste of resources.
To do this, we split the rule into specifically "default branch" and
"staging branch", and "release branch" gets dropped by virtue of no
longer being caught by any rule.
Cc: mesa-stable
Reviewed-by: Martin Roukala <None>
Reviewed-by: Dylan Baker <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33411>
It's too late to run all the tests by then, the release has been made
based on the staging pipelines results
Cc: mesa-stable
Reviewed-by: Martin Roukala <None>
Reviewed-by: Dylan Baker <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33409>
Previously this used the {Min,Max}Tile{Rows,Cols} as returned by the
driver capabilities. Those parameters should be used to determine
implementation supported tile configurations for a specific resolution.
In the case of header coding, the {min,max}Log2Tile{Rows,Cols} should be
derived exactly as the AV1 spec defines it.
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Sil Vilerino <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32762>
Forcing a dedicated sparse queue is problematic in real-world scenarios.
In the current implicit sync world for sparse updates, we can rely on
submission order.
For use cases where an application can take advantage of the separate
sparse queue to do "async" updates, the existing implementation works
well, but problems arise when trying to implement D3D-style submission
ordering. E.g., when a game does sparse on a graphics or compute queue,
we need to guarantee that previous submissions, sparse update and future
submissions are properly ordered.
The Vulkan way of implementing this is to:
- Signal graphics queue to timeline N (i.e. last submission made)
- Wait on timeline N on the sparse queue
- Do sparse updates
- Signal timeline N + 1 on sparse queue
- Wait for timeline N + 1 on graphics queue (can be deferred until next
graphics submit)
This causes an unavoidable bubble in GPU execution, since the
existing sparse queue ends up doing:
- Wait pending signal. The implication here is that all previous GPU
work must have been submitted.
- Do VM operations on CPU timeline
- Wait for semaphores to signal (this is required for signal ordering)
- ... GPU is meanwhile stalling in a bubble due to GPU -> CPU -> GPU roundtrip.
- Signal semaphore on CPU (unblocks GPU work)
Letting the GPU go idle here is not great, and we can be screwed over by bad thread scheduling.
Another knock-on effect is that the graphics queue is now forced into
using a thread for submissions. This is because when the graphics queue
wants to wait for timeline N + 1, the sparse queue may not have
signalled the timeline yet on CPU, so effectively, we have created a
wait-before-signal situation internally in RADV. Throwing another thread
under the bus is not great either.
Just letting the queue in question support sparse binding solves all
these issues and I don't see a path forward where the D3D use case can
be solved in a separate queue world.
It is also friendlier to the ecosystem at large. RADV is the only driver
I know of that insists on separate sparse queues and multiple games
assume that graphics queue can support sparse.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33166>
Prepare for building ANGLE for Android, which requires applying a
patch to fix building with minimal dependencies.
Rework the existing script by allowing passing the ANGLE_TARGET and
ANGLE_ARCH environmental variables to specify the target platform,
Android or Linux, and also allow building for arm64 in addtition to
x64 by setting ANGLE_ARCH.
Also build the GLESv1 compatibility mode library (libGLESv1_CM),
which is required for EGL testing on Android.
Based on work by Antonio Ospite.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Eric Engestrom <None>
Reviewed-by: Antonio Ospite <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33377>
After commit 83d1553391 (ci: Don't build Vulkan for GL dEQP, 2025-01-29) deqp
does not build cleanly anymore for the Android target.
Fix that by updating the
build-deqp-gl_Build-Don-t-build-Vulkan-utilities-for-GL-builds.patch patch.
Reviewed-by: Eric Engestrom <None>
Reviewed-by: Antonio Ospite <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33377>
The radv-stoney-angle-full was unintentionally inheriting the fraction
from the pre-merge job.
Also use the correct manual rules definition while we're here, and use
consistent naming for the restricted rules.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Eric Engestrom <None>
Reviewed-by: Antonio Ospite <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33377>
Lionel added a neat debugging tool. Let's make it work with the new-style
hashing approach too, since nir_printf_fmt is a lot more convenient than needing
to define a dedicated CL function to access printf (although that works too).
We remove the old non-hashed path, because it has no more functional users --
hashing is a hard requirement with vtn_bindgen2, which Intel has now switched
to.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33380>
It's unclear to me whether this is dead code that should be removed or
dead code that should be used, so I just marked it as unused to remove
a few thousand warnings when compiling.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33384>
This switches to disk_cache instead of our own mechanism which only
stored meta shaders when the logical was destroyed.
Meta shaders are still stored separately from the application shaders
because they are common to all applications on a given GPU/Mesa version.
The default cache is 32MiB which should be large enough.
This fixes massive stuttering in FF7 Rebirth but all apps are
technically affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33370>
The fix in e7ac1094f6 to emit preamble defs in the correct block would
move the cursor of the builder that is later used to insert descriptor
prefetches, emitting them at the wrong place. Fix this by resetting the
cursor before emitting the prefetches.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: e7ac1094f6 ("ir3: rematerialize preamble defs in block dominated by sources")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33399>
Somewhere between writing c1510ad72e ("nak: Optimize bindless to cbuf
textures on Volta+") and me rebasing it a year later, we switched to
using the NV-specific ldc_nv intrinsic for cbuf loads. It's basically
the same as load_ubo but we're detecting the wrong intrinsic so the
optimization does nothing.
Fixes: c1510ad72e ("nak: Optimize bindless to cbuf textures on Volta+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33398>
These have been switching between failing and passing recently. Not
really sure what's going on here, but we don't want the CI to flip
randomly between failing and passing, so let's mark them as flakes.
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33381>
I think this was broken as there might be a store_output with
less than 4 components to a location that shouldn't be smoothed
anyway (i.e. not the first one).
nir_lower_poly_line_smooth now handles the case where the first location
doesn't have 4 components.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33340>
The if is really short so it should really use a conditional select,
but this pass is called too late go through all the usual lowerings
and opts.
Foz-DB Navi21:
Totals from 1128 (1.42% of 79377) affected shaders:
MaxWaves: 29358 -> 29342 (-0.05%)
Instrs: 552306 -> 549668 (-0.48%); split: -0.58%, +0.10%
CodeSize: 2796392 -> 2782360 (-0.50%); split: -0.59%, +0.08%
Latency: 2574361 -> 2566482 (-0.31%); split: -0.47%, +0.16%
InvThroughput: 644047 -> 647500 (+0.54%); split: -0.18%, +0.72%
Copies: 37521 -> 36460 (-2.83%); split: -2.92%, +0.09%
Branches: 12009 -> 10157 (-15.42%)
VALU: 350886 -> 349199 (-0.48%); split: -0.64%, +0.16%
SALU: 104459 -> 105415 (+0.92%); split: -0.00%, +0.92%
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33340>
Now using the same model as the compute shader.
As a result we temporarily disable the use of the Inline register for
providing push constants on Task & Mesh shaders. Since that register
is also available on the compute shader we'll try to find a way to use
the same mechanism for all 3 shaders in another MR and bring back that
optimization.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32895>
Some drivers implement luminance as RGBA. Since the code checks the
renderbuffer format instead of the internal format this can cause the
query to incorrectly return "signed" on the Alpha component for signed
luminance formats.
Fixes the following Piglit test for various drivers (at least Panfrost
and V3D):
spec/ext_packed_float/query-rgba-signed-components
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33372>
By having the DUTs download and extract from a CI-Tron artifact, we
deduplicate the downloading of the build artifact across all DUTs from
a CI farm, leading to quicker and more reliable jobs, and lower
bandwidth usage on both FD.o and the CI gateway.
Inside the CI-Tron infra, this should also drastically reduce the job
submission time by removing needless copies (executorctl -> executor,
executor -> S3, S3 -> B2C, and even B2C -> NBD when applicable).
As an additional bonus, the size of install.tar is reduced by virtue of
zstd providing better compression than zip/deflate.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32927>
Rather than downloading the full history of deqp just to check a merge
base, let's ask github for this information directly.
This fixes the deqp build on arm32 platforms which do not have enough
address space to run git fetch on such a large repo.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32927>
Both test-gl and test-vk install a bunch of stuff which is required to
just run tests. Instead of copy and pasting a bunch of random stuff into
derived containers, just keep it in the base container. Technically this
makes both containers very slightly larger, but the additions here pale
into comparison with 700MB of mostly-unused Proton, 400MB of deqp-vk
mustpass, etc.
v2 (Martin Roukala):
* Move spirv-tools to the list of dependencies as it is needed by
python3-renderdoc
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32927>
Use command:
find . -type d \( -path "./.git" -o -path "./docs/relnotes" \) -prune -o -type f -exec sed -i 's/PIPE_SHADER_CAP_\([A-Za-z0-9_]*\)/pipe_shader_caps.\L\1/g' {} +
find . -type d \( -path "./.git" -o -path "./docs/relnotes" \) -prune -o -type f -exec sed -i 's/PIPE_COMPUTE_CAP_\([A-Za-z0-9_]*\)/pipe_compute_caps.\L\1/g' {} +
Also remove value type in pipe_compute_caps doc because they
are explicit in struct now.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33176>
To remove ir_type param when get_compute_param. These caps depend
on IR type and used by clover only (only clover query with
PIPE_SHADER_IR_NATIVE, others query with PIPE_SHADER_IR_NIR).
Only r600 and radeonsi support PIPE_SHADER_IR_NATIVE.
These caps can be removed when clover is deprecated.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33176>
Vulkan defines the line rasterization to *always* use perpendicular
rather than aligned line ends (unless otherwise specified by
VK_EXT_line_rasterization). So let's remove the code that conditionally
sets the bit, we always want the default value (0) here.
It might seem confusing because we kinda named this field wrong. It's
really about perpendicular vs aligned line ends. That's a cleanup we
might want to deal with later, but deleting the assignment is sufficient
to fix this issue. This is also what we do for v10.
This was probably just copied from the Gallium-driver, where this logic
is more or less correct.
Fixes: d970fe2e9d ("panfrost: Add a Vulkan driver for Midgard/Bifrost GPUs")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33373>
The check in check_multiview_texture_target() whether numViews <= 0 (as
required by the OVR_multiview spec) is never triggered since it is only
called by frame_buffer_texture() when numviews > 1, as numviews of 0 is
passed in by non multiview FramebufferTexture functions. Such cases are
incorrectly treated as non-multiview attachments.
Tweak frame_buffer_texture() to take an extra bool argument "multiview"
to distinguish between a multiview call with numviews=0, and a
non-multiview call.
Fixes: 328c29d600 ("mesa,glsl,gallium: add GL_OVR_multiview")
Signed-off-by: James Hogan <james@albanarts.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33346>
Fix the FBO attachment completeness test to ensure that multiview
attachments have all views referring to layers in range of the
underlying texture.
The OVR_multiview spec states:
Add the following to the list of conditions required for framebuffer
attachment completeness in section 9.4.1 (Framebuffer Attachment
Completeness):
"If <image> is a two-dimensional array and the attachment
is multiview, all the selected layers, [<baseViewIndex>,
<baseViewIndex> + <numViews>), are less than the layer count of the
texture."
Fixes: 328c29d600 ("mesa,glsl,gallium: add GL_OVR_multiview")
Signed-off-by: James Hogan <james@albanarts.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33346>
This is complicated by two things: mediump varyings, and the lack of u16
regfmt support in LD_VAR.
With mediump, a load(_interpolated)_input with a 16-bit dest size may
either be an explicit 16-bit type or a mediump type lowered by
nir_lower_mediump_io. With explicit 16-bit types, we write 16-bit values
in the VS, but with mediump we write 32-bit in the VS (for messy
reasons). bi_emit_load_vary needs to distinguish these cases by checking
for a mediump type, and set the appropriate source_format to convert the
type on the LD_VAR_BUF path. Types like 'mediump uint16' are luckily not
allowed.
The missing u16 regfmt for LD_VAR means that we take the obvious
approach for 16-bit int varyings of emitting 16-bit int formats in the
attribute descriptor and loading them to u16. Instead, we just
write/read all 16-bit varyings as f16 regardless of type. Unlike with
mediump, we don't need to do any 32bit->16bit conversion when loading in
the FS, so as long as we use the same type between the attribute
descriptor and LD_VAR, the conversion is a no-op and the mismatch
doesn't matter.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33078>
For Bifrost and newer, we always write mediump varyings from a 32-bit
source in the VS. This is needed because the FS does not unconditionally
lower mediump to 16-bit.
Previously we worked around this in panvk by replacing 16-bit formats
with 32-bit in emit_varying_descs, but once we support
storageInputOutput16, we will need to preserve 16-bit formats for
explicit 16-bit varyings.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33078>
The mesa screenshot layer attempts to use VK_FORMAT_R8G8B8_UNORM by
default. Using this, we can directly & efficiently write out to a
PNG file without further modifications. However, some GPUs don't
support the given format, so for those that don't, we'll attempt to
use VK_FORMAT_R8G8B8A8_UNORM, which will require some work to ensure
the alpha values are set to opaque to make RGB comparisons easier.
If both surface formats fail, a more descriptive failure
will be shown.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33295>
For Xe3+ the registers are tightly packed to make better use of GRF
space, so add a statistic to keep track of how many registers were used.
For previous versions this is not useful since the code is spreading
the registers among the whole space.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33311>
We do not have the TEX semaphore there anyway so the benefits are not as
high as with R500 and the chances of running out of TEX indirections are
just too high.
This will increase the register pressure in some shaders, but I believe
the gained shaders are worth it and there is also some cycles reduction
in some cases. I'm not sure how to optimize this further without
actually clonning the shader before the pair shceduling and than doing a
trial and error to see if there is some compromise where we can just hit
the indirection limit to not group it too much...
Shader-db RV410:
total instructions in shared programs: 112800 -> 112825 (0.02%)
instructions in affected programs: 5024 -> 5049 (0.50%)
helped: 23
HURT: 19
total temps in shared programs: 18170 -> 18244 (0.41%)
temps in affected programs: 1365 -> 1439 (5.42%)
helped: 39
HURT: 34
total cycles in shared programs: 169535 -> 166806 (-1.61%)
cycles in affected programs: 14229 -> 11500 (-19.18%)
helped: 84
HURT: 4
LOST: 0
GAINED: 8
GAINED: shaders/godot3.4/34-59.shader_test FS
GAINED: shaders/lightsmark/25.shader_test FS
GAINED: shaders/lightsmark/28.shader_test FS
GAINED: shaders/lightsmark/34.shader_test FS
GAINED: shaders/this-war-of-mine/144.shader_test FS
GAINED: shaders/this-war-of-mine/145.shader_test FS
GAINED: shaders/tropics/432.shader_test FS
GAINED: shaders/tropics/462.shader_test FS
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33275>
if there is no heap with device-local and host-visible, then
rebar cannot exist. the previous detection did not account for
the rebar heap using the device-local fallback, which of course
would have the same size as the device-local heap and pass the threshold
check
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33359>
Rather than assert (and otherwise write past the array size), guard against
this (and miscompile the shader), to make the code more robust.
This mimics the behavior of exceeding the cond_stack size (and other similar
stacks) - the if_stack is only used together with the cond_stack, the behavior
should be the same.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33338>
The gen_header.py script is failing for older versions of python3 such
as python 3.5. Two issues observed with python 3.5 are ...
1. Python 3 versions prior to 3.6 do not support the f-string format.
2. Early python 3 versions do not support the 'required' argument for
the argparse add_subparsers().
Fix both of the above so that older versions of python 3 still work.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28780>
The handle and addr fields of drm_xe_sync is defined as the union:
union {
__u32 handle;
__u64 addr;
};
When initialized on the stack on certain implementations, setting
.handle will leave the upper bits of .addr/the overall union
uninitialized causing exec calls to fail with:
[drm:xe_sync_entry_parse [xe]] Ioctl argument check failed at drivers/gpu/drm/xe/xe_sync.c:136: upper_32_bits(sync_in.addr)
Somewhat awkward but init .addr first to 0 and then set the handle after
the struct init.
Cc: stable
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33172>
There were 4 fields:
- key: now will be passed explicitly, so we can reuse the existing
more general fs_visitor constructor;
- input_vue_map: used only by the client code brw_compile_gs, so
create it separatedly as a local variable;
- two unsigned parameters: just put them inside a nested struct in the
shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33228>
OpenCL C defines work-item functions to return a scalar for a particular
dimension. This is a really annoying papercut, and is not what you want for
either 1D or 3D dispatches. In both cases, it's nicer to get vectors. For
syntax, we opt to define uint3 "magic globals" for each work-item vector. This
matches the GLSL convention, although retaining OpenCL names. For example,
`gl_GlobalInvocationID.xy` is expressed here as `cl_global_id.xy`. That is much
nicer than standard OpenCL C's syntax `(uint2)(get_global_id(0),
get_global_id(1))`.
We define the obvious mappings for each relevant function in "Work-Item
Functions" in the OpenCL C specification.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
This is a rewrite of vtn_bindgen. For now the two tools live in parallel, to
give Intel time to migrate off v1.
For a refresher, the classic vtn_bindgen reads a SPIR-V and generates a .h
containing nir_builder stubs for each exported function. The stub inserts an
unimplemented nir_function with the proper signature into the shader, and adds a
"call" to that function. The driver is responsible for linking with the library
later, which is annoying.
vtn_bindgen2 instead generates a .c/.h pair. The header are just prototypes with
identical signatures to what we have now. The .c implementations, however, are
very different. Instead of generating unimplemented nir_function, the
implementations contain the actual code (as serialized NIR, deserialized
on-the-fly). There is no linking step, nor a library nir_shader that the driver
has to keep around.
The programming model here is that this is "just" nir_builder ... just a
massively more competent way of using nir_builder.
Additionally, the whole SPIR-V -> optimized lowered serialized NIR step is now
all common code. There's no longer anything target-specific, and it's
disentangled from the nir_precomp infrastructure.
That means drivers can use CL with zero integration, except a few meson.build
rules. This gives a very gentle on-ramp to CL for drivers. (Note: that applies
only for library-style CL. For precompiled kernel-style CL, that still requires
significant driver integration. I do have plans there, though. Also,
printf/abort support requires a minimal amount of driver code.)
Furthermore, this unblocks the use of CL library functions in common code. That
makes this an important step towards common code geom/tess or maybe saner
raytracing.
For drivers already using classic vtn_bindgen, porting to vtn_bindgen2 should
just be deleting all your linking/deserializing code. The .cl's are unchanged,
as are the function prototypes exposed.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
When the core supports unified instruction memory, don't clamp individual
shader sizes to 256 instructions. Allow shaders to make full use of the
state instruction memory, as long as both VS and FS fit into the memory
region.
Allows to run the shaders from glmark2 terrain from state instruction
memory, so we don't need to use icache mode on GC3000 and makes the app
work on GC2000, which doesn't have icache but unified instruction memory.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
Currently we statically partition the unified intsruction memory range
into a VS and a FS range, with each of the shader types getting half
of the memory. Change this to place the FS instructions right behind
the VS instructions, which potentially allows larger shaders to be
executed from the instruction memory states when VS are FS are
unbalanced in size, which is quite common.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
According to _InitializeContextBuffer() in the downstream kernel driver
all GPUs claiming to support more than 256 shader instructions have unified
instruction memory and the shader range registers to partition usage of
this unified memory region.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
There is no reason to emit those states on framebuffer changes. While the
framebuffer format change might change the fragment shader due to R/B
swapping in the shader, this triggers compilation of a new shader variant
which will dirty the shader state as needed.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
Some cores with the the instruction cache feature, such as the GC3000 found
on the i.MX6QP, have a wrong instruction limit encoded in hardware. The HWDB
entry for this core has the correct number (512). Fixup all cores with the
instruction cache feature to report at least 512 instructions, which was
already assumed when configuring the VS/FS instruction state memory split in
other parts of the driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
The existing code relies on assert to identify when a cs overlow
occurs. On builds without asserts, a cs overflow won't be detected
and it will likely lead to a hang.
Reporting a preemptively a PIPE_UNKNOWN_CONTEXT_RESET error seems
ok as the context is lost anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33288>
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.
The old approach also stops worrking if passes reorder instructions.
This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
- Check if a new version of Mesa is available which might have fixed the problem.
- If you can, check if the latest development version (git main) works better.
- Check if your bug has already been reported here.
- For any logs, backtraces, etc - use [code blocks](https://docs.gitlab.com/ee/user/markdown.html#code-spans-and-blocks), GitLab removes line breaks without this.
- For any logs, backtraces, etc - use [code blocks](https://docs.gitlab.com/user/markdown/#code-spans-and-blocks), GitLab removes line breaks without this.
- Do not paste long logs directly into the description. Use https://gitlab.freedesktop.org/-/snippets/new, attachments, or a pastebin with a long expiration instead.
- As examples of good bug reports you may review one of these - #2598, #2615, #2608
@@ -44,7 +44,7 @@ Example:
### System information
Please post `inxi -GSC -xx` output ([fenced with triple backticks](https://docs.gitlab.com/ee/user/markdown.html#code-spans-and-blocks)) OR fill information below manually
Please post `inxi -GSC -xx` output ([fenced with triple backticks](https://docs.gitlab.com/user/markdown/#code-spans-and-blocks)) OR fill information below manually
- Check if a new version of Mesa is available which might have fixed the problem.
- If you can, check if the latest development version (git main) works better.
- Check if your bug has already been reported here.
- For any logs, backtraces, etc - use [code blocks](https://docs.gitlab.com/ee/user/markdown.html#code-spans-and-blocks), GitLab removes line breaks without this.
- For any logs, backtraces, etc - use [code blocks](https://docs.gitlab.com/user/markdown/#code-spans-and-blocks), GitLab removes line breaks without this.
- Do not paste long logs directly into the description. Use https://gitlab.freedesktop.org/-/snippets/new, attachments, or a pastebin with a long expiration instead.
- As examples of good bug reports you may review one of these - #2598, #2615, #2608
@@ -16,7 +16,7 @@ The title should effectively distinguish this bug report from others and be spec
### System information
Please post `inxi -GSC -xx` output ([fenced with triple backticks](https://docs.gitlab.com/ee/user/markdown.html#code-spans-and-blocks)) OR fill information below manually
Please post `inxi -GSC -xx` output ([fenced with triple backticks](https://docs.gitlab.com/user/markdown/#code-spans-and-blocks)) OR fill information below manually
- OS: (`cat /etc/os-release | grep "NAME"`)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.