Add a check to vaDeriveImage to see if a non-interlaced buffer was
created successfully. Otherwise, return an error, since we won't be able
to derive an image from the interlaced buffer.
Prevents a null pointer dereference from occuring on some nVidia cards,
reported by Alexander Kapshuk.
v2: Check for PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE support (Ilia)
Fixes: fcb558321e ("frontends/va: Derive image from interlaced buffers")
Signed-off-by: Thong Thai <thong.thai@amd.com>
Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8320>
Especially on GFX10 we can avoid pretty much all L2 flushes.
However, instead of that we have to do L2_METADATA invalidations. We
do that every time we could possibly be reading new DCC/HTILE info
from the L2 cache in shaders.
Benchmark results, basemark on high preset with a navi10 on profile_standard
(which is slower than a navi10 on default settings, please don't compare
to random navi10 results you find)
before:
5932
5928
5937
after:
6011
6013
6009
So this looks like a >1% increase.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
This way we're properly using the vulkan barrier paradigm instead
of adhoc guessing what caches need to be flushed. This is more robust
for cache policy changes as we now don't have to revisit all the meta
operations all the time.
Note that a barrier has both a src and dst part though. So
barrier:
flush src
meta op
flush dst
becomes
barrier:
flush barrier src
flush meta op dst
meta op
flush meta op src
flush barrier dst
And there are some places where we've been able to replace a CB flush
with a shader flush because that is what we'd need according to vulkan rules
(and it turns out that in the cases the CB flush mattered the app will set the
bit in one of the relevant flushes or it was needed as a result of an optimization
that we counter-acted in the previous patch.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
To cancel the optimization in radv_dst_access_flush if these helpers
get used by meta operations.
We could also remove that optimization but I think this triggers less
often as all SHADER_WRITE flushes on images not supporting STORAGE should
be meta
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
This extends the TLB based blit to support both depth and stencil
buffers.
v2:
- Ammend comment for further clarification (Iago)
- Remove parenthesis (Iago)
- Remove condition so separate stencil blit is done (Iago)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8304>
Vulkan guarantees only 4 byte alignment of offset for vkCmdDrawIndirect,
while CP_LOAD_STATE.EXT_SRC_ADDR requires 16 byte alignment which
makes us copy indirect parameters to a correctly aligned buffer.
Blob does essentially the same but emits indirect CP_LOAD_STATE
with src = SS6_UBO and EXT_SRC_ADDR = 0xe0000, and only for a
first dispatch.
Fixes:
dEQP-VK.compute.indirect_dispatch.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8444>
Fix defects reported by Coverity Scan.
uninit_member: Non-static class member progType is not initialized
in this constructor nor in any functions that it calls.
uninit_member: Non-static class member insn is not initialized in
this constructor nor in any functions that it calls.
uninit_member: Non-static class member data is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7390>
The Android Vulkan loader needs this symbol, so the addition of the
linker script broke Vulkan for Android.
(For non-Android builds: I checked that having a non-existent symbol in
the linker script works ok and doesn't put the symbol in the library)
Fixes: 41bb6459d3 ("radv: restrict exported symbols with static llvm")
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8437>
The lower_double and lower_int64 don't lower all 64 bit IO ops and merging
to and splitting fromn 64 bit values. So here goes a bunch of lowering
passes that takes care of this and also of merging IO that might have been
split.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7824>
When there are no param exports in an NGG (or legacy VS) shader,
the NO_PC_EXPORT=1 is set, which means PS waves can launch before
the current stage finishes.
If the current stage has any stores, we need to make sure to wait for
those before we allow PS waves to start, so that PS can read what
these instructions stored.
Fossil DB results on Navi 10:
Totals from 45 (0.03% of 136420) affected shaders:
CodeSize: 87224 -> 87404 (+0.21%)
Instrs: 16750 -> 16795 (+0.27%)
Cycles: 69580 -> 69760 (+0.26%)
VMEM: 8022 -> 8167 (+1.81%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7868>
Could still be improved a little. For example, 8-bit pack without
constants could be:
(s_pack_ll(x, z) & 0x00ff00ff) | ((s_pack_ll(y, w) & 0x00ff00ff) << 8)
fossil-db (Sienna):
Totals from 136 (0.10% of 139391) affected shaders:
CodeSize: 279776 -> 278144 (-0.58%)
Instrs: 50742 -> 50470 (-0.54%)
Cycles: 211560 -> 210472 (-0.51%)
SMEM: 3607 -> 3557 (-1.39%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8421>
Looks like m2mf bails if a line is >64k in width for tiled textures
(even if only a sub-section is copied as long as any part is beyond the
64k mark).
Fixes a number of GLES3 accuracy tests which made 8k-wide textures which
were read out as RGBA32_UINT, leading to problems.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8396>
This happens during "3d" blit operations, where we must reinterpret it
as color in order to support stencil/depth masking. However the hardware
isn't necessarily amused by this, esp when multiple draws are queued up.
Throw in serialize calls in order to get it to flush out previous draws.
This was noticeable in the test
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_stencil,
although 3d blit operation had to be forced on nvc0 where it's much
rarer.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8396>
Under some very rare circumstances, the OP_EXPORT will refer to a def
provided by a mov. When we then try to make the defining op write to the
export directly, it blows up. Reuse the existing setDst helper which
handles this and more for the long encoding.
Fixes dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8396>
This patch also replaces lower_negate with lower_ineg / lower_fneg.
The fneg semantics have been clarified as of Version 1.5, Revision 1
of the SPIR-V specification, which means that the previous lowering
to fsub is not a viable solution anymore, and is replaced with
lowering to fmul(x, -1.0).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6597>
The fau_index field contains the lower 4 bits of the 64bit constant,
which allows one to reuse the same clause constant slot from different
bundles if the upper 60 bits match. That doesn't work for constants
referenced from the same instruction or for constants referenced from
two instructions that are part of the same bundle though, since the
fau_index is shared in that case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8418>
This doesn't have much of an effect, but it helps avoid a
pathological case for Assassin's Creed Valhalla and a RDR2 shader with a
future change.
fossil-db (Sienna):
Totals from 55074 (39.51% of 139391) affected shaders:
SGPRs: 3515076 -> 3567744 (+1.50%); split: -0.01%, +1.51%
CodeSize: 206942120 -> 206941868 (-0.00%); split: -0.00%, +0.00%
Instrs: 39625900 -> 39625837 (-0.00%); split: -0.00%, +0.00%
Cycles: 1640088780 -> 1640088828 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4070
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8416>
Drivers that doesn't support conditional rendering can't really require
respecting conditional rendering, so let's not ask for it to be
respected in the first place.
This fixes a problem where util_can_blit_via_copy_region started
unconditionally rejecting all blits that originate from
glBlitFramebuffer, even for drivers where this can't possibly be a
problem.
Fixes: 767f70dfe1 ("gallium/util: fix util_can_blit_via_copy_region for conditional rendering")
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8378>
Found a case where we mapped a range too many.
Per the comment the constraint is:
/* [first, last] is exactly the range of ranges that either overlap the
* new parent, or are adjacent to it. This corresponds to the bind ranges
* that may change.
*/
So that means that after the ++last we the ranges[last] should still
be adjacent. So we need to test the post-increment value to see whether
it is adjacent.
Failure case:
ranges:
0: 0 - ffff
1: 10000 - 1ffff
2: 20000 - 2ffff
3: 30000 - 3ffff
new range: 10000 - 1ffff
wrong first, last: 0,3
However range 3 clearly isn't adjacent at all.
Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7953>
Even if the FCE predicate is FALSE, we might still need to decompress
FMASK if compressed rendering was used. FMASK decompressions should
never been predicated.
This fixes a ton of CTS failures and a rendering issue with Control
when DCC+MSAA is force-enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8331>
This is mostly to get additional -Werror coverage to avoid introducing
unforced ILP32 or big-endian errors. i386 adds lavapipe, r600, nouveau,
zink, and all the classic drivers. ppc64le adds lavapipe and zink, and
also adds -Werror for symmetry with the other cross builds. s390x also
adds lavapipe and zink.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8394>
The gcc we're using (and quite possibly newer ones) throws a really
stupid error:
../src/gallium/drivers/nouveau/nouveau_buffer.c:765:22: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
buffer->address = (uint64_t)user_ptr;
Which... address is a uint64_t, and user_ptr is a void *, so this is
completely unambiguously safe to do. Apparently casting to uintptr_t
squelches this, so do that instead.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8394>
When running gles3 deqp's with ETNA_MESA_DEBUG=deqp we fake streamout support.
CSO thinks that streamout is supported and calls ctx->pipe->set_stream_output_targets(..)
in cso_destroy_context(..) which results in a null-pointer access.
Add a stub to make development easier.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8382>
For color buffers, conditional fast clears can cause aux-state tracking
to lose information necessary for resolves later on.
For depth buffers, they never actually worked because they occurred
unconditionally. Even if they were conditional, they would suffer from
the same issues as color buffers.
Enables iris to pass the nv_conditional_render-clear-bug piglit test.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3565
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7762>
Apart from an issue with fast clears that will be addressed soon,
aux-state tracking with conditional rendering works because the
aux-state info needed for performing required resolves is never lost.
Add comments explaining how this works. Assertions are omitted to avoid
having to pass render_condition variables into
iris_resource_prepare_access and iris_resource_finish_write.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7762>
Until just recently ("vrend: Fix TGSI UIF/IF behavior"), virgl does "if
(any(bvec4(src0)))" instead of "if (src0.x != 0)", despite the tgsi.rst
documentation and tgsi_exec agreeing on the second form. It's harmless to
work around it, since apparently NTT was the only one to not have scalar
swizzled the if condition.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8373>
BRK/CONT don't take a label, as shown by tgsi_opcode_tmp.h and the lack of
any users of a label on those instructions in tree. I can't find any user
of ENDLOOP's label. Additionally, GLSL-to-TGSI apparently never set up
the BGNLOOP label, so even nvfx's usage probably wants us to not set it.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8373>
In DXIL, the FMA instruction only supports 64-bit operations. However,
back when we implemented support for this, there were only a single
switch for lowering all ffma instructions, so we couldn't easily use it.
But now that there's separate flags to lower ffma on 16, 32 and 64 bit,
we can lower 16 and 32 bit ffmas, and leave 64 bit ffmas alone.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8349>
When I originally added the FFMA opcode here, I added the FMAD opcode
instead of the FMA opcode. The reason for this is that it works on
32-bit values as well, so that seemed like a better fit.
But that's not correct, as the FMA opcode isn't a fused operation, so
let's correct the opcode.
This isn't currently in use, because we currently lower away all ffma
opcodes on the NIR level, but that's about to change.
While we're at it, let's also update the opcode name to match the DXIL
documentation.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8349>
We don't support stencil-exports yet, and even when we will, we might
not support it on all hardware. So we really need an alternative plan
here, even when render_condition_enable is true.
Fixing this properly is much more involved, and depends on reworking
render-condition along the lines that we do in !7746 to support pausing
and resuming properly first. So let's do the minimal thing, which is to
allow this to work in cases where no render-condition is active.
Fixes: 767f70dfe1 ("gallium/util: fix util_can_blit_via_copy_region for conditional rendering")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4056
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8379>
To properly support multi-planar images, we don't want to set metadata
on anything other than the first plane. To achieve this radv currently
checks for the image TILING and assumes LINEAR means it's not the first
plane.
However this doesn't account for images with a single LINEAR plane. We
still want to set metadata on those, e.g. to properly set the scanout
bit in the tiling flags.
Instead of checking for LINEAR, check if the offset is zero. Only the
first plane has a zero offset on AMD.
This mirrors the radeonsi logic [1].
While at it, move the metadata declaration into the if block.
[1]: 6fecdc6dda/src/gallium/drivers/radeonsi/si_texture.c (L710)
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8086>
The lack of this broke scheduled pipelines, because they attempted
to create a meson-windows-vs2019 job, which couldn't work (because the
windows_build_vs2019 job doesn't exist in scheduled pipelines).
Fixes: 84c8a35aa2 "CI: Add Windows source dependency map"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8360>
NTT doesn't handle uniforms, and requires them to have been lowered to
UBOs. But for drivers that don't set
nir_shader_compiler_options::lower_uniforms_to_ubo to true, this won't
have happened yet. Neither Zink nor V3D sets this option, and in the
case of Zink this isn't trivial to change.
So let's lower uniforms to UBOs in this case in NTT instead.
Fixes: 03c60762f5 ("gallium/ntt: Fix load_ubo_vec4 buffer index setup.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4047
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8365>
When blitting just the stencil aspect, the source and destination
resources are prepared/setup twice. Move the unconditional resource
setup into the aspect_mask loop to avoid this.
In addition, use the aspect provided by the loop instead of the mask
provided by the info parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8340>
The table constructor and the table lookup were doing different things
for big-endian. This fixes MesaFormatsTest.FormatFromFormatAndType and
MesaFormatsTest.FormatMatchesFormatAndType failing to round-trip for
GL_RGBA / GL_SHORT, which we're not currently running in CI for s390x,
but which a subsequent commit will enable.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8353>
Compute the number of components of the destination vector from the
bitsize when eg. a 16-bit vec2 vertex fetches is splitted. This is
because the dst will be a v1, so the p_create_vector should be created
from two v2b fro both sizes to match.
This prevents a regression from the next change which will split
typed vertex buffer loads on GFX6 and GFX10+.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8363>
There is a path to blit stencil buffers reinterpreting the stencil data
as an RGBA8888 or R8 float texture.
This works fine except for the case when the stencil buffer is
multisampled, and the blit operation needs to resolve it: an average of
the samples is done, which is incorrect, as only one sample must be
used.
This can be observed n the piglit test
`ext_framebuffer_multisample-unaligned-blit 2 stencil downsample -auto
-fbo`, specifically in the triangles border.
To avoid this averaging, let's reinterpret the stencil data as RGBA8888
or R8 uint texture.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8361>
Two main reasons:
As described in the previous commit, sending buffers to the Wayland
compositor as quickly as possible effectively results in mailbox
behaviour.
Also, doing the same as for MAILBOX present mode provides the following
benefits:
* We use more images in the swapchain, which avoids stalls on the client
side if the Wayland compositor directly uses the client buffers for
scanout.
* We wait for fences to signal before submitting a new buffer, which
avoids missing frames in the Wayland compositor due to fences not
signalling in time for a flip.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3673
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8197>
This allows Xwayland to forward buffers to the Wayland compositor ASAP
for fullscreen / undecorated windows, which in turn allows true mailbox
behaviour in the Wayland compositor.
Without this, Xwayland has to emulate the mailbox behaviour itself,
which it cannot do as well as the Wayland compositor by design.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8197>
LLVM expects that exec != 0 when entering loops and generates this code
that becomes an infinite loop if exec == 0:
BB5_1:
vcc_lo = (inverted terminating condition)
s_and_b32 vcc_lo, exec_lo, vcc_lo
s_cbranch_vccnz BB5_3 // jump if vcc != 0 (break statement)
// ... loop body ...
s_branch BB5_1
BB5_3:
For non-monolithic VS before TCS, VS before GS, and TES before GS,
we set exec = (thread enabledmask), which sets 0 for HS-only and GS-only
waves, causing the infinite loop condition above.
Fix it as follows:
- set exec = ~0 at the beginning
- wrap the whole shader (LS and ES) in a conditional block, so that HS-only
and GS-only waves jump over it and never enter such a loop
The TES before GS hang can be reproduced by gfxbench:
testfw_app --gfx egl -w 1920 -h 1080 --gl_api gles -t gl_tess
Fixes: 68d6d097f1 - radeonsi/gfx9: add GFX9 and VEGA10 enums
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8344>
Restructured text (and markdown) is painful to programatically
manipulate, most python parsers are geared towards writing markdown and
generating html. I'd like to move the calendar updates to being
scripted, as such using csv to store them will be convenient. This also
allows us to simplify our scripting that manipulates the table
considerably.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8341>
Stencil texture sampling (such as what we have to do for BlitFramebuffer)
is broken with UBWC enabled. We can't just take the
fd_resource_uncompress() path, because that's a blit just like
BlitFramebuffer.
Fixes failure in dEQP-GLES3.functional.fbo.msaa.2_samples.stencil_index8,
but also the uncaught rendering fails of 4_samples.stencil_index8 and
depth24_stencil8.
Prior to "911ce374caf0 freedreno/a6xx: Fix MSAA clear" we would usually
pass and sometimes flake fail on this test occasionally, thus it being
listed as a flake (though the rendering was actually broken). Since that
commit, though, we consistently fail on a pixel of the broken rendering,
and thus this was brought to my attention by the #freedreno-ci channel
spam.
Rob took a look at the performance impact of this, and the worst was maybe
up to .5% fps hit on trex.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8319>
Sync Android.mk GALLIUM_TARGET_DRIVERS names from kmsro meson.build,
notably adding the missing mediatek, meson and rockchip display drivers
names.
It also fixes the imx name into imx-drm as referenced in meson.build
and src/gallium/targets/dri/target.c
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7443>
Like SPIR-V and GL_ARB_sparse_texture2, these return a residency code. It
is placed in the destination after the rest of the result. If it's zero,
then the texel is resident. Otherwise, it's not resident.
Besides the larger destination and the residency code, sparse fetches
work the same as normal fetches.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7774>
Mark the test cases which aren't supported by ir3_parser.y explicitly,
so we notice future regressions. And likewise, fail when we see an
unexpected pass, so we don't forget to update the test vectors in the
future as ir3_parser improves.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
Note that this shows up a slight encoding difference compared to test
vector extracted from blob deqp runs. We think these should be dontcare
bits. For now, add a note and replace the encoded value in the disasm
test.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
Well, really just resinfo.. dealing with the different ldib/stib syntax
for a6xx+ vs earlier seems a bit too painful to deal with. But resinfo
at least gives us some encoding test coverage of this group of instrs.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
This appears to be ignored when writing to predicate registers (which I
guess makes sense, since they are boolean). So no real harm in setting
it, other than it makes some of the ir3_parser test vectors not match
the expected result for encoding.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
Currently ir3 (incl emit_cat5()) expects the samp/tex src register to be
first.. which requires some fixup for the parser to match.
TODO we might want to revisit the src reg order when adding new instr
packing/encoding. For now, lets just make the parser match the rest of
ir3.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
There was some src2 vs src3 confusion, but since the syntax is like:
ldl.f32 rDst, l[rBase+off], ncomp
it makes more sense to call the offset src2 and ncomp src3, than the
way we had it. This is also easier to deal with for the ir3 assembly
parser.
Also, src_offset was only ever used by the assembly parser, and was
handled incorrectly in emit_cat6(), resulting that cat6 load instrs
would not work properly in (for ex) computerator. Since we are
cleaning things up, drop src_offset and make the asm parser work in
the same way as the nir->ir3 frontend.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
Update the IR and packer to handle the additional cat0 fields, in
prep for adding support in the assembler (in prep for adding round
trip parsing/packing test coverage).
We don't actually use these yet from the ir3 compiler, but at least
this is one less thing to worry about when we start trying to use
them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
Various things that I noticed which were initially wrong with the xml
based disasm.
These were extracted from a collection of unique instructions extracted
from deqp traces, which unfortunately looses the link back to the
original test case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8175>
Enable vrs2x2 coarse shading if flat shading as per
idea and guidance given by Marek.
is_flat_shading variable in struct si_shader_info is set
based on the data from gather_intrinsic_info() function
and struct si_state_rasterizer. If is_flat_shading_variable
is set, then in function si_emit_db_render_state() vrs2x2
shading is enabled in hardware.
v2: Fix review comments from Pierre-Eric. Code optimizations.
v3: Fix indentation style issue.
v4: Fix review comments from Marek. Fixed logical issue pointed
by Marek where info->is_flat_shading variable can be corrupted
and other code cleanup.
v5: Make the code compact as suggested by Pierre-Eric.
v6: Fix new review comments from Marek.
v7: use info->uses_interp_color variable fix from Marek.
v8: Fix coding style comment from Marek.
v9: Add uses_fbfetch_output check as suggested by Marek.
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8161>
We needed to do this anyway to finish enabling NTT in general, but more
importantly: when we enabled sending NIR to the draw module, that broke
PIPE_CAP_LOAD_CONSTBUF drivers in the select/feedback paths if LLVM was
disabled.
Fixes: 44b7e1497f ("st/mesa: don't generate TGSI for the draw VS because it now supports NIR too")
(along with the rest of this MR)
Closes: #3996
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>
I had a funny +1 in nir_to_tgsi's load_ubo lowering on the buffer index,
because I hadn't set lower_uniform_to_ubo for softpipe. This removes that
weirdness in favor of just using lower_uniform_to_ubo, regardless of
driver preference (which matters if a NIR-native driver had it set, and
then the gallium draw module triggered the non-LLVM TGSI fallback path
that hit NTT).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>
It was OK because right now we only execute in the first channel of the
CS, but if you wanted to extend that then you'd need to check each
channel. We already had what we needed for SSBOs, so just reuse it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>
GL by default gives you UB when you access a missing constbuf, and we were
crashing on debug builds in that case. More importantly, we were
assertion failing even under valid circumstances, when a !ExecMask channel
had a bad value for the indirect buffer index and we tried to load from it
anyway.
In removing the assertion, also sink the buf declaration to after we've
done the bounds check that determines that there's a constbuf actually
bound to this index.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>
There's not really a reason to directly map textures. Doing so
requires the texture to be allocated in system RAM instead of
video RAM, which means all GPU access to it would be needlessly slow.
Notably, the one texture type that was allocated this way is the
display target texture for the software driver path. Instead, use
pipe_transfer_map to be able to copy the texture to system RAM.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>
For non-CPU-accessible pipe resource types (DEFAULT/IMMUTABLE),
allocate non-CPU-accessible buffers directly from the cache_bufmgr.
Update the d3d12_bo creation to handle nonmappable buffers.
For CPU-write-only (DYNAMIC/STREAM), use the upload slab_bufmgr.
Update this slab manager to use CPU_WRITE | GPU_READ PB usage.
For CPU-read-write (STAGING), use the readback_slab_bufmgr.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>
Readback (GPU write, CPU read) should use different CPU page
properties compared to upload (write-back vs write-combined).
A future commit will start to respect these PB usage flags.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>
Currently all buffers are allocated as mappable, but a future
commit will change that so that some buffers can be allocated
directly in non-CPU-accessible memory for improved performance.
Note that the returned pointer must be appropriately offset from
a 64-byte-aligned base pointer, so if offsets are used, the data
will be read/written to an offset region in the staging buffer.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>
The mantissa for a float doesn't contain enough data to accurately represent
the min/max values for some destination types. Instead of clamping before
converting, clamp after converting when coming from floats. This improves
conformance of CL conversions, specifically for float -> long/ulong with
int64 emulation enabled.
Refactors the limit determination from the clamp, so we can determine
limits for the dest type (int/uint) in both the source (float) and dest
type. The limit as a float is used for comparison, while the limit as a
dest type is used for bcsel.
Important note is that the comparison is inverted to fge instead of flt,
so the bcsel chooses the direct int/uint over the converted float in the
case where the comparison comes up equal, but the conversion can't produce
the exact min/max value.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8256>
Each transform feedback target should have a separate buffer
for an offset from which to resume, instead of just having
one buffer per binding point. Otherwise, if transform feedback
is paused and other tf object is bound - the offset of the
previous tf object would be lost.
Fixes CTS tests:
dEQP-GLES3.functional.transform_feedback.*triangles*
Fixes Piglit tests:
gl-3.1-primitive-restart-xfb flush
gles-3.0-transform-feedback-uniform-buffer-object
arb_transform_feedback2-change-objects-while-paused
arb_transform_feedback2-change-objects-while-paused_gles3
ext_transform_feedback-intervening-read
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8281>
Each transform feedback target should have a separate buffer
for an offset from which to resume, instead of just having
one buffer per binding point. Otherwise, if transform feedback
is paused and other tf object is bound - the offset of the
previous tf object would be lost.
Fixes Piglit tests:
arb_transform_feedback2-change-objects-while-paused
arb_transform_feedback2-change-objects-while-paused_gles3
ext_transform_feedback-alignment 4
ext_transform_feedback-alignment 8
ext_transform_feedback-alignment 12
ext_transform_feedback-change-size offset-grow
ext_transform_feedback-change-size offset-shrink
ext_transform_feedback-change-size range-grow
ext_transform_feedback-change-size range-shrink
ext_transform_feedback-immediate-reuse-uniform-buffer
ext_transform_feedback-position *
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8281>
Some DRI extension features are enabled/disabled
based on capabilities of the gallium pipe_screen
associated with the DRI screen. Additionally, the
list of extensions enabled also varied based on
features requested by the screen creator. However,
prior to this change the extension list and
extension definition structures within it were
global variables, meaning the last screen
initialized ended up defining the DRI capabilities
of all screens.
This change instead stores a copy of the
extensions which vary per screen, as well as a
copy of the extension list itself in the gallium
DRI screen structure, allowing them to vary per
screen.
Closes: https://gitlab.freedesktop.org/drm/nouveau/issues/9
Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7175>
Quote from the OpenGL Shading Language spec, version 4.40, section 8.9.2
"Texel Lookup Functions":
> The offset value must be a constant expression.
So, until we start consuming SPIR-V shaders, it seems we don't need to
deal with non-constant offsets.
This means we can avoid lowering this away in some cases.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8327>
reservations are accumulated for all shader-stages in a program without
resetting it. But stream-out is completely orthogonal to all other
inputs and outputs, so they don't matter for this stuff at all.
So let's drop considering reservations here, and simply count how many
generic outputs we have here instead.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7986>
Destroying the blitter frees samplers, which pushes the sampler-handles
onto the batches' zombie-sampler lists. So if we want to properly clean
these zombie-samplers up, we need to first get them onto the list so
we'll know about them in time.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>
vdpau specifies that top-left is x0/y0, bottom-right is x1/y1 and that x0/y0 are
inclusive while x1/y1 are exclusive.
This commit remove the abs() usage and instead verifies that the VdpRects passed
by the user matche the documentation. When they don't they're treated as empty
rectangles.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>
We had this strange 5-dword-per-stream storage for the single dword
current vertex count, due to copy and paste. We can make much cleaner
code by just having a 4-element array in the machine.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8283>
* int64 is a core type on Haiku (and potentially other platforms)
* rename to int64_avail matching other similar calls
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes:
- Sample shading now uses per-sample interpolation for colors if colors
are the only inputs. (this is the only case that was broken)
Optimizations:
- BC_OPTIMIZE (barycentric optimization) is now enabled with MSAA if colors
are qualified with both center and centroid. (BC_OPTIMIZE means that
the hardware skips initializing centroid (i,j) if they are equal to
center (i,j))
- If MSAA is disabled and at least 2 out of (center, centroid, sample) are
used by all inputs now including colors, center is forced for all inputs.
- If INTERP_MODE_COLOR is not used and the legacy GL shade model is flat,
the shader variant for flat shading is not generated.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8225>
If multiple rules could match, the rule that appears first in the file
is used.
Only Tiger Lake and Ice Lake are affected. Other platforms either have
a LRP instruction or can't run any shaders from shader-db that would
benefit.
v2: Fix issues created when this commit was rebased on top of
3c8934a644 ("nir/algebraic: add flrp patterns for 16 and 64 bits").
Noticed by Caio.
Tiger Lake and Ice Lake had similar results.
total instructions in shared programs: 20908672 -> 20908661 (<.01%)
instructions in affected programs: 419 -> 408 (-2.63%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.20 x̃: 3
helped stats (rel) min: 1.85% max: 3.19% x̄: 2.49% x̃: 2.65%
95% mean confidence interval for instructions value: -3.56 -0.84
95% mean confidence interval for instructions %-change: -3.24% -1.73%
Instructions are helped.
total cycles in shared programs: 473513940 -> 473513793 (<.01%)
cycles in affected programs: 7176 -> 7029 (-2.05%)
helped: 12
HURT: 0
helped stats (abs) min: 5 max: 22 x̄: 12.25 x̃: 12
helped stats (rel) min: 0.84% max: 3.24% x̄: 2.09% x̃: 1.80%
95% mean confidence interval for cycles value: -15.43 -9.07
95% mean confidence interval for cycles %-change: -2.57% -1.61%
Cycles are helped.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This prevents other transformations from converting them to 'a != 0'.
For example, both of these transformations can do this:
(('~flt', 0.0, ('fabs', a)), ('fne', a, 0.0)),
(('~flt', ('fneg', ('fabs', a)), 0.0), ('fne', a, 0.0)),
Both fsign(fabs(NaN)) and fsign(fneg(fabs(NaN))) should produce zero,
but, since 'NaN != 0.0' is true, cascading these transformations could
cause them to generate 1.0 or -1.0 respecively.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
OpenGL GLSL, OpenGL ARB assembly shaders, and DX9 are pretty loose about
the behavior in the presence of NaNs. Many GPUs that implement these
specifications do not even have a representation of NaN. However,
OpenCL and Vulkan SPIR-V are not so lax. Both actually have some
required behavior in the presence of NaN, and, of the two, OpenCL is the
most strict.
For years we have implemented SPIR-V by using the same comparison
opcodes as we use for OpenGL GLSL and OpenGL assembly shaders. This has
repeatedly caused problems where an optimization that is valid in the
NaN-relaxed world is not valid in Vulkan or OpenCL. To fix this, set
the "exact" flag on comparisons instructions generated from SPIR-V.
This will block optimizations that may have different NaN behavior.
v2: Set the exact flag in the nir_builder, not in the vtn_builder.
v3: Add an assertion in vtn_handle_constant that the exact flag wasn't
set (because it's ignored). Rebase on 80163bbec3 ("nir/vtn: Support
OpOrdered and OpUnordered opcodes"). Mark the NIR generated for those
opcodes as exact as well.
v4: s/unused_exact/exact/ in a couple places, and assert that exact has
the expected value (true in one place, false in the other). Suggested
by Caio.
Closes: #3345
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Fixes: 8513b12590 ("nir/opt_if: split ALU from Phi more aggressively")
This commit doesn't really fix anything in 8513b12590. However,
without 8513b12590, a regression is triggered in RADV on No Man's
Sky. I want to ensure that this change is only applied on top of
8513b12590, and Fixes: seems the safest way to do that.
No shader-db changes on any Intel platform. This only affects SPIR-V,
and we have no OpenGL SPIR-V shaders in shader-db.
124 shaders in Shadow of the Tomb Raider (Steam "native") were hurt by 1
spill and 1 fill each.
All Intel platforms had similar results. (Tiger Lake shown)
Instructions in all programs: 155668276 -> 155685764 (+0.0%)
SENDs in all programs: 6474570 -> 6474570 (+0.0%)
Loops in all programs: 35271 -> 35271 (+0.0%)
Cycles in all programs: 3198055373 -> 3198628031 (+0.0%)
Spills in all programs: 231522 -> 231646 (+0.1%)
Fills in all programs: 347571 -> 347695 (+0.0%)
Vega
Totals:
SGPRs: 20955712 -> 20956756 (+0.00%); split: -0.02%, +0.03%
VGPRs: 13476920 -> 13473132 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613371940 -> 613339348 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 3111886 -> 3112481 (+0.02%); split: +0.02%, -0.00%
Instrs: 120723785 -> 120746991 (+0.02%); split: -0.04%, +0.06%
Cycles: 626658992 -> 626862708 (+0.03%); split: -0.05%, +0.08%
VMEM: 216330854 -> 216343196 (+0.01%); split: +0.04%, -0.04%
SMEM: 32079391 -> 32081972 (+0.01%); split: +0.05%, -0.04%
VClause: 2688784 -> 2688789 (+0.00%); split: -0.03%, +0.03%
SClause: 6554669 -> 6556251 (+0.02%); split: -0.01%, +0.03%
Copies: 5356667 -> 5353283 (-0.06%); split: -0.36%, +0.29%
Branches: 954466 -> 954716 (+0.03%); split: -0.01%, +0.04%
PreSGPRs: 9078300 -> 9081626 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 10972090 -> 10966576 (-0.05%); split: -0.06%, +0.01%
Totals from 48239 (12.08% of 399432) affected shaders:
SGPRs: 2713984 -> 2715028 (+0.04%); split: -0.16%, +0.19%
VGPRs: 1997804 -> 1994016 (-0.19%); split: -0.46%, +0.27%
CodeSize: 172094092 -> 172061500 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 337327 -> 337922 (+0.18%); split: +0.20%, -0.02%
Instrs: 33053657 -> 33076863 (+0.07%); split: -0.15%, +0.22%
Cycles: 254961228 -> 255164944 (+0.08%); split: -0.12%, +0.20%
VMEM: 15165226 -> 15177568 (+0.08%); split: +0.59%, -0.51%
SMEM: 3304938 -> 3307519 (+0.08%); split: +0.49%, -0.41%
VClause: 766225 -> 766230 (+0.00%); split: -0.12%, +0.12%
SClause: 1332645 -> 1334227 (+0.12%); split: -0.04%, +0.16%
Copies: 2040651 -> 2037267 (-0.17%); split: -0.94%, +0.77%
Branches: 743668 -> 743918 (+0.03%); split: -0.01%, +0.05%
PreSGPRs: 1697667 -> 1700993 (+0.20%); split: -0.07%, +0.27%
PreVGPRs: 1718424 -> 1712910 (-0.32%); split: -0.39%, +0.07%
Polaris
Totals:
SGPRs: 21349172 -> 21354376 (+0.02%); split: -0.02%, +0.04%
VGPRs: 13690680 -> 13686920 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613745824 -> 613704988 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 2775012 -> 2775189 (+0.01%); split: +0.01%, -0.00%
Instrs: 120735079 -> 120756209 (+0.02%); split: -0.04%, +0.06%
Cycles: 627906100 -> 628076156 (+0.03%); split: -0.05%, +0.08%
VMEM: 216623065 -> 216641838 (+0.01%); split: +0.04%, -0.04%
SMEM: 32295618 -> 32299338 (+0.01%); split: +0.05%, -0.04%
VClause: 2711025 -> 2711141 (+0.00%); split: -0.03%, +0.04%
SClause: 6545185 -> 6546769 (+0.02%); split: -0.01%, +0.03%
Copies: 5387723 -> 5383249 (-0.08%); split: -0.37%, +0.29%
Branches: 953775 -> 953954 (+0.02%); split: -0.01%, +0.03%
PreSGPRs: 9148814 -> 9153211 (+0.05%); split: -0.01%, +0.06%
PreVGPRs: 11029429 -> 11023915 (-0.05%); split: -0.06%, +0.01%
Totals from 48239 (12.00% of 402052) affected shaders:
SGPRs: 2682056 -> 2687260 (+0.19%); split: -0.16%, +0.35%
VGPRs: 1994436 -> 1990676 (-0.19%); split: -0.46%, +0.27%
CodeSize: 170857060 -> 170816224 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 295429 -> 295606 (+0.06%); split: +0.07%, -0.01%
Instrs: 32808802 -> 32829932 (+0.06%); split: -0.16%, +0.22%
Cycles: 254633252 -> 254803308 (+0.07%); split: -0.13%, +0.20%
VMEM: 14897934 -> 14916707 (+0.13%); split: +0.65%, -0.52%
SMEM: 3289726 -> 3293446 (+0.11%); split: +0.53%, -0.42%
VClause: 775318 -> 775434 (+0.01%); split: -0.11%, +0.13%
SClause: 1304867 -> 1306451 (+0.12%); split: -0.04%, +0.16%
Copies: 2026334 -> 2021860 (-0.22%); split: -0.99%, +0.77%
Branches: 742554 -> 742733 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1690887 -> 1695284 (+0.26%); split: -0.07%, +0.33%
PreVGPRs: 1717709 -> 1712195 (-0.32%); split: -0.40%, +0.07%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This prevents some fossil-db regressions in "spir-v: Mark floating point
comparisons exact".
v2: Note that the patterns and replacements produce the same value when
isnan(b). Suggested by Caio.
v3: Use C99 isfinite() instead of (obsolete) BSD finite(). Fixes
various Windows builds.
No fossil-db changes on any Inetl platform, Vega, or Polaris10.
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908670 -> 20908672 (<.01%)
instructions in affected programs: 69 -> 71 (2.90%)
helped: 0
HURT: 1
total cycles in shared programs: 473515288 -> 473513940 (<.01%)
cycles in affected programs: 4942 -> 3594 (-27.28%)
helped: 2
HURT: 0
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This also prevents some fossil-db regressions in "spir-v: Mark floating
point comparisons exact".
v2: Mark the fmin / fmax in the replacement exact to prevent other
optimizations from ruining the NaN-clensing property of the fmin / fmax.
Suggested by Rhys. Don't assume that constants are not NaN because some
components of a vector might be NaN while others are numbers. Noticed
by Rhys. This causes ~8 more shaders in Age of Wonders III (dxvk) to
regress on cycles (not instructions) by less than 1% when "spir-v: Mark
floating point comparisons exact" is applied. This difference is too
small to care.
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908668 -> 20908670 (<.01%)
instructions in affected programs: 9196 -> 9198 (0.02%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 2 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.02% max: 5.41% x̄: 2.20% x̃: 2.16%
HURT stats (abs) min: 2 max: 6 x̄: 3.20 x̃: 3
HURT stats (rel) min: 2.44% max: 16.67% x̄: 9.39% x̃: 12.50%
95% mean confidence interval for instructions value: -1.22 1.49
95% mean confidence interval for instructions %-change: -2.08% 5.41%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 473515330 -> 473515288 (<.01%)
cycles in affected programs: 67146 -> 67104 (-0.06%)
helped: 10
HURT: 7
helped stats (abs) min: 1 max: 36 x̄: 15.90 x̃: 17
helped stats (rel) min: 0.01% max: 1.29% x̄: 0.66% x̃: 0.89%
HURT stats (abs) min: 1 max: 48 x̄: 16.71 x̃: 4
HURT stats (rel) min: 0.08% max: 1.94% x̄: 0.87% x̃: 0.19%
95% mean confidence interval for cycles value: -13.88 8.94
95% mean confidence interval for cycles %-change: -0.56% 0.49%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
GLSL and SPIR-V GLSL.std.450 don't have any requirements for fsign(NaN),
and both only require that FSign(-0.0) == 0.0. OpenCL, on the other
hand, requires sign(-0.0) be exactly -0.0. It also requires that
sign(NaN) be exactly 0.0.
In practice, this change is difficult to test. Our GLSL frontend
already constant folds sign(NaN) to 0.0 before even getting to NIR. As
far as I can tell, glslang does the same. I don't have a good way to
run an OpenCL SPIR-V test. Maybe SPIR-V GLSL.std.450 assembly?
No shader-db or fossil-db changes on any Intel platform.
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
I originally noticed that 3b30814791 ("nir/algebraic: Optimize 1-bit
Booleans") caused this pattern no longer be matched by incorrectly
replacing b@32 with b@1. Making that correct had no effect on
shader-db. When this pattern originally was added, it only affected 4
shaders, so it's not worth the effort to debug further.
This reverts commit f50400cc80.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
The original comment was a little terse and a little incorrect. The
rearrangements are fine w.r.t. NaN. However, they produce incorrect
results if one operand is +Inf and the other is -Inf.
A later commit, "nir/algebraic: Add some compare-with-zero optimizations
that are exact", will add some more patterns here. It may be reasonable
to squash this commit (forward) into that commit.
v2: Fix some incorrect comparisons operators in the comment (<= vs >=).
Add commentary that subtraction works like addition w.r.t. NaN. Both
noticed / suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This commit only documents the current behavior, even if that behavior
is not the behavior preferred by the relevant specs.
In SPIR-V, there are two flavors of the sign instruction, and each lives
in an extended instruction set. The GLSL.std.450 FSign instruction is
defined as:
Result is 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0.
This also matches the GLSL 4.60 definition.
However, the OpenCL.ExtendedInstructionSet.100 sign instruction is
defined as:
Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if
x < 0. Returns 0.0 if x is a NaN.
There are two differences. Each treats -0.0 differently, and each also
treats NaN differently. Specifically, GLSL.std.450 FSign does not
define any specific behavior for NaN.
There has been some discussion in Khronos about the NaN behavior of
GLSL.std.450 FSign. As part of that discussion, I did some research
into how we treat NaN for nir_op_fsign, and this commit just captures
some of those notes.
v2: Document the expected behavior of nir_op_fsign more thoroughly.
Suggested by Rhys. Note that the current implementation of constant
folding does not produce the expected result for NaN. Suggested by
Caio.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
this requires that arrays of samplers be declared as single variables with
a single binding point, which is then propagated through to the descriptor
set updates
constant sampler array indexing is now un-lowered during access so we can
construct an access chain for both constant and dynamic offset paths
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8155>
The resulting point-coord origin not only depends on whether
the draw buffer is flipped but also on GL_POINT_SPRITE_COORD_ORIGIN
state. Which makes its transform differ from a transform of wpos.
On freedreno fixes:
gl-3.2-pointsprite-origin
gl-3.2-pointsprite-origin -fbo
Fixes: d934d320 "nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform."
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8200>
Not doing this for APUs because spilling is quite likely, due to
overall VRAM pressure.
Also adding a flag to disable for performance debugging.
Finally adds some memset for places where we depended on the memory
being initialized to zero, which we won't get with VRAM anymore.
(I think these places should stop depending on it since it hides
issues with executing the cmdbuffer multiple times, but this
preserves behavior)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7979>
Clang warns that the second instance overrides array entry
initialization, so remove the copy/pasted line. UNORM entries
are already initialized above (with alpha explicitly, and
NO_ALPHA used for the others), so this was just a duplicate and
had no real impact.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8248>
Clang warns that errorString is uninitialized if printBlobUtf8
is null, meaning GetBlobAsUtf8 failed, but then we go ahead and
access it (and printBlobUtf8) after the if. Expand the if to
encompass the printing.
Fixes: 2ea15cd6 ("d3d12: introduce d3d12 gallium driver")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8248>
Missed these last time through, not sure how. I couldn't find a
reason for the nested loop in d3d12_enable_fake_so_buffers to go
backwards, which would require signed, so I switched it to forward.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8248>
In case the stencil is modified, it is also enabled. That was the
behavior of the original code, which was also the correct behavior,
so reinstate the behavior.
Fixes dEQP-GLES2.functional.fragment_ops.depth_stencil.* on STM32MP1 GC400T.
Fixes: b29fe26d43 ("etnaviv: rework ZSA into a derived state")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8174>
These bring a whole lot of new coverage to these drivers, since dEQP is
bad at desktop GL feature coverage around early GL 3.x. piglit also gets
at a lot of MSAA, fast clearing, and texture layout issues that dEQP
doesn't do much with.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
I've set it up in the gitlab-runer config on all the freedreno boards.
This means that for piglit, where the run.sh always choose either this
variable or 4 threads otherwise, we'll have the right number of parallel
tasks.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
ARM64 had it for traces only, upgrade it to a full build so we can test
a630. We also add it for armhf, as we'll want it on both rpi and etnaviv.
Bumped the LAVA tag as well, since the script changes a bit and it does
impact the final image (even if we aren't pulling in full piglit there
yet). Note I also had to drop the "v" on the tarring of their rootfs, as
the verbosity on baremetal was exceeding job log size.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
v2: Use the same storage for OpenCL C sources and ILs representations
(Karol Herbst, Francisco Jerez)
v3:
* Remove `program::has_source` and instead add a value to
`program::il_type` for sources. (Francisco Jerez)
* Use `std::move()` on sources.
* Replace `CL_MAKE_VERSION(99999999u, 0u, 0u)` with
`std::numeric_limits<uint32_t>::max()` (Francisco Jerez)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Pierre Moreau <dev@pmoreau.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2078>
v2:
* Change the existing method to return a `std::vector<cl_name_version`;
* Add a string function that uses the previous method but returns a
`std::string`.
v3:
* Remove `supported_il_versions_as_string()` (Francisco Jerez)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Pierre Moreau <dev@pmoreau.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2078>
An OpenCL implementation advertising a certain version of the API does
not have to support all existing version: some versions are mandatory
but not all. For example, the OpenCL 2.1 Specification mentions that
conforming implementations have to support SPIR-V 1.0, but only might
support higher versions.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Pierre Moreau <dev@pmoreau.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2078>
There are now more intrinsics for which nir_type_uint is forced than
where the destination type is used to find the intrinsic type, so
invert the conditional so that nir_type_uint is the default case when
nothing more specific is given.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8264>
mir_args_ssa asserted that the given number of arguments to use is
greater than or equal to the actual number, but this is not checked by
callers, so instead of crashing return false to mark failure.
Fixes the local memory atomics OpenCL tests in Piglit.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8264>
Mali GPUs have native gl_GlobalInvocationID support, so we don't want
it to be lowered.
Although we do want to lower gl_LocalInvocationIndex, the single CAP
doesn't allow for choosing what to lower. We've already told NIR to do
the lowering instead, so just disable the GLSL-level lowering.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8264>
Swizzles that access components outside of the maximum
vector size cannot be vectorized with each other.
This patch creates different hash bins for this case.
For example accesses to .x and .y are considered different variables
compared to accesses to .z and .w for 16-bit vec2.
This prevents the vectorization of things like
vec2 16 ssa_3 = iadd ssa_1.xz, ssa_2.xz
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6666>
nir_addition_might_overflow() expects the parent instruction to be
an alu instr but it might be a phi instr. Fix it by assuming that
the addition might overflow.
This fixes compiler crashes with Horizon Zero Dawn.
No fossils-db changes.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8268>
The bound parameter allows us to prevent allocations from crossing
particular boundaries (typically 128-bit boundaries). For 16-bit, we
don't want to cross 64-bit boundaries, in order to keep swizzles
possible to encode. We already handle this for 16-bit destinations, but
it _also_ needs to be (redundantly) handled for 16-bit sources, in case
types don't match (for example, with a vectorized size conversion
instruction).
Fixes a few newer dEQP fails.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8282>
Generates bi_index from nir_alu_src, taking into account the applied
swizzle, and using (swizzle / 32-bit) portion as an offset, to be
applied later during RA. The sub 32-bit portion only applies for 8-bit
and 16-bit instructions, which need to either handle them explicitly as
a swizzle specifier, or lower to a swizzle explicitly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8135>
We've got the new lookup with size+ptr, just use that one for querying
buffer size.
This means we now return 0 instead of undefined for unbound buffers, but
it also means we return 0 for a buffer view with a size larger than that
of the underlying buffer.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8276>
We were ignoring the requested size of the load in the overflow handling
and would read past the end of buffers, rather than just returning 0 as
robustness would like us to do.
Fixes valgrind complaint on softpipe in:
EQP-GLES31.functional.shaders.builtin_functions.common.sign.float_mediump_compute
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8276>
It was only executing the first channel, ignoring the rest. I also
cleaned things up to not loop over rgba, since atomics are only ever to a
single 32-bit value per invocation.
This worked on softpipe previously because it only dispatches 1 CS
invocation per TGSI exec machine anyway, wasting the other 3 slots.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8276>
If you deleted your old GS and created a new one, then it would
occasionally skip binding the new GS because the token pointers were
equal. Clear the current token pointer in the machine when we're deleting
its token.
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8277>
Commit 7779b1d71b, disabled clear support
when copying to/from color buffers. According to the performance CI, it
falls within a range of commits that introduced a performance regression
on Bioshock Infinite with Tigerlake. Icelake isn't noticeably affected.
By analyzing a trace of the game, I found a couple cases where that
commit added new partial resolves. Update get_copy_region_aux_settings
to avoid them:
- The trace uploads to R8_UNORM textures. On TGL, these enter the
COMPRESSED_CLEAR state on the upload and are partially resolved before
every subsequent upload. Thankfully, they keep their initial clear
color of all zeroes. Since zeros can survive format reinterpretation,
allow clear support for it.
- The trace copies between RGBA16_FLOAT textures. The ones with zero
clear color are helped by the optimization above. The ones with
non-zero clear color are used as source textures. Thankfully on ICL+,
the clear color used for sampling is in pixel form and can thus be
sampled from with format reinterpretation. Allow clear support for
this case.
I haven't tested the actual performance impact of this change, but it
should be beneficial regardless.
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Reported-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8262>
Clear is done with one instanced draw call, where the layer
to clear is controlled by gl_Layer.
Same as how util_blitter_clear does this.
Fixes test:
gl-3.2-layered-rendering-clear-color-all-types 2d_multisample_array single_level
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7919>
Passes amd_vertex_shader_layer-layered-2d-texture-render
Don't enable GL_AMD_vertex_shader_layer because we do not pass
amd_vertex_shader_layer-layered-depth-texture-render due to
the assert:
emit_blit: Assertion `psurf->u.tex.first_layer == psurf->u.tex.last_layer'
However, in current state it is still useful for clearing
of arrayed framebuffers.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7919>
We have been advertising 3.1, which waffle has issues creating contexts
for, causing coverage (and performance!) issues in piglit. We should
support all the necessary features already.
Some new failures are caught by the 3.2 CTS, but they look like they're
existing issues simply not covered by the minimal GL 3.0 CTS.
Fixes: #3037
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8255>
ACO attempts to store the output of an instruction in the same register
occupied by its operands where possible. Importantly this only works if
the operands are large enough to store the result register size. The code
failed to consider subdword operands when checking for this, causing
entire register slots to be freed up even though subdword parts were still
used.
In Mafia 3, this affected the following code:
v2b: %363:v[2][0:16], v2b: %362:v[2][16:32] = p_split_vector %360:v[2]
v1: %116:v[2] = v_cvt_f32_f16 %362:v[2][16:32]
v1: %117:v[2] = v_cvt_f32_f16 %363:v[2][0:16]
where v[2] is allocated to %116 even though its original lower 16 bits are
still used in the instruction after.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3717
Fixes: 031edbc4a5
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7461>
the previous commit handling this forced geometry shader usage for all cases,
but this is not ideal, so instead there are now fragment shader variants for
both depth==1 and depth!=1, corresponding to the existence of gl_Layer in the
shader
Fixes: 614c77772a ("st/pbo: fix pbo uploads without PIPE_CAP_TGSI_VS_LAYER_VIEWPORT")
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8089>
On some devices, window resizing results in flashes of blue- and
orange-tinted versions of the current frame until resizing is
finished.
This fix ensures that the emubgra tweak used for GLES virgl hosts
has its enabled state flag set properly during resize events.
v2: removed unrelated whitespace change
Fixes: 6f68cacf61 ("virgl: Always enable emulated BGRA and swizzling unless specifically told not to")
Signed-off-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8119>
This connects printf up for NIR drivers, it lowers using the NIR
pass where it places the idx to the strings into the output buffer.
It also sets the global buffer header to the nir paths.
v2: remove dead function temps after lowering
v3: move to single string
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8254>
This pass creates a SSBO var for the printf buffer. It does an atomic increment
at the beginning of the buffer to determine where to write, then dumps
the args after that.
v2: [airlied]
Enhanced to use an index into a set of format info that is passed
back to the caller. The format info contains the number of args,
argument sizes and the format string.
v3: move format string lowering to vtn
v4: Jason reworked it.
v5: assume buffer has initial offset prebaked in and work from there.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8254>
[airlied: rebase fixup types]
v2: add support for storing strings in a sideband storage,
just store the index in print buffer.
v3: move the format strings into the nir shader as well
v4: simplify the write constant string + explicit sizes
move printf cap definition.
v5: just parse the format string to find string specifiers
using util code.
add vtn_fail_if if we can't get the correct type.
v6: use ralloc + avoid instr handler for srcs > 5
v7: use a packed struct 4 bytes align all of it
v8: simplify constant copy
v9: rework to use a single string and common string
extract code, (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8254>
"The implementation is based on what LLVM AMD target expect.
The compiler provided an id link to argument desc and format used.
The runtime need to store them to be able to parse the buffer filled by
the device during the kernel execution, ie, an id value to find the
format and followed by the arguments values"
v2: airlied
Split out the core code to a separate patch, add support for the
different global buffer formats, and move the LLVM specific code
as much as possible to the backend.
v3: handle strings differences better
llvm backend stores strings to the printf buffer
nir backend stores them to a sideband storage in NIR and stores
an index in the buffer.
v4: move specifier parsing to util code.
v5: rename buffer fmt + make printf code work
v6: handle args/specifier number mismatch support
v7: move to single string + struct
v8: use "%s" to print strings to avoid bad specifier, fix str
calcs.
v9: move to the same global buffer format as llvm, just strings
are different now. This requires changes to nir lowering.
buffer format:
[0] contains offset into buffer at start contains 8
[1] contains length of buffer
v10: printf const clean, add warning, endian assert, print %%
at end, fix specifiers to vector
v11: minor cleanups, make sure the format string never contains
an n.
v12: validate format string
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8254>
This adds storage for printf formats encoded as number of
argument sizes + the printf format string, and storage
for sideband printf strings if the backend wants them.
It adds a flag that decides if the backend wants AMD (LLVM)
behaviour or NIR wrt the format of the global buffer and
how to decode strings.
Based on work by EdB in his printf support, but made useful
to be generic.
I'm not a huge fan of the buffer format flag, but this was
the easiest way to denote the llvm abi buffer format.
v3: rename buffer fmt
v4: use a single strings storage and one struct
v5: move printf_info into module, cleanup serialisation struct
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8254>
gl_SecondaryFragColorEXT is mapped to FRAG_RESULT_COLOR and just
have a different io.dual_source_blend_index. We don't need to replicate
the color to other render targets in case of dual source blending, so
we could just remap it to FRAG_RESULT_DATA0 + index.
Fixes piglit test:
arb_blend_func_extended-fbo-extended-blend-pattern_gles2
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8245>
It's been regressed a couple of times recently, so let's try to make sure
it doesn't happen again. The setup here is mostly like llvmpipe-quick-gl,
but using quick_gl+quick_shader together, and a few more spectacularly
long-running tests dropped. I also excluded a bunch of unsupported
extensions, to minimize the size of the skip list checked into the tree
(it's still 200k, though).
The unfortunate exclusions in here are fp64 and int64 -- most of the
piglit tests for them don't run because softpipe is still GL3.3, and it's
an egregious number of skips to add to the checked in list.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8068>
index is of type uint32_t.
Fix defect reported by Coverity Scan.
Macro compares unsigned to 0 (NO_EFFECT)
unsigned_compare: This greater-than-or-equal-to-zero comparison of
an unsigned value is always true. index >= 0U.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8231>
We may have to make a new ATI_fs variant when the texture target changes.
Fixes a regression on piglit ati_fragment_shader-render-textargets on
llvmpipe after the switch to NIR ATI_fragment_shader.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8118>
I moved QPA-to-XML conversion to the runner, so Mesa CI (and developers!)
don't need to do quite so much in bash. I also made it clean up caselist
.qpa files since nobody ever wants them and we deleted them anyway. This
cleans up a ton of the job log output.
Additionally, I added a subcommend to turn the .csv into a junit output
that we can expose to gitlab. Now, the pipeline's status page will report
the failed testcases, and the "detail" button will give you a link to the
.XML to view for the failure. (We don't report all testcases because it's
too much load for the gitlab server). Note that this will 404 for the
LAVA runners for now, as they don't retain artifacts in gitlab (the plan
is to eventually have them minio upload the artifacts).
This uprev also includes a deqp output parsing fix, resulting in us
catching a couple more failures in some drivers.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8206>
If the program with a draw variant gets deleted, it could leave a dangling
pointer in st's draw module that would get referenced next state update of
a draw fallback.
Fixes a valgrind complaint in piglit's rasterpos test, which is flaky on
softpipe (but not due to this).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8207>
RA, DCE, and liveness assume that SSA and non-SSA normal indices are
indexed from 1 in a shared address space, with a maximum given by
bi_max_temp. As a stop gap, let's translate bi_index to old style
node numbers so those passes can be updated cleanly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8215>
Rather than open-coding indices with manual bit packing flying around,
let's add a data structure corresponding to a reference to some data.
(Think nir_src, ibc_ref, etc). In particular this allows us to pack in
more metadata, like an offset, for properly supporting limited vectors
(for I/O) without bloating the IR with swizzle fields.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8215>
Message-passing instructions that read/write staging registers access
either:
* a fixed number of registers
* vecsize registers (/2 for LD/ST_CVT if register_format is 16-bit)
* a computed number for TEXC
This adds the fixed counts into the XML for the first type and space to
specify the latter types.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8213>
Message-passing instructions have an associated message type, which the
clause header needs to signal. Instead of open coding this, let's
annotate the XML. Instructions not otherwise marked do not generate
messages.
Three exceptions apply:
* UBO loads need to use the attribute message type.
* Tile buffer access to Z/S needs ZS message type
* LD_VAR_SPECIAL.fragz needs ZS message type
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8213>
Certain transcendental instructions are not even possible to generate
since these bits are lowered away before the Bifrost backend is touched,
as far as I know.
Job management instructions (most interestingly DOORBELL) do not
correspond to OpenGL/OpenCL/Vulkan.
Segment arithmetic seems mostly useless for real code, any actual use
case I can think of is already covered by indirect loads/stores which
does the segment arithmetic implicitly. I've never seen this in blob
code, probably just a future proofing thing.
Dropping these instructions corresponds to a 3% reduction in generated
lines of code for the printer, builder, and packer for the new IR. Not a
terrible yield for functionality we'll likely never need.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8213>
Certain instructions are highly unlikely to ever be used in the Bifrost
compiler, due to differences in the Mesa stack versus the Arm compiler,
as well as hardware features added speculatively and that never became
API visible. It doesn't make sense to include these instructions in the
IR, so let's disable them, while retaining complete disassembly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8213>
These tests were designed before having access to canonical information
about the hardware and thus had two purposes:
* Validating that our understanding of an instruction (as defined by IR
semantics) matches hardware behaviour -- obsoleted by new information.
* Validating that the IR packing code is correct -- obsoleted by
rewriting the IR and rewriting the packing.
I dislike removing tests as much as the next person, but the value of
these will be nil by the end of the series, and will prove burdensome.
Proper unit tests will be useful, however.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8213>
The first callback which uses an image's loaderPrivate data was recently
added. Prior to this, dri2_create_image_khr_texture had been setting the
unused loaderPrivate field on the image it creates. This caused a
pointer type mixup in platform_android when it started using the new
callback. Fix this by no longer unnecessarily setting loaderPrivate in
dri2_create_image_khr_texture.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4010
Fixes: a2fb87eea6 ("egl/android: implement image cleanup callback")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8211>
In gfx10_sh_query_end a new query buffer is being allocated if there are pending
shader queries. However since emit_shader_query is called only once per draw
command, this newly allocated buffer is not used subsequently.
So even though this newly allocated buffer is treated as the last query buffer,
it is never actually used by any of the queries. Essentially there is no need
to allocate a new query buffer on the same context i.e. draw command.
The existing query buffer can be used to provide the answers to multiple queries.
Allocating an extra buffer makes subsequent queries wait on a query buffer whose
fence will never be triggered since there are no subsequent draw commands to
trigger the same.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8083>
v2:
- Rename the ".arm64-deqp-test-vk" template as
".arm64-deqp-test-freedreno-vk" (Eric).
v3:
- Rename the ".arm64-test" template as ".freedreno-test" (Eric).
- Rename the ".arm64-deqp-test" template as
".baremetal-deqp-test" (Eric).
- Rename the ".arm64-deqp-test-freedreno-vk" template as
".baremetal-deqp-test-freedreno-vk".
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6388>
Along the way, modify the piglit run script and refactor the way
piglit jobs are generated.
v2:
- Squashed the commit to remove tracie jobs (Eric).
v3:
- Extend information in the comments about the need to use a
running X server for replaying with Vulkan (Tomeu).
- Do actually fail if the upload doesn't work (Tomeu).
v4:
- Rename *-piglit-traces jobs with *-traces.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6388>
The -S and -B flags were officially introduced in CMake 3.13.
Avoids the following warning:
"
CMake Warning:
No source or binary directory provided. Both will be assumed to be the
same as the current working directory, but note that this warning will
become a fatal error in future CMake releases.
"
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6388>
The tag update was forgotten after e384476d1e ("ci: Bump deqp to
current vulkan-cts-1.2.5.0").
Noticeably, this introduces 2 more failures in the panfrost-t860 job:
- dEQP-GLES3.functional.shaders.matrix.inverse.dynamic.lowp_mat2_float_vertex
- dEQP-GLES3.functional.shaders.matrix.inverse.dynamic.mediump_mat2_float_vertex,Fail
Fixes: e384476d1e ("ci: Bump deqp to current vulkan-cts-1.2.5.0")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8198>
this is super gross. spirv doesn't provide any facility for doing per-component
writes, which means all components of a value must be written every time
to this end, we need to manually split both the src and dst composites and
do per-component access for each store in order to accurately handle both
non-sequential wrmasks (which could be handled by nir_lower_wrmasks, yes, but
we aren't using it) as well as partial wrmasks
see also mesa/mesa#4006
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8152>
GL allows the pipeline to "infer" a tcs shader if a tes shader is bound using
API-specified default values for gl_TessLevelOuter and gl_TessLevelInner,
but VK requires that both shaders be explicitly present
to handle this, create a generic tcs which translates all vs outputs to
invocation-based arrays and copy the appropriate value to the expected tes
input array location. also emit the default inner/outer values as push constants
so we don't have to recompile the shaders whenever the api calls occur
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8152>
To support multipass, querying perf counters happens in several steps
below.
0) There's a scratch reg to set pass indices for perf counters query.
Prepare cmd streams to set each pass index to the reg at device
creation time. See tu_CreateDevice in tu_device.c
1) Emit command streams to read all requested perf counters at all
passes in begin/end query with CP_REG_TEST/CP_COND_REG_EXEC, which
reads the scratch reg where pass index is set.
2) Pick the right cs setting proper pass index to the reg and prepend it
to the command buffer at each submit time.
3) If the pass index in the reg is true, then executes the command
stream below CP_COND_REG_EXEC.
Would need to implement for kgsl in the future.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6808>
There are still some commands unimplemented yet.
- vkGetPhysicalDeviceQueueFamilyPerformanceQueryPassesKHR:
The following patch supports this.
- vkAcquireProfilingLockKHR / vkReleaseProfilingLock
This patch supports only monitoring perf counters for each submit.
To reserve/configure counters across submits we would need a kernel
interface to be able to do that.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6808>
When introducing/removing these files, it's easy to forget to update the
yml to point to them. Instead of requiring the separate update, just have
the runner script pick the right one from a single per-gpu variable.
As a result, we now pick up the new deqp-lvp-skips.txt that was added but
not conected. This also required moving some bypass flakes from the
shared a630 flakes list to a separate list, which is a feature because now
we'd notice the introduction of flakes to the gmem path.
Fixes: ab79e6b8e3 ("ci: skip failing test on lavapipe")
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8147>
GL_MAX_VARYING_COMPONENTS is bumped to 124 since it should
not include the components of gl_Position. (Same as in blob)
GL_MAX_*_OUTPUT_COMPONENTS is bumped to 128, only
GL_MAX_GEOMETRY_INPUT_COMPONENTS is 64. (Same as in blob)
Per GL 3.2 spec the minimum of:
- GL_MAX_GEOMETRY_OUTPUT_COMPONENTS is 128
- GL_MAX_FRAGMENT_INPUT_COMPONENTS is 128
- others is 64
Per ARB_tessellation_shader the minimum of:
- GL_MAX_TESS_CONTROL_*_COMPONENTS to be 128
- GL_MAX_TESS_EVALUATION_*_COMPONENTS to be 128
Allows passing of:
gl-3.2-minmax
arb_tessellation_shader-minmax
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7917>
MALI_WRAP_MODE_CLAMP doesn't work fully on either GPU generation, so
use other wrap modes instead in some cases.
With nearest filtering, Midgard only clamps to the edge for two of the
edges, and uses the border colour for the other two. Using the clamp
mode on Bifrost causes broken rendering and/or GPU faults.
Fixes piglit test "texwrap" on both Midgard and Bifrost, and fixes
Chromium B.S.U. rendering on Bifrost.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8176>
SPIR-V modules can have multiple shaders (including of the same
stage), but the global variables are all declared for the whole
module. This can result in variables with same Binding but
incompatible types, so those need to be removed before we use.
Previously, a similar issue but with a narrower scope was fixed by
6775665e5e ("spirv: Eliminate dead input/output variables after
translation.").
This patch depends on the previous patch that prevents variables used
only in pointer initializers to be considered dead.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3686
Fixes: 3a266a18 ("nir/spirv: Add support for declaring variables")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8133>
Between the creation of a shader (from GLSL or SPIRV frontends) and
nir_lower_variable_initializers is called, variables may refer to
other variables for initialization. Those referred variables need to
be kept alive, so consider that in the pass.
Fixes: 7acc81056f ("compiler/nir: Add support for variable initialization from a pointer")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8133>
LLVM (like NIR) requires phi instructions to be before any other
instructions in the block. ac_branch_exited() can insert non-phi
instructions before visit_block() adds phis, so visit_block() should add
phi instructions before the non-phi instructions ac_branch_exited()
inserts.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: aa757f4f8c ("ac/llvm: fix demote inside conditional branches")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8054>
This was disabled due to some depth/stencil resolve CTS failures
which are now fixed.
I figured that disabling TC-compat HTILE for D32_SFLOAT+MSAA reduced
performance in Control by -11% on Vega10. In fact, the game only uses
D32_SFLOAT for depth rendering.
This gives a huge boost in Control on Navi10 (eg. +17% in MSAA4x).
Note that the game is still slower than PRO without MSAA on Navi10,
but as fast (or even a bit faster) on Vega10.
I think TC-compat HILE could also be enabled for D32_SFLOAT_S8_UINT
but it needs more testing first.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8143>
imageSize() expects the last component of the return value to be the
number of layers in the texture array. In the case of cube map array,
it will return a ivec3, with the third component being the number of
layer-faces.
Fixes: dEQP-VK.image.image_size.cube_array.*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8087>
The glPopAttrib optimizations incorrectly removed it.
Use GL_ALL_ATTRIB_BITS to mean "all texture parameters have changed" to
make it more efficient.
Fixes: d0e18550e2 - mesa: optimize saving/restoring bound textures for glPush/PopAttrib
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8046>
This changes the code so that program parameters no longer have to be
sorted (meaning uniforms and constants are before state variables).
Instead of checking if the parameter is a state variable for every element,
teach all functions to handle non-state parameters safely. This is better
for the most common case where parameters are sorted or semi-sorted.
The new enum STATE_NOT_STATE_VAR identifes that a parameter is not
a state variable.
Fixes: 63f7d7dd - mesa: take advantage of sorted parameters in _mesa_load_state_parameters
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3914
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8046>
We apparently don't have anything else making sure that it's flushed in
between use as a render target and use as a texture source, so bypass-mode
depth texture sampling could get stale data.
Fixes consistent (as far as I could see) failures in FD_MESA_DEBUG=nogmem
on:
dEQP-GLES31.functional.texture.multisample.samples_*.use_texture_depth_2d
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_draw
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8146>
If the only user is a trivial bcsel which in a second step
can be turned into a phi, this conversion is also worth it
even if the previous result is not undefined or constant.
Allows for some more loop unrolling or saves a few instructions.
Totals from 62 (0.04% of 139391) affected shaders (NAVI10):
SGPRs: 4976 -> 4992 (+0.32%)
VGPRs: 4408 -> 4472 (+1.45%); split: -0.45%, +1.91%
CodeSize: 453632 -> 464000 (+2.29%); split: -0.32%, +2.60%
MaxWaves: 527 -> 511 (-3.04%); split: +0.38%, -3.42%
Instrs: 84940 -> 86681 (+2.05%); split: -0.36%, +2.41%
Cycles: 11946844 -> 11783708 (-1.37%); split: -1.40%, +0.04%
VMEM: 9403 -> 10357 (+10.15%); split: +11.59%, -1.45%
SMEM: 3003 -> 3025 (+0.73%); split: +1.07%, -0.33%
VClause: 1756 -> 1997 (+13.72%); split: -0.11%, +13.84%
SClause: 2914 -> 2915 (+0.03%); split: -0.10%, +0.14%
Copies: 6426 -> 6768 (+5.32%); split: -4.14%, +9.46%
Branches: 2105 -> 2102 (-0.14%); split: -1.66%, +1.52%
PreSGPRs: 2921 -> 2909 (-0.41%); split: -0.55%, +0.14%
PreVGPRs: 4151 -> 4179 (+0.67%); split: -0.24%, +0.92%
cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8123>
Since we're requiring the branch condition to be in WQM, we have to ensure
that the block is in the worklist.
Fixes Trials Fusion hang at 4K and High settings.
fossil-db (Sienna):
Totals from 216 (0.15% of 139391) affected shaders:
SGPRs: 13392 -> 13360 (-0.24%)
CodeSize: 1321184 -> 1318592 (-0.20%)
Instrs: 255310 -> 254662 (-0.25%)
Cycles: 2178360 -> 2174652 (-0.17%)
Affected fossils in fossil-db are dirt4, nier and youngblood.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3863
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8145>
For "0" (its default) deqp-runner picks a number of jobs corresponding to
the CPU count, so set our hardware runners to use that (note that mesa's
deqp-runner.sh will pick a default of 4 if we don't specify a
DEQP_PARALLEL).
This means we'll allocate threads for the slow cores on a630 now, reducing
gles3 runtime from 6.5 minutes to around 5.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8132>
They've been doing so since the webdav results upload was added. This
means that we'll get normal truncated failures lists with the pointer to
the job artifacts, rather than filling a log file if you broke everything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8132>
according to spec, dvec3 and dvec4 vertex attribs require 2 slots (locations),
and so the shader loads have to be explicitly split to reflect this
helpfully, gallium already gives us the vertex element state in a split format,
so no other changes are necessary to have this work as expected
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8141>
According to the spec:
"pCounterBuffers is an optional array of buffer handles [...]
If pCounterBuffers is NULL, then transform feedback will start
capturing vertex data to byte offset zero in all bound transform
feedback buffers."
"If counterBufferCount is not 0, and pCounterBuffers is not NULL,
pCounterBuffers must be a valid pointer to an array [...]"
So counterBufferCount could be non-zero with pCounterBuffers
being NULL.
Fixes crash in RenderDoc when inspecting draw call with tesselation
or geometry shader present.
Fixes: 98b0d900 "turnip: rework streamout state and add missing counter buffer read/writes"
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8140>
According to the spec:
"pTessellationState [...] is ignored if the pipeline does not
include a tessellation control shader stage and tessellation
evaluation shader stage."
Fixes crash in RenderDoc when inspecting draw call with
geometry shader but without tesselation shaders.
Fixes: eefdca2e "turnip: Parse tess state and support PATCH primtype"
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8140>
I thought this was a bug in CTS but the Vulkan spec says:
"VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT specifies write access
to a color, resolve, or depth/stencil resolve attachment during
a render pass or via certain subpass load and store operations."
So, VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT is used to synchronize
depth/stencil resolve attachments. Yes, it's counterintuitive.
This can't actually be fixed properly for now because RADV performs
the end subpass barrier *before* resolve attachments instead of after.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8138>
In case one operand was renamed and another operand came
from an incomplete phi, it could happen, that the original
name was not restored.
This has no impact on the code, but ensures correct SSA
is maintained during RA.
Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8109>
EGL_EXT_protected_surface introduces EGL_PROTECTED_CONTENT_EXT,
while EGL_EXT_protected_content is about protected context.
When I implemented EGL_EXT_protected_surface I mixed up the 2
names, so this commit fixes it.
Fixes: bd182777c8 ("egl: implement EGL_EXT_protected_surface support")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8122>
Since Gallium supports 8 bit indices, this extension is a simple matter
of plumbing a value through, exposing a feature and flipping the switch
for the extension. This lets zink avoid up-converting the index-buffer
before drawing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8082>
Instead of checking whether the source and destination are the same,
we should check if the underlying BOs are the same, since we may
be suballocating resources from the same allocation and the kernel
will fail to execute jobs if the BO list has duplicated entries.
Fixes aborts with Unreal Engine due to failed TFU jobs.
Fixes: 30f1fc25ce ('v3dv: implement TFU blits')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8098>
This moves the parts of zink_format.c that also operates on zink_screen
into zink_screen.c. This has the benefit that we can start testing the
enum-translation code separately from the state.
This will make the next commit a bit cleaner.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7982>
We've been inconsistent between IID_PPV_ARGS,
__uuidof(var), and __uuidof(type). Since Linux doesn't
support the latter of these, they need to be changed.
While we're at it, switch all __uuidof to the more terse
IIV_PPV_ARGS option.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7937>
MSVC has an extension for getting IIDs (GUIDs) from types. Other
compilers can support this extension when targeting Windows, but
don't support it when targeting Linux. Instead, winadapter.h
defines __uuidof(var) to uuidof<decltype(var)>. Then dxguids.h
provides inline specialized definitions for the known D3D types.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7937>
On some platforms, the authenticate callback may be NULL, e.g. on
surfaceless. If a client tries to send a wl_drm.authenticate request
the handler tries to dereference the NULL pointer.
This can be reproduced with libva which unconditionally tries to use
wl_drm.authenticate even with render nodes [1]. Run a compositor with
a surfaceless context, then try to start e.g. mpv to trigger the
segfault.
[1]: https://github.com/intel/libva/pull/476
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7992>
This uses ralloc for spirv_shader and it's data-payload, which seems a
bit neater than having to remember to free twice. We can now also easily
piggy back on more sophisticated ralloc usage as well.
No need to use rzalloc here, as we'll write all memory in the struct,
and the struct isn't used as a hashmap key, so padding shouldn't matter.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8049>
Using the general layout for samplers can have terrible performance, so
let's use shader-read-only-optimal instead.
This is fairly straight-forward if we use conservative bounds for the
barriers, and assume they are being used in all stages.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7655>
Quoting a comment on the bug report:
I suspect the shader is incorrect.
When a (conditional) discard is executed then control flow
becomes non-uniform, meaning that subsequent implicit
derivatives required for the texture operation are not
computed correctly.
Using glsl_correct_derivatives_after_discard fixes it. Note
that for radeonsi this requires LLVM master to work properly.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/1386
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8005>
The spec says:
When disabled, it is as if theline stipple has its default value
(the default value being all 1's)
So treat pattern=0xffff as line stippling = off.
This improves performance in specviewperf13 snx lines tests.
For instance in the last test I get:
* master: 260 fps, gpu-load: ~92%
* with this commit: 280 fps, gpu-load: ~72%
(both tested with d60930c017 reverted)
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8105>
The problem was that the shader constants were based on the framebuffer
sample count and ignored the multisample enable state and the line/polygon
smoothing state, which uses MSAA rasterization that only sets SampleMaskIn
to get the coverage for alpha-blended smoothing (the PS epilog computes
the alpha channel from SampleMaskIn and blending generates the AA results).
- This is a complete rework that adds a new state for NGG cull constants.
- It fixes the same thing for the prim discard compute shader.
- It documents how VS_STATE.SMALL_PRIM_PRECISION is encoded.
It fixes blue corruption in Unigine Heaven with MSAA and Medium details
or better.
Fixes: 7648060dc0 - radeonsi: enable NGG culling by default on gfx10.3 dGPUs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8022>
According to the mali driver output, the Mali-400 GP provides space for
304 vec4 uniforms, globals and temporary variables.
The Mali-PP supports a uniform table up to size 32768 total.
However, indirect access to an uniform only supports indices up to 8192
(a 2048 vec4 array). Trying to access beyond that currently causes a pp
job timeout with both lima and the mali driver. To prevent indices
bigger than that in application uniforms, limit to 8192 for now.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8079>
I want to enable ASan runs on freedreno. It turns out it's a long road to
get there, starting with making sure we can run our unit tests with the
sanitizer enabled.
While I'm revving this container, add in valgrind too to make sure that
our build paths with valgrind enabled work.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7936>
There's no harm in checking for the extension on non-macOS, just do it.
Nor can I see any point in checking for both the layer and the
extension, since you're never going to see the extension if the layer
isn't available, so just check for the extension instead of the reduced
boolean. Simplify some variable naming while we're at it.
Acked-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8019>
According to ANDROID_get_native_client_buffer, EGL implementations must
guarantee that the lifetime of an EGLClientBuffer returned by
eglGetNativeClientBufferANDROID is at least as long as that of the
EGLImage which is bound to. Do this by acquiring a reference to the
underlying AHardwareBuffer for all ANativeWindowBuffers which are bound
to an _EGLImage.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7805>
The small DCE of the spiller only removes the original instructions
of rematerialized variables in case they are unused. If a variable
has been renamed, it cannot match any original instruction anymore.
Thus, the lookup is then unnecessary and can be omitted.
No fossil-db changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8055>
This naming change should clarify what we are actually doing here.
We are defining/managing what data is stored in the GPUs
uniform data storage area. A shader can access this area with
the ETNA_RGROUP_UNIFORM register group.
In this uniform data area we need to store const buffer data and
own immediate/constant data.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8062>
the code here tries to be too smart and only use a geometry shader if there's
actually multiple layers being uploaded, but the fragment shader also unconditionally
reads gl_Layer as long as the pipe cap for gs is set, which means that
in the case when the gs is dynamically disabled due to uploading a
single-layer surface, the fs has no input to read for gl_Layer and everything breaks
always using a gs isn't ideal, but it's considerably more work to manage multiple
fs variants based on layer usage
Fixes: c99f2fe70e ("st/mesa: implement PBO upload for multiple layers")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8067>
cmd_size was changed to size_t in commit 4b2445916e ("glthread:
change sizes to unsigned or size_t where needed").
Fix defect reported by Coverity Scan.
Macro compares unsigned to 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned
value is never true. cmd_size < 0UL
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8024>
Fix defects reported by Coverity Scan.
Identical code for different branches (IDENTICAL_BRANCHES)
identical_branches: The same code is executed regardless of
whether 0 is true, because the 'then' and 'else' branches are
identical. Should one of the branches be modified, or the entire
'if' statement replaced?
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8001>
With the block's end_ip accidentally being the ip of the next instruction,
contrary to the comment, you would end up doing end-of-block freeing early
and have the value missing when it came time to emit the next instruction.
Just expand the ips to have separate ones for start and end of block --
while it means that nir_instr->index is no longer incremented by 1 per
instruction, it makes sense for use in liveness because a backend is
likely to need to do other things at block boundaries (like emit the if
statement's code), and having an ip to identify that stuff is useful.
Fixes: a206b58157 ("nir: Add a block start/end ip to live instr index metadata.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7658>
It made the nir_print_shader() for NIR_TO_TGSI_DEBUG not match up with the
instructions being emitted, confusing me. Given that I'm seeing only like
1/3 shrinking in the SSA indices, just drop the reindexing since it's not
doing much (and we don't store that much per SSA index).
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7658>
Getting the live SSA defs will do it if necessary, and that liveness is
what we use the instr index for. (We used to need to do it manually, and
cleanups for merging resulted in the index being treated as metadata).
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7658>
In 2900f82e19 I mistakenly used tc_set_resource_reference in both
tc_transfer_unmap and tc_call_transfer_unmap.
This causes a leak because tc_call_transfer_unmap clears dst before
acquiring a reference, so it must only be used when initializing
tc_payloads.
This fixes the perf drop reported by Marek in MR 7098.
Fixes: 2900f82e19 ("gallium/u_threaded: fix staging and non-staging conflicts")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8035>
Add support for the half float texel type and pixel types.
This enables the OES_texture_half_float extension.
Tested with piglit test oes_texture_float (half float and with linear
filtering) and passes all deqp half float related tests.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8020>
In targets that support half float textures but not float textures (so
without ARB_texture_float), the previous logic did not allow for
enabling half float texture support in desktop OpenGL.
OES_texture_half_float is only valid for OpenGL ES 2.0 contexts, so
include ARB_half_float_pixel in the logic to cover OpenGL too.
Remove _mesa_is_gles3 from the check since in case of a gles3 context,
OES_texture_half_float is already assumed to be enabled.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8020>
According to the spec for both vkCmd{Begin,End}TransformFeedbackEXT(),
if pCounterBufferOffsets is NULL, then it is assumed the offsets are
zero.
Fixes crash on dEQP-VK.transform_feedback.simple.backward_dependency_no_offset_array
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8057>
TFU can convert from these new formats, but can not autogenerate
mipmaps from them.
Hence we need to set what is the purpose to know if the formats are
supported or not.
v1:
- Use the same and shorter variable name (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8050>
When loading the depth, we want to store component X of the texel fetch
result into position.Z which can't be expressed without an extra MOV
unless the backend replicates the depth.
Stencil is always expected in the Y component, but some TGSI shaders
assume it will also be available in X, which only works if the backend
replicates the stencil value.
Let's fix those shaders so backend drivers are not forced to replicate
the depth/stencil values.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7922>
The classic OSMesa renders directly into user memory using
src/mesa/swrast, while gallium OSMesa renders using softpipe or llvmpipe
and copies out at glFlush() time. This would make gallium look like a
worse choice for OSMesa, except that swrast is:
1) Painfully slow to render compared to llvmpipe
2) Incorrect at derivatives
3) Limited to GL 2.1 instead of GL 4.6
In my survey of OSMesa users, debian was the remaining holdout with
classic OSMesa in use on hurd and some rare non-LLVM-supported
architectures (sh4, alpha, etc.). As of today, they've switched to
softpipe-based gallium OSMesa for them.
To prevent people from running the wrong OSMesa (to the extent that
running OSMesa can ever be the right thing), delete the classic
version.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Closes: #320Closes: #877Closes: #2297
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1243>
Since the depth buffer starts out as a malloc, and we weren't clearing it,
you could get undefined values in your top 8 bits. This should fix
intermittent failures of the depth test.
(Sadly, valgrind wasn't catching this, presumably because the 32-bit value
there *is* written, just some bits are left undef)
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1243>
Replace mesa's slightly different container_of() with one more aligned
to the linux kernel's version which takes a type as the 2nd param. This
avoids warnings like:
freedreno_context.c:396:44: warning: variable 'batch' is uninitialized when used within its own initialization [-Wuninitialized]
At the same time, we can add additional build-time type-checking asserts
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7941>
As it will fail right away if there isn't, and that prevents the CI to
run on people's branches.
$ ci-fairy check-merge-request --require-allow-collaboration --junit-xml=check-merge-request.xml
ERROR: No open merge request against mesa/mesa with sha 9f6aba4be0
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Fixes: d4151f2e ("ci: Run sanity job only in pre-merge pipelines")
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8032>
This should make more important jobs visible without scrolling on
pipeline pages.
The deploy stage jobs only depend on the sanity job or none at all, so
this has no impact on when the former can run.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7926>
It's more of a nuisance than useful for forked branches.
This means the test-docs job can no longer have a direct dependency on
sanity for forked branches, so split it up into two jobs: one for
pre-merge pipelines, one for forked branches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7926>
And use it in jobs for images using another Mesa image as their base.
Should fix the build of images which don't use another Mesa image as
their base (by no longer setting the FDO_BASE_IMAGE variable).
Fixes: 0781d9825b "ci: Append $MESA_TEMPLATES_COMMIT to image tags"
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7957>
This is more or less just compile-tested, but this seems about right to
me. I see the extension being supported when running on top of Zink,
which makes me happy enough for now ;)
v2: fixed up to copy the structs on pipeline create [airlied]
gallium doesn't support the 0 divisor case yet.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7930>
We need to pick 1u vs 1.0f based on the type of the texture, just like for
normal samples. Move the decision up to the create_sampler_view, and use
that value from both sampler paths.
Cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8012>
Some MSAA+fmt combination require writeout to be split. Right now, it
only impacts blend shaders since we only support MSAA 4x, and the only
formats that could exceed the 128bit/pixel limit in MSAA 4x are
not supported by the fixed-function blend unit. We thus rely on the
blend shader to split things properly. Things will change once we add
MSAA 8x/16x to the mix, since even the blendable formats will exceed
the 128b/pixel limit in that case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7984>
Commit 64d6f56ad2 ("panfrost: Allocate syncobjs in panfrost_flush")
aimed at optimizing the fencing logic but it looks it also broke the
fence-based synchronization in subtle ways.
Indeed, now that the fence only waits on a single syncobj, we're not
guaranteed that all jobs queued in panfrost_flush_all_batches() will
be done when the fence is signaled, because jobs at the top level
(those stored in the batches hashmap) have not inter-dependencies.
Commit 9e397956b0 ("panfrost: signal syncobj if nothing is going to
be flushed") made this even more apparent by signaling the fence right
away if nothing was left to be drawn in the current context, thus
ignoring any of the batches left to flushed in the ->batches map.
If we want to keep relying the existing kernel APIs there's clearly no
ideal solution here. We can either go back to the original fencing
mechanism where each fence contained an array of syncobjs to be tested
or serialize jobs that have no explicit dependencies so we know the last
submitted job will also be the last one to return. The orginal approach
has proven to add quite a significant overhead (caused by the amount of
ioctls and the time spent in kernel space to gather dma fences attached
to those syncobjs and test them). So let's go for the simple solution
where we have a single syncobj bound to the context which we update to
point to the last job out_sync every time we submit a top-level job.
This approach implies reworking the way we create fences since we
need to capture the syncobj state at the time the fence is created.
Unfortunately, there's not SYNCOBJ_CLONE ioctl, which forces us to
export/create/import a fence so we have a new object that's not
subject to changes done to the context syncobj.
If we want to further optimize the logic, we should probably explore
some of those options:
1/ Adding array based SYNCOBJ ioctls (SYNCOBJ_{CREATE,DESTROY,CLONE}_ARRAY)
so we can mitigate the cost of ioctls when we need to manipulate
arrays of syncobjs
2/ Support synchronization jobs. That is, jobs that have a NULL job chain
but an array of sync_in and a sync_out to allow creating
synchronization points
3/ Add syncobj aggregators so we only have to wait on one syncobj from
userspace. The syncobj aggregator would wait for all sub syncobjs to
be signaled before signaling the top-level one.
Fixes: 64d6f56ad2 ("panfrost: Allocate syncobjs in panfrost_flush")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7831>
This implements fast-path blit using the TLB to blit from one buffer to
another, if conditions for allowing this are met.
v1:
- Move checks in the code (Iago)
v2:
- Use function to compute tile width and height (Iago)
- Fix commit message (Iago)
- Use surface size to compute draw_tiles_{x,y} (Iago)
- Move checks (Iago)
- Fix tile draw parameters (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7816>
This implements blit operation using the TLB.
It uses a source color buffer (bbuf) which will be blitted to color
buffer 0.
It also takes in account the number of samples for the input and output
so it can perform multisample resolve.
v1:
- Fix comment (Iago)
- Removed needless brackets (Iago)
- Ensure msaa is correctly set (Iago)
- Get rid of job->resolve (Iago)
- Add rbuf as part of job's key (Iago)
- Rename rbuf/rsurf by bbuf/bsurf (Iago)
- Revert needless change (Iago)
v2:
- Remove spurious change (Iago)
- Add assert for safety reasons (Iago)
- Add brackets in condition (Iago)
- Fix commit message and title (Iago)
- Do tile blit only for version >=4.0 (Iago)
v3:
- Add assertion (Iago)
- Fix comment (Iago)
- Change commit title (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7816>
Should help some compilers/static analyzers understand this code and avoid
things like this:
../src/intel/tools/aubinator_error_decode.c:850:19: warning: "path" may be used uninitialized in this function [-Wmaybe-uninitialized]
850 | ret = asprintf(&filename, "%s/%d/i915_error_state", path, minor);
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7994>
Fix defects reported by Coverity Scan.
uninit_member: Non-static class member m_interpolate is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member m_lds_pos is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member m_mask is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7909>
This patch fixes this Meson build error.
$ meson builddir \
-Dshared-llvm=disabled
-Ddri-drivers=''
-Dbuild-tests=true \
-Dgallium-drivers=swrast \
-Dvulkan-drivers=''
[...]
/usr/bin/ld: src/gallium/auxiliary/libgallium.a(gallivm_lp_bld_misc.cpp.o): in function `llvm::InitializeNativeTarget()':
llvm/Support/TargetSelect.h:118: undefined reference to `LLVMInitializeX86TargetInfo'
/usr/bin/ld: llvm/Support/TargetSelect.h:119: undefined reference to `LLVMInitializeX86Target'
/usr/bin/ld: llvm/Support/TargetSelect.h:120: undefined reference to `LLVMInitializeX86TargetMC'
/usr/bin/ld: src/gallium/auxiliary/libgallium.a(gallivm_lp_bld_misc.cpp.o): in function `llvm::InitializeNativeTargetAsmPrinter()':
llvm/Support/TargetSelect.h:132: undefined reference to `LLVMInitializeX86AsmPrinter'
/usr/bin/ld: src/gallium/auxiliary/libgallium.a(gallivm_lp_bld_misc.cpp.o): in function `llvm::InitializeNativeTargetDisassembler()':
llvm/Support/TargetSelect.h:156: undefined reference to `LLVMInitializeX86Disassembler'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7777>
We've been getting spurious failures from the new VC4 CI, which I believe
are due to this set of tests (which have been showing up along with a GPU
hang report in the list of flaky tests in the failing jobs). This was a
known issue I had in vc4.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7998>
This is an issue on the cheza platform, the theory is due to some old
firmware bug that will be fixed in future platforms. Given that cheza was
a target that didn't get released and we expect future platforms to be
fixed, just detect the issue and restart.
I've noticed this error in my CI monitoring less than once a week.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7993>
among all Android gen rules '::' was used only here to declare dependencies;
mesa development and stable branch are worth receiving the fix
Fixes the following building errors with Android 7:
obj/STATIC_LIBRARIES/libmesa_nir_intermediates/spirv/gl_spirv.P:184: *** target file
gen/STATIC_LIBRARIES/libmesa_nir_intermediates/spirv/vtn_generator_ids.h' has both : and :: entries. Stop.
Cc: "20.3" <mesa-stable@lists.freedesktop.org>
Fixes: 1070bba19e ("android: fix SPIR-V -> NIR build")
Reported-by: youling257 <youling257@gmail.com>
This avoids a possible issue with MSAA sysmem clears, which use a 3D clear
path which assumes draw states are disabled, and are emitted in draw_cs in
BeginRenderPass.
(checking for TU_CMD_DIRTY_DRAW_STATE also allows not emitting the draw
states if they will be re-emitted on the next draw anyway. the previous
patch makes it so TU_CMD_DIRTY_DRAW_STATE is always set outside of
renderpasses)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7899>
* do the disable in EndRenderPass2 to fix the missing disable for sysmem
* we don't need a disable at the end of every tile, or between binning pass
and gmem pass (the first draw in draw_cs emits all the draw states)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7899>
If ACCESS_NON_UNIFORM is not specified, we can assume the resource is
uniform. This requires nir_lower_non_uniform_access to remove that flag.
A few Detroit: Become Human shaders use a index sourced from a fragment
input which is expected to be uniform.
shader-db (Navi):
Totals from 8 (0.01% of 127638) affected shaders:
SGPRs: 224 -> 384 (+71.43%)
VGPRs: 208 -> 112 (-46.15%)
CodeSize: 5360 -> 5344 (-0.30%); split: -1.49%, +1.19%
Instrs: 1036 -> 1028 (-0.77%); split: -1.93%, +1.16%
VMEM: 1320 -> 608 (-53.94%)
SMEM: 384 -> 336 (-12.50%); split: +14.58%, -27.08%
VClause: 24 -> 16 (-33.33%)
SClause: 48 -> 56 (+16.67%)
PreSGPRs: 124 -> 216 (+74.19%)
PreVGPRs: 168 -> 88 (-47.62%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5201>
In the Chrome WebGL Aquarium stress test, 20 instances of Chrome will run
Aquarium simultaneously over 20+ hours. That causes Chrome crash.
During the stress, glBeginQueryIndexed is called frequently.
1.Each query will only use 32 bytes from query_buffer_uploader. After the offset
exceed 4096, it will alloc new buffer for query_buffer_uploader->buffer
and release the old buffer.
2.But iris_begin_query will call u_upload_alloc when the offset changed, and it
will increase the query_buffer_uploader->buffer->reference.count every time
when it called u_upload_alloc.
3.So when u_upload_release_buffer try to release the resource of
query_buffer_uploader->buffer, its reference.count is
already equal to 129. pipe_reference_described will only decrease its reference
count to 128.So it never called old_dst->screen->resource_destroy.
4.The old resouce bo will never be freeed. And chrome will called mmap every time
when it alloc new resource bo.
5. Chrome process map too many vmas in its process. Its map count exceed the
sysctl_max_map_count which is 65530 defined in kernel.
6. When iris_begin_query want to alloc new resource bo, it will meet NULL pointer
because mmap return failed. Finally chrome crashed when it access this NULL resource
bo.
The fix is decrease the reference count in iris_destroy_query.
Patch is verified by chrome webgl Aquarium test case for more than 72 hours.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Yang Shi <yang.a.shi@intel.com>
Reviewed-by: Alex Zuo <alex.zuo@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7890>
avoids errors seen when building on OpenBSD/amd64
../src/amd/compiler/aco_instruction_selection.cpp:1677:62: error: ambiguous conversion for functional-style cast from 'unsigned long' to 'aco::Operand'
bld.vop3(aco_opcode::v_mul_f64, Definition(dst), Operand(0x3FF0000000000000lu), tmp);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
glibc uses unsigned long for uint64_t on LP64 archs and unsigned long long for
uint64_t on ILP32 archs. On OpenBSD unsigned long long is used for uint64_t
on all archs.
The Operand constructors are uint8_t uint16_t uint32_t uint64_t
use UINT64_C so lu or llu suffix will be used as needed.
Fixes: df645fa369 ("aco: implement VK_KHR_shader_float_controls")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7944>
Merging primitives generates incorrect gl_PrimitiveID[In] values.
So make merged primitives construction non-destructive and fallback
to drawing with original primitives if a program reads gl_PrimitiveId.
This commit adds _mesa_update_primitive_id_is_unused modeled after
_mesa_update_allow_draw_out_of_order to update ctx->_PrimitiveIDIsUnused
each time shaders are updated.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7078>
For (Multi)DrawArrays and (Multi)DrawElements commands, the storage size
needed are known early so we can make sure that the prim_store/vertex_store
will be big enough to store the whole command.
This reduces the amount of drawcalls in snx03 tests. For instance in test10:
| Num draw calls | GPU-load |
------|----------------|-----------------|
| Before | After | Before | After |
------|--------|-------|---------|-------|
test10| 35k | 8k | 58% | 80% |
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7078>
- xxhash is faster than sha1.
- remove superfluous calls to strlen
Using SPECviewperf13 snx-03 first subtest and "perf -e cycles -g", perf report says:
Before | After | Function
---------|--------|---------------
47.39% | 47.36% | _mesa_CallList
5.00% | 3.03% | _mesa_program_resource_location
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7078>
Merge consecutive primitives using the same mode while constructing the index buffer.
This improves performance a lot (x3 - x10) SPECviewperf13 snx-03 test by reducing the
number of draw calls per frame.
Here are some numbers for 4 of the tests:
| Num draw calls | GPU-load |
------|----------------|-----------------|
| Before | After | Before | After |
------|--------|-------|---------|-------|
test1 | 390k | 16k | 68% | 90% |
test2 | 370k | 16k | 40% | 90% |
test3 | 1.2M | 35k | 38% | 78% |
test10| 3.5M | 35k | 36% | 58% |
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7078>
Less primitive modes allows for better primitive merging.
Lines are always used (instead of picking dynamically lines or line
strips for instance) because:
- they don't need primitive restarts to be merged
- they perform better (at least on radeonsi) - SPECviewperf13 snx subtests
with lines (like 4 or 10) are 1.5x-2x faster.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7078>
external/mesa3d/src/mesa/math/m_matrix.c:1403:13: error: address of array 'mat->inv' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
if (mat->inv && (mat->flags & MAT_DIRTY_INVERSE)) {
~~~~~^~~ ~~
Fixes: 3175b63a0d ("mesa: don't allocate matrices with malloc")
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7861>
PIPE_MAX_CONSTANT_BUFFERS is 32, however many Vulkan implementations
has maxPerStageDescriptorUniformBuffers that exceeds it, for example:
radv 8388606,
anv 64
nvidia 1048580 for RTX 2000 and up.
and, together with the current zink logic, the returned value
will exceed the maximum allowed value for the cap.
This causes cso_destroy_context to pass big values back to zink
(via zink_set_constant_buffer), resulting in access beyond end of
allocated buffer for all UBOs.
Cap the cap to PIPE_MAX_CONSTANT_BUFFERS (32), not INT_MAX.
Add an assert to verify future drivers.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: daaf5f1d18 ("gallium: Fix leak of currently bound UBOs at CSO context destruction.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7976>
This looks like a typo. Packed vulkan formats should always map to the
inverse order of the corresponding gallium notation. Besides, it makes
no sense that unsigned and signed formats have different ordering.
Fixes: cdfb1d925f ("zink: add last few format maps for ARB_vertex_type_2_10_10_10_rev")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7964>
In particular, if we have an index or bindless handle we were passing
the original handle which, technically, is uniform within the context of
the if. However, we can save the back-end compiler some effort if we
pass it the result of the read_first_invocation().
(Rebased by Kenneth Graunke and Rhys Perry.)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7592>
There's no good reason why drivers that doesn't grok geometry,
tesselation or compute shaders needs to deal with them.
This fixes a crash on a lot of Piglit tests for Zink.
Fixes: daaf5f1d18 ("gallium: Fix leak of currently bound UBOs at CSO context destruction.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7971>
Fixes the following building error:
FAILED: ninja: 'external/mesa/src/gallium/drivers/freedreno/freedreno_log.c',
needed by 'out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_pipe_freedreno_intermediates/freedreno_log.o',
missing and no known rule to make it
Fixes: 03e7c93b82 ("freedreno: Remove fd_log()")
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7942>
Changelog:
- add freedreno_tracepoints.c.{c,h} gen rules for Android $(MESA_PYTHON3)
- update Makefile.sources with the required generated sources
Fixes the following building errors:
external/mesa/src/gallium/drivers/freedreno/freedreno_gmem.c:35:10:
fatal error: 'u_tracepoints.h' file not found
^~~~~~~~~~~~~~~~~
1 error generated.
FAILED: out/target/product/x86_64/obj/SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so
...
ld.lld: error: undefined symbol: __trace_end_clear_restore
>>> referenced by freedreno_tracepoints.h:38 (out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_freedreno_intermediates/
freedreno_tracepoints.h:38)
...
ld.lld: error: undefined symbol: __trace_start_vsc_overflow_test
>>> referenced by freedreno_tracepoints.h:272 (out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_freedreno_intermediates
/freedreno_tracepoints.h:272)
ld.lld: error: too many errors emitted, stopping now
Fixes: a02dcb970f ("freedreno: Add GPU tracepoints")
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7942>
Android rules to build u_trace sources and u_tracepoints generated sources
Changelog:
- add util/u_tracepoints.{c,h} gen rules for Android using $(MESA_PYTHON3)
- update Makefile.sources with the required sources and generated sources
Fixes: 3471af9c6c ("gallium/aux: Add GPU tracepoint mechanism")
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7942>
This is in preparation for additional generated sources rules for Android
which will require ad hoc rules, so it is necessary to replace old ones
NOTE: pre-existing gen rules based on $(transform-generated-source) macro
are both obsolete and use of '%' pattern rule is incompatible with ad hoc
python commands for different targets
Changelog:
- remove util/u_format_srgb.c target
- replace obsolete indices/{u_indices,unfilled}_gen.c 'common' gen rules
with 'per target' gen rules using $(MESA_PYTHON3) as per meson gen rules
Fixes: 3471af9c6c ("gallium/aux: Add GPU tracepoint mechanism")
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7942>
MoltenVK does not export the vkGetPhysical*2() functionns, even in Vulkan 1.2.154.0 where the instance version moves from 1.0 to 1.1.
If the extension is present and used the KHR versions of the functions can be used.
From the spec the vkGetPhysicalDevice*2() functions should be avaiable from Vk 1.1 loaders and devices. Which implies MoltenVK might be misbehaving.
This change allows the extension to be used, if present, before the Vk 1.1 version check.
Fixes: 752f6d80 ("zink: setup version dependent VkPhysicalDeviceVulkan*Features and VkPhysicalDeviceVulkan*Properties.")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7960>
This ensures all images get rebuilt when we update to a newer
ci-templates commit.
v2:
* Append to WINDOWS(_UPSTREAM)_IMAGE instead of WINDOWS_TAG. The latter
failed, apparently variables are not expanded recursively on the
Windows runners.
* Use separate MESA_IMAGE_TAG/MESA_BASE_IMAGE variables instead of
appending to each FDO_DISTRIBUTION_TAG/FDO_BASE_IMAGE separately for
Linux jobs, to prevent accidentally dropping the suffix.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7902>
Since the job which creates the cache tarball starts from the previous
cache, the cache kept accumulating cruft and growing bigger.
This cuts the size of the tarball in half (from almost 600M to under
300M), which can translate to significant time savings when downloading
it on some runners.
v2:
* Use git gc --aggressive (Eric Anholt)
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7920>
We were returning a pointer to use-after-free the depth buffer, not
updating it in after future rendering, and also not y flipping it. A
little refactor to mostly reuse the color buffer's path makes it easy to
do it all right.
Adds a unit test to check for these bugs.
Closes: #885
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7886>
This tests OpenGL ES 2.0 CTS suite with VC4 drivers, through baremetal
Raspberry Pi 3 devices.
The devices are connected to a switch that supports Power over Ethernet
(PoE), so the devices can be started/stopped through the switch, and
also to a host that runs the GitLab runner through serial-to-USB cables,
to monitor the devices to know when the testing finishes.
The Raspberries uses a network boot, using NFS and TFTP. For the root
filesystem, they use the one created in the armhf container. For the
kernel/modules case, this is handled externally. Currently it is using
the same kernel/modules that come with the Raspberry Pi OS. In future we
could build them in the same armhf container.
At this moment we only test armhf architecture, as this is the default
one suggested by the Raspberry Pi Foundation. In future we could also
add testing for arm64 architecture.
Finally, for the very rare ocassions where the Raspberry Pi 3 device is
booted but no data is received, it retries the testing for a second
time, powering off and on the device in the process.
v2:
- Remove commit that exists capture devcoredump (Eric)
- Squash remaining commits in one (Andres)
v3:
- Add missing boot timeout check (Juan)
v4:
- Use locks when running the PoE on/off script (Eric)
- Use a timeout for serial read (Eric)
v5:
- Rename stage to "raspberrypi" (Eric)
- Bump up arm64_test tag (Eric)
v6:
- Make serial buffer timeout optional (Juan)
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7628>
If we don't do that, the line_stride might be wrong. We also need
to create a new BO if the previous one is too small to hold the
linear version, which can happen with the tile alignment done on
linear+renderable resources.
Suggested-by: Icecream95
Fixes: d4f662a252 ("panfrost: Update the resource layout when doing a tile -> linear conversion")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7821>
Fix defects reported by Coverity Scan.
uninit_member: Non-static class member m_maxBaseAlign is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member m_maxMetaBaseAlign is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7768>
Add a helper to get debug options that specify a file path, with
additional checking for suid to prevent unintended file access via
mesa's debug features.
Unlike other DEBUG_GET_ONCE_*, this returns a new file ptr each time
it is called (although it only does the lookup of the path once).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7818>
This is completely untested!
In commit 3175b63a0d, Marek stopped
allocating the GLmatrix::inv field with malloc, instead embedding
it directly in the structure. So, we need to drop a level of
indirection here and use (matrix pointer + MATRIX_INV) as the
inverse matrix array directly, rather than reading a pointer at
that offset and chasing it.
Fixes: 3175b63a0d ("mesa: don't allocate matrices with malloc")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7889>
In commit 3175b63a0d, Marek stopped
allocating the GLmatrix::inv field with malloc, instead embedding
it directly in the structure. So, we need to drop a level of
indirection here and use (matrix pointer + MATRIX_INV) as the
inverse matrix array directly, rather than reading a pointer at
that offset and chasing it.
Fixes: 3175b63a0d ("mesa: don't allocate matrices with malloc")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7889>
Z scaling case without nearest filter needs a 3D texture, so add a 3D
texture path and use it to cover all scaling/mirroring cases.
The "rotation" argument for the clear/blit "setup" function is replaced
with a more generic "blit_param", which has a different meaning for the
3D blit path. (to avoid having too many arguments)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7781>
Some flex/bison installs on windows include yacc and lex
as bash scripts that call bison/flex binaries. That creates
an extra layer of dependencies because those won't work from
plain cmd.exe/powershell. Lets switch the lookup order so that
by default we pickup vanilla binaries instead of scripts.
Reviewd-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7806>
Unlike the other drivers, the D3D12 driver is hardware accelerated, so
it's going to be a more reasonable choice. So let's prefer it.
This only matters for people who build with the D3D12 driver. And they
can set the GALLIUM_DRIVER environment variable as appropriate to
override it.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7534>
In the following sequence:
- transfer_map(buffer, DISCARD) // use staging upload
- memcpy(...)
- transfer_unmap
- draw
- transfer_map(buffer, UNSYNCHRONIZED) // non-staging upload
- memcpy(...)
- draw
Currently the order of operations is:
- map#1 - staging buffer
- memcpy to staging buffer
- map#2
- memcpy to buffer
- staging buffer copy to real buffer
- draw#1
- draw#2
When the 2nd map operation doesn't use UNSYNCHRONIZED, the tc_sync_msg() call
will make sure that the bo is unused before mapping it.
But, if it does use UNSYNCHRONIZED and the mapped intervals overlap this commit
clears the UNSYNCHRONIZED to make sure ordering is maintained.
This will affect performance, but correct output is better than fast output.
See https://gitlab.freedesktop.org/mesa/mesa/-/issues/3611.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7098>
This introduces a new flag in shader_info to know if a fragment
shader uses sample shading, even if there is no inputs.
During NIR linking, constants varyings are optimized and the
per-sample interpolation info (ie. the sample qualifier) might
be removed if nir_shader_gather_info() is called again.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7876>
It's a little unclear from the GLX_ARB_create_context spec whether the
list of supported extensions means what the client supports at all, or
what it knows an indirect GLX encoding for. You'd think it could only
really matter for indirect, since the only way the server would know
about GL commands (as opposed to GLX commands) is if the context was
indirect. And indeed for Xorg's GLX it doesn't matter, because it
doesn't check this, assuming that anything a direct client says works
works, and clamping the GL version based on the protocol it has code
for.
But if you're NVIDIA, apparently, you check this even for direct
contexts. And since drisw creates a nominally "direct" context, this
means llvmpipe and friends get clamped to 3.0 for desktop GL (since
that's as far as the protocol is defined) and can't do GLES at all.
So, whatever, just go ahead and claim to support everything. The wire
representation of the supported versions is strange (see comments in the
code) but it matches what NVIDIA does.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7369>
This is Sort Of handled by nerfing GL_VERSION in __indirect_glGetString,
but that doesn't cover GLES contexts which we also don't have any
indirect support for. Xorg's GLX would reject this for us since it has
the same limitation, but NVIDIA's GLX seems to interpret a request for
ES 2.0 as desktop, despite having the ES2 profile bit set, leading to a
very confusing GL_VERSION string and probably not the ES2-compatible
context you were hoping for.
Since we may now return NULL from indirect_create_context_attribs for
reasons other than malloc failure, we need to reasonably handle the case
where gc == NULL by the time we get to the XCB call. We rely on the
server to generate correct return values in this case, but if it
succeeds despite our client-side failure we just throw GLXBadFBConfig
(chosen to keep piglit/glx-create-context-core-profile happy, since
nothing else seems to hit it).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7369>
If you dlclose your driver, the leak reports look like:
#0 0xffff9c7e5e7c in malloc (/lib/aarch64-linux-gnu/libasan.so.6+0x9ee7c)
#1 0xffff94aaaa48 (<unknown module>)
#2 0xffff94aa5ff4 (<unknown module>)
#3 0xffff94d1867c (<unknown module>)
#4 0xffff94d184f0 (<unknown module>)
#5 0xffff94c9a990 (<unknown module>)
#6 0xffff94c92e30 (<unknown module>)
#7 0xffff94c91d48 (<unknown module>)
#8 0xffff946eb800 (/home/anholt/src/mesa/build-aarch64-asan/src/egl/libEGL.so.1.0.0+0xfe800)
#9 0xffff94c72874 (<unknown module>)
#10 0xffff946ede68 (/home/anholt/src/mesa/build-aarch64-asan/src/egl/libEGL.so.1.0.0+0x100e68)
#11 0xffff94bf7134 (<unknown module>)
#12 0xffff9c686450 in dri2_create_screen ../src/egl/drivers/dri2/egl_dri2.c:1079
which is not terribly useful. Probe if we're building with asan and just
skip closing the driver in the happy path (which seems to be the standard
practice for loadable modules with this tool).
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7695>
This is only required for the DRI-path. For the swrast code-path, we
don't need this.
We also don't need to explicitly test for it in the DRI-path, because we
test for KHR_external_memory_fd, which depends on KHR_external_memory. So
no implementation will expose the former without the latter.
Fixes: f1432fd3e2 ("zink: generate extension infrastructure using a python script")
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7882>
Three things:
1. MSVC dislikes mismatching declaration/definition of __declspec(dllexport).
Since CL headers don't have the declspec, the implementations should't either.
2. An unnamed brace-initialization gets deduced as an initializer list, instead
of a brace-constructed string. Just add the type name.
3. posix_memalign doesn't exist on Windows.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7680>
Two things:
1. While instantiating a template where clover::llvm and ::llvm are
both resolvable for unscoped llvm, MSVC complains about ambiguity.
Resolve by not using namespace clover, leaving only ::llvm as a
valid namespace.
2. LLVM headers (specifically Allocator.h) use __declspec(restrict),
but Mesa's util headers #define restrict to __restrict for C++.
Since __declspec(__restrict) is invalid, make sure we always include
Allocator.h first before the util header.
3. Change a uint/int to uint64_t to match the type returned from LLVM.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7680>
When DEBUG is not defined, no error reporting is done, the error is
just returned back. The current definition a couple of warnings in
anv_formats.c. First when the return value is intentionally ignored
../src/intel/vulkan/anv_formats.c:989:48: warning: statement with no effect [-Wunused-value]
989 | vk_errorfi(instance, physical_device, VK_ERROR_FORMAT_NOT_SUPPORTED,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/intel/vulkan/anv_private.h:486:55: note: in definition of macro ‘vk_errorfi’
486 | #define vk_errorfi(instance, obj, error, format, ...) error
| ^~~~~
and also when an argument is used only
../src/intel/vulkan/anv_formats.c:908:25: warning: unused variable ‘instance’ [-Wunused-variable]
908 | struct anv_instance *instance = physical_device->instance;
| ^~~~~~~~
../src/intel/vulkan/anv_formats.c: In function ‘anv_GetPhysicalDeviceImageFormatProperties2’:
../src/intel/vulkan/anv_formats.c:1231:25: warning: unused variable ‘instance’ [-Wunused-variable]
1231 | struct anv_instance *instance = physical_device->instance;
| ^~~~~~~~
to avoid both issues, use a static inline function that just returns
it's argument but can consume other input. Ignoring the return value
of a function is OK, and the extra input can be tagged as UNUSED
getting rid of both warnings.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7860>
When the server doesn't support indirect contexts it will generate a
BadValue error, since the CreateContext request's isDirect field will
have specified an unsupported value of False. We attempt to verify that
context creation succeeded by asking whether the context's XID is direct
or not after we create it. Due to the details of XCB error handling, if
the context wasn't successfully created, the GLXBadContext error from
the GLXIsDirect request will get raised first, hiding the BadValue from
the application.
To fix this, we change the behavior of __glXIsDirect based on the
`error` outparameter. If it is NULL we still raise the error generated
from the GLXIsDirect request, but if it is non-NULL we now just inform
the caller that the request failed and silently eat the error. By doing
this the BadValue (or whatever else) from the CreateContext request will
bubble up to the application as expected.
This is admittedly a bit subtle but it's the simplest way to get to the
fix here. A better solution would be to convert all of CreateContext to
XCB, but XCB doesn't have protocol for GLX_SGIX_fbconfig yet so we'd
lose glXCreateContextWithConfigSGIX.
Fixes: mesa/mesa#3907
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7859>
The test "clc_compiler_test" is kinda nasty in packing too many things
into a single test, making it awkwardly long. We should really consider
splitting it up into multiple tests instead.
But right now, it's sometimes timing out on CI, which is bad, so here's
a quick band-aid to prevent this from happening.
The previous timeout of two minutes seems to not always be sufficient
under various loads, so let's add another minute just to be sure.
Here's an example of a failure with the current timeout:
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/5918980#L1589
Fixes: ff05da7f8d ("microsoft: Add CLC frontend and kernel/compute support to DXIL converter")
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7872>
We include git_sha1.h in clc_compiler.c, so we should also make sure we
depend on the header being generated in time. This fixes a spurious
build error when compiling with many cores, like we do on CI.
Fixes: ff05da7f8d ("microsoft: Add CLC frontend and kernel/compute support to DXIL converter")
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7869>
GitLab CI doesn't allow us to store artifacts from outside the
build-directory, so let's create an install-directory and install there
instead.
To do this properly, we need to expand a variable inside the
command-line, so we need to change to a double-quoted string.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7857>
This means the pass has to walk all the instructions but it was doing
that in a bunch of cases anyway when it didn't have a HALT_TARGET.
However, removing HALT_TARGET frees up the scheduler a bit because
HALT_TARGET is considered a scheduling barrier. The shader-db results
are kind-of a wash but we're about to add HALT_TARGET unconditionally so
we want to be able to get rid of it.
Shader-db results on Ice Lake:
total instructions in shared programs: 19935623 -> 19935623 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 976758472 -> 976766135 (<.01%)
cycles in affected programs: 11097707 -> 11105370 (0.07%)
helped: 1750
HURT: 875
helped stats (abs) min: 1 max: 866 x̄: 26.39 x̃: 4
helped stats (rel) min: <.01% max: 39.24% x̄: 1.25% x̃: 0.46%
HURT stats (abs) min: 1 max: 1678 x̄: 61.54 x̃: 10
HURT stats (rel) min: <.01% max: 65.69% x̄: 1.86% x̃: 0.42%
95% mean confidence interval for cycles value: -2.48 8.32
95% mean confidence interval for cycles %-change: -0.40% -0.03%
Inconclusive result (value mean confidence interval includes 0).
LOST: 62
GAINED: 46
All of the lost/gained programs are SIMD32 fragment shaders.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
invalidate_draw_sh_constants should invalidate only SGPRs.
invalidate_draw_constants invalidates SGPRs and NUM_INSTANCES.
u_blitter called invalidate_draw_sh_constants, which previously
invalidated NUM_INSTANCES as well. This commit fixes that.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
If you did:
si_pm4_set_reg(pm4, reg, val0);
si_pm4_cmd_add(pm4, val1);
si_pm4 set_reg(pm4, reg + 4, val1);
it wrote val0 to reg, val1 to reg + 4, and val2 to reg + 8.
This fixes it by clearing last_opcode in si_pm4_cmd_add, so that
si_pm4_set_reg doesn't try to combine set_reg calls across si_pm4_cmd_add.
Fixes: da78d50bc8 - radeonsi: make si_pm4_cmd_begin/end static and simplify all usages
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
The idea of having a single file containing the ToC is not really how
things are done in Sphinx, and kinda makes it harder to structure
documentation more naturally. This was just something I did to mirror
what we used to do for the old HTML-only version of the docs, to ease
the transition and to de-clutter index.rst.
Now that the transition is far behind us, and index.rst is much cleaner,
we can finally start inlining this.
In the long run, I expect most of these to be moved to separate "chapter
articles" that summarize what these topics are, and thus disappear from
here.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7709>
The rST code here is much more to the point and easy to read if we
define the links as external link-references instead of inlining them.
This will make the next few patches much easier to grok.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7709>
The features added in each major version is also unlikely the first things
someone wants to know about Mesa. So let's move this into the
versions.rst article.
This documentation is severely out of date anyway, and as it doesn't
seem like anyone is interested in documenting this any more, we should
probably consider driopping versions.rst entirely in the longer run.
But for now, this makes the front-page much more approachable.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7709>
It turns out, the load_ubo to load_ubo_vec4 implementation isn't quite
enough for us, for a few reasons:
1. We use a single array of uvec4s for our UBOs, and to handle 64-bit
values in UBOs, we need further lowering.
2. The whole vec4 stuff seems a bit hard to reconsile with glsl 4.3
packing as well as PackedDriverUniformStorage.
In addition to the above, this fixes several piglit tests that *aren't*
part of quick_gl, which is what I've been running. So this doesn't even
work correctly right now.
So let's go back to what we had before instead.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3643
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7858>
This used to not be a problem, because these mutexes were the first
members of this array, meaning that we ended up trying to lock/unlock
NULL mutexes. But this isn't guaranteed to be allowed, so we were
relying on luck here.
Recently, this changed. We introduced asserts for NULL-pointers, and
changed the behavior in a way that leads to crashes in release-builds.
This means we can't rely on luck any longer.
Fixes: e317103753 ("c11/threads: Remove Win32 null checks")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3903
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7853>
Most of these are adding 'static', for functions that are local
to a translation unit but weren't declared static.
There's one instance of a missing include for bringing the prototype
into the translation unit, one function missing a return type (default-int),
and one which added inline to avoid it being considered unused in some sources.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7780>
Move some stuff from d3d12_context.h to d3d12_compiler.h, and
fix d3d12_compiler.h to not include d3d12_context.h.
This serves two purposes:
1. Putting declarations and definitions where they really belong.
2. Making it so only C++ code needs d3d12.h simplifies the helpers
we need to add to add to support d3d12.h for Linux.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7780>
This wasn't implemented yet, because we hadn't encountered it yet. But
now it seems we can trigger this, thanks to the nv_copy_depth_to_color
piglit tests.
This makes the test go from crash to fail, which isn't perfect, but it's
better than nothing.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7855>
I don't see why this is needed and it's not used anywhere.
As long as apps don't call glDeleteTextures, nothing will release them.
And even if they do, we don't use the saved textures anywhere.
Also, BindTexture will fail for deleted textures anyway, so they can't be
popped. The existing code already binds the Name that was saved, not
the texture object that was saved.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6946>
The old path copies state parameters into the parameter list, and then
the driver copies them into a buffer.
The optional new path loads state parameters into a buffer directly.
This increases performance by 5% in one subtest of viewperf.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6946>
Disabled because of CI failures.
Instead of separate state vars for each row and invoking fetch_state 4x:
state.matrix.modelview.row[0]
state.matrix.modelview.row[1]
state.matrix.modelview.row[2]
state.matrix.modelview.row[3]
The rows are now merged and fetch_state is invoked once:
state.matrix.modelview.row[0..3]
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6946>
This moves state vars to the end of the parameter list, so that state vars
can be loaded directly into a buffer instead of loaded into the parameter list.
Also, state vars don't need to be searched in the parameter list anymore,
because we will know their index range. (this will make gallium faster)
This commit just wraps a for loop around the existing code.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6946>
This moves state vars to the end of the parameter list, so that state vars
can be loaded directly into a buffer instead of loaded into the parameter list.
Also, state vars don't need to be searched in the parameter list anymore,
because we will know their index range. (this will make gallium faster)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6946>
Viewperf does a lot of redundant uniform updates - 60-80% in some tests.
Those are sometimes the only state changes between draw calls.
This improves performance by 33% in one viewperf subtest.
If you are worried about CPU overhead in the non-redundant case,
glthread is the solution.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6946>
The TFU path only activates for blits that are really copies
(no linear filtering, no scaling, same pixel format, etc.), and
we do it slice by slice, so we can easily handle mirroring of the
Z coordinate for 3D images by reversing the order of the layers
as we copy them.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7845>
Same as with other TFU paths, we only handle exact copies without
conversion, so we can rewrite the format to use a compatible TFU format
based on its texel size, which allows us to use this path with more
formats.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7845>
v2: Fixup comment about bits in nir_intrinsics.py
v3: Use varying for primitive shading rate builtin (samuel)
v4: Reoder switch alphabetically
Make divergence of frag_shading_rate an option
v5: Remove stage check for frag_shading_rate in divergence (Samuel)
v6: s/frag_shading_rate_per_subgroup/single_frag_shading_rate_per_subgroup/ (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7795>
In commit 00b28a50b2, Marek extended
a number of optimizations that had been 32-bit specific to work on
other bit-sizes.
Most optimizations preserve the data type across the transformation.
In other words, an optimization which generates e.g. fp64 operations
only does so when the source expression also contains fp64 operations.
These transformations are fine with respect to lowering, because we
will lower away all expressions that would trigger the search portion
of the expression, and so we'd never apply those rules.
However, a few of the rules create new operations that run afoul of
lowering passes. For example,
('bcsel', a, 1.0, 0.0) => ('b2f', a)
where the result is a double would simply be a selection between two
different 64-bit constants. The replacement expression, on the other
hand, involves a nir_op_b2f64 ALU operation. If we're run after
nir_lower_doubles, then it may not be legal to generate such an
expression anymore (at least without running lowering again, which we
don't do today).
Regressions due to this are blocking the 20.3 release, so for now, we
take the easy route and simply disallow those few rules when doing full
softfp64 lowering, which fixes the immediate problem. But it doesn't
solve the long-term problem in an extensible manner.
In the future, we may want to add a `lowered_alu_ops` bitfield to the
NIR shader, and as lowering passes are run, mark them as taboo. Then,
we could have each algebraic transformation track which operations it
creates in the replacement expression. With both of those in place,
nir_replace_instr could compare the transformation's list of ALU ops
against `lowered_alu_ops` and implicitly skip rules that generate
forbidden ALU operations.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3504
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7841>
Environment variables aren't the easiest thing to use on android. So
add a fallback to android's property mechanism for os_get_option().
This is slightly complicated by the fact that the assumption that the
return value of os_get_option() need not be freed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7763>
We will be using the new parameter in an upcomig change. The TFU
unit has a limited list of supported formats, so for cases where
we don't want to do any pixel format conversions and we are just
copying raw image data, we want to be able to rewrite the underlying
image format to use a compatible format. We will be using this
in a follow-up patch that adds a TFU path for image copies. For this
purpose, we also, move the function definition up in the file
so it is available for that upcoming TFU path without having to
put its prototype earlier in the file.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7809>
Without this, the test-docs job could end up waiting for manual action
after the sanity job failed, which prevented the pipeline as a whole
from having failed status.
(This means the test-docs job will no longer exist in one corner case
where it did before, when pushing directly to a non-master branch of
the main repository. That should be fine, since the docs are only
deployed from master.)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7676>
This is possible now that it uses the external ci-fairy docker image.
This allows dropping the "check mr" job from needs: of other jobs, the
container stage jobs will only become available once the sanity stage
has passed.
This also allows simplifying the "check mr" job rules and script, since
the job only needs to exist in pre-merge pipelines for MRs anymore.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7676>
This is unreachable, and in release mode it should also indicated that
the function will not return something useful here. Also add a default
return value just in case a compiler doesn't support the "unreachable"
Thanks Dieter Nützel for pointing this error out.
Fixes: b6c17e2965621a46eb07ba2605d9f9e221a400b
r600/sfn: lower IO for FS inputs and handle interpolation accordingly
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7820>
MSVC C++ can't do designated initializers without /std:c++latest. These
helpers will likely be removed soon anyway, so just don't use the
intrinsic builders here.
This should also fix the GCC7 build, which doesn't implement non-trivial
designated initializers.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: c9bcad2573 ("nir: add generated intrinsic builders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7808>
When copying multiple regions that have the same image subresource we are
effectively copying various rects across the same layer range, so we can
batch together all the rects to copy for each layer in a single job.
This allows us to significantly reduce CPU overhead when recording the
command, as we need to produce less jobs and allocate less descriptor
sets. It also offers smaller gains in execution time due to the reduced
job count.
A stress test where we copy 10 subrects of an image in a loop 100 time,
choosing regions that will involve the texel buffer path, we get these
results:
| Recording Time | Execution Time |
----------|----------------|----------------|
master | 3.021s | 0.112s |
----------|----------------|----------------|
patch | 0.163s | 0.080s |
----------|----------------|----------------|
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7782>
This workaround fixes a hang while loading a renderdoc trace for me.
Since the workload does 1 mip per cmdbuffer it is quite hard to confirm
what exactly the conditions for the hang are but this is the most
restrictive set I found and it corresponds to a workaround in AMDVLK as
well.
CC: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7210>
Quoting the spec :
"When a pool is destroyed, all descriptor sets allocated from the
pool are implicitly freed and become invalid. Descriptor sets
allocated from a given pool do not need to be freed before
destroying that descriptor pool."
This implies we might leak nodes allocated in the vma object.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0a6d2593b8 ("anv: Allocate descriptor buffers from the BO cache")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7796>
It fixes the following valgrind issue:
==141996== Invalid read of size 4
==141996== at 0x61F8806: gl_nir_link_uniforms (gl_nir_link_uniforms.c:1788)
==141996== by 0x60F17AA: gl_nir_link_glsl (gl_nir_linker.c:672)
==141996== by 0x5C1AEDF: st_link_nir (st_glsl_to_nir.cpp:739)
==141996== by 0x5C15574: st_link_shader (st_glsl_to_ir.cpp:172)
==141996== by 0x5C673B0: _mesa_glsl_link_shader (ir_to_mesa.cpp:3117)
==141996== by 0x5E7B61C: link_program (shaderapi.c:1311)
==141996== by 0x5E7B61C: link_program_error (shaderapi.c:1419)
==141996== by 0x5E7CF8A: _mesa_LinkProgram (shaderapi.c:1911)
==141996== by 0x4923D13: stub_glLinkProgram (piglit-dispatch-gen.c:33956)
==141996== by 0x1142C0: link_and_use_shaders (shader_runner.c:1636)
==141996== by 0x1205A6: init_test (shader_runner.c:5347)
==141996== by 0x121555: piglit_init (shader_runner.c:5725)
==141996== by 0x4991C84: run_test (piglit_fbo_framework.c:50)
It can be reproduced on `iris` using the following piglit test:
instance-matching-shader-storage-blocks-align-qualifier-mismatch.shader_test
Closes: #3818
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 47c35823 ("glsl: fix up location setting for variables pointing to a UBO's base")
Signed-off-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7376>
This patch fixes these SCons build errors.
/usr/bin/ld: build/linux-x86_64-debug/gallium/auxiliary/libgallium.a(lp_bld_misc.os): in function `llvm::InitializeNativeTarget()':
llvm/Support/TargetSelect.h:118: undefined reference to `LLVMInitializeX86TargetInfo'
/usr/bin/ld: llvm/Support/TargetSelect.h:119: undefined reference to `LLVMInitializeX86Target'
/usr/bin/ld: llvm/Support/TargetSelect.h:120: undefined reference to `LLVMInitializeX86TargetMC'
/usr/bin/ld: build/linux-x86_64-debug/gallium/auxiliary/libgallium.a(lp_bld_misc.os): in function `llvm::InitializeNativeTargetAsmPrinter()':
llvm/Support/TargetSelect.h:132: undefined reference to `LLVMInitializeX86AsmPrinter'
/usr/bin/ld: build/linux-x86_64-debug/gallium/auxiliary/libgallium.a(lp_bld_misc.os): in function `llvm::InitializeNativeTargetDisassembler()':
llvm/Support/TargetSelect.h:156: undefined reference to `LLVMInitializeX86Disassembler'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7722>
with a sequence like this:
glClear(STENCIL)
glBeginTransformFeedback()
...
glEndTransformFeedback()
glClear(STENCIL)
The second clear sometimes may produce an unexpected result.
Calling si_flush_gfx_cs() when doing ngg -> legacy transition seems to be a
valid workaround (both for the synthetic reproducer and the real Blender bug).
Using flush flags or events (BOTTOM_OF_PIPE_TS, RESET_TO_LOWEST_VGT) didn't help.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2941
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7750>
Avoid small possibility of reading torn write on 32-bit platforms.
If frequency caching is desired, it's probably better to initialize from
C++ and extern "C" instead. It's not a tremendous optimization though.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7138>
The code works but is a bit fragile if we ever add a case that has a
less strict requirement (a smaller gen) than the case above. To avoid
having to reason about this, refactor code to use a variable to
indicate whether the SFID is supported or not.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7742>
They're not really "push" anymore but that's because there is no such
thing as push constants in bindless shaders on Intel. They should be
fast enough, though. There is some room for debate here as to whether
we want to do the pull in NIR or push it into the back-end. The
advantage of doing it in the back-end is that it'd be easier to use
MOV_INDIRECT for indirect push constant access rather than falling back
to a dataport message.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
For triangle geometry, the hit attributes are always two floats which
contain the barycentric coordinates of the hit. For procedural
geometry, they're an arbitrary blob of data passed from the intersection
shader to the hit shaders. In our implementation, we stash that data
right after the HW RayQuery in the ray stack.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Unlike graphics and compute pipelines, Vulkan ray-tracing pipelines do
not have a single entrypoint. Instead, the raygen shader is specified
as a one-element shader binding table in the vkCmdTraceRay call. This
means that raygen shaders have to be bindless shaders just like any
other ray tracing shader. To launch them, we have a tiny compute shader
that acts as a trampoline and sets up the hotzone and uses btd_spawn to
fire off the raygen shader.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Both traceRay() and executeCallable() take a payload parameter which
gets passed from the caller to the callee and which the callee can write
to pass data back to the caller. We implement these by passing a
pointer to the data structure in the callee to the caller as the second
QWord on its stack. Coming out of spirv_to_nir, the incoming call
payloads get the nir_var_shader_call_data variable mode allowing us to
easily identify them. Outgoing call payloads get assigned the
nir_var_shader_temp mode and will have been turned into function_temp by
nir_lower_global_vars_to_local. All we have to do is crawl the shader
looking for references to the nir_var_shader_call_data variable and
rewrite those to use the passed in pointer. nir_lower_explicit_io will
do the rest for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These are required for ray-tracing. There are many cases where the
ray-tracing hardware may decide to execute some but not all of our
shaders. In these cases, it needs a shader to execute at the end which
will pop the stack back to the shader which called traceRay().
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Each callable ray-tracing shader shader stage has to perform a return
operation at the end. In the case of raygen shaders, it retires the
bindless thread because the raygen shader is always the root of the call
tree. In the case of any-hit shaders, the default action is accep the
hit. For callable, miss, and closest-hit shaders, it does a return
operation. The assumption is that the calling shader has placed a
BINDLESS_SHADER_RECORD address for the return in the first QWord of the
callee's scratch space. The return operation simply loads this value
and calls a btd_spawn intrinsic to jump to it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
In ray-tracing shader stages, we have a real call stack and so we can't
use the normal scratch mechanism. Instead, the invocation's stack lives
in a memory region of the RT scratch buffer that sits after the HW ray
stacks. We handle this by asking nir_lower_io to lower local variables
to 64-bit global memory access. Unlike nir_lower_io for 32-bit offset
scratch, when 64-bit global access is requested, nir_lower_io generates
an address calculation which starts from a load_scratch_base_ptr. We
then lower this intrinsic to the appropriate address calculation in
brw_nir_lower_rt_intrinsics.
When a COMPUTE_WALKER command is sent to the hardware with the BTD Mode
bit set to true, the hardware generates a set of stack IDs, one for each
invocation. These then get passed along from one shader invocation to
the next as we trace the ray. We can use those stack IDs to figure out
which stack our invocation needs to access. Because we may not be the
first shader in the stack, there's a per-stack offset that gets stored
in the "hotzone".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These will eventually contain per-stage lowering for various ray-tracing
things. This is separate from brw_nir_lower_rt_intrinsics because, for
reasons that will become apparent later, brw_nir_lower_rt_intrinsics has
to be run very late in the compile process, right before brw_compile_bs.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The new intrinsics we added for doing address calculations are all
things we fetch from the RT_DISPATCH_GLOBALS struct. We could emit an
RT_DISPATCH_GLOBALS load at every point we want it and trust NIR to CSE
it for us but it's easier to use intermediate intrinsics.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The Intel bindless thread dispatch model is very simple. When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs. These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned. Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke. When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers. The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The cloned version is the one that has updated start and end bits
fields. We're about to start passing those through to a new
__gen_address function and we need the correct start/end in order to do
that reliably.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
This is the first of the HW data structures added for ray-tracing.
These are added to their own file because it's not really associated
with any hardware we've enabled in Mesa just yet. Eventually, these
will likely get folded into the appropriate genX.xml file as they are
hardware data structures and needed to be tracked as such.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Halt is like a return for the entire shader or exit() if you prefer to
think of it that way. Once an invocation hits a halt, it's 100% dead.
Any writes to output variables which happened before the halt do,
however, still apply.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
If valgrind is installed, these components need to find valgrind.h.
Fixes: 53f7d539cd ("util: Add helgrind support for simple_mtx")
Closes: #3876
Acked-by: Rob Clark <robclark@freedesktop.org>
A fairly common pattern for debug envvars is something like:
static int should_print = -1;
if (should_print < 0)
should_print = env_var_as_unsigned("NIR_PRINT", 0);
Unfortunately helgrind doesn't realize that we expect to always get the
same return value, so we don't actually care about the race condition
here.
Add a helper get_once() and do_once macros, with extra locking to make
helgrind/drd happy. Note that other than the nir usages (which are
limited to debug builds), other usages are not in hot-paths.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7644>
The old NV version (and the provisional KHR version) specified the data
payload via an integer location. This was quite annoying for the parser
and potentially error-prone. The final KHR version of the SPIR-V
ray-tracing spec replaces these integers with actual pointers. We don't
really need to implement the NV versions but we have the code and
someone might want to parse some NV ray-tracing shaders.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7734>
For these intrinsics, the NV version and the provisional KHR version
have the same enum value and semantics but the final KHR version is
different on both counts. Re-name them to NV before we update the
header so the header update isn't a functional change.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7734>
Vulkan can't scale while resolving using vkCmdResolveImage. For this we
need to use util_blitter.
The reason this wasn't a problem in the past, was that glBlitFramebuffer
always set pipe_blit_info::render_condition_enable, and we always used
that to bail out to util_blitter. When the latter changed, this broke.
Fixes: 19906022e2 ("zink: more accurately track supported blits")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7745>
This optimizes v_add(c, v_lshlrev(a, b)) to v_mad_u32_u24(b, 1<<a, c)
if 'a' is a constant (less than or equal to 6 to avoid creating
literals) and 'b' known to be a 16-bit or a 24-bit value.
On GFX9+, this is already optimized to v_lshl_add_u32.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7673>
The warning is a bit misleading about where it shows up.. it complains
about the shader key, due to shader key being calculated from (among
other things) stream_output state that had some uninitialized garbage
in the padding.
==84572== Uninitialised byte(s) found during client check request
==84572== at 0x60548E8: blob_write_bytes (blob.c:163)
==84572== by 0x6534EF7: compute_variant_key (ir3_disk_cache.c:111)
==84572== by 0x6535143: ir3_disk_cache_retrieve (ir3_disk_cache.c:171)
==84572== by 0x654D82F: create_variant (ir3_shader.c:251)
==84572== by 0x654DA2B: ir3_shader_get_variant (ir3_shader.c:301)
==84572== by 0x645B2CB: ir3_shader_variant (ir3_gallium.c:113)
==84572== by 0x645B7EB: ir3_shader_create (ir3_gallium.c:219)
==84572== by 0x645BAA7: ir3_shader_state_create (ir3_gallium.c:285)
==84572== by 0x6506003: fd6_shader_state_create (fd6_program.c:1136)
==84572== by 0x64676C7: assemble_tgsi (freedreno_program.c:105)
==84572== by 0x64679DF: fd_prog_init (freedreno_program.c:188)
==84572== by 0x6506157: fd6_prog_init (fd6_program.c:1172)
==84572== Address 0xeff1588 is 424 bytes inside a block of size 480 alloc'd
==84572== at 0x4866FA4: malloc (vg_replace_malloc.c:307)
==84572== by 0x605D46F: ralloc_size (ralloc.c:133)
==84572== by 0x605D52F: rzalloc_size (ralloc.c:166)
==84572== by 0x654DFF7: ir3_shader_from_nir (ir3_shader.c:473)
==84572== by 0x645B6C7: ir3_shader_create (ir3_gallium.c:182)
==84572== by 0x645BAA7: ir3_shader_state_create (ir3_gallium.c:285)
==84572== by 0x6506003: fd6_shader_state_create (fd6_program.c:1136)
==84572== by 0x64676C7: assemble_tgsi (freedreno_program.c:105)
==84572== by 0x64679DF: fd_prog_init (freedreno_program.c:188)
==84572== by 0x6506157: fd6_prog_init (fd6_program.c:1172)
==84572== by 0x64CB36F: fd6_context_create (fd6_context.c:154)
==84572== by 0x59D93BB: st_api_create_context (st_manager.c:917)
Somehow this was showing up with dEQP-GLES31.info.vendor but not other
things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7717>
1) Propagate the change to only emit markers in debug builds (and add
the WFI that ensures they are synchronized with GPU. We could
consider dropping them entirely, since the GPU devcoredump support
in newer kernels is more useful. But it is still an occasionally
useful fallback.
2) Use p_atomic_inc_return() to placate helgrind
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7717>
Assigning an array reg removes IR3_REG_ARRAY, which means that
definitions and uses can't be tracked back to the array register's name
and liveness for the components of the array aren't correctly
calculated. To fix this we delay assigning array registers until the
scalar pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7711>
Caused to return early wrongly on CmdPushConstants with some tests
using several calls to that method. As we are here we are also
replacing the (void *) casting at the memcpy below.
Fixes: e1c8041cde ("v3dv: try harder to skip emission of redundant state")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7718>
It can only be done if a TCS input is accessed without indirect indexing and
with gl_InvocationID as the vertex index, and the number of VS and TCS threads
is the same.
This eliminates LDS stores and loads for VS->TCS IO, reducing shader lifetime
and LDS traffic.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7623>
Previously it was 16 and bigger patches would always trim the patch count
needlessly.
There are 2 variables to consider:
- lane occupancy
- LDS usage (limiting wave occupancy)
If LDS size is 32 KB (max limit per CU) for 3 waves and we can't maximize
occupancy, it's better to leave some lanes unoccupied because using 2
waves would decrease the LDS size to 21 KB, which is not enough to fit
another workgroup on the CU.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7623>
Fix defects reported by Coverity Scan.
uninit_member: Non-static class member tlsSize is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member driver is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member driver_out is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7703>
This removes a lot of locking from the driver thread.
If multiple contexts sharing buffers submit GL calls from multiple threads,
they will be serialized by this mutex. I can add a driconf option to turn
off this optimization if needed, but I currently don't anticipate to see
GL apps that use multiple shared contexts in different threads
simultaneously.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7053>
Issue: Encoding parameters not updated after changing FrameRate
Root Cause:
In rvce_begin_frame, need_rate_control was enabled if the target_bitrate,
quant_i_frames, quant_p_frames, quant_b_frames or rate_ctrl_method
changes. Due to this the rate_control() was not updating the encoder
parameters with new framerate, peak_bits_per_picture_integer and
avg_target_bits_per_picture
Fix:
Added the condition where we will check if there is a change in
other parameters and enable need_rate_control. Eventually updating the
encoder parameters with new framerate and bitrate.
Signed-off-by: Krunal Patel <krunalkumarmukeshkumar.patel@amd.corp-partner.google.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7696>
Using more blend targets than specified by maxFragmentDualSrcAttachments
is invalid per the Vulkan spec.
I'm usually not a fan to workaround game bugs inside the driver but
it's really easy for us to ignore MRT1+ in the driver and that
prevents wrong behaviour.
Cc: 20.2, 20.3
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7684>
Writing uniform streams is performance sensitive so we should try our
best to avoid writing new uniforms if they have not changed. Particularly,
if only the vertex buffers have changed, we should not write new uniforms.
This improves performance in vkQuake2 by about 11.15%.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7683>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member alu_temp_gprs is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member max_fetch is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member has_trans is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member vtx_src_num is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member num_slots is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member uses_mova_gpr is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member r6xx_gpr_index_workaround is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member stack_workaround_8xx is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member stack_workaround_9xx is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member wavefront_size is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member stack_entry_size is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7413>
Fix defect reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member chip_class is not initialized in
this constructor nor in any functions that it calls.
uninit_member: Non-static class member scratch_size is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7648>
Allocate enough space and then program the registers correctly. We
currently allocate scratch memory as part of the pipeline, because the
alternative of trying to share it across pipelines is a bit trickier due
to the need for the configs to exactly match whenever we reuse the same
buffer for different shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7386>
We have to allocate backing storage big enough to hold all the private
memory for all threads that can possibly be in flight, which means that
we have to start filling in some more model-specific information as the
sizes will be different for models with different core counts/ALU
counts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7386>
It seems the src_offset and dst_offset are unused for these, and the
offset is expected to be an immediate register. Also we forgot to add a
dummy dst for the store instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7386>
They seem to be broadly similar to the a3xx ones, albeit with some
things shuffled around and with different units, and the extra layout
mode bits.
We also document the FIRST_EXEC_OFFSET registers, so that we can start
properly setting them all to 0 in freedreno and turnip in later commits.
I discovered the compute one when playing with function support in the
blob CL driver, and added the other registers via analogy (the blob
Vulkan driver sets FIRST_EXEC_OFFSET and the shader VA together in one
packet for all stages, so it seems to really be in the same place for
all stages).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7386>
In bison's commit 72c9fa4510eb (skeletons: use "end of file" instead of
"$end") in bison-3.6, '$end' was changed to 'end of file' in error
messages. Since our glcpp test cases contain the expected output text,
they rely on the particular messages printed by bison. The test case
084-unbalanced-parentheses fails when Mesa is built with bison-3.6 due
to this change.
To allow the test to pass on all supported versions of bison, we:
1. Change '$end' -> 'end of file' in the .expected file, and
2. Normalize the error generated by the test case with the same
replacement
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3181
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7659>
strstr returns a pointer to the needle sub-string within the haystack
string if the latter contains the former, or NULL otherwise. So this
essentially always set info->is_pro_graphics = true, since probably no
marketing name ever contains all of these sub-strings.
Fixes: b635dff256 "ac: fix detection of Pro graphics"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7675>
We are already ensuring that we only copy the appropriate pixel
rect via the scissor and viewport state, so there is no need to
do this check in the shader.
Using a stress test with 100 buffer to image copies of a single
layered image with 10 miplevels recorded into a command buffer and
measuring the time it gets to execute the command buffer we get
these results:
| Execution Time |
----------|----------------|
master | 0.142s |
----------|----------------|
patch | 0.071s |
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7671>
Icelake's sampler message header introduces a field in m0.3 bit 0
which controls whether the sampler state pointer should be relative
to bindless sampler state base address or dynamic state base address.
g0.3 bit 0 is part of the per-thread scratch space field. On older
hardware, we were able to copy that along because the sampler ignored
bits 4:0. Now, however, we need to mask them out.
Fixes various textureGatherOffsets piglit tests when forcing the FS
to run with 2048 bytes of per-thread scratch space (which is a
per-thread scratch space encoding of 1, meaning bit 0 will be set).
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6735>
In our source languages, interpolateAtOffset() takes a floating point
offset in the range [-0.5, +0.5]. However, the hardware takes integer
valued offsets in the range [-8, 7], in units of 1/16th of a pixel.
So, we need to multiply and clamp the coordinates. We were doing this
in the FS backend, but with the advent of IBC, I'd like to avoid doing
it twice. This patch instead moves the lowering to NIR so we can reuse
it across both backends.
v2: Use nir_shader_instructions_pass (suggested by Eric Anholt).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6193>
This function has a number of problems:
1. It performs 24-bit quantization on a format which shouldn't be
quantized at all (PIPE_FORMAT_Z32_FLOAT_S8X24_UINT).
2. The algorithm seems to create a different pixel than HW would in the
absence of this SW conversion. This can cause issues with depth
testing.
Instead of adding more code to deal with these issues, delete the
quantization code.
This code originated from i965
(0ae9ce0f29) and helped to avoid a
regression in Lightsmark 2008. This change continues to avoid that
regression because any new clear value is now casted from double to
float before checking if the resource's clear value has changed.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3783
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7665>
These functions aren't implemented using plain functions in the MSVC
runtime, so trying to take the function pointers directly cause
compilation errors.
Instead, let's call them from a wrapper-function, and use a
pre-processor define to replace the usage in this case. This makes these
build fine on MSVC.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7549>
We still need a fallback for the case where the application makes
WSI allocations without a surface (Zink), but for the general case,
this is the right way to do this, as it would ensure that we use
the same display connection that was used to create the surface.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7631>
This patch fixes this build error.
../src/microsoft/compiler/dxil_nir.c: In function 'extract_comps_from_vec32':
../src/microsoft/compiler/dxil_nir.c:52:10: error: a label can only be part of a statement and a declaration is not a statement
52 | unsigned dst_offs = i * comps_per32b;
| ^~~~~~~~
Fixes: b9c61379ab ("microsoft/compiler: translate nir to dxil")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7563>
This patch fixes this build error.
In file included from ../src/microsoft/compiler/dxil_enums.c:24:
../src/microsoft/compiler/dxil_enums.h:323:58: warning: 'struct glsl_type' declared inside parameter list will not be visible outside of this definition or declaration
323 | enum dxil_component_type dxil_get_comp_type(const struct glsl_type *type);
| ^~~~~~~~~
../src/microsoft/compiler/dxil_enums.h:325:71: warning: 'struct glsl_type' declared inside parameter list will not be visible outside of this definition or declaration
325 | enum dxil_prog_sig_comp_type dxil_get_prog_sig_comp_type(const struct glsl_type *type);
| ^~~~~~~~~
../src/microsoft/compiler/dxil_enums.h:327:61: warning: 'struct glsl_type' declared inside parameter list will not be visible outside of this definition or declaration
327 | enum dxil_resource_kind dxil_get_resource_kind(const struct glsl_type *type);
| ^~~~~~~~~
../src/microsoft/compiler/dxil_enums.c:31:30: error: conflicting types for 'dxil_get_prog_sig_comp_type'
31 | enum dxil_prog_sig_comp_type dxil_get_prog_sig_comp_type(const struct glsl_type *type)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/microsoft/compiler/dxil_enums.c:24:
../src/microsoft/compiler/dxil_enums.h:325:30: note: previous declaration of 'dxil_get_prog_sig_comp_type' was here
325 | enum dxil_prog_sig_comp_type dxil_get_prog_sig_comp_type(const struct glsl_type *type);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: b9c61379ab ("microsoft/compiler: translate nir to dxil")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7562>
This patch fixes this build error.
In file included from ../src/microsoft/compiler/dxil_container.c:24:
../src/microsoft/compiler/dxil_container.h:98:42: warning: ‘struct dxil_features’ declared inside parameter list will not be visible outside of this definition or declaration
98 | const struct dxil_features *features);
| ^~~~~~~~~~~~~
../src/microsoft/compiler/dxil_container.c:72:1: error: conflicting types for ‘dxil_container_add_features’
72 | dxil_container_add_features(struct dxil_container *c,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/microsoft/compiler/dxil_container.c:24:
../src/microsoft/compiler/dxil_container.h:97:1: note: previous declaration of ‘dxil_container_add_features’ was here
97 | dxil_container_add_features(struct dxil_container *c,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: b9c61379ab ("microsoft/compiler: translate nir to dxil")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7560>
Fix defect reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member nodes is not initialized in this
constructor nor in any functions that it calls.
uninit_member: Non-static class member nodeCount is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7522>
Used as reference Hyujun's commit
5d3fdbc52b, that does the same for
turnip.
This commit also replaces in several cases alloc for zalloc, and adds
checks on more Destroy methods if the object to be free is NULL or
not. Most of them were needed to avoid crashes/weird behaviour due
trying to use un-initialized data. Note that now that vk_object_free
iterates over a array, making it more against un-initialized or just
NULL data.
Additionally, using zalloc we can also remove some memset to 0. In
fact we needed to remove them, as if not, they would override the
vk_object_base object to 0 (the alternative would me doing a memset
computing a pointer offset, but that's is not needed as we can just
use zalloc).
v2:
* Call memset(0) on reused descriptor sets when calling
ResetDescriptorPool, not when reallocating them (Iago)
* Add null check when calling DestroyImageView (detected by a full CTS run)
v3: Fixed rebase conflicts after last meta copy/clear changes
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7627>
Simple refactor. No intended change in behavior.
Replace each derivation of aux address with anv_image_get_aux_addr().
The function will soon do more in support of
VK_EXT_image_drm_format_modifier, where the image bo and aux bo may be
disjoint.
v2:
- Replace param 'aspect' with 'plane'.
v3:
- Workaround for stencil ccs. If no aux surface, then return
ANV_NULL_ADDRESS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Pre-patch, we checked the offsets once per aspect after adding all
surfaces for the aspect. The additional checks will make it easier to
diagnose layout bugs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Pure refactor. No intended change in behavior.
This makes the code infinitely easier to understand. And it uncovers
a potential bug (marked with XXX comment).
v2: Fix narrowing conversions on 32-bit arch. s/size_t/uintmax_t/.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Months ago, make_surface() added *all* surfaces required for the given
aspect. It was a monster monolithic function, and difficult to reason
about its correctness. In commit c652ff8c (2020-03-06), I split the code
for aux surfaces into its own function, add_aux_surface_if_supported().
This patch continues the splitting, therefore making bugs easier to
identify.
Code changes:
- Move the code that adds the shadow surface from make_surface() to
a new function add_shadow_surface(), called from
add_all_surfaces().
- Move the call to add_aux_surface_if_supported() from make_surface()
to add_all_surfaces().
- To preserve correctness of the assertions on image layout in
make_surface(), move them to the loop in add_all_surfaces() after
all the aspect's surfaces have been added.
- Rename make_surface() to add_primary_surface() because now that's
what it does.
Pure refactor, no intended change in behavior.
v2:
- Rebase onto "anv: Fix isl_surf_usage_flags for stencil images".
- Sanitize the image's extent earlier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
In anv_get_image_format_properties(), the special-case code for
VK_IMAGE_TILING_DRM_FORMAT_MODIFIER_EXT is tiny. It is mostly a detached
'case' in the 'switch' block for VkImageType. So move the special-case
code to immediately follow the 'switch' block.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The check in anv_get_image_format_properties() is already handled in
anv_get_image_format_features().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When filling VkImageFormatProperties, anv_get_image_format_properties()
checks the requested VkImageUsageFlags and VkImageCreateFlags against
the VkFormatFeatureFlags available to the queried VkFormat. However, we
neglected to consider if any formats given in
VkImageFormatListCreateInfo
further restricted the available VkFormatFeatureFlags.
The image view formats are more likely to introduce additional
restrictions when DRM format modifiers are present.
v2:
- Do not drop anv_formats_ccs_e_compatible().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
If anv_get_image_format_features reports that the inputs are
unsupported, fail immediately.
Without the early fail, I have less confidence in the function's
correctness when a DRM format modifier is present.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The code in anv_get_image_format_properties() that set sampleCounts
appears correct, but weirdly inconsistent. Clean the code to
consistently set sampleCounts in the same location as
maxExtent/maxMipLevels/maxArraySize.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rename it to get_drm_format_modifier_properties_list() because it is now
independent of WSI.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In vkGetPhysicalDeviceImageFormatProperties2, we advertised support for
VK_IMAGE_TILING_LINEAR and VK_IMAGE_TILING_OPTIMAL for all memory
handles.
However, when importing or exporting an image, there must exist a method
that enables the app and driver to agree on the image's memory layout.
If no method exists, then we should reject image creation.
v2:
- Reduce copy-paste for Lionel.
v3:
- Treat tiling LINEAR and DRM_FORMAT_MODIFIER as identical when
determing compatible memory handles.
- Improve comments.
v4:
- Remove DMA_BUF from opaque_fd_only_props.
v5:
- Minor changes to code style for `if`. (for jekstrand)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v4)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)
The code asserted that we supported no more than 4 formats with
modifiers: /VK_FORMAT_B8G8R8(A8)?_(SRGB|UNORM)/.
Strangely, 2 of the 4 were non-power-of-two formats, which were rejected
elsewhere.
The assertion's comment suggested that we use a hard-coded list of
formats because the driver was not yet able to determine if a given
format was compatible with a given modifier. Therefore, the list only
contained formats that were compatible with *all* modifiers. That code
deficiency no longer exists: anv_get_image_format_features() can check
format/modifier compatibility.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Refactor in get_wsi_format_modifier_properties_list().
Instead of iterating over a function-local hard-coded list, iterate over
all modifiers in isl_drm.c.
This will improve agreement in behavior between
VkDrmFormatModifierPropertiesListEXT
VkPhysicalDeviceImageDrmFormatModifierInfoEXT.
The future disagreement this patch attempts to prevent is the
combination of:
a. VkDrmFormatModifierPropertiesListEXT neglects to return a valid
modifier because its hard-coded list of modifiers drifts
out-of-sync with hard-coded lists elsewhere in the code. (Already
today, the list in get_wsi_format_modifier_properties_list() does
not match the list in isl_drm.c; though, this has produced no bug
yet).
b. vkGetPhysicalDeviceImageFormatProperties2 accepts, via
VkPhysicalDeviceImageDrmFormatModifierInfoEXT, the modifier
overlooked in (a), because it does not use the same hard-coded
list in get_wsi_format_modifier_properties_list(). (Recall that
the spec requires vkGetPhysicalDeviceImageFormatProperties2 to
correctly accept/reject any int that the app provides, even when
the int is an invalid modifier).
c. The Bug. The driver told the app in (b) that it can legally
create an image with format+modifier, but the app cannot query
the VkFormatFeatureFlags of the format+modifier due to (a).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This allows Vulkan and GL to iterate over the full list of modifiers
instead of hard-coding in various places the "same" list as isl.
(Anvil's list has already diverged from isl's list. It omits Gen12
modifiers).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fill VkDrmFormatModifierPropertiesEXT::drmFormatModifierTilingFeatures
with anv_get_image_format_features().
anv_formats.c:get_wsi_format_modifier_properties_list() incorrectly left
it uninitialized.
v2: Increment drmFormatModifierPlaneCount if modifier support aux.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Because anv_get_image_format_features() now understands modifiers, also
relocate most of the modifier compatibility checks from
anv_get_format_plane() into anv_get_image_format_features() in order to
avoid duplication.
The new signature forces some code movement in
anv_get_image_format_properties().
v2:
- Reject VK_FORMAT_B4G4R4A4_UNORM_PACK16 with modifiers on HSW.
v3:
- Revert the v2 change.
- Query isl_format_layout instead of pipe_format. (for jekstrand)
- Drop misguided comments. (for jekstrand)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
If each format channel has the same base type (such unorm), then that
is the format's "uniform channel type".
Calculating the field at buildtime is probably better than looping over
all channels at runtime each time we wish to query it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Return the modifier's score, which indicates the driver's preference for the
modifier relative to others. A higher score is better. Zero means
unsupported.
Intended to assist selection of a modifier from an externally provided list,
such as VkImageDrmFormatModifierListCreateInfoEXT.
v2:
- Rename anv_drm_format_mod_score to isl_drm_modifier_get_score.
- Squash all incremental changes to anv_drm_format_mod_score.
v3:
- Drop redundant 'unlikely'. (for nchery)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
The code did not return error when VK_IMAGE_CREATE_DISJOINT_BIT was
incompatible with the other input params.
If the Vulkan spec forbids a set of input params for vkCreateImage,
but permits them for vkGetPhysicalDeviceImageFormatProperties2,
then vkGetPhysicalDeviceImageFormatProperties2 must reject those input
params with failure.
- v2: Clearer commit message.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The general layout can be used for transfers, so we need to make sure
the vulkan driver knows. This will help the driver know when it needs to
flush caches.
While we're at it, also add shader-read, which is another access we use.
We should stop using that one ASAP, but for now this seems like the
right thing to do.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7652>
This avoids redundant per-layer operations that are the same across
layers or that only need to do once. Namely:
- The sampler for the blit source is the same for all layers.
- The decision about whether we need to load TLB contents or not only
needs to be done once.
- Some command buffer state such as the pipeline, the viewport and the
scissor is the same for all layers and should only be bound once.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
This is much faster than the blit fallback (which requires to upload
the linear buffer to a tiled image) and the CPU path.
A simple stress test involving 100 buffer to image copies of a
single layer image with 10 mipmap levels provides the following
results:
Path | Recording Time | Execution Time |
-------------------------------------------------|
Texel Buffer | 2.954s | 0.137s |
-------------------------------------------------|
Blit | 10.732s | 0.148s |
-------------------------------------------------|
CPU | 0.002s | 1.453s |
-------------------------------------------------|
So generally speaking, this texel buffer copy path is the fastest
of the paths that can do partial copies, however, the CPU path might
provide better results in cases where command buffer recording is
important to overall performance. This is probably the reason why
the CPU path seems to provide slightly better results for vkQuake2.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
src0 and src1 were mixed leading to invalid varying indices. In order to
fix that properly, we first extend load_vary to pass the immediate index
through a dedicated field and add a special boolean. This way, we don't
have to make sure src0 always contains the index, and can instead match
the src numbering defined in ISA.xml.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
mir_estimate_pressure often underestimates the register pressure,
letting too many registers be used for uniforms, causing RA to fail.
Mitigate this by demoting some uniforms back to explicit loads to free
up work registers if register allocation fails.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7616>
We now have NIR opt_large_constants support in place, so we can flip the
switch and get better optimization before lowering to a constant buffer,
but also avoid having constant data mixed in with the shader's uniforms,
which should lower CPU overhead on affected shaders.
Only a few shaders are affected (<.01% impact across shader-db), but for
those the impact is pretty big:
instructions in affected programs: 748 -> 639 (-14.57%)
nops in affected programs: 364 -> 284 (-21.98%)
non-nops in affected programs: 384 -> 355 (-7.55%)
mov in affected programs: 47 -> 27 (-42.55%)
cov in affected programs: 9 -> 6 (-33.33%)
dwords in affected programs: 932 -> 836 (-10.30%)
full in affected programs: 13 -> 14 (7.69%)
constlen in affected programs: 140 -> 64 (-54.29%)
(ss) in affected programs: 14 -> 15 (7.14%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
Right now if the shader indirects on some large constant array, we see NIR
load_consts (usually from the const file) of its contents into general
registers, then indirection on the GPRs. This often results in register
allocation failures, as it's easy to go beyond the ~256 dwords of
registers per invocation.
By moving the large constants to a UBO, we can load an arbitrary number of
them. They also can be theoretically moved to the constant reg file (~2k
dwords), though you're unlikely to hit this path without an indirect load
on your large constant, and we don't yet let UBO indirect loads get moved
to constant regs.
This possibly won't work out right if we have 16-bit load_constants, but
without other MRs in flight we won't see 16-bit temps to be lowered to
this.
This allows 2 kerbal-space-program shaders to compile that previously
would fail, and fixes the new dEQP-VK and -GLES2 tests I wrote that
dynamically index a 40-element temporary array of float/vec2/vec3/vec4
with constant element initializers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2789
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
If you're loading a 32b word from the const file and doing a cov.u32u16
split to two 16bit values, we can't turn that into a reference of a 16-bit
float value directly from the constbuf, because the
CONSTANT_DEMOTION_ENABLE results in a f2f16 operation on the 32-bit value
that we didn't want.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
cffdump looks at the following 4 instructions to decide if the shader has
*really* ended, so if we pack data after that (such as turnip's next
stage's shader), it might decode instructions that aren't really part of
the shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
It's supposed to be ralloced -- there's not even a shader variant destroy
function for freeing, just ralloc_free() on the ir3_shader_variant or the
parent ir3_shader when you're done!
Fixes: f97acb4bb4 ("freedreno/ir3: disk-cache support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
Missing a copy of the pipe_sampler_state into the etna_sampler_state object
lead to the texture_use_int_filter() to always see a max_anisotropy of 0, so
the INT filter wasn't disabled when necessary. Also state emission should
never change the state objects, as this might also lead to stale information
being kept around the in the state object.
Fixes: 89a41dae77 (etnaviv: do not use int filter when
anisotropic filtering is used)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7638>
Screen is shared among contexts, other context might be already using
vtbl while another initializes it again.
==45872== Possible data race during write of size 8 at 0x5DDAE78 by thread #549
==45872== Locks held: 1, at address 0x5D1B6F8
==45872== at 0x6D66D91: gen9_init_state (iris_state.c:7816)
==45872== by 0x6BA0A31: iris_create_context (iris_context.c:342)
==45872== by 0x621F390: st_api_create_context (st_manager.c:917)
==45872== by 0x620E6F9: dri_create_context (dri_context.c:163)
==45872== by 0x6A40DB1: driCreateContextAttribs (dri_util.c:480)
==45872== by 0x540B963: dri2_create_context (egl_dri2.c:1583)
==45872== by 0x53FB84E: eglCreateContext (eglapi.c:821)
==45872==
==45872== This conflicts with a previous read of size 8 by thread #544
==45872== Locks held: 1, at address 0x5F6E0E0
==45872== at 0x6CB779E: blorp_alloc_binding_table (iris_blorp.c:167)
==45872== by 0x6CAEF70: blorp_emit_surface_states (blorp_genX_exec.h:1540)
==45872== by 0x6CB67F9: blorp_exec (blorp_genX_exec.h:2016)
==45872== by 0x6CB7AFE: iris_blorp_exec (iris_blorp.c:307)
==45872== by 0x70F5916: try_blorp_blit (blorp_blit.c:2145)
==45872== by 0x70F5FCA: do_blorp_blit (blorp_blit.c:2273)
==45872== by 0x70F778F: blorp_copy (blorp_blit.c:2803)
==45872== by 0x6BB9EB6: iris_copy_region (iris_blit.c:725)
v2: move as genX(init_screen_state) (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7544>
All other functions calling _eglLookupImage hold the display lock.
==16659== Possible data race during write of size 8 at 0x5D1BCF0 by thread #2668
==16659== Locks held: 1, at address 0x5D1B6F8
==16659== at 0x5405DDF: _eglLinkResource (egldisplay.c:454)
==16659== by 0x53F9189: _eglLinkImage (eglimage.h:138)
==16659== by 0x53FE2CA: _eglCreateImageCommon (eglapi.c:1740)
==16659== by 0x53FE39A: eglCreateImageKHR (eglapi.c:1751)
==16659==
==16659== This conflicts with a previous read of size 8 by thread #2664
==16659== Locks held: 1, at address 0x5308D00
==16659== at 0x5405C06: _eglCheckResource (egldisplay.c:387)
==16659== by 0x5408C92: _eglLookupImage (eglimage.h:162)
==16659== by 0x5409E96: dri2_lookup_egl_image (egl_dri2.c:688)
==16659== by 0x6210AAF: dri2_lookup_egl_image (dri_helpers.c:250)
==16659== by 0x6212843: dri_get_egl_image (dri_screen.c:470)
==16659== by 0x625F7CC: st_get_egl_image (st_cb_eglimage.c:152)
==16659== by 0x625FE7D: st_egl_image_target_texture_2d (st_cb_eglimage.c:354)
==16659== by 0x6501C05: egl_image_target_texture (teximage.c:3446)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7544>
In applications using clip planes, set_clip_state is expected to be
implemented in the backend. If it is not defined, it may cause the
application to segfault.
glClipPlane it is not part of GLES 2, so it is not trivial to reverse
engineer if something needs to be done in lima.
Other drivers just define a placeholder implementation for
set_clip_state, so for now let's just define one for lima too.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7088>
By default we are using 32bit output type for texture operations,
16bit for shadow.
With this commit we also use the precision info from the sampler (that
is assigned if SPIR-V uses RelaxedPrecision decorator), in order to
use 16bit.
This is a first step as only take into account the precision of the
deref_vars used on the texture operation.
But the decoration can be also applied to other cases, like the result
of the operation. That means that there are ways to infer that the
texture operation can operate at relaxed precision. Those cases would
be handled on following patches.
v2:
* Add directly the return_size on the descriptor_map, instead of
shadow/relaxed_precision.
* Check relaxed precision for images too (Iago)
* Handle the return size for the default sampler
v3:
* Handle different output size for the case of not having a sampler.
* Comment fixes (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>
Now that the v3d compiler has support for separated texture and
sampler indices, we can stop to combine them. Again, that's what
Vulkan allows after all.
As we are doing this we can't use anymore the texture format (coming
from the texture) to chose the return size (that is a sampling
parameter). We default for 32, and just go to 16 for shadow. We plan
to use SPIR-V RelaxedPrecision to use in more cases 16 bit. We would
do that on following patches.
v2 (from Iago feedback):
* Fix typos/bad grammar on comments.
* Move tex/sampler number assert to before the loop that fills
tex/sampler info.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>
So far the v3d compiler has them combined, as for OpenGL both are the
same. This change is intended to fit the v3d compiler better with
Vulkan, where they are separate concepts.
Note that NIR has them separate for a long time, both on nir_variable
and on some NIR lowerings.
v2: (from Iago feedback)
* Use key->num_tex/sampler_used to iterate through the array
* Fill up num_samplers_used on v3d, assert that is the same that
num_tex_used if possible.
v3: (Iago)
* Assert num_tex/samplers_used is smaller that tex/sampler array size.
v4: Update assert mentioned on v3 to use <= instead of < (detected by CI)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
squash! broadcom/compiler: separate texture/sampler info from v3d_key
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>
In preparation to the changes that would allow to not need them.
It is worth to note that it is likely (we have some ideas in mind)
that we would need to bring back pre-generate variants on the
future. The approach is slightly different on v3dv_pipeline vs
v3dv_cmd_buffer:
* v3dv_pipeline: even after the clean-up, we had code for all the
functions they have, even if they were doing less things
(specifically, a second shader variant), so they still make sense
on their own, and serve as template for adding support of multiple
pre-generated shader variants in the future.
* v3dv_cmd_buffer: as we really don't need to fill up the key with
some after-pipeline data, we would end with some functions empty
(specifically cmd_buffer_populate_v3d_key). Even as a placeholder,
that would be odd. Additionally the current code has a lot of
boilerplate code (functions to fill up vs, cs and fs keys are
basically the same), and we already have in mind refactor them. So
it would be better to remove all of them, instead of keeping
around some code we would not be happy with. If in the future we
pregenerate more that one variant, hopefully the new code to chose
between them would be better.
v2: clarify the commit message, and fix typos on the comments (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>
During a blorp_copy between two color surfaces, the source and
destination formats are re-interpreted to UINT (if possible) to avoid
losing bits.
If either surface has CCS_E, then extra steps are taken to support
fast-cleared blocks with this format re-interpretation. Each clear value
is packed in the original format, then unpacked in the new UINT format.
This is then placed into the surface state object for some platforms.
There are couple problems here:
1. This is only being done for CCS_E, but MCS also supports fast-clears.
2. These steps aren't enough for fast-clears on gen11. On gen11, the
clear color isn't part of the surface state object that BLORP
creates. Instead it's stored in a separate BO, that the surface state
object references. Since that BO doesn't get updated during
blorp_copy, the incorrect/unconverted clear color is used for the copy
operation.
I didn't measure any performance gain from this code, so this patch
simply disables the feature.
Makes i965 pass the nv_copy_image-simple piglit test on gen11.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5388>
During a blorp_copy between two color surfaces, the source and
destination formats are re-interpreted to UINT (if possible) to avoid
losing bits.
If either surface has CCS_E, then extra steps are taken to support
fast-cleared blocks with this format re-interpretation. Each clear value
is packed in the original format, then unpacked in the new UINT format.
This is then placed into the surface state object for some platforms.
There are couple problems here:
1. This is only being done for CCS_E, but MCS also supports fast-clears.
2. These steps aren't enough for fast-clears on gen11+. On gen11+, the
clear color isn't part of the surface state object that BLORP
creates. Instead it's stored in a separate BO, that the surface state
object references. Since that BO doesn't get updated during
blorp_copy, the incorrect/unconverted clear color is used for the copy
operation.
I didn't measure any performance gain from this code, so this patch
simply disables the feature.
Makes iris pass the nv_copy_image-simple piglit test on gen11+.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5388>
Until recently, the depth value from glClearBufferfv wasn't clamped.
Before then, this patch enabled the driver to fail the clearbuffer-depth
piglit test with INTEL_DEBUG=nofc. This is because convert_depth_value
relies on the assumption that the depth value is clamped.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7410>
OpenGL 3.0 spec, section 4.2.3 "Clearing the Buffers":
depth and stencil are the values to clear the depth and stencil
buffers to, respectively. Clamping and type conversion for
fixed-point depth buffers are performed in the same fashion as for
ClearDepth.
Enables iris to pass the clearbuffer-depth-stencil piglit test.
Cc: mesa-stable
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7410>
OpenGL 3.0 spec, section 4.2.3 "Clearing the Buffers":
If buffer is DEPTH, drawbuffer must be zero, and value points to the
single depth value to clear the depth buffer to. Clamping and type
conversion for fixed-point depth buffers are performed in the same
fashion as for ClearDepth.
Enables iris to pass the clearbuffer-depth piglit test.
v2. Add spec citation. (Eric Anholt)
v3. Don't clamp floating point formats. (Eric Anholt)
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7410>
Without this header-file, we can't build the driver. So let's verify
that it exists, and can be used by the C++ compiler.
This should make it a bit more clear what's wrong if someone attempts to
build this using MinGW or on Linux.
Fixes: 2ea15cd661 ("d3d12: introduce d3d12 gallium driver")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7575>
The first operand of v_bcnt should always be a VGPR because if it's
a SGPR, isel selects s_bcnt1 but I added a sanity check to prevent
any problems.
fossils-db (Vega10):
Totals from 23 (0.02% of 139517) affected shaders:
CodeSize: 106828 -> 106664 (-0.15%)
Instrs: 20242 -> 20201 (-0.20%)
Cycles: 213112 -> 211352 (-0.83%)
VMEM: 3200 -> 3184 (-0.50%)
SMEM: 928 -> 927 (-0.11%)
Helps Control, Assassins Creeds Origins and Youngblood.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7568>
Now that we introduced the generic glx_extension_override option,
we can remove the glx_disable_oml_sync_control,
glx_disable_sgi_video_sync, and glx_disable_ext_buffer_age ones.
It seems like the only user for them was the vmwgfx, and only for
Gnome and Compiz which are covered by the default mesa driconf. This
means that it is unlikely for a user to have these options set in
their local driconf file.
Suggested-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Martin Peres <martin.peres@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7252>
This adds support for multiple DRM planes for a single format plane
and uses that to enable DCC support with modifiers.
With the implicit flush patches we can also enable displayable DCC
both with and without DCC as the X server and compositors know not
to do frontbuffer rendering onto images with multiple DRM planes.
For now we require that the extra planes are essentially fixed though.
We require that the offset/stride are the same as ac_surface computes
and that all planes are in the same buffer. This is mainly for
simplicity and could be somewhat more relaxed in the future given
a strong usecase.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6176>
We do flushing on glFlush etc., so we don't need explicit flush,
but we still need to avoid frontbuffer rendering.
For modifiers there was logic put in apps that basically prevent
frontbuffer rendering if multipe planes are involved.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6176>
This primarily tests that:
- multiple GPUs with the same GPU modifier parameters result
in the same tiling layout.
- The size & alignment calculations don't change for a given
modifier & image parameters.
It does this primarily based on addrlib. Radeonsi has used addrlib
for the retiling of displayable DCC for a while already, so the
DCC tiling should be pretty reliable.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6176>
This adds modifiers for GFX9+ AMD GPUs.
As the modifiers need a lot of parameters I split things out in
getters and setters.
- Advantage: simplifies the code a lot
- Disadvantage: Makes it harder to check that you're setting all
the required fields.
The tiling modes seem to change every generatio, but the structure
of what each tiling mode is good for stays really similar. As such
the core of the modifier is
- the tiling mode
- a version. Not explicitly a GPU generation, but splitting out
a new set of tiling equations.
Sometimes one or two tiling modes stay the same and for those we
specify a canonical version.
Then we have a bunch of parameters on how the compression works.
Different HW units have different requirements for these and we
actually have some conflicts here.
e.g. the render backends need a specific alignment but the display
unit only works with unaligned compression surfaces. To work around
that we have a DCC_RETILE option where both an aligned and unaligned
compression surface are allocated and a writer has to sync the
aligned surface to the unaligned surface on handoff.
Finally there are some GPU parameters that participate in the tiling
equations. These are constant for each GPU on the rendering/texturing
side. The display unit is very flexible however and supports all
of them :|
Some estimates:
- Single GPU, render+texture: ~10 modifiers
- All possible configs in a gen, display: ~1000 modifiers
- Configs of actually existing GPUs in a gen: ~100 modifiers
For formats with a single plane everything gets put in a separate
DRM plane. However, this doesn't fit for some YUV formats, so if
the format has >1 plane, we let the driver pack the surfaces into
1 DRM plane per format plane.
This way we avoid X11 rendering onto the frontbuffer with DCC, but
still fit into 4 DRM planes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6176>
Some architectures like aarch64 and ppc64el have char = unisgned char.
This breaks meta equation generation for DCC coords, as addrlib tries
to filter all the Z bits > -1 which ends up being all the Z bits > 255.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7593>
This enables GL applications to be written without any involvement of
Xlib.
EGL X11 platform is actually already xcb-only underneath, so this commit
just add the necessary interface changes so eglDisplay can be created
from a xcb_connection_t.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6474>
So far, we have only been supporting X11, so we assumed that we were running
inside X11 and would always try to get an authenticated fd from Xorg during
device initialization. While this works for desktop Raspbian, it is not
really correct and it is not what we want to do when we start considering
other WSIs.
Initially, one could think we can still do this by guarding the WSI code
under the proper instance extension check. This, however, doesn't work
reliably, as the Vulkan loader can call vkEnumerateDevices without enabling
surface extensions on the instance, which then can lead to us not
initializing any display_fd and failing with VK_ERROR_INITIALIZATION_FAILED,
which is not correct, so while we can try to acquire the display_fd here,
it might not always work, and we should definitely not fail initialization
of the physical device for that.
Instead, with this change we move acquisition of display_fd to swapchain
creation time where required extensions need to be enabled in the instance.
This was also suggested by Daniel Stone during review of a work-in-progress
implementation for the Wayland WSI.
There is a special case to consider though: applications like Zink that
don't use Vulkan's swapchains at all but still allocate images that they
intend to use for WSI. We need to handle these by checking that we have
indeed acquired a display_fd before doing any memory allocation for WSI,
and acquiring one at that time if that's not the case.
This change also removes the render_fd and display_fd fields from the
logical device (which we were copying from the physical device), because
now there is no guarantee that we have acquired a display_fd at the
time we create a logical device. Instead, we now put a reference to the
physical device on the logical device from which we can access these.
Finally, this also fixes a regression introduced with VK_KHR_display, where
if that extension is enabled but we are running inside a compositor, we would
acquire a display_fd that is not authenticated and try to use that instead
of acquiring an authenticated display_fd from the display server.
Fixes: b1188c9451 (v3dv: VK_KHR_display extension support)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7546>
Commit eda3e4e055 moved the creation of
s->info.name to shader creation time, rather than after the compile.
A few lines after creating the shader, prog_to_nir clobbers s->info
entirely, losing the name.
This dropped the "ARB" indicator that iris uses to switch math to the
legacy non-IEEE mode used by ARB_vertex_program/fragment_program.
Revert that hunk and go back to doing things the way they were.
Fixes: eda3e4e055 ("nir/builder: Add a name format arg to nir_builder_init_simple_shader().")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3777
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7564>
In commit eda3e4e055, Eric added names
to various programs. In that patch, he also renamed our passthrough
TCS shader from "passthrough" to "passthrough TCS". The passthrough
TCS directly supplies the VUE headers rather than doing the whole
"patch parameters are in backwards order" reswizzling dance.
We failed to detect this and started trying to supply vec4s starting
at component 3, leading to a stack smash on an array of 7 sources,
not to mention the values were being put in the wrong place.
Easy fix: update the code for the new name.
Fixes: eda3e4e055 ("nir/builder: Add a name format arg to nir_builder_init_simple_shader().")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3777
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7564>
Seems some sort of linux change (bugfix?) resulted in the db410cs
selecting device mode for the db410cs due to the micro cable being
plugged in (fastboot runs them in device mode), so we weren't finding
the network and getting artifacts out.
Closes: #3728
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
I'm surprised these were listed as flaky instead of xfails, since I would
have expected them to always fail given my experience on freedreno and
broadcom. But let's try turning them back on and see if it's actually
flaky since the test has been fixed.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
I want the new version to show the fix in the fd-largeconsts branch (and
make sure the pass keeps working, and make sure other drivers get around
to fixing the issue). While I'm here, cherry-pick in the VK test along
with the GLES one, and also the fix for clip_three on ARMs.
Since the VK and GL test lists were changing, I took the opportunity to
reset freedreno xfails lists to just the tests that are being run with the
CTS uprev, and increase its coverage to 1/10th of the CTS across two
boards (since we just freed up a bunch of runtime with the grouped gles
"other" job).
For panfrost, I didn't spend the time characterizing the t720 fragment_ops
flakes like I did for the deqp-runner change. Given that the random
behavior changes between CTS versions, it doesn't seem to be worth the
time to do so.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
The recent change to install kernel modules for AMD included a sed job to
disable kernel modules in the defconfig. This somehow broke booting on
a307, except the commit failed to bump the arm64_test tag so it wasn't
noticed until the next uprev. (I didn't notice when landing the next
change to that container to add the deqp runner, because I didn't get a
git conflict on rebasing my tag bump so I didn't bump the tag again to
pull in the kernel changes and catch the fail).
I've spent a while trying to debug what's happened (including what
*should* be a replication of the kernel build on my local db410c) and come
up empty. Just punt and disable the AMD kernel module changes on
baremetal to fix it. Bump every container using lava_build.sh to make
sure we don't screw anything up with the script changes.
Fixes: 60c5729d16 ("ci: Distribute ADMGPU driver to LAVA as a module")
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
This cleans up a bunch of gross sprintfs and keeps the caller from needing
to remember to ralloc_strdup. I added a couple of '"%s", name ? name :
""' to radv where I didn't fully trace through whether a non-null name was
being passed in.
I also took the liberty of adding a basic name to a few shaders (pan_blit,
unit tests)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>
These two consumers were the only ones out of the ~65 calls to
init_simple_shader, so there's a pretty clear consensus on how to allocate
simple shaders. I suspect that actually these would be just fine with
b.shader being the mem_ctx, but that would take a bit more rework.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>
Rather than hard-code a list of all the format
modifiers supported by any gallium driver and the
number of aux planes they require in the dri state
tracker, add a screen proc that queries the number
of planes required for a given modifier+format
pair.
Since the only format modifiers that require
auxiliary planes currently are the iris driver's
I915_FORMAT_MOD_Y_TILED_CCS,
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, and
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, the absence
of the screen proc implies zero aux planes for all
of the screen's supported modifiers. Hence, when
a driver does not expose the proc, derive the
number of planes directly from the format.
Signed-off-by: James Jones <jajones@nvidia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3723>
Add a "do you support this modifier?" query to all
drivers which support format modifiers. This will
be used in a subsequent change to fully
encapsulate modifier validation and auxiliary plane
count calculation logic behind the driver
abstraction, which will in turn simplify the
addition of device-class-specific format modifiers
in the nouveau driver.
Signed-off-by: James Jones <jajones@nvidia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3723>
dEQP-VK.pipeline.blend.dual_source.format.r16g16b16a16_snorm.states.color_1msc_1ms1a_add_alpha_1mdc_1msa_sub-color_dc_1ms1c_rsub_alpha_z_1mdc_sub-color_ca_1ms1c_min_alpha_sas_ca_rsub-color_1ms1c_s1c_add_alpha_z_1mda_add,Fail
dEQP-VK.pipeline.blend.dual_source.format.r8g8_snorm.states.color_z_sc_add_alpha_1ms1c_sa_min-color_dc_1mca_add_alpha_z_1mca_max-color_1ms1c_sa_max_alpha_1mcc_sc_sub-color_s1c_1mda_add_alpha_s1c_1mda_add,Fail
dEQP-VK.pipeline.blend.dual_source.format.r8g8b8a8_snorm.states.color_1msc_1ms1a_add_alpha_1mdc_1msa_sub-color_dc_1ms1c_rsub_alpha_z_1mdc_sub-color_ca_1ms1c_min_alpha_sas_ca_rsub-color_1ms1c_s1c_add_alpha_z_1mda_add,Fail
dEQP-VK.pipeline.blend.dual_source.format.r8g8b8a8_snorm.states.color_z_sc_add_alpha_1ms1c_sa_min-color_dc_1mca_add_alpha_z_1mca_max-color_1ms1c_sa_max_alpha_1mcc_sc_sub-color_s1c_1mda_add_alpha_s1c_1mda_add,Fail
dEQP-VK.pipeline.blend.format.r16g16b16a16_snorm.states.color_ca_1mca_rsub_alpha_1mda_z_sub-color_sc_sc_add_alpha_1mca_sa_max-color_sa_1msa_min_alpha_1msc_sa_sub-color_dc_sc_add_alpha_1mdc_1mca_add,Fail
dEQP-VK.pipeline.blend.format.r8g8b8a8_snorm.states.color_ca_1mca_rsub_alpha_1mda_z_sub-color_sc_sc_add_alpha_1mca_sa_max-color_sa_1msa_min_alpha_1msc_sa_sub-color_dc_sc_add_alpha_1mdc_1mca_add,Fail
All fail due to the 1 - mdc or 1 - mca alpha channel in the last quadrant.
Cc: 20.3 <mesa-stable>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7499>
If we have cbufs but they are all empty, default
to returning the fb->samples.
Fixes:
dEQP-VK.pipeline.multisample.mixed_count.1_4_unused
on lavapipe
v2:
drop unneeded chunk (Roland)
Cc: 20.3 <mesa-stable>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7499>
Some drivers will drop warnings about seeing these structs in the
pNext chain and not handling them. This change makes it so we
only include the structs with Vulkan drivers that are known to
require them for proper behavior (v3dv only for now) to avoid the
warnings.
It should be noted that here we are only supressing the messages
from Zink. Since the Mesa Vulkan WSI code will include these structs,
when native Vulkan Mesa drivers are used without Zink they might
still dump these messages.
Requested by Mike Blumenkrantz.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7523>
We can use num_blocks (if it's been initialized by some pass indexing
blocks) to pre-size our table, which helps on validating shaders with many
blocks which would otherwise reallocate the set several times.
No statistically significant performance difference on softpipe
KHR-GL33.texture_swizzle.functional runtime (n=15). A previous, similar
variant of this patch cut .3% of instructions in softpipe shader-db ./run
shaders/closed/steam/borderlands-2/35* (an arbitrary set of shaders that
completed in reasonable amount of time) according to callgrind.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7244>
This implementation was broken and should have just been the same as the
hash_table_clear() one, which I copied over here. It was setting all
formerly-present entries to deleted, yet also setting deleted_entries to
0. This meant that all new searches or additions after clearing would
have to reprobe the whole table until a rehash happened, and that rehash
would be delayed because we violated the deleted_entries invariant.
No statistically significant performance difference on softpipe
KHR-GL33.texture_swizzle.functional runtime (n=18)
Fixes: 5c075b0855 ("util/set: add a set_clear function")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7244>
Use the entry_is_present() helper to clarify what's going on with
deletion, and then we can remove the special continue for NULL since we're
just writing NULL anyway (which the CPU cache will elide for us).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7244>
With the image_read_write OpenCL CTS we can get a stack overflow handling
all the events as the application itself never flushes.
We need to address this in two ways:
1. flush the queue once an abritary amoung of events piled up.
2. Drop event deps once they get a fence assigned.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7460>
Just statically initialize the dummy/incomplete framebuffer/renderbuffer
to avoid re-intializing their mutex.
==328537== Mutex reinitialization: mutex 0x1281bd28, recursion count 0, owner 0.
==328537== at 0x486FD34: pthread_mutex_init_intercept (drd_pthread_intercepts.c:826)
==328537== by 0x486FD34: pthread_mutex_init (drd_pthread_intercepts.c:835)
==328537== by 0x118F9727: mtx_init (threads_posix.h:207)
==328537== by 0x118F983B: simple_mtx_init (simple_mtx.h:132)
==328537== by 0x118FA087: _mesa_init_fbobjects (fbobject.c:93)
==328537== by 0x117E8CB7: init_attrib_groups (context.c:849)
==328537== by 0x117E942F: _mesa_initialize_context (context.c:1225)
==328537== by 0x1173C323: st_create_context (st_context.c:1019)
==328537== by 0x11720A9F: st_api_create_context (st_manager.c:930)
==328537== by 0x1170E2CF: dri_create_context (dri_context.c:163)
==328537== by 0x11FB9DC3: driCreateContextAttribs (dri_util.c:480)
==328537== by 0x8E9D3DF: dri3_create_context_attribs (dri3_glx.c:316)
==328537== by 0x8E9D49B: dri3_create_context (dri3_glx.c:347)
==328537== mutex 0x1281bd28 was first observed at:
==328537== at 0x486FD34: pthread_mutex_init_intercept (drd_pthread_intercepts.c:826)
==328537== by 0x486FD34: pthread_mutex_init (drd_pthread_intercepts.c:835)
==328537== by 0x118F9727: mtx_init (threads_posix.h:207)
==328537== by 0x118F983B: simple_mtx_init (simple_mtx.h:132)
==328537== by 0x118FA087: _mesa_init_fbobjects (fbobject.c:93)
==328537== by 0x117E8CB7: init_attrib_groups (context.c:849)
==328537== by 0x117E942F: _mesa_initialize_context (context.c:1225)
==328537== by 0x1173C323: st_create_context (st_context.c:1019)
==328537== by 0x11720A9F: st_api_create_context (st_manager.c:930)
==328537== by 0x1170E2CF: dri_create_context (dri_context.c:163)
==328537== by 0x11FB9DC3: driCreateContextAttribs (dri_util.c:480)
==328537== by 0x8E9D3DF: dri3_create_context_attribs (dri3_glx.c:316)
==328537== by 0x8E9D49B: dri3_create_context (dri3_glx.c:347)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7517>
In general, rings are not shared across contexts/threads. But this
can happen with texture stateobjs, which can be invalidated by other
contexts.
And while we're here, lets convert the rest of freedreno/drm to
u_atomic
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7342>
These were actually just wrappers for the screen->lock, left over from
moving things around a long time ago. Lets drop them to make things
more explicit (that we are locking the screen, not the context).
Involves a bit of shuffling things around to untangle header deps, but
no functional change.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7342>
This fixes a regression that happened after rebasing on master, where we
end up not writing all components of the clip-distance array, which the
DXIL validation code in the D3D12 runtime treats as an error.
To ensure we don't end up overwriting a previous wrire, enable
nir_shader_compiler_options::lower_all_io_to_temps as well.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7477>
This is support-code to emit the DirectX Intermediate Language, which is
a dialect of LLVM 3.7 bitcode. Because modern versions of LLVM doesn't
support emitting bitcode for older versions, and we can't rely on an old
LLVM version because we need the OpenCL support from Clang later on, we
instead implement our own LLVM bitcode encoder as part of this work.
See the official DXIL documentation for more details on DXIL:
https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/DXIL.rst
The reason this comes as a separate library, is because we're also using
this code as the basis for an OpenCL C compiler, which will follow as a
separate merge-request later.
This is the combination of more than 230 commits from our development
branch, including the work from several authors.
Co-authored-by: Bill Kristiansen <billkris@microsoft.com>
Co-authored-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-authored-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Gert Wollny <gert.wollny@collabora.com>
Co-authored-by: Jesse Natalie <jenatali@microsoft.com>
Co-authored-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7477>
Adds template support to zink_device_info.py for setting up the VkPhysicalDeviceVulkan* version Features and Properties structures.
When the next Vulkan version with newer structure is released a single like should only need to be added.
Note, the 11 structures where not added until Vk 1.2, so that is not a typo.
This code does not stop the use of clonflicting extensions or other VkPhysicalDevice*Features structures with VkPhysicalDeviceVulkan*Features structures when calling vkCreateDevice()
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7496>
pipeline cannot be NULL since pipeline->layout->num_sets was just
checked.
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking pipeline suggests that it may be
null, but it has already been dereferenced on all paths leading to
the check.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7521>
# Some blue rings are missing on the top-left corner, also the penguin watermark
checksum:0020a77e25003e4e8db1ce929eed8914
# Flaky since the introduction of parallel trace replay
#- path: gputest/gimark.trace
# expectations:
# - device: gl-panfrost-t860
## Some blue rings are missing on the top-left corner, also the penguin watermark
# checksum: 0020a77e25003e4e8db1ce929eed8914
- path:gputest/pixmark-julia-fp32.trace
expectations:
- device:gl-panfrost-t860
@@ -202,7 +203,7 @@ traces:
- path:humus/CelShading.trace
expectations:
- device:gl-panfrost-t860
checksum:e44a7ac7442e82d85de583f2cdd68fdf
checksum:521ca6a236b8400cf692e6817b91c739
- path:humus/DynamicBranching3.trace
expectations:
- device:gl-panfrost-t860
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.