In 76e2f390f9, when Topi switched num_samples from 0 to 1 for
single-sampled, he accidentally switched the last parameter in the call
to miptree_create_for_teximage from 0 to 1 thinking it was num_samples
when it was actually layout_flags. Switching from 0 to 1 added the
MIPTREE_LAYOUT_ACCELERATED_UPLOAD flag which causes us to allocate a
busy BO instead of an idle one. This caused the subsequent CPU upload
to consistently stall. The end result was a 15% performance drop in the
SynMark v7 DrvRes microbenchmark. This restores the old behavior and
fixes the performance regression.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes: 76e2f390f9
Bugzilla: https://bugs.freedesktop.org/102260
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit f24cf82d6d)
This little optimization improves the performance of SynMark v7
TexFilterTri by almost 10% on Sky Lake GT4 among other improvements.
We've been doing it for some time but somehow it got dropped during
the miptree refactoring.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/102258
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 61d2f3f1c2)
Looking at NewDriverState is not safe in general. The state atom system
is set up to ensure that new bits that get added to NewDriverState get
accumulated into the set of bits used when emitting atoms but it doesn't
go the other way. If we read NewDriverState, we may not get the full
picture because the per-pipeline state (3D or compute) does not get
added to NewDriverState before state emit is done. It's especially
dangerous to do this from BLORP (either explicitly or implicitly when
BLORP calls gen7_upload_urb) because that does not happen during one of
the normal state upload paths.
This commit solves the problem by whacking all of the per-shader-stage
URB sizes to zero whenever we change the total URB size. We still have
to flag BRW_NEW_URB_SIZE to ensure that the gen7_urb atom triggers but
the actual decision in gen7_upload_urb can now be based entirely on URB
sizes rather than on state atoms. This also makes BLORP correct because
it just asks for a new URB config whenever the vsize is too small and so
any change to the total URB size will trigger blorp to re-emit as well
because 0 < vs_entry_size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/102289
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d5e217dbfd)
EGLimages are shared with external users, and we don't know what they're
going to do with them. They might scan them out. They might access
them in a way that doesn't work with our explicit clflushing.
It's safest to simply mark them non-coherent.
Chris Wilson caught this problem and wrote a similar (though less
aggressive) patch to solve it; the miptree code has since undergone
a lot of refactoring so I had to rewrite it.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit bc56dfbf3f)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/intel_mipmap_tree.c
The only one of the three remaining flags that has anything whatsoever
to do with layout is TILING_NONE. This commit renames them to
MIPTREE_CREATE_*, documents the meaning of each flag, and makes the
create functions take an actual enum type so GDB will print them nicely.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 24a0da338f)
The only force tiling flag we really care about is LAYOUT_TILING_NONE.
The others don't actually do anything but add confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 55116839d9)
The flag hasn't affected actual surface layout for some time. The only
purpose it served was to set bo->cache_coherent = false on the BO used
to create the miptree. This is fairly silly because we can just set
that directly from the caller where it makes much more sense.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a5a673dfa7)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/intel_mipmap_tree.c
The one caller of is_mcs_supported passes 0 in as the layout_flags
unconditionally.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 0e4d9a4b37)
For stable the regressions were addressed by reverting the offending
commits - see previous commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This reverts commit fcbb93e860 and
also commit 7c5b204e38 to avoid
compilation errors.
Basically, the parameter indexes look wrong when a non-bindless
sampler is declared inside a nested struct (because it is skipped).
I think it's safer to just restore the previous behaviour which is
here since ages and also because the initial attempt is only a
little performance improvement.
This fixes a regression with
ES2-CTS.functional.shaders.struct.uniform.sampler_nested*.
Note: this patch is only for 17.2 since the fixes in master are rather
invasive. Namely:
365d34540f mesa: correctly calculate the storage offset for i915
de0e62e106 st/mesa: correctly calculate the storage offset
Fixes: fcbb93e860 ("mesa: stop assigning unused storage for non-bindless
opaque types")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101983
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The make_shareable function deletes the aux buffer and then whacks
aux_usage to ISL_AUX_USAGE_NONE but not unsetting supports_fast_clear.
Since we only look at supports_fast_clear to decide whether or not to do
fast clears, this was causing assertion failures.
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101925
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit e7a52cc381)
If you don't pass this, the compiler refuses to compile the assembly for
pre-v7 CPUs. This also keeps us from building identical, non-NEON code on
aarch64 and x86.
Fixes: a373f77662 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.")
v2: Fix Android build by just appending NEON_C_SOURCES when
ARCH_ARM_HAVE_NEON.
Tested-by: Rob Herring <robh@kernel.org>
(cherry picked from commit bd5efbd70b)
[Emil Velikov: address libvc4_la_LIBADD conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/vc4/Makefile.am
I've been trying to get away without these conditionals in vc4's NEON
code, but it meant compiling extra unused code on x86, and build failing
on ARMv6.
v2: Use the _arm/_arm64 flags to simplify detection (suggested by Rob),
but hide the _arm version under ARCH_ARM_HAVE_NEON to keep from trying
to build this stuff for armv5te.
Tested-by: Rob Herring <robh@kernel.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit ba8533b6ea)
We need to link librt for u_thread.h's clock_gettime() call.
Fixes: b822d9dd67 ("gallium/util: move u_queue.{c,h} to src/util")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b94ddc181b)
BLEND_STATE packing was modified to be variable-length in:
9670124e31 genxml: Make BLEND_STATE command support variable length array.
The initial gen10.xml still had the old, fixed-length style
definition for BLEND_STATE. So gen10_upload_blend_state would
overwrite the packed BLEND_STATE_ENTRYs with its own fixed array
of all-zero entries when packing BLEND_STATE. This caused
BLEND_STATE upload to not work at all.
Fixes: aa416f515a ("i965/genxml: Add gen10.xml")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit d6539608a4)
LLC platforms are magic in that reads from the CPU are always cache
coherent, or rather GPU writes that bypass LLC do still invalidate the
appropriate cache line.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 49eda75df6)
This affects which inputs are marked as used. In a situation where only
the texture instruction uses an input, it might have been ignored as
unused due to input masks.
Affects subtests of KHR-GL45.texture_cube_map_array.sampling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 054c54d1be)
We continue in the code to do some more things with the rhs, including
setting a constant initializer. If the type is wrong, this causes some
confusion down the line, leading to assertions. This makes sure that the
rhs processing continues to flow as-if the type was correct to start
with (even though the state has been marked as an error state).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101766
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 978c4c597a)
intel_miptree_texture_aux_usage() takes an isl_format, but we are
passing a mesa_format. clang warns:
brw_blorp.c:305:52: warning: implicit conversion from enumeration
type 'mesa_format' to different enumeration type
'enum isl_format' [-Wenum-conversion]
intel_miptree_texture_aux_usage(brw, src_mt, src_format);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~
Fixes: fc1639e46d ("i965/blorp: Use texture/render_aux_usage for blits")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit f7dfc44c61)
Since we don't iterate to a fixed point, we can end up in situations
where we have a SAT instruction + a long immediate. This is not legal.
However since it's immediately computable, just run unary straight away
to handle the situation.
Fixes: 24a799ad35 ("nv50/ir: fix ConstantFolding with saturation")
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 165e18dd21)
This seems like a workaround, but we don't see the bug on CIK/VI.
On SI with the dEQP-VK.memory.pipeline_barrier.host_read_transfer_dst.*
tests, when one tests complete, the first flush at the start of the next
test causes a VM fault as we've destroyed the VM, but we end up flushing
the compute shader then, and it must still be in the process of doing
something.
Could also be a kernel difference between SI and CIK.
v2: hit this with a bigger hammer. This fixes a bunch of hangs
in the vk cts with the robustness tests.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101334
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 82ba384c10)
This ports the workaround from radeonsi, that was missing in radv.
This fixes Talos rendering when MSAA is enabled on my Tahiti card.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8bf3930751)
This just copies the code from the -pro shaders,
and fixes the tests on CIK.
With this CIK passes the same set of conformance
tests as VI.
Fixes: 83e58b03 (radv: flush f32->f16 conversion denormals to zero. (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3f389f75b6)
The argument here is a bitmask, so the old code selected .xy, which
got silently truncated to .x when constructing the vec4 from components,
instead of using .w.
Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit acba3a3151)
It justs works with the fragment shader resolve, so no need to do
a custom conversion. In fact with SRGB dest, it actually gives
wrong results.
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 15e5a7a683)
These seem to store very bogus results. Luckily there is some code
that converts srgb->linear already, so just making the descriptor
format UNORM should work.
Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8286c3a49f)
This fixes corrupted shadows in Unigine Valley.
The corruption disappeared when I stopped setting IMG_DATA_FORMAT_24_8
for depth.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 27fef5d52d)
Also, silence an obnoxious finishme that started occurring for all
GL applications which use stencil after the i965 ISL conversion.
v2: Check against 3DSTATE_STENCIL_BUFFER's pitch bits when using
separate stencil, and 3DSTATE_DEPTH_BUFFER's bits when using
combined depth-stencil.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 5563872dbf)
If we have an invalid display fed into the functions, the display lookup
will return NULL. Thus as we attempt to get the platform type, we'll
deref. it leading to a crash.
Keep in mind that this will not happen if Mesa is built without X11 or
when the legacy eglCreate*Surface codepaths are used.
A similar check was added with earlier commit 5e97b8f5ce ("egl: Fix
crashes in eglCreate*Surface), although it was only applicable when the
surfaceless platform is built.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 26fbb9eacd)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/egl/main/eglapi.c