The previous fix 0cae8d372e is the right way to proceed, but it
should also apply when index_size is non-zero.
This change was tested on palm and cayman. Here is the test fixed:
spec/arb_multi_draw_indirect/arb_draw_elements_base_vertex-multidrawelements -indirect: fail pass
Fixes: 0cae8d372e ("r600: don't set an index_bias for indirect draw calls")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34726>
(cherry picked from commit a640b7233c)
The ISA docs don't mention this, but instead of always truncating
like other integer conversions, this opcode actually uses the single
precision rounding mode.
We could continue to use the opcode and set the rounding mode to rtz
in lower_to_hw_instrs, but I think I should just concede that f2u8
isn't worth the effort.
Fixes: 9bb10b58 ("aco: use v_cvt_pk_u8_f32 for f2u8")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391>
(cherry picked from commit d95e90ab5f)
We were allocating these, but never freeing the actual CSOs here.
Let's wire things up so we delete the data when we destroy the
hash-table. Because we don't have access to the context in that
callback, we can't call the pipe-level function to delete a CSO,
but luckily we don't actually need the context for the
driver-logic. So let's add an internal helper for that.
Fixes: ae3fb3089f ("panfrost: Add infrastructure for internal AFBC compute shaders")
Fixes: f39194cdd3 ("panfrost: support MTK 16L32S detiling")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35336>
(cherry picked from commit fb0a422be2)
At draw time, if the number of BOs is bigger than 2048, the current
job submission is forced.
The 2048 limit has been validated to be big enough to not be reached
in most of the scenarios. Only a couple of CTS tests get over this
threshold.
So the new V3D_JOB_MAX_BO_HANDLE_COUNT is defines as 2048 and
V3D_JOB_MAX_BO_REFERENCED_SIZE is defined as 768MB.
This forced submission is useful to handle scenarios where the client
application is not calling glFlush() or where SwapBuffers() is a NOP
because of not having a window surface. In this case, the CLE can
grow indefinitely until the system runs out of memory resources.
This approach is followed by different drivers forcing the flush
of CL when it reaches a defined size because of HW limitations.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227
Cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35042>
(cherry picked from commit ed16884bfa)
Mark when at least one job for the current active FBO has already been
submitted since the last framebuffer state update.
With this we can apply TLB load invalidation only to the first
job that is submitted to the same FBO. Not applying TLB
loads invalidation on follow-up jobs targeting the same framebuffer
state.
With this we avoid doing incorrect invalidations when we force
a job submission for a reason not related with a new framebuffer bind.
Cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35042>
(cherry picked from commit 6ff509593c)
Before we move a UBO load to a previous location in the block we take a
reference to the instruction after it so we can continue the loop from
there, however, if the load we just moved was already the last instruction
in the block we just want to break the loop right there.
Fixes crashes with shaders from http://flightradar24.com
Fixes: 8998666de7 ("broadcom/compiler: sort constant UBO loads by index and offset")
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35333>
(cherry picked from commit c059c721fb)
0e3e5146cf ("intel/brw: Use correct instruction for value change check
when coalescing") enabled some new cases that exposed a pre-existing
bug that would turn something like this :
mul.sat(16) %789:F, %787:F, %788:F
mov.g.f0.0(16) %790:F, %789:F
(+f0.0) sel(16) %800:UD, %790:UD, 0u
into this :
mul.sat(16) %790:F, %787:F, %788:F
mov.g.f0.0(16) null:F, null<8,8,1>:F
(+f0.0) sel(16) %800:UD, %790:UD, 0u
The mov[] array can contain the same instruction because it's repeated
for each REG_SIZE writes and a SIMD16 instruction will write 2
REG_SIZE.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0e3e5146cf ("intel/brw: Use correct instruction for value change check when coalescing")
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35276>
(cherry picked from commit a51d061c00)
in a scenario like:
* begin_rendering(cbuf1:store=DONTCARE, cbuf2)
* draw
* remap(cbuf2, NULL)
* draw
* end_rendering
cbuf1 will be poisoned at the end of the renderpass, but the corresponding
clear call to trigger the poisoning will not be able to detect that this
texture is being written by an async fs, causing a write hazard
unremapping the fb here ensures that all attachments are fb-referenced
as expected in order to guarantee threads sync before memory is poisoned
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35319>
(cherry picked from commit d8a6ec5985)
When binning with a GS, both VS and GS are active. This means that we
could have to use the safe-const variant for the GS. However we only
emitted VPC state for the binning case with the "normal" GS variant.
Emit the VPC state with the safe-const variant too, and select between
the state variants at link time.
This fixes a few tests like
dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.32struct_to_8struct.uniform_uint_geom
with TU_DEBUG=gmem,forcebin.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35294>
(cherry picked from commit 723a1fabac)
If the render area is restricted to a section of the framebuffer, there
is no need to consider all the framebuffer size when configuring the
supertiles, as only the supertiles coordinates of the affected area will
be submitted.
This allow to create supertiles smaller than the ones in case
considering the full screen, reducing the tiles that need to be
processed.
This also fixes https://gitlab.freedesktop.org/mesa/mesa/-/issues/13218.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35257>
(cherry picked from commit d30a6f8102)
So far the driver was configuring the supertiles to be less than 256.
But actually, there can be up to 256, not strictly less than 256.
There is one restriction though: the frame width or height in supertiles
must be less than 256.
It also moves this limit to the limits file, which is shared by v3d and
v3dv.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35257>
(cherry picked from commit 2cac70558d)
This mirrors AMDVLK. 128-byte alignment is possible, but DOOM: The Dark
Ages screws up scratch allocation with alignments <256 bytes.
Fixes hangs in DOOM: The Dark Ages.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35152>
(cherry picked from commit dac6f09451)