Previously, we would only offset register ids for LValues that are
directly used in a merge/split instruction, but this is incorrect.
We instead need to apply the offset to all LValues that compMask
has been propagated to. By calcuating this from compMask instead
of figuring it out a second time, we fix that issue and also manage
to simplify the code a bit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24541>
This code previously stored two rather different masks in compMask:
1. from merge/splits (calculated in makeCompound), and
2. in the join root for whatever register was assigned
Since we were already calculating the second type as intfMask where it
is used in checkInterference(), change that function to unconditionally
use intfMask and only use compMask for the first type.
This is functionally equavalent and keeps the types of masks separate.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24541>
It's non-trivial to drop the private binding or transfer ownership to
the bound memory. So we track the image in the device memory for
dedicated allocation so that wsi image alias can find the original wsi
image from the wsi memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36095>
Lowering IO to temps leads to problems with RA with piglit
spec@glsl-1.50@execution@geometry@max-input-component
Not doing so results in an assertion failure with piglit
spec@glsl-1.50@execution@geometry@dynamic_input_array_index
because not all indirect IO access is lowered. Using
nir_lower_indirect_derefs works around this limitations.
v2: Fix formatting (Patrick Lerda)
Fixes: 1186c73c6 (r600: implement gs indirect load_per_vertex_input)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36051>
I've overlooked that unconditionallowering of indirect VS
inputs had been dropped. Since VS inputs are stored in
consecutive registers one can implement the indirect access
without additional lowering, it just needs a proper declararion
of the registers forming the array.
v2: - Fix formatting (Patrick Lerda)
- Use allocator for std::map to avvoid menory leak
(Patrick Lerda)
Fixes: a43bfffe1e
r600: Correct nir_indirect_supported_mask
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36051>
The AFBC-P payload layout is currently retrieved in 2 steps starting
with the payload sizes retrieval using a CS job on the GPU followed by
a CPU pass to set the payload offsets. This commit proposes to do both
steps on the CPU at once using a new utility function
pan_afbc_payload_layout_packed().
A new utility function pan_afbc_payload_uncompressed_size() is added
to help retrieve the uncompressed size from a pipe_format and
modifier. Both the CPU and GPU versions use it now.
A new AFBC-P driconf option "pan_afbcp_gpu_payload_sizes" is added to
fallback to the original payload sizes retrieval on the GPU.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Add an AFBC header block structure pan_afbc_headerblock to improve
readability when accessing header blocks. get_superblock_size(), which
will be used for AFBC packing in the next commits, has been moved to
pan_afbc.h and renamed to pan_afbc_payload_size() so that it can be
tested. Other utility functions pan_afbc_header_subblock_size() and
pan_afbc_header_subblock_uncompressed_size() hasve been added to help
retrieve the compressed or uncompressed size of a subblock from a
header. This commit also fixes a few issues like arch handling.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Pack AFBC resources asynchronously in order to prevent stalls at
texture upload waiting for 1) the AFBC staging blit (sparse encoding)
to complete and 2) the AFBC payload sizes retrieval.
After a texture upload, an AFBC resource is now progressively packed
at each read access once consecutively accessed a certain number of
times without a write access. This allows to prevent most stalls by
making AFBC packing a progressive async background process.
A useful side effect is that consecutive glTexSubImage*() calls on the
same texture (for texture atlases for instance) don't uselessly
respawn packing.
A new AFBC-P driconf option "pan_afbcp_reads_threshold" is added to
tweak the consecutive reads threshold.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
The pan_afbc_block_info structure describes the extent (offset and
size) of the payload data (compressed data) for a superblock, so use
the pan_afbc_payload_extent structure name instead in order to be more
precise and improve readability. This also allows to differentiate
superblocks and payload data which will be useful later in this series
when new helpers will be added to pan_afbc.h.
A set of payload extents describes the layout of various payloads, so
use the term "layout" instead of the generic term "metadata" to
describe it.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Preventing the use of the AFBC tiled layout could be useful to further
optimise memory usage when using AFBC packing. This commit introduces
a new option to disable it through a driconf option.
This is exposed as a new AFBC pan_afbc_tiled option (not tied to
pan_force_afbc_packing) because it would otherwise imply a useless
performance hit for the tiled to untiled conversion at packing time:
there's no need to detile if the resource is created untiled in the
first place. This could also be useful to compare the performance of
the AFBC tiled and untiled layouts.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Layout refactoring commits broke AFBC packing while removing several
fields to simplify the logic. The stride and height are now derived
when necessary at packing time based on the resource modifier. The
problem is that the code assumes that the source and destination
headers are the same although the source and destination modifiers
might differ and create size mismatches when passed to the AFBC
utilities in pan_afbc.h. The destination modifier is set as the source
modifier without the AFBC_FORMAT_MOD_SPARSE and AFBC_FORMAT_MOD_TILED
flags. While the AFBC_FORMAT_MOD_SPARSE flag doesn't have any impact
on these utilities, the AFBC_FORMAT_MOD_TILED flag does.
This commit fixes the issue by keeping the same header block layout
(linear or tiled header layout) when packing a resource. This allows
to simply parse header blocks linearly without having to bother with
the internal layout (Morton order). The tiled packed resource might
also benefit from better cache accesses.
Fixes: a2e9ce39e9 pan/layout: Drop pan_image_slice_layout::afbc::{stride_sb,nr_sblocks}
Fixes: 01d325ba63 pan/layout: Interleave header/body in AFBC(3D)
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Implement offset lowering by using the explicit LOD value with nearest-integer
rounding (floor(lod + 0.5)) and reusing the coordinate calculation helper.
Passes dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.* on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
Implement offset lowering by calculating implicit LOD using coordinate derivatives (ddx/ddy)
and doing some deep floating point wizardry matching the binary blob behaviour.
Adds helper functions for coordinate calculation and LOD clamping that will be
reused by subsequent offset lowering passes.
Passes dEQP-GLES3.functional.shaders.texture_functions.textureoffset.* without explicit bias on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
This will enable large code removal.
shader->config.lds_size is now always computed the same as ACO except for
compute shaders.
We have to add a new 8-bit user SGPR bitfield called
GS_STATE_GS_OUT_LDS_OFFSET_256B, which contains the offset
that was previously set by the relocation.
Since the offset must be a multiple of 256, we have to add padding
to the LDS size computation to make sure the alignment to 256 for the ESGS
LDS size doesn't cause us to exceed the maximum LDS size.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>