Make sure that both per-vertex and per-primitive attribute
ring stores are finished before position or primitive export
instructions are executed.
This is necessary because we need to ensure that mesh shader
waves work correctly when they have either vertex-only or
primitive-only waves.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 93b4f200de)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25157>
Cleanup the code that generates the two channels of the
primitive export instruction, and move storing the built-in
per-primitive outputs out to match how vertex attributes work.
Prepares the mesh shader lowering for a workaround that
affect export instructions.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 0721784b78)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25157>
This is a HW bug workaround for some (all?) GFX11 chips.
On these chips, rasterization can start before the attribute ring
stores are finished, which can cause issues.
As a workaround, wait for attribute ring stores to finish
before doing the position export.
Mesh shaders will be taken care of in another commit.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit edd51655f0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25157>
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.
Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 4eddeea7bf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
The compaction introduced in a252123363 ("intel/compiler/mesh: compactify MUE layout")
is not suitable for the case where graphics pipeline libraries are fast
linked, as the fragment shader won't receive the mue_map to know where
to locate its inputs.
For that case, keep doing what we did before and lay things down in the
order varyings are defined, which is also how it works for the non-mesh
case.
Fixes dEQP-VK.fragment_shading_rate.*fast_linked_library*.ms
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit b200e5765c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
Move mesh URB allocations together with the other stages.
This fixes a hang that started happening with mesh enabled after
419531c5d9 ("intel/blorp: add a new flag to communicate PSS sync need")
Bspec 45352 says:
L3 Space allocation can only be changed when the GPU pipeline is
completely flushed.
It's likely that the PIPE_CONTROL added in that commit was breaking that
assumption and the URB allocation happening afterwards at the end of the
pipeline emission would then hang. And before that, we were probably
just getting lucky.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit bcde58ea86)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
Both per-primitive and per-vertex space is allocated in MUE in 8 dword
chunks and those 8-dword chunks (granularity of
3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength)
are passed to fragment shaders as inputs (either non-interpolated
for per-primitive and flat vertex attributes or interpolated
for non-flat vertex attributes).
Some attributes have a special meaning and must be placed in separate
8/16-dword slot called Primitive Header or Vertex Header.
Primitive Header contains 4 such attributes (Cull Primitive,
ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword
slot) potentially unused.
Vertex Header is similar - it starts with 3 unused dwords, 1 dword for
Point Size (but if we declare that shader doesn't produce Point Size
then we can reuse it), followed by 4 dwords for Position and optionally
8 dwords for clip distances.
This means we have an interesting optimization problem - we can put
some user attributes into holes in Primitive and Vertex Headers, which
may lead to smaller MUE size and potentially more mesh threads running
in parallel, but we have to be careful to use those holes only when
we need it, otherwise we could force HW to pass too much data to
fragment shader.
Example 1:
Let's assume that Primitive Header is enabled and user defined
12 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of
MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to
fragment shader.
16/16 is better than 24/16, so packing makes sense.
Example 2:
Now let's assume that Primitive Header is enabled and user defined
16 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of
MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to
fragment shader.
24/24 is worse than 24/16, so packing doesn't make sense.
This change doesn't affect vk_meshlet_cadscene in default configuration,
but it speeds it up by up to 25% with "-extraattributes N", where
N is some small value divisible by 2 (by default N == 1) and we
are bound by URB size.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit c1685f08dd)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.
There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
float[N] requires N 1-dword slots, but
i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
if possible is beneficial
This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.
Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit a252123363)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
The vulkan spec says all conversions are correctly rounded, so if the input
is larger than the largest fp16 value, we need to return MAX_FLOAT/inf
instead of cutting off the msbs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24826>
(cherry picked from commit 6d949e18fd)
For z surfaces, flags.texture should be based on
RADEON_SURF_TC_COMPATIBLE_HTILE alone. Otherwise, addrlib could pick a
_X/_T swizzle mode for a MSAA depth texture, which is said to be broken:
When _X/_T swizzle mode was used for MSAA depth texture, TC will get zplane
equation from wrong address within memory range a tile covered and use the
garbage data for compressed Z reading which finally leads to corruption.
Fixes: de0885cdb8 ("amd/surface: add RADEON_SURF_NO_TEXTURE flag")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24767>
(cherry picked from commit e74c3dbb70)
It uses a poll function that waits for a second hoping for another thread
to catch up, which is not a reliable way to do synchronization. The test
has been spuriously failing merges on a regular basis recently.
This is issue #9222, which I'm leaving open until the author can fix the test.
Fixes: 3b69b67545 ("util/fossilize_db: add runtime RO foz db loading via FOZ_DBS_DYNAMIC_LIST")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24755>
(cherry picked from commit 4dfd306454)
It's unnecessary because earlier parts of the pass will ensure that a
mov of undef is turned into an undef. It's also wrong because
nir_op_mov has different semantics from nir_op_vecN when it comes to how
sources map to destination components.
Fixes: 5f26c21e62 ("nir: Expand opt_undef to handle undef channels in a store intrinsic")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24704>
(cherry picked from commit 408929289a)
These work in some circumstances (dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_16_to_64.scalar9_tessc),
but I'm not sure if they work in all, blending certainly doesn't work and
this probably wasn't intended in the first place.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 01bd012edd ("amd: fix 64-bit integer color image clears")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24400>
(cherry picked from commit 405f3bf990)
Adding GEM handles to the global list is necessary to allow
maintaining a single reference count for handles that are shared
between multiple buffer objects.
Since exported handles can end up being shared with other buffer
objects, as in the case that drmPrimeHandleToFD() and gbm_bo_import()
are called externally to Mesa, they too must be added to the global
list.
Unfortunately, doing this properly requires a new libdrm API. Use
the best possible option for now.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9552
Signed-off-by: Dor Askayo <dor.askayo@gmail.com>
Acked-by: Karol Herbst <git@karolherbst.de>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24648>
(cherry picked from commit daa1f789b5)
Partial revert of e516a0a94f ("egl: disable partial redraw when gallium
hud is active").
We shouldn't change the behavior of the application when the hud is
enabled, doing so could make it harder do diagnose issues. Instead, now
we warn and ask the user to manually disable the extension if he
considers it to be worth it.
Fixes: e516a0a94f ("egl: disable partial redraw when gallium hud is active")
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23456>
(cherry picked from commit 2edf222abd)
The buffer data is not directly accessible to application and it's
internally used to only store VACodedBufferSegment struct.
Ignore the size requested by application and instead allocate
sizeof(VACodedBufferSegment). Use calloc to zero out the struct.
This can save significant amount of memory, for example FFmpeg
will request up to tens of MB for single buffer.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6462
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24410>
(cherry picked from commit 7bcbfae87c)
The presentation timing extension is used for doing WaitForPresent
properly, but we accidentally bind it after an early return intended to
stop us from binding dmabuf when software rendering.
Remove the early return.
cc: mesa-stable
Signed-off-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24588>
(cherry picked from commit 5ba5bcf2b6)
vkQueuePresentKHR might return VK_SUBOPTIMAL_KHR which is not VK_SUCCESS
but presentation succeeded anyway. We should capture a trace even if
VK_SUBOPTIMAL_KHR is returned.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24052>
(cherry picked from commit b8edd19358)
as in the producer case, big io needs to reserve the appropriate number
of slots
fixes:
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-float-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-float-index-wr-before-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec2-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec2-index-wr-before-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec3-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec3-index-wr-before-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec4-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec4-index-wr-before-barrier,Fail
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24568>
(cherry picked from commit ee6ba2bb57)
in this scenario, sample counting must happen before a2c, as a2c may eliminate
coverage if alpha is zero, leading to a sample count of zero
dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_alpha_to_coverage_samples_4_maintenance5
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24589>
(cherry picked from commit ce09458917)
When replaying a RT pipeline, RADEON_FLAG_REPLAYABLE should be set.
The idea is that for capture, RADEON_FLAG_REPLAYABLE should be passed
when allocating a BO (ie. replay_va would be 0), and then for replay
the VA would be non-zero but the flag is also required.
Fixes
dEQP-VK.ray_tracing_pipeline.pipeline_library.configurations.multithreaded_compilation.*.
Fixes: 744357477e ("radv: Add utilities to serialize and deserialize shader allocation info")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24543>
(cherry picked from commit 1b66ebf09a)
We don't support any ASTC formats yet, but the textureCompressionASTC_LDR
feature was incorrectly set to true. Fix this by setting it to false and
don't advertise ASTC support for texture compression.
Fixes dEQP-VK.api.info.format_properties.compressed_formats
Fixes: 8991e646 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24448>
(cherry picked from commit c5a6e88c4e)
this otherwise underflows the array and provides a (probably huge) garbage
value for the binding id, which then causes the driver to massively overallocate
both the layout and set/pool/buffer
the main result of this is that on radv any simple test that should be near-instant
takes 2-3 seconds to execute, which somehow nobody noticed
Fixes: e3b746e3a3 ("zink: use GPL to handle (simple) separate shader objects")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24501>
(cherry picked from commit 652e87bc5d)
When we switch the channels by re-creating vec4 values we have to
take into account that the source values may be used in an ALU op,
and with that we have to take read-port limitations into account.
Fixes: 18a8d148d8
r600/sfn: Cleanup copy-prop into vec4 source values
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24519>
(cherry picked from commit 807c0d6bb7)
This fixes some CACHE_ERROR caused by proper multi-threading support. The
bug is a bit older though, just never triggered because there was only one
push buffer to begin with.
Without this change the compute initialization stayed unpushed in the
screen push buffer causing random issues.
Fixes: ff72440b40 ("nv50: implement a basic compute support")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24496>
(cherry picked from commit a9a30a7e09)
This reverts commit f9860a84b3. It's a
bit annoying having this scattered around but it's 100% a GLSL thing and
there's no reason why it should go in glsl_types.h. The fact that
glsl_print_type() even uses it is a bit sketchy.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24491>
(cherry picked from commit bf6d6a0934)
This reverts commit 1b836a52ea. This
patch, while claiming to decouple things, actually increases coupling
because it leaks two OpenGL state tracker limits and an OpenGL state
tracker fixed binding enum into the entire compiler. Nothing wants to
know these outside the OpenGL state tracker and the GL-specific compiler
passes. Put them back where they were.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24491>
(cherry picked from commit b89a48e00d)
It was not taken into account that without Offset decoration
the output is not written into XFB.
Aside from eliminating more outputs this change prevents gl_PerVertex
builtins generated by glslang from being kept alive in case when XFB
is enabled. Keeping such outputs alive may upset a driver.
VUID-StandaloneSpirv-Offset-04716:
"Only variables or block members in the output interface decorated
with Offset can be captured for transform feedback, and those
variables or block members must also be decorated with XfbBuffer
and XfbStride, or inherit XfbBuffer and XfbStride decorations from
a block containing them"
Additional info about glslang behavior could be found at:
https://github.com/KhronosGroup/glslang/issues/1526
Fixes: e95531e101
("radv: fix gathering XFB info if there is dead outputs")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24318>
(cherry picked from commit 81407797b9)
For TXQ we know make sure that we at least add one source. If the nir
instruction however didn't had any sources, we inserted a fake 0 source
ending up with two 0s for TXQ.
It's unclear to me if we have other ops where this would be necessary.
Fixes: 85a31fa1fc ("nv50/ir/nir: fix txq emission on MS textures")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Acked-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24373>
(cherry picked from commit 8d7f682bdb)
In GL and a lot of Vulkan if we end up with either a lod or an ms index.
Sadly in Vulkan we can end up with both and have to choose properly. For
TXQ we have to emit a zero LOD. For TXF we have to emit the ms index.
Fixes: bb032d8b62 ("nv50/ir/nir: implement nir_instr_type_tex")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24343>
(cherry picked from commit 85a31fa1fc)
Currently we don't properly support using he two IDX registers in the
same ALU CF, so work around this by enforcing a new CF if both indices
are used.
Fixes: d21054b4bc
r600/sfn: Add pass to split addess and index register loads
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24297>
(cherry picked from commit 1d4dd664e0)
nir_const_value_for_int asserts signed bounds on the input, but we pass in an
unsigned value that would be out-of-bounds for 32-bit channels, causing the
assert to fail for 32-bit channel formats.
Fixes dEQP-VK.pipeline.monolithic.logic_op.r32_uint.* on AGXV (and probably
PanVK).
Fixes: dbd0615e7a ("nir/lower_blend: Avoid useless iand with logic ops")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24252>
(cherry picked from commit 9c0740211d)
Consider the snippet of NIR:
div 32 %447 = @load_reg (%442) (base=0, legacy_fabs=0, legacy_fneg=0)
div 32 %463 = @load_reg (%442) (base=0, legacy_fabs=0, legacy_fneg=0)
con 32 %409 = iadd %17 (0x3), %447
@store_output (%182 (0x601), %463) (base=0, wrmask=x, component=0, src_type=invalid...
@store_reg (%409, %442) (base=0, wrmask=x, legacy_fsat=0)
The load_reg's are trivial, so the %442 read will get folded into store_output.
But under the old definition, the store_reg is also trivial so it gets folded
into the iadd... causing a read-after-write hazard and invalid code generation.
The fix is to amend our definition of store_reg triviality to account for loads
getting folded in. It's not good enough that there's no intervening load_reg,
there can also be no intervening source that gets chased to a load_reg. Handle
that case as well.
Identified in dEQP-VK.geometry.input.basic_primitive.triangles_adjacency on
V3DV.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reported-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit 0655bada4b)
In order for a register load to be trivial, it cannot be used in any
block other than the one in which it is loaded. We're not currently
explicitly doing anything to ensure this invariant holds. It may be
that it holds regardless but I couldn't find any documented reason why
it should so let's explicitly handle that case. Worst case, the newly
added code does nothing.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit f8b69abbd4)
Because this pass is intended to be run after out-of-SSA and directly
before injesting the NIR into the back-end, it may come after divergence
analysis and needs to preserve the divergence information. Fortunately,
since all we ever do is insert nir_op_mov, this is easy.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit f1f05cc7cf)
This commit makes three changes:
1. Default all newly created registers divergent because this is the
safer default.
2. Make divergence analysis do something sane with register divergence.
It's not perfect because divergence analysis isn't able to prove
registers divergent based on stores but at least if someone uses
registers a bit they'll end up with safe defaults. This matches
what they'd get with nir_ssa_def_init().
3. Make the load_reg() helper automatically propagate divergence from
the register. Because the defaults for both nir_ssa_def_init() and
nir_decl_reg() are to mark everything divergent, this only means
that nir_load_reg() of a uniform reg is now uniform.
Putting all these together, nir_from_ssa should now be producing
load_reg intrinsics with the proper uniform information.
Fixes: 7229bffcb1 ("nir: Add intrinsics for register access")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit 4fd257d20f)
ISL's state-machine of CCS_D describes full resolves as leaving the aux
buffer in the pass-through state. Hardware doesn't behave this way on
gfx8 however. On that platform, full resolves transition the aux buffer
to the resolved state. This was verified by dumping the CCS before and
after a full resolve on BDW (gfx7 is simply assumed to behave the same).
Ambiguate after resolving to match driver expectations.
Prevents iris from failing piglit's fcc-write-after-clear on BDW with a
future patch which relies on fast-clear encodings being removed after a
resolve. The avoided failure is:
Testing implicit read of partial block UNORM -> SNORM
Probe color at (0,1,0)
Expected: 1.000000 1.000000 1.000000 1.000000
Observed: 0.000000 0.000000 0.000000 0.000000
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>
(cherry picked from commit 1d12b29b3f)
In the case of:
halt
// succs: b9
if %618 {
block b3:// preds:
break
// succs: b6
} else {
block b4: // preds: , succs: b5
}
block b5: // preds: b4
32 %556 = iadd %617, %2 (0x1)
opt_constant_if() doesn't work because stitch_blocks() can't join blocks if the
before ends in a jump and the after isn't empty.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24235>
(cherry picked from commit 21f0aca948)
Same cause as for other R8G8 formats - msaa resolve via
blit event causes gpu fault.
Fixes:
dEQP-VK.api.image_clearing.*.clear_color_attachment.*.r8g8_srgb_*
Fixes: 029919f3c8
("tu: allow using resolve engine for SRGB MSAA resolves")
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24277>
(cherry picked from commit eeb1fd90fc)
Lavapipe has switched to layer push descriptor support atop descriptor
updates internally since 12a7fc51c7, so
it must skip retrieving immutable samplers from the write info even if
the update call itself is blessed by the spec to not hit that case.
Fixes: 12a7fc51c7 ("lavapipe: Rework descriptor handling")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24263>
(cherry picked from commit 8cb7bab341)
Before this change, anv_get_image_format_features2 reported support for
ASTC formats with any modifier (even those not supported by anv). But,
we didn't intend to support that compressed image format with modifiers.
With this change, the format feature function reports no support for
modifiers on ASTC-formatted images.
This prevents the next patch from causing assertion failures due to
unsupported modifiers.
Fixes: 355f318843 ("anv: Allow transfer-only linear ASTC images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
(cherry picked from commit e50af52e3d)
Even on Valhall, vertex_id is zero-based in a transform feedback program. Lower
that for transform feedback programs properly since it wouldn't happen
automatically on Valhall. Fixes assertion fails.
Fixes: 91ffd10351 ("pan/bi: Lower gl_VertexID in NIR")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24198>
(cherry picked from commit 64ff2b3ed6)
Without SP_FS_CTRL_REG0.LODPIXMASK quad ops don't get values from
helper invocations, but from the current one.
Fixes:
dEQP-VK.glsl.derivate.dfdxsubgroup.*
dEQP-VK.glsl.derivate.dfdysubgroup.*
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24211>
(cherry picked from commit a0d426370d)
That's the "real" name of the field.
It enables ALL helper invocations in a quad, which is necessary for
fine derivatives and quad subgroup ops.
While PIXLODENABLE by itself enables only 3 out 4 fragments in a quad.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24211>
(cherry picked from commit 696f37f5c3)
Some Wayland compositors, notably Exo, do not always release buffers
fast enough, and not in sync with their frame callbacks, to guarantee
that a free buffer is available the next time a client calls
`eglSwapBuffers()`.
This currently leads to a crash in `dri2_wl_swrast_get_backbuffer_data()`
with the swrast backend. To avoid this, simply block until the
compositor releases a buffer eventually.
While arguably compositors should release buffers they don't need any
more for the next frame, this can be quite complex depending on
the architecture - notably multi-process/IPC in case of Exo.
cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24091>
(cherry picked from commit 74451ed3f0)
Do not try to determine the shader stage from the compiled shader
variant, which may be NULL after compile failure. Instead, get it
from the NIR shader.
Fixes a segfault when trying to evaluate etna_shader_stage(NULL)
after compile failure.
Suggested-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Fixes: 3d49619071 ("etnaviv: add support for performance warnings")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24178>
(cherry picked from commit f626605cbf)
From OpenGL ES 3.0 spec, page 56:
"Binding more than one attribute name to the same location
is referred to as aliasing, and is not permitted in OpenGL
ES Shading Language 3.00 vertex shaders. LinkProgram will
fail when this condition exists. However, aliasing is
possible in OpenGL ES Shading Language 1.00 vertex shaders.
This will only work if only one of the aliased attributes
is active in the executable program, or if no path through
the shader consumes more than one attribute of a set of
attributes aliased to the same location. A link error can
occur if the linker determines that every path through the
shader consumes multiple aliased attributes, but implemen-
tations are not required to generate an error in this case."
So here we make sure to allow the optimisations before validation
for earlier ES shader versions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: 80c001013c ("glsl: do vs attribute validation in NIR linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9342
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24205>
(cherry picked from commit c64ad299e4)
It turns out the hardware doesn't save the whole state on a context
switch, as the kernel expects when it creates the golden context.
For some HW units, only the state that was explicitly programmed will be
part of it, so we need to make sure mesh shading is disabled on context
creation, or we risk being context switched with an application that
uses mesh, and when ours gets to run again, the mesh state won't be
reset, and submitting a legacy 3D pipeline while the HW thinks mesh is
enabled causes us to hang.
Cc: 23.2 <mesa-stable>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24150>
(cherry picked from commit 7b0ded0b23)
It turns out the hardware doesn't save the whole state on a context
switch, as the kernel expects when it creates the golden context.
For some HW units, only the state that was explicitly programmed will be
part of it, so we need to make sure mesh shading is disabled on context
creation, or we risk being context switched with an application that
uses mesh, and when ours gets to run again, the mesh state won't be
reset, and submitting a legacy 3D pipeline while the HW thinks mesh is
enabled causes us to hang.
Cc: 23.2 <mesa-stable>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24150>
(cherry picked from commit 50d68f74b5)
It happened because glCallList was restoring varying_vp_inputs, which
caused every glCallList to process the state change again.
This loosely reverts commit 3a294ff01f
"mesa: move the _mesa_set_varying_vp_inputs call to where the state changes".
Fixes: 3a294ff01f - "mesa: move the _mesa_set_varying_vp_inputs call to where the state changes"
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24165>
(cherry picked from commit c97961a855)
nir_opt_mov and nir_op_vecN are only the same if the mov is only a
single component. Otherwise the vec loop will try to access src[c]
where c > 0 which breaks for nir_op_mov. It's uncommon but scalar
back-ends can see vector movs so we need to handle this correctly.
Fixes: 6513c675ad ("nv50/ir/nir: implement nir_alu_instr handling")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24167>
(cherry picked from commit 259ba104f7)
This was missed in 0bf6dcb785
There is a loop which iterates over a temp array. NIR optimization
moves the real work out of the loop and what remains are just ALU ops
with undefs. So after converting undefs to zero, the ALU ops are
optimized out and DCE kills the loop. This is a good thing in
general and we don't fail the linking due to the loop presence.
However than we hit the shader constants and ALU limits later :-(
So from dEQP POW we go from NotSupported to Fail.
Fixes: 0bf6dcb785
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24134>
(cherry picked from commit 0a2a7bfd19)
The 1st sync scope of vkCmdCopyQueryPoolResults is not sufficient to
cover transfer writes against query feedback buffer. We must ensure
ordering against prior query reset cmd where the feedback buffer fill
gets injected.
Fixes: de4593faa1 ("venus: add query pool feedback cmds")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24130>
(cherry picked from commit ed79b30639)
We don't use draw states for dispatches, so the bound pipeline
could be overwritten by reg stomping in a renderpass or blit.
The solution is to re-emit pipeline's IB on every dispatch if
reg stomping is used.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23881>
Could be used for knowing which regs to stomp and to verify that
only appropriate regs are emitted.
Each register that is actually being used by driver should have "usage"
defined, currently there are following usages:
- "cmd" - the register is used outside of renderpass and blits,
roughly corresponds to registers used in ib1 for Freedreno
- "rp_blit" - the register is used inside renderpass or blits
(ib2 for Freedreno)
It is expected that register with "cmd" usage may be written into only at
the start of the command buffer (ib1), while "rp_blit" usage indicates that
register is either overwritten by renderpass/blit (ib2) or not used if not
overwritten by a particular renderpass/blit.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23881>
Each time addvariant was added it was added to the end of ctx->vars
list, without previous variant being removed. While the check for
variant tests only the first one that has expected enum name.
Fix this by updating `variant` instead of appending new one if variant
with such enum already exists.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23881>
There was some confusion from users as to whether disabling this option
disables OpenGL ES as well, so let's remove the confusing "all versions"
note and specify this affects "desktop OpenGL" only.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24113>
Before adding new copies of all of these for swapping start by
refactoring into macro templated code.
I avoided using inline functions because I want to test with
opts turned down, and this will kill perf.
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24066>
Switch to register intrinsics, using the helpers. Since our backend copyprop
chokes on non-SSA moves, we get better coalescing with this approach, hence the
small improvements to instruction count / cycle count in shader-db. Changes to
register pressure seem to be noise from iteration order. I'm not too worried.
total instructions in shared programs: 1508444 -> 1508193 (-0.02%)
instructions in affected programs: 42581 -> 42330 (-0.59%)
helped: 482
HURT: 41
Inconclusive result (value mean confidence interval includes 0).
total bundles in shared programs: 643023 -> 643136 (0.02%)
bundles in affected programs: 16318 -> 16431 (0.69%)
helped: 230
HURT: 85
Inconclusive result (value mean confidence interval includes 0).
total quadwords in shared programs: 1125992 -> 1125600 (-0.03%)
quadwords in affected programs: 125366 -> 124974 (-0.31%)
helped: 507
HURT: 351
Quadwords are helped.
total registers in shared programs: 90632 -> 90554 (-0.09%)
registers in affected programs: 669 -> 591 (-11.66%)
helped: 114
HURT: 31
Registers are helped.
total threads in shared programs: 55607 -> 55600 (-0.01%)
threads in affected programs: 20 -> 13 (-35.00%)
helped: 1
HURT: 7
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 1371 -> 1437 (4.81%)
spills in affected programs: 44 -> 110 (150.00%)
helped: 0
HURT: 2
total fills in shared programs: 5133 -> 5273 (2.73%)
fills in affected programs: 84 -> 224 (166.67%)
helped: 0
HURT: 2
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
This is pretty straightforward, since we don't try to "coalesce" register access
the way a GPU backend would. In the old path, we generated register load/store
instructions internally when hitting register sources/destinations. In the new
path, we just translate the register load/store intrinsics to the LLVM
loads/stores and we're back where we started. It's a bit more code, but it's
more straightforward.
Notably, although this continues to use registers, this does NOT use the chasing
helpers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
A number of passes lower SSA partially to registers, do work that would be
invalid in SSA, and then go back into SSA with nir_lower_regs_to_ssa. As a step
towards replacing nir_register with intrinsics,
the nir_lower_{phis,ssa_defs}_to_regs passes are changed to produce intrinsics
instead of nir_registers, and their callers are updated to call
nir_lower_reg_intrinsics_to_ssa instead of nir_lower_regs_to_ssa to compensate.
Jointly authored with Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
Note the writemask handling is chosen for consistency with the rest of NIR. In
every other instance, writemask=w requires a vec4 source. This is hardcoded into
nir_validate and nir_print as what it means to have a writemask.
More importantly, consistency with how register writemasks currently work.
nir_print hides it, but r0.w = fneg ssa_1.x is actually a vec4 instruction with
source ssa_1.xxxx. As a silly example nir_dest_num_components(that) = 4 in the
old model. I realize this is quite strange coming from a scalar ISA, but it's
perfectly natural for the class of vec4 hardware for which this was designed. In
that hardware, conceptually all instructions are vec4`, so the sequence "fneg
ssa_1 and write to channel w" is implemented as "fneg a vec4 with ssa_1.x in the
last component and write that vec4 out but mask to write only the w channel".
Isn't this inefficient? It can be. To save power, Midgard has scalar ALUs in
addition to vec4 ALUs. Those details are confined to the backend VLIW scheduler;
the instruction selection is still done as vec4. This mechanism has little in
common with AMD's SALUs. Midgard has a wave size of 1, with special hacks for
derivatives.
As a result, all backends consuming register writemasks are expecting this
pattern of code. Changing the store to take a vec1 instead of a vec4 would
require changing every backend to reswizzle the sources to resurrect the vec4. I
started typing a branch to do this yesterday, but it made a mess of both Midgard
and nir-to-tgsi. Without any good reason to think it'd actually help
performance, I abandoned the idea. Getting all 15 backends converted to the
helpers is enough of a challenge without forcing 10 backends to reswizzle their
sources too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
That file has become a bit of the new `.gitlab-ci.yml` with just about
everything in there, but a lot of its content doesn't need to be in the
same file anymore now that `!reference` exists.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24090>
Previously, nir_opt_dead_cf could skip dead CF nodes because overwriting
cur after dead_cf_block is not enough to cover the whole CF list.
foreach_list_typed would select the next node, skipping the node that
previously made progress:
block 1
if (true) {}
block 2
if (true) {}
block 3
if (true) {}
Would turn into:
block 1, then, block 2
if (true) { }
block 3, then
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22064>
utrace submits can either have a batch or not.
When there is a batch, the utrace vk_sync is signaled by the utrace
batch (because utrace does a timestamp buffer copy using its own
batch). When there is no batch, the utrace vk_sync should be signaled
by the application batch (no timestamp copy required, utrace can read
the timestamps when the application batch has completed).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fdea48df5e ("anv: Implement Xe version of anv_queue_exec_locked() and queue_exec_trace()")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24085>
Since the mesa state tracker can promote RGB texture formats
to RGBA texture formats (among other formats) without exposing
any of that information to a driver, it is more desirable to
have the behaviour of `PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND`
be the default. This avoids rendering bugs where an application
sets `DST_ALPHA` blending on a format where there is no alpha
channel, that has been promoted to a format that actually has an
alpha channel. The driver can instead rely on the common code
in the state tracker to convert the blending parameter to one
that reflects the limitations of the application requested format,
as long as `PIPE_CAP_INDEP_BLEND_FUNC` is supported.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24044>
If there is a FS input but no VS output the behavior is undefined
but okay. Use a register 0 (position) for such cases.
glsl-routing test triggers it with e.g. the following subtest.
Test: VS(C0 -- T0 -- T2 -- T4 T5)
FS(C0 C1 T0 T1 T2 T3 T4 T5)
This will now end with following linker debug output:
link result:
vs -> fs comps use pa_attr
t1 -> t1 xyzw 0,0,0,0 0x000002f1
t2 -> t2 xyzw 0,0,0,0 0x000002f1
t0 -> t3 xyzw 0,0,0,0 0x000002f1
t3 -> t4 xyzw 0,0,0,0 0x000002f1
t0 -> t5 xyzw 0,0,0,0 0x000002f1
t4 -> t6 xyzw 0,0,0,0 0x000002f1
t5 -> t7 xyzw 0,0,0,0 0x000002f1
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24030>
submit_count is used to disambiguate a batch_id based on the generation
id of a given batch: this value is incremented once on submit and once on
reset such that the diff of the values is > 1 any time the batch does not
represent the fence it was last submitted with
in the case of a batch's first use, however, this value was being incorrectly
incremented such that the first submit would cause disambiguation checks
to erroneously determine that the batch had already completed, breaking synchronization
fixes#9313
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24016>
From Bspec 46959, a programming note applicable to Gfx12+:
"Since HZ_OP has to be sent twice (first time set the clear/resolve
state and 2nd time to clear the state), and HW internally flushes the
depth cache on HZ_OP, there is no need to explicitly send a Depth
Cache flush after Clear or Resolve."
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24027>
From Bspec 46959, a programming note applicable to Gfx12+:
"Since HZ_OP has to be sent twice (first time set the clear/resolve
state and 2nd time to clear the state), and HW internally flushes the
depth cache on HZ_OP, there is no need to explicitly send a Depth
Cache flush after Clear or Resolve."
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24027>
`y_vu` will be used to convert NV21 to RGB.
`yv_yu` will be used to convert YVYU and VYUY to RGB when the
subsampling formats PIPE_FORMAT_R8B8_R8G8 and PIPE_FORMAT_B8R8_G8R8
are supported.
`yx_xvxu` and `xy_vxux` will be used to convert YVYU and VYUY to RGB
when those subsampling formats are not supported.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21219>
util_clear_texture implements clear_texture through a memset.
This patch implements u_default_clear_texture, which tries to clear the
given texture using a surface plus clear_render_target or
clear_depth_stencil.
In case this path fails, either because the formats are non-renderable
or for some other reason, we fallback to `util_clear_texture`, which is
guaranteed to work.
This will allow us to make ARB_clear_texture available to every driver,
as well as provide HW acceleration for the clear_texture operation.
If some hardware doesn't want to use it, such as llvmpipe, it can always
just directly point to the software version using pipe->clear_texture.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23735>
Currently, the rogue compiler does not support control flow upstream.
Imagination's plan is to implement an SSA-based register allocation (I wish them
well in this endeavour). As such they won't be needing convert_from_ssa. remove
the commented call so nobody is tempted to put it back in. This takes care of
the rogue portion of #9051.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24015>
This issue is mainly a consequence of a call to util_blitter_clear()
with unnecessary blitter states, these states are never freed.
This change is inspired from radeonsi and r600.
Note: PAN_SAVE_FRAGMENT_STATE is added and always enabled
at this stage.
For instance, this issue is triggered on Mali-T720 with
"piglit/bin/fcc-read-after-clear sample tex -auto -fbo", "piglit/bin/cubemap -auto"
and "piglit/bin/fbo-srgb -auto" or on Mali-T820 with "piglit/bin/longprim -auto -fbo"
and "piglit/bin/ext_render_snorm-render -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22522>
Kopper requires that any depth/stencil images are created through winsys which
was not taken into account by the WGL frontend causing it to hit an assert:
'Assertion failed: ctx->fb_state.zsbuf->texture->bind & PIPE_BIND_DISPLAY_TARGET'
fixes#9256
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24055>
This code was written with 80-char lines in mind; changing that to 100
with the common .clang-format unnecessarily reflows a lot of code that
looks fine like this.
`AlwaysBreakAfterReturnType: All` might be something that we want
everywhere, but for now only set it here, to match existing code.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23398>
Instead of manually indexing a single-dimensional array as 2-dimensional
(and using the wrong stride for the outer array) just actually make it
a 2-dimensional array.
Fixes: 7edae456 ("d3d12: Track up to 16 contexts worth of batch references locally in bos")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24041>
- To keep things organized, create a base hidden jobs for alpine images,
as we have 2 now
- This image will have an SSH client ran by LAVA dispatchers
(x86_64-only) who will serve as a bypass channel of output and dmesg
and an alternative for UART.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23534>
Our LAVA farm is currently experiencing issues with running and pulling
docker. LAVA has been detecting (with a low rate) timeouts during these
commands, causing some jobs to fail with infrastructure errors.
Increasing the failure_retry will make the job retry run the container
when LAVA detects the failure without losing its place in the job queue.
We are currently investigating why docker times out. But, when LAVA
fails to detect it, we cancel the job on our side and resubmit it to the
job queue. For more information, please refer to following dashboard:
https://ci-stats-grafana.freedesktop.org/goto/VjZvaA_4z?orgId=1
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23534>
When blitting between two multisampled resources, the blit region needs
to be scaled by the number of samples. Currently this is only done
correctly for downsampling blits where src samples > dst samples. Fix
it to also work as expected when src samples == dst samples.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23595>
If we try to re-use the AR register after a new CF was started
we will get into trouble, because it is no longer loaded, so
clear the AR register handle when we hit a non-ALU instruction
while scanning the shader to split AR loads. With that
dependencies will be set accodingly.
Fixes: d21054b4bc
r600/sfn: Add pass to split address and index register loads
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24036>
We're interested in a Vulkan-only stack in Chrome OS, where Android's GLES
would be provided by ANGLE-over-Venus-over-RADV. Let's get some testing
covering ANGLE-on-RADV first.
This is structured as a single partial job pre-merge to catch most
regressions, and 2 longer manual jobs to do full coverage for when you
need to update the xfails list.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20163>
We're interested in a Vulkan-only stack in Chrome OS, where Android's GLES
would be provided by ANGLE-over-Venus-over-ANV. Let's get some testing
covering ANGLE-on-ANV first.
This is structured as a single partial job pre-merge to catch most
regressions, and a longer manual job to do full coverage for when you need
to update the xfails list.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20163>
There's no functional changes:
1. remove unused function arg and use snake case
2. do early return for direct recording (avoid dup feedback checks)
3. use vk_alloc instead of vk_zalloc if applicable
4. move local struct closer to usage, and use assignment
5. convert secondary cmd in_render_pass condition check to assert
6. avoid redundant list_del upon freeing up
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24009>
I should've copy/pasted it back into my terminal to double-check it.
The `-o` is incorrect (it splits each char that matches into its own
line) and there's a missing `^` to remove lines that start with a hash
even if they contain anything else.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24028>
Skips are regexes, which means the `*` would've needed to be escaped. As
is, they can't match any existing test.
Since these lines are also all in `-fails.txt` as `Crash`es, let's just
remove them from the skips.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24022>
this uses geometry updates from events when possible in order to avoid
roundtripping during vkGetPhysicalDeviceSurfaceCapabilitiesKHR, which
significantly improves wsi performance in severely bottlenecked scenarios
now that roundtripping is completely eliminated from acquires in most scenarios,
this improves acquire perf by 10%+
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23835>
All this code does is reinitialise the values to what the original
ir_variable() call already set them too. This code is very old dating
to the initial glsl compiler support, it has probably been unrequired
for a long time now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22846>
All this does is compilcate things such as forcing us to set
var->data.always_active_io in the glsl linker. Just let NIR clean
these up for us instead.
A Zink test hits a new assert but this is not a regression it just
uncovers an existing mesa bug.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22846>
These are currently removed by the GLSL IR DCE pass but we will
drop that in a following patch. Also there are scenarios where these
might not be detected as unused until the NIR optimisations have
been run so we really need to do it here too anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22846>
Move all these lowering calls into the linker where they belong. This
makes future changes to the linker more flexible and is needed to
allow some following patches as we need to call things in a specific
order.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22846>
Just like we do for everything else here, since we are going to realloc
them again right below. Notice this is not exactly a memory leak, since
all these arrays are allocated with ralloc using v3d_compile as context,
so all allocations will be eventually freed when the context is destroyed.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24001>
Indeed, util_blitter_clear() requires a call to
util_blitter_save_fragment_constant_buffer_slot(),
but most other blitter functions do not.
For instance, this issue is triggered with:
"piglit/bin/object-namespace-pollution glDrawPixels buffer -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: 03bc7503d4 ("radeonsi: save the fs constant buffer to the util blitter context")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23856>
Dead writes can lead to problems with regalloc, so add a safety assert
to catch such cases in the vertex shaders at least in the meantime.
Additionally we could think there are no readers due to some shortcoming
of out dataflow analysis or some other bug which we would also like to
know about.
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23927>
We have much better regalloc in the backend, and additionally having
a close to ssa form means some optimizations can be more effective.
RV370:
total instructions in shared programs: 82500 -> 81645 (-1.04%)
instructions in affected programs: 32147 -> 31292 (-2.66%)
helped: 396
HURT: 1
total temps in shared programs: 12355 -> 12465 (0.89%)
temps in affected programs: 368 -> 478 (29.89%)
helped: 5
HURT: 96
GAINED: shaders/trine/vp-237.shader_test VS
GAINED: shaders/trine/vp-79.shader_test VS
RV530:
total instructions in shared programs: 130706 -> 129684 (-0.78%)
instructions in affected programs: 40902 -> 39880 (-2.50%)
helped: 428
HURT: 1
total temps in shared programs: 16811 -> 16920 (0.65%)
temps in affected programs: 421 -> 530 (25.89%)
helped: 7
HURT: 89
The instruction decrease is from the channel merging pass which can be
much more agressive when we have ssa-like form.
The temp regressions are cases where we merge something like
3: MAD output[1].xy, const[8].xy__, input[1].ww__, temp[0].xy__;
....
12: MOV output[1].zw, none.__00;
We always merge the first instruction into the second one, which means
the liverange for temp[0] will be unnecessarily extended here.
This can be fixed with the following draft MR
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19790
however if we ever get a VS pair scheduling support this will be solved
as well as a consequence, so let it be for now.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7693
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23927>
This is now done in NIR. The remaining one for ADD + 0 to MOV is kept
until we move some remaining part of FS lowering to NIR.
There single regressions is in one d3d->glsl shader from Wine.
Wine sets invariant for glPosition which translates to exact bit for all
calculations leading to it (or the TGSI PRECISE flag). r300 backend
ignores is completelly, so removing the backend optimizations should
even make us more correct in this regards.
RV530:
total instructions in shared programs: 130705 -> 130706 (<.01%)
instructions in affected programs: 16 -> 17 (6.25%)
helped: 0
HURT: 1
RV370: no change
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23927>
This lowers some of the bool-producing comparisons and following bcsels
if the bool comparison results is only used in the bcsel.
This is a temporary solution before we can fork ntt and optimize
the pass sequence there. Right now if we have something like
bcsel(a,b,0.0) we lower it to flrp in nir_lower_bool_to_float. The
flrp goes to backend where it will be lowered to 2 MADs. However in this
case with one of the arguments being a constant one MAD is enough. The
backend can figure this out in the constant folding pass, however this
is actually one of the last things we need it for. So if we do early
translation of the bcsels, than the algebraic pass can clean it up and
we can remove more backend code in the next patch.
no significant change with RV370 shader-db:
total instructions in shared programs: 82497 -> 82496 (<.01%)
instructions in affected programs: 1029 -> 1028 (-0.10%)
helped: 4
HURT: 3
total temps in shared programs: 12351 -> 12355 (0.03%)
temps in affected programs: 10 -> 14 (40.00%)
helped: 0
HURT: 4
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23927>
They will get translated to read from random register otherwise, which is
not problematic per se, but they will not be regalloced and if the
initial register index was too high, we can fail the shader compilation
because we think we run out of registers.
Almost no effect with shader-db on RV530:
total instructions in shared programs: 130707 -> 130705 (<.01%)
instructions in affected programs: 1012 -> 1010 (-0.20%)
helped: 2
HURT: 1
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23927>
For stages where the capture/replay handle is only known after compiling
and uploading the shader, the shader needs to be relocated to the VA
corresponding to the capture/replay address.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23516>
radv_shader_nir_to_asm actually had 3 functions: compiling the NIR to
asm, uploading the shaders and generating debug info for them.
This reduces the functionality of radv_shader_nir_to_asm to only compile
NIR to asm. Uploading the shader and generating debug info is split into
separate functions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23516>
The dumb buffer backing the renderonly_scanout is only destroyed if the
refcount reaches zero. If a driver does not correctly initialize the
refcount, the refcount may be negative and the buffer will never be
freed.
Add an assert to ensure that drivers correctly initialize the refcount.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23743>
If the GEM is closed before setting the BO in the sparse array to zero,
a newly allocated GEM may be associated with a stale BO that is left in
the cache reusing an old BO.
Zero the BO before closing the GEM to make sure that the BO is removed
from the cache and won't be associated with a different GEM.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23744>
Account for the possibility that the scissor is outside the render area. Fixes
the usual assertion fail:
glcts: ../src/gallium/drivers/asahi/agx_state.c:1015:
agx_upload_viewport_scissor: Assertion `maxx > minx && maxy > miny' failed.
on the following dEQP tests with my conformance build:
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_line
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_point
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_tri
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
The GPU ABI requires varyings to be grouped as follows:
- Position
- Smooth shaded fp32
- Flat shaded fp32
- Linear shaded fp32
- Smooth shaded fp16
- Flat shaded fp16
- Linear shaded fp16
- Point size
Use the flat shaded mask info we now have in the vertex shader key to
sort things properly, and pass the counts to the hardware.
FP16 is still TODO.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
We need to propagate shading model metadata from the FS to the VS in
order to correctly lay out the uniforms in the right order. This means
we need VS variants depending on this data.
We could use the existing shader info structure, but that applies to
compiled shaders which would introduce a dependency from the VS compile
to the FS compile. This information does not change with FS variants, so
we can introduce an agx_uncompiled_shader_info structure and gather it
early at precompilation time.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
An offset may be negative, indexing backwards from the array base.
When we right shift an offset by the format shift, we need to use a
signed shift to ensure that the resulting offset is still negative.
Fixes Nautilus faults/pink crashes.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
In 97a1bbeaf26 ("agx: Fix discards"), we made our discard lowering very simple,
since we had just discovered the underlying instruction behaviour and needed a
hotfix for misrendering in the wild. Now that we understand the behaviour, we
can do better. There are two potential performance issues with the lowering in
that commit:
1. It generates extra sample_mask instructions. For a shader that has a single
discard_if at root level, it would generate two instructions
sample_mask foo, 0
sample_mask ~0, ~0
rather than a single
sample_mask ~0, ~foo
2. It runs depth/stencil testing/updates at the end of the shader, even when it
could be run immediately after the discard. This might cause pipeline stalls.
The solution is to insert the "trigger testing" sample_mask instruction as soon
after the "discard" instruction as possible, fusing them if they would be next
to each other. There are two cases:
1. The last discard is executed unconditionally. In this case, we can test
immediately after, unconditionally, and fuse together.
2. The last discard is executed conditionally. In this case, we test in the
first unconditional block after the discard. Example shader:
...
loop {
if .. {
loop {
discard_if <-- discard here
...
}
..
}
...
}
<---- we test here
...
store_output
Together this covers all the usual patterns for single-sampled discard. We could
still do better with multisampling, but whatever.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
When lowering discards, it will be convenient to generate the pattern:
(cond ? 255 : 0) ^ 255
Add rules to optimize that to
(cond ? 0 : 255)
This is not part of the main algebraic optimizer since this lowering happens
late.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
anv_vma_alloc() returns a canonical address, but explicit_address is a
regular address. This mismatch can potentially cause issues.
So here making bo->offset as always canonical address by converting it
in the explicit case and fixing the only caller that was caling
anv_bo_vma_alloc_or_close() with a canonical address.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23977>
The Vulkan CTS started generating the list of valid versions the
driver can report as conformant against based on the active branches,
and the branch we were reporting up to now is no longer valid.
Fixes dEQP-VK.api.driver_properties.conformance_version
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23980>
It seems NIR is tracking this for us now so we can stop doing this
in the backend.
Also, new CTS tests seem to add the requirement where in the presence of
some builtin's like gl_SampleID in a shader, even if unused, sample shading
is expected to be enabled, which is something we can't track in the backend
since the variable may have been dropped by then.
Fixes 2 failures in:
dEQP-VK.draw.renderpass.implicit_sample_shading.sample*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23984>
There is no mention in spec about subtract one of the number of
threads, also Iris and blorp code don't subtract.
Alchemist PRMs: Volume 2a: Command Reference: Instructions: CFE_STATE: Maximum Number of Threads:
Normally set to the maximum number of threads: (# EUs) * (# threads/EU)
Cc: mesa-stable
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23973>
- For SSA destination, padding is applied before `%`.
- For Reg destination, pad to the SSA size (to align div/con),
then remaining padding is applied before `r`.
- For instructions without destination, padding is applied so
they start right after the ` = ` of the cases above.
If the block doesn't have any destinations, there's no padding
is applied to the instructions without destinations in that
block.
For now registers with array access will be unaligned.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23564>
If a then/else block ends in a jump, the phi nodes do not necessarily
have to reference the always taken branch because they are dead code.
Avoid crashing in this case by only rewriting phis, if the block does
not end in a jump.
cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23150>
The common state code expects you to use a different struct for state in
graphics pipelines and in pipeline libraries. This means we need to
copy the approach radv uses in order to be compatible. This also allows
us to shrink the structs a bit by moving compute-only things to the
compute pipeline and library-only things to the library pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
This field also affects triangle strip and triangle fan ordering, so we
would get the incorrect (D3D) order with tessellation and geometry
shaders both enabled. Instead flip clockwise/counterclockwise when
the domain origin is upper-left, as radv does.
Because the register is only emitted when tessellation is active which
forces sysmem, it shouldn't regress performance to emit it directly
instead of using a draw state. We're already very tight on draw states.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
This was relying on cb being NULL instead of just gracefully handling
it, and it will stop being NULL once we start tracking attachment count
as state. Moreover is was broken in the case where only the blend enable
is dynamic.
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
Previously, drivers have either not supported some dynamic state (like
vertex input or sample locations) at all or it's been always dynamic. In
order to be able to set dynamic state sometimes and other times leave it
up to driver-specific state packets, we need a few helpers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
On turnip we support dynamic vertex input, but static vertex input is
precompiled and so we will copy from a source without VI to a
destination with VI and it's valid in this case to do nothing. On the
other hand, it should never be valid if VI state is set but the pointer
isn't there, which the code previously silently skipped over. There's a
similar issue with sample locations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
This is a mirror of vi::bindings_valid, but we can track it and set it
properly even when vertex input state is precompiled, because it is also
needed on turnip for knowing the size of the vertex buffer and vertex
stride state packets even when vertex input state is precompiled.
Previously drivers that could pre-bake vertex input state were expected
to handle this themselves, but this would've been complicated for turnip
because we can handle both pre-baked and dynamic vertex input state. Now
we have the one field which is correctly set in all circumstances and we
never have to setup space for vertex input state in the pipeline.
Reviewed-by: Faith Ekstrand <faith.ekstrand@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
On turnip, there are two cases for feedback loops:
- For feedback loops that involve input attachments, everything works as
normal in GMEM mode but have to do a workaround in sysmem.
- For feedback loops that may involve any texture, GMEM mode is
impossible and we have to disable it.
Currently we track this through a special flag on the pipeline, but this
won't be practical in the future. Add a flag to the common renderpass
state struct to patch this info through when using our own renderpass.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
It only has a subset of the renderpass state, whereas with turnip we
need to use pretty much all of it at one point or another. Just allow
the driver to pass in the entire vk_render_pass_state if it's using its
own renderpass implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301>
This cleans up a lot and helps to generate much better code. There
are only benefits on GPUs without inline immediate support.
shader-db results on GC2000:
total instructions in shared programs: 237168 -> 235101 (-0.87%)
instructions in affected programs: 17297 -> 15230 (-11.95%)
helped: 758
HURT: 0
helped stats (abs) min: 1 max: 24 x̄: 2.73 x̃: 2
helped stats (rel) min: 7.14% max: 29.41% x̄: 14.47% x̃: 14.29%
95% mean confidence interval for instructions value: -2.94 -2.51
95% mean confidence interval for instructions %-change: -14.84% -14.09%
Instructions are helped.
total temps in shared programs: 85553 -> 84969 (-0.68%)
temps in affected programs: 2879 -> 2295 (-20.28%)
helped: 584
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 5.00% max: 25.00% x̄: 21.48% x̃: 20.00%
95% mean confidence interval for temps value: -1.00 -1.00
95% mean confidence interval for temps %-change: -21.76% -21.21%
Temps are helped.
total immediates in shared programs: 154800 -> 154800 (0.00%)
immediates in affected programs: 0 -> 0
helped: 0
HURT: 0
total loops in shared programs: 0 -> 0
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 0
GAINED: 0
No changes on GC3000 and GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23947>
It really isn't that hard. This drops the roundmode optimization but otherwise
should be at parity to what there was before, and it's massively more competent
at it anyway.
total instructions in shared programs: 1514477 -> 1508444 (-0.40%)
instructions in affected programs: 645848 -> 639815 (-0.93%)
helped: 2712
HURT: 187
Instructions are helped.
total bundles in shared programs: 645069 -> 642999 (-0.32%)
bundles in affected programs: 136233 -> 134163 (-1.52%)
helped: 1242
HURT: 319
Bundles are helped.
total quadwords in shared programs: 1130469 -> 1125969 (-0.40%)
quadwords in affected programs: 379780 -> 375280 (-1.18%)
helped: 1878
HURT: 376
Quadwords are helped.
total registers in shared programs: 90577 -> 90633 (0.06%)
registers in affected programs: 5627 -> 5683 (1.00%)
helped: 309
HURT: 294
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 55594 -> 55607 (0.02%)
threads in affected programs: 118 -> 131 (11.02%)
helped: 43
HURT: 33
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 1399 -> 1371 (-2.00%)
spills in affected programs: 345 -> 317 (-8.12%)
helped: 10
HURT: 4
total fills in shared programs: 5273 -> 5133 (-2.66%)
fills in affected programs: 1035 -> 895 (-13.53%)
helped: 12
HURT: 4
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
Some instructions are not able to swizzle their sources, so we conservatively
refused to propagate moves into them to avoid needing a swizzle on the source.
This is too conservative: we only need to do this if the move swizzles. If there
is only an identity swizzle on the move, we can propagate it without issue. This
will mitigate some instruction count regression from the later modifier
propagation, which will leave lots of moves that need to be propagated.
total instructions in shared programs: 1514834 -> 1514477 (-0.02%)
instructions in affected programs: 132297 -> 131940 (-0.27%)
helped: 349
HURT: 3
Instructions are helped.
total bundles in shared programs: 645093 -> 645069 (<.01%)
bundles in affected programs: 9650 -> 9626 (-0.25%)
helped: 42
HURT: 23
Bundles are helped.
total quadwords in shared programs: 1130751 -> 1130469 (-0.02%)
quadwords in affected programs: 78790 -> 78508 (-0.36%)
helped: 269
HURT: 21
Quadwords are helped.
total registers in shared programs: 90563 -> 90577 (0.02%)
registers in affected programs: 163 -> 177 (8.59%)
helped: 4
HURT: 16
Registers are HURT.
total spills in shared programs: 1400 -> 1399 (-0.07%)
spills in affected programs: 2 -> 1 (-50.00%)
helped: 1
HURT: 0
total fills in shared programs: 5276 -> 5273 (-0.06%)
fills in affected programs: 151 -> 148 (-1.99%)
helped: 1
HURT: 3
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
If we need to insert a mov in order to schedule a branch, we do not schedule
anything writer to the source of that mov in the same bundle to avoid a
data race between the read and the write. That's too conservative, though: it is
legitimate to write in the first part of the ALU word (VMUL/SADD stages) and
then read from the second part (VADD/SMUL/VLUT stages). Reset the
predicate.exclude when going from scheduling the latter stages to the former, to
allow a sequence of code like:
FCMP.vector 0.xyzw, ...
branch 0.x
to be scheduled as
vmul.FCMP.vector 0.xyzw
smul r31.w, 0.x
branch 0.x
rather than getting split up into two bundles.
This mitigates a cycle count regression from the copyprop change.
total instructions in shared programs: 1514856 -> 1514834 (<.01%)
instructions in affected programs: 3087 -> 3065 (-0.71%)
helped: 5
HURT: 1
Inconclusive result (value mean confidence interval includes 0).
total bundles in shared programs: 645327 -> 645093 (-0.04%)
bundles in affected programs: 40498 -> 40264 (-0.58%)
helped: 230
HURT: 68
Bundles are helped.
total quadwords in shared programs: 1130554 -> 1130751 (0.02%)
quadwords in affected programs: 75323 -> 75520 (0.26%)
helped: 49
HURT: 231
Quadwords are HURT.
total registers in shared programs: 90559 -> 90563 (<.01%)
registers in affected programs: 119 -> 123 (3.36%)
helped: 5
HURT: 8
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 55590 -> 55594 (<.01%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
Threads are helped.
total spills in shared programs: 1402 -> 1400 (-0.14%)
spills in affected programs: 289 -> 287 (-0.69%)
helped: 1
HURT: 1
total fills in shared programs: 5285 -> 5276 (-0.17%)
fills in affected programs: 448 -> 439 (-2.01%)
helped: 2
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
If we have multiple reads of the same SSA def in the same block, we don't need
to emit multiple copies for it, we can just reuse a copy (OR'ing in the mask,
knowing the source is already fully written since it's SSA). This will prevent
some regressions in moves from the copyprop patch.
There is a bit of a tradeoff here between increased pressure and reduced
instruction count but I'm not too worried. The affect on pressure seems all over
the place -- register use decreases overall, threads increase (great!) but a few
shaders that were *already spilling*, spill a bit worse. I'm not terribly
worried there.
total instructions in shared programs: 1518289 -> 1514856 (-0.23%)
instructions in affected programs: 292854 -> 289421 (-1.17%)
helped: 1557
HURT: 232
Instructions are helped.
total bundles in shared programs: 646903 -> 645327 (-0.24%)
bundles in affected programs: 91872 -> 90296 (-1.72%)
helped: 910
HURT: 256
Bundles are helped.
total quadwords in shared programs: 1133728 -> 1130554 (-0.28%)
quadwords in affected programs: 187170 -> 183996 (-1.70%)
helped: 1399
HURT: 44
Quadwords are helped.
total registers in shared programs: 90640 -> 90559 (-0.09%)
registers in affected programs: 2676 -> 2595 (-3.03%)
helped: 202
HURT: 124
Inconclusive result (%-change mean confidence interval includes 0).
total threads in shared programs: 55561 -> 55590 (0.05%)
threads in affected programs: 50 -> 79 (58.00%)
helped: 23
HURT: 6
Threads are helped.
total spills in shared programs: 1386 -> 1402 (1.15%)
spills in affected programs: 231 -> 247 (6.93%)
helped: 2
HURT: 13
total fills in shared programs: 5159 -> 5285 (2.44%)
fills in affected programs: 1282 -> 1408 (9.83%)
helped: 11
HURT: 16
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
1. Always calculate when asked. This is the sort of optimization that just
introduces bugs. Like one I hit when shuffling register indices around with
the register access changes.
2. Ask before using in RA.
3. Account for precoloured blend inputs.
Small shader-db hit, didn't investigate too much.
total instructions in shared programs: 1518017 -> 1518168 (<.01%)
instructions in affected programs: 2895 -> 3046 (5.22%)
helped: 0
HURT: 24
Instructions are HURT.
total bundles in shared programs: 646756 -> 646782 (<.01%)
bundles in affected programs: 1119 -> 1145 (2.32%)
helped: 1
HURT: 19
Bundles are HURT.
total quadwords in shared programs: 1133694 -> 1133728 (<.01%)
quadwords in affected programs: 1736 -> 1770 (1.96%)
helped: 0
HURT: 20
Quadwords are HURT.
total registers in shared programs: 90596 -> 90612 (0.02%)
registers in affected programs: 108 -> 124 (14.81%)
helped: 0
HURT: 16
Registers are HURT.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: mesa-stable
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
mir_prev_op will point to the last instruction of the block in that case because
the block instruction list is circular. That would cause an invald
write-after-read relationship between the move we insert with the constants and
the CSEL reading them, which DCE "helpfully" optimizes out, leaving a read from
an undefined def. That ends up getting RA'd to an invalid register.
All in all, pretty bad.
Identified due to a new assert fail after the proper temp_count fix.
Affects dEQP-GLES31.functional.separate_shader.random.12.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
If we start with unscheduled IR:
0 = comparison
csel 1, 2, 0
the old code will schedule this as
r31.w = comparison
csel 1, 2, 0
leaving 0 as a dangling source, which can confuse the rest of the compiler.
Instead rewrite this to
r31.w = comparison
csel 1, 2, r31.w
Note the swizzle as already taken care of (i.e. turned to .x for scalar
conditions) by the time we get to scheduling so we can force to .w.
This keeps register allocation from doing stupid things.
total instructions in shared programs: 1518138 -> 1518017 (<.01%)
instructions in affected programs: 37714 -> 37593 (-0.32%)
helped: 48
HURT: 42
Instructions are helped.
total bundles in shared programs: 646877 -> 646756 (-0.02%)
bundles in affected programs: 17024 -> 16903 (-0.71%)
helped: 48
HURT: 42
Bundles are helped.
total registers in shared programs: 90624 -> 90596 (-0.03%)
registers in affected programs: 361 -> 333 (-7.76%)
helped: 31
HURT: 5
Registers are helped.
total threads in shared programs: 55561 -> 55566 (<.01%)
threads in affected programs: 5 -> 10 (100.00%)
helped: 4
HURT: 0
Threads are helped.
total spills in shared programs: 1386 -> 1383 (-0.22%)
spills in affected programs: 19 -> 16 (-15.79%)
helped: 3
HURT: 0
total fills in shared programs: 5159 -> 5077 (-1.59%)
fills in affected programs: 1305 -> 1223 (-6.28%)
helped: 20
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
As an off-shoot of trying to delete modifiers (and nir_register) from NIR, I'd
like to get rid of some of the modifier NIR silliness that Midgard is doing.
The CSEL type selection heuristic at NIR->MIR time is peak backend silly, so
replace it with nir_gather_ssa_types.
Small win on shader-db. I didn't investigate much, but this matches my intution
for how this patch would perform: very small instruction/cycle count
improvements due to slightly better decisions around modifiers, more substantial
space savings due to more float constants getting inlined.
total instructions in shared programs: 1518422 -> 1518414 (<.01%)
instructions in affected programs: 1914 -> 1906 (-0.42%)
helped: 8
HURT: 0
Instructions are helped.
total bundles in shared programs: 646941 -> 646937 (<.01%)
bundles in affected programs: 344 -> 340 (-1.16%)
helped: 4
HURT: 0
Bundles are helped.
total quadwords in shared programs: 1134727 -> 1134324 (-0.04%)
quadwords in affected programs: 66752 -> 66349 (-0.60%)
helped: 351
HURT: 54
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
This is a generic algebraic optimization. Use the generic algebraic optimizer.
The only reason we can't rely on nir_opt_algebraic to do this is because we
generate inot's late in order to optimize some comparisons. But we already have
a pass to clean that up (midgard_nir_clean_inot), it just needs to be extended
to handle more cases.
shader-db is noise:
total bundles in shared programs: 646941 -> 646942 (<.01%)
bundles in affected programs: 100 -> 101 (1.00%)
helped: 0
HURT: 1
total quadwords in shared programs: 1134727 -> 1134726 (<.01%)
quadwords in affected programs: 318 -> 317 (-0.31%)
helped: 2
HURT: 1
total registers in shared programs: 90619 -> 90618 (<.01%)
registers in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
Create a feedback buffer for each query pool and retrieve the query
results from the buffer instead of a roundtrip call in
vkGetQueryPoolResults.
VK_QUERY_RESULT_WAIT_BIT queries will poll until the queries are
available in the feedback buffer.
Query results in the feedback buffer are always VK_QUERY_RESULT_64_BIT
and if needed converted to what the app requests at
vkGetQueryPoolResults time.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
vkCmdCopyQueryPoolResults cannot be called within a render pass so batch
and defer the query feedback copies until after the render pass.
Secondary command buffers inside render passes also have their query
feedback copies batched when recorded. When the secondary command buffer
is recorded via vkCmdExecuteCommands, it's batch is merged into the
primary command buffer's batch and is defered until the render pass ends.
If multiview is enabled, vkCmdCopyQueryPoolResults needs to copy
additional queries matching the number of bits set in viewMask.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
vkCmdCopyQueryPoolResults cannot be called within a render pass/or
while the render pass is suspended so track when commands are inside
a render pass. Also track whether a secondary command buffer is
considered to be entirely inside a render pass.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
Per spec 1.3.255: "If queries are used while executing a render pass
instance that has multiview enabled, the query uses N consecutive query
indices in the query pool (starting at query) where N is the number of
bits set in the view mask in the subpass the query is used in."
track viewMask so query feedback can copy the correct amount of queries
when multiview is enabled.
viewMask is passed in for vkCmdBeginRendering but for legacy
vkCmdBeginRenderPass/2 they are set by vkCreateRenderPass for each
subpass.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
Add feedback commands to write query results into a coherent buffer to
optimize out roundtrip vkGetQueryPoolResults that poll until a result
is available.
Queries are available after vkCmdEndQuery or vkCmdWriteTimeStamp, so
append a vkCmdCopyQueryPoolResults to copy to query results to our
coherent buffer.
The coherent buffer also needs to be cleared after vkCmdResetQueryPool
so append vkCmdFillBuffer.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
I've been betting support requests by people confused as to why their
builds broke because this option was removed. We can add the option back
with the deprecated flag set so that Meson will give a nice warning, but
builds will continue to work.
fixes: 1dd1147408
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23893>
this fixes the refcount for the separate shader program to not have a leaked ref
and then fixes the owned program to have the expected number of refs
this happened to work some of the time before because there was an arbitrary unref
in replace_separable_prog(), but this shouldn't have been necessary
Fixes: e3b746e3a3 ("zink: use GPL to handle (simple) separate shader objects")
fixes#9274
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23888>
fails:
dEQP-VK.descriptor_indexing.storage_texel_buffer
dEQP-VK.descriptor_indexing.storage_texel_buffer_after_bind
dEQP-VK.descriptor_indexing.storage_texel_buffer_after_bind_in_loop
dEQP-VK.descriptor_indexing.storage_texel_buffer_in_loop
dEQP-VK.descriptor_indexing.storage_texel_buffer_minNonUniform
They seem to be vertex-shader related.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22828>
Since we must not access the pipe_context concurrently, it makes sense
to add a lock for all kinds of quere related operations. This way, we
can safely create pipe resources inside Vulkan entry points that can be
used concurrently.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22828>
I've recently learned that this is necessary to get
correct results from divergence analysis.
No Fossil DB stats changes on GFX10.3.
Note, when backporting this patch to stable, make sure
the call to nir_convert_to_lcssa is before nir_divergence_analysis,
which may be located in a different place in the stable branch.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23946>
in `a << b` with gcc 13 the shift count c is masked by the
bit count, and a value larger than 32 will result in shifts
by (c & 0x1f), which will add empty instructions if all
color outputs are written and this will eventually
result in an OOM error.
Fixes: 201b46e487
r600/sfn: on R600/R700 write a dummy pixel output if there is a gap
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23945>
When we allocate a new primary sub-command of type
PVR_SUB_CMD_TYPE_TRANSFER we need to make sure the list backing
transfer sub-commands can be shared and managed by both the
secondary and primary sub-command. Do this by always using a
pointer to maintain the list.
Found with:
dEQP-VK.memory.pipeline_barrier.host_write_transfer_src.8192
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reported-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23918>
Depth/stencil surfaces cannot be linear but they can be 1D, so they
end up being tile64 when sparse (as we force every sparse resource to
be either tile64 or linear).
According to the "1D surfaces" page from BSpec, our driver treats 1D
surfaces as 2D surfaces with a height of 1 texel, since we don't
enable the corresponding bit from HAS_SLICE_CHICKEN7. And since we
support 2D surfaces, we should also support 1D.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22974>
Just like we avoid it for Tile4, avoid it for Tile64.
We can't easily notice this problem since Tile4 is preferred over
Tile64, but if we patch isl_surf_choose_tiling() to choose Tile64 over
Tile4, then we start getting more than 1600 failures in CI.
These are the two most common error messages:
../src/gallium/drivers/iris/iris_resource.c:2168: get_image_offset_el: Assertion `z0_el == 0 && a0_el == 0' failed.
../src/intel/isl/isl_tiled_memcpy.c:857: linear_to_tiled: Assertion `!"" "unsupported tiling"' failed.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22974>
Add a pass for bounds checking UBOs, SSBOs, and images to implement robustness.
This pass is based on v3d_nir_lower_robust_access.c, with significant
modifications to be appropriate for common code. Notably:
* v3d-isms are removed.
* Stop generating invalid imageSize() instructions for cube maps, this
blows up nir_validate with asahi's lowerings.
* Logic to wrap an intrinsic in an if-statement is extracted in anticipation of
future robustness2 support that will reuse that code path for buffers.
* Misc cleanups to follow modern NIR best practice. This pass is noticeably
shorter than the original v3d version.
For future support of robustness2, I envision the booleans turning into tristate
enums.
There's a few more knobs added for Asahi's benefit. Apple hardware can do
imageLoad and imageStore to non-buffer images (only). There is no support for
image atomics. To handle, Asahi implements software lowering for buffer images
and for image atomics. While the hardware is robust, the software paths are not.
So we would like to use this pass to lower robustness for the software paths but
not the hardware paths.
Or maybe we want a filter callback?
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23895>
Here we were aliasing the full compressed image with an uncompressed
format that we would then use for sampling during the blit copy. This
had 2 issues:
1. Uncompressed image views would have smaller dimensions than the
compressed image, and thus, would also have less mip levels.
2. When sampling from smaller mip levels, the hw internally computes
the size of the mip level from the size of level 0, which then uses
to interpret the texture coordinates, but for some texture sizes
this size would not be an exact match for compressed and uncompressed
views.
To fix this, we modify the aliasing technique to only alias the
miplevel selected in the copy as a level 0 image and we ensure the
slice 0 for that image matches exactly the slice description of the
aliased mip level in the original image.
Fixes all test failures in
dEQP-VK.api.copy_and_blit.core.image_to_buffer.*
for compressed formats when we forcefully disable the TLB path.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23919>
On DG2 I ran into a case where the surface state was not being decoded
with INTEL_DEBUG=bat. This is because the surface states are not part
of a state pool there anymore. Instead BO are allocate manually and
placed in vma heap.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 96c33fb027 ("anv: enable direct descriptors on platforms with extended bindless offset")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23891>
In addition to the hexadecimal and float (when applicable), print the
signed and unsigned representations. Representations may be omitted based
on information about the value:
- If gather types has unambiguous information, we use it;
- Float is omitted for 8 bit values;
- Signed decimal is omitted for positive values;
- Unsigned decimal is omitted for small values (representation is same as hex);
Note for now the "terse form" that appear in SSA uses is unchaged.
Based on a patch by Mike Blumenkrantz.
Examples:
```
// Just used as float. Omitted decimals.
vec4 32 ssa_81 = load_const (0x3f800000, 0x3f800000, 0x3e4ccccd, 0x3f800000) = (1.000000, 1.000000, 0.200000, 1.000000)
vec1 32 ssa_28 = load_const (0x3e4ccccd = 0.200000)
// Just a small integer. Omitted float and decimal.
vec1 32 ssa_45 = load_const (0x00000001)
// Larger positive integers. Omitted float.
vec1 32 ssa_39 = load_const (0x00002000 = 8192)
vec1 32 ssa_30 = load_const (0x000000ff = 255)
vec1 32 ssa_28 = load_const (0x00000010 = 16)
// Integers with negative values.
load_const (0xff = -1 = 255)
load_const (0xff80 = -128 = 65408)
load_const (0xffff = -1 = 65535)
// Same value, in the first case we know is used as an integer.
load_const (0xffffffe0 = -32 = 4294967264)
load_const (0xffffffe0 = -nan = -32 = 4294967264)
```
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23562>
If the src_type is not available, untie by looking at the results from
nir_gather_ssa_types(). If that is ambiguous, just pick uint.
Now in print_const_from_load() when the type is invalid, print the full
constant form (with both padded hex and float); when the passed type
is valid, print the terse form based on it.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23562>
The files included are extremely large and hurt compile time of
everything that inludes half_float.h directly or indirectly.
Compile time of a fresh RADV build:
before 32.477s 32.661s 32.625s
after 25.116s 24.928s 25.114s
v2: Include xmmintrin instead (Marek Olšák)
after 25.552s 25.811s 25.678s
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23871>
There's no point in bloating the vertex_info struct everywhere with
information that's only used by i915 in a single place. Let's explicitly
store the hwinfo when needed, instead of piggy-backing on vertex_info.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23851>
Previously the code assumed that you could only have depth-stencil
attachments so no stencil only or depth only, for ZLS load/stores.
This isn't true as we can have stencil only attachments so the
ZLS depth and stencil store/load enable have to be set separately.
Other ZLSCTL setup has also been adjusted for separate depth-stencil.
E.g. the z{load,store}format, and {load,store}twiddled.
Co-Authored-By: Soroush Kashani <soroush.kashani@imgtec.com>
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Signed-off-by: Soroush Kashani <soroush.kashani@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23830>
Called copy_image_to_buffer_texel_buffer, that reuses
copy_image_linear_texel_buffer, by setting up a image destination from
the buffer destination.
This fixes new ycbcr tests added recently (1.3.6.0) like:
dEQP-VK.ycbcr.copy.*.*.*buffer*
that were failing due lack of a codepath handling them.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23864>
In order to have common code to create a image from a buffer, that we
plan to use later on a new codepath.
This refactor adds three new methods:
* One that gathers all the info required to create the structures and
implement the operation
* One that creates the image from the buffer, based on that info
* One that creates a BlitRegion from that info
This seems like too much splitting, but we needed to do it in this
way, because we can't ensure that future uses of this common code
would use a BlitRegion.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23864>
Most users of nir_foreach_function actually want the nir_function_impl, not the
nir_function, and want to skip empty functions (though some graphics-specific
passes sometimes fail to do that part). Add a nir_foreach_function_impl macro
to make that case more ergonomic.
nir_foreach_function_impl(impl, shader) {
...
foo(impl)
}
is equivalent to:
nir_foreach_function(func, shader) {
if (func->impl) {
...
foo(func->impl);
}
}
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23807>
git://anongit.freedesktop.org/drm/drm 2222dcb0775d36de28992f56455ab3967b30d380
The motivation for this change in to get the uapi changes from:
commit 81b1b599dfd71c958418dad586fa72c8d30d1065
Author: Fei Yang <fei.yang@intel.com>
Date: Tue Jun 6 12:00:42 2023 +0200
drm/i915: Allow user to set cache at BO creation
Specifically, the I915_GEM_CREATE_EXT_SET_PAT extension.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878>
The Vulkan specification indicates that if memory types have
properties which are a strict subset of another type's, then they
should appear before that memory type. Otherwise the specification
does not require a specific ordering of memory types.
But, it appears that Aztec Ruins and the Vulkan CTS make an assumption
that the first host-accessible memory type is host-coherent and select
it when they expect data written by the CPU to become visible without
calling vkFlushMappedMemoryRanges(), even though flushing is required
by the spec, which leads to misrendering and hangs on MTL platforms.
We found that other drivers also put a host-coherent, but not cached
memory type as the first host-accessible memory type, so let's do the
same in order to match the expectations of such broken applications.
Host-coherent uncached memory types are currently implemented with a
WC CPU map on non-LLC platforms, so there shouldn't be a huge
performance penalty from this: If an application intends to do heavy
R/W CPU access on a memory range it's expected to loop over the
available memory types and select one marked as host-cached -- If an
application fails to do that and simply selects the first available
type it seems more robust to stay on the safe side and give them a
host-coherent type rather than a cached one.
Rework:
* Jordan: Add initial explanation to body of commmit message.
* Curro: Add additional comments to commit message.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878>
If the application requests a coherent resource, we should honor that.
We technically don't need to ensure coherency for persistent mappings,
but we would have to handle PIPE_BARRIER_MAPPED_BUFFER to ensure that
data became visible at the right times. Instead, we just opt for the
easy plan and mark them coherent too.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878>
Although the following is based on this observations for OpenGL, we
probably need this for Vulkan as well.
KHR-GL46.texture_buffer.texture_buffer_operations_ssbo_writes writes
to an SSBO in a compute program, then issues a memory-barrier, which
causes us to add a DC-flush. Then a second compute program samples
from the SSBO written by the first compute program.
Although we expected the DC-flush to make the writes available to the
second compute program, on MTL this wasn't the case. Adding the
"Untyped Data-Port Cache Flush" fixes this.
The PRM indicates that compute programs must set "Untyped Data-Port
Cache Flush" to flush some LSC writes when flushing HDC. Although we
are setting DC-flush, and not HDC-flush, it does appear that the
following reference might also apply to DC-flush.
In the Intel(R) Arc(tm) A-Series Graphics and Intel Data Center GPU
Flex Series Open-Source Programmer's Reference Manual, Vol 2a: Command
Reference: Instructions, PIPE_CONTROL, HDC Pipeline Flush (DWord 0,
Bit 9), there is a programming note:
> When the "Pipeline Select" mode is set to "GPGPU", the LSC Untyped
> L1 cache flush is controlled by "Untyped Data-Port Cache Flush" bit
> in the PIPE_CONTROL command.
Ref: a8108f1d44 ("anv: Add missing untyped data port flush on PIPELINE_SELECT")
Ref: bd8e8d204d ("iris: Add missing untyped data port flush on PIPELINE_SELECT")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23176>
In the Intel(R) Arc(tm) A-Series Graphics and Intel Data Center GPU
Flex Series Open-Source Programmer's Reference Manual, Vol 2a: Command
Reference: Instructions, PIPE_CONTROL, HDC Pipeline Flush (DWord 0,
Bit 9), there is a programming note:
> When the "Pipeline Select" mode is set to "GPGPU", the LSC Untyped
> L1 cache flush is controlled by "Untyped Data-Port Cache Flush" bit
> in the PIPE_CONTROL command.
Ref: a8108f1d44 ("anv: Add missing untyped data port flush on PIPELINE_SELECT")
Ref: bd8e8d204d ("iris: Add missing untyped data port flush on PIPELINE_SELECT")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23176>
KHR-GL46.texture_buffer.texture_buffer_operations_ssbo_writes writes
to an SSBO in a compute program, then issues a memory-barrier, which
causes us to add a DC-flush. Then a second compute program samples
from the SSBO written by the first compute program.
Although we expected the DC-flush to make the writes available to the
second compute program, on MTL this wasn't the case. Adding the
"Untyped Data-Port Cache Flush" fixes this.
The PRM indicates that compute programs must set "Untyped Data-Port
Cache Flush" to flush some LSC writes when flushing HDC. Although we
are setting DC-flush, and not HDC-flush, it does appear that the
following reference might also apply to DC-flush.
In the Intel(R) Arc(tm) A-Series Graphics and Intel Data Center GPU
Flex Series Open-Source Programmer's Reference Manual, Vol 2a: Command
Reference: Instructions, PIPE_CONTROL, HDC Pipeline Flush (DWord 0,
Bit 9), there is a programming note:
> When the "Pipeline Select" mode is set to "GPGPU", the LSC Untyped
> L1 cache flush is controlled by "Untyped Data-Port Cache Flush" bit
> in the PIPE_CONTROL command.
Ref: bd8e8d204d ("iris: Add missing untyped data port flush on PIPELINE_SELECT")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23176>
In the Intel(R) Arc(tm) A-Series Graphics and Intel Data Center GPU
Flex Series Open-Source Programmer's Reference Manual, Vol 2a: Command
Reference: Instructions, PIPE_CONTROL, HDC Pipeline Flush (DWord 0,
Bit 9), there is a programming note:
> When the "Pipeline Select" mode is set to "GPGPU", the LSC Untyped
> L1 cache flush is controlled by "Untyped Data-Port Cache Flush" bit
> in the PIPE_CONTROL command.
Ref: bd8e8d204d ("iris: Add missing untyped data port flush on PIPELINE_SELECT")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23176>
This patch decreases the size of nir_builder_opcodes.h from 14292 loc to
13763 loc.
nir_build_ versions are still needed if the nir_ is a custom helper.
Intrinsics which need such a helper have to be added to
build_prefixed_intrinsics.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23858>
While debugging KHR-GLES31.core.draw_buffers_indexed.color_masks, the noise from
piles of store_output(load_output) instructions got in the way. Optimize it out.
This does not fix the test, but if this case ever happened in a real app it
would improve performance. This is only load bearing on Asahi (and PanVK?),
since Panfrost wouldn't call nir_lower_blend at all in this case.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23836>
These generated sources uses older, less portable types such as ubyte,
ushort and uint. But we have stdint.h everywhere now, so let's use those
types instead.
To stay consistent, let's talk about UINT8 etc instead of UBYTE for the
entirety of the u_indices infrastructure.
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23853>
There are cases where there is a chain to an unused nir variable that get removed
by nir_opt_dce. This breaks our current linker as the variable can still be accessed
via nir_foreach_shader_in_variable(..) macro.
So lets call nir_remove_dead_variables(..) just before we setup our linking.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23673>
With TLB paths we are always storing full tiles, so we can't use it
if the regions we store are not a multiple of the tile size (or the
full image).
Unfortunately, at the point we call this we don't usually have the
tile size yet so for now we skip the path if we are not copying
full mip levels.
Fixes various CTS fails in:
dEQP-VK.ycbcr.copy.*.optimal*buffer_optimal*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23739>
Compute pipelines only have one shader, which was not handled correctly
in the case of ray tracing pipelines. Adding radv_shader as an argument
allows us to handle the ray tracing prolog. The original loop is inlined
into its only user (radv_pipeline_graphics.c).
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23812>
This hardware hang workaround (PAL waMiscPopsMissedOverlap) is needed only
on some Vega chips, and only for 8 or more samples per pixel. It has a
significant performance cost (around 1.5x-2x in
nvpro-samples/vk_order_independent_transparency), so it should be precisely
configured when setting up Primitive Ordered Pixel Shading.
It was added in 47b780be21, when POPS was not
used in Mesa, with the change being described as "this may not be needed
yet, but let's set it now".
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
This hardware hang workaround (PAL waMiscPopsMissedOverlap) is needed only
on some Vega chips, and only for 8 or more samples per pixel. It has a
significant performance cost (around 1.5x-2x in
nvpro-samples/vk_order_independent_transparency), so it should be precisely
configured when setting up Primitive Ordered Pixel Shading.
It was added in 47b780be21, when POPS was not
used in Mesa, with the change being described as "this may not be needed
yet, but let's set it now".
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
When letting the overlapping waves enter their ordered sections, there must
be no memory accesses to resources which need primitive-ordered access that
are still pending, or there would be a race between the current wave and
the overlapping waves.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
If the wave has set the Primitive Ordered Pixel Shading packer ID hardware
register, it must send MSG_ORDERED_PS_DONE once before the program ends.
It's also safe to send the message if the packer ID register hasn't been
set yet, therefore the message may be sent conservatively. For simplicity,
to ensure that it's sent on all execution paths after setting the packer ID
register, always sending it from a top-level block. This is required for
GFX9-10.3 POPS.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
Implementing the acquire/release semantics of fragment shader interlock
ordered section in Vulkan, and preventing reordering of memory accesses
requiring primitive ordering out of the ordered section.
Also, the ordered section should be as short as possible, so not reordering
the instructions awaiting overlapped waves upwards, and the exit from the
ordered section downwards.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
Waits are needed for early exits from inside a Primitive Ordered Pixel
Shading ordered section, but that code doesn't insert them reliably anyway
because it doesn't obtain the counters for the exact locations of the
jumps, which may be anywhere inside the predecessor blocks.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
These run nir_validate in debug builds, which will avoid bugs slipping in. It's
not enough that llvmpipe doesn't mind illegal NIR, these passes are well within
their rights to fail spectacularly if the NIR wouldn't validate. So validate so
we catch issues early.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23804>
GLSL booleans (and hence bool derefs) may be translated either as 1-bit or
32-bit NIR registers, depending whether the backend uses nir_lower_bool_to_int32
or not. Add a knob for this and choose the right type for different backends.
Fixes nir_validate failure on
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_bvec3 run under
lavapipe. That test indexes into a bvec3 array, and gallivm first lowers bools
and then lowers derefs to registers, resulting in random 1-bit booleans mixed in
with 32-bit bools.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23804>
Wire up the CL_DEVICE_PROFILING_TIMER_RESOLUTION from the PIPE_CAP.
While here, also set CL_PLATFORM_HOST_TIMER_RESOLUTION to 1;
that's bogus since we're using the same value as for device, but
at this point we don't have a device to ask.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23639>
Use the get_timestamp as both the device_timestamp in
get_device_and_host_timer and host_timestamp in that
and get_host_timer.
Having eliminited most other clock sources, discussions
on previous versions have concluded it's best to use the
same timer as the 'host_timestamp' since the main requirements
are that it must be one that's a time seen by the device and
that it's very closely coupled.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23639>
In these cases we actually want uint32_t, because we're doing 32-bit
things to them.
The hwinfo-bit is only being used by i915, and should probably be
moved to i915 instead. But it shoukd *also* be converted, so let's do
that now.
While we're at it, fixup the bit-setting as well.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23833>
This function is added for create strong relationship between
nir_function_impl and nir_function.
So that nir_function->impl->function == nir_function is always true when
(nir_function->impl != NULL && nir_function->impl != NIR_SERIALIZE_FUNC_HAS_IMPL)
And indeed this invariant is already done in functions validate_function and validate_function_impl
of nir_validate
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23820>
The problem is when we have DP2 or DP3 instruction that writes a w
channel like here:
DP3 temp[148].w, -temp[147].xyz_, temp[57].xyz_;
will get pair-converted to
src0.xyz = temp[147], src1.xyz = temp[57]
DP3, -src0.xyz, src1.xyz
DP3 temp[148].w, -src0._, src0._
where the alpha instruction is a basically just a replicate of the
result from the RGB sub intruction. However the destination register
index in the RBG slot is also 148. Now we pair-schedule and regalloc
src0.xyz = temp[13], src1.xyz = temp[3]
DP3, -src0.xyz, src1.xyz
DP3 temp[3].w, -src0._, src0._
We properly regalloc the alpha channel, but we obviously skip the rgb,
because the writemask is empty there. However when we emit the shader
later, we actually check the number of used regs based on the maximum
used register index and we don't consider the writemasks, so we would
think we use 149 temps. AFAIK the shader would be still completelly OK.
But we would think it hits the HW limits and used a dummy one instead.
Fix this by checking for empty writemasks when marking the registers as
used.
GAINED: shaders/glmark/1-22.shader_test FS
This is also needed to prevent another lost Trine shader from
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23838>
On Alchemist, the FF_MODE2 documentation says that we must set the
FF_MODE2 timer values for GS and HS to 224. The hardware performance
tuning guide also recommends setting the TDS timer to 4.
On Tigerlake, i915 applies workarounds to set the GS timer to 224
(failing to do so can cause HS/DS unit hangs), and the TDS timer to 4
(for performance). It doesn't currently apply a HS timer there, and
I'm not sure if it's strictly necessary, but given that Alchemist
needed it, and the other two settings matched, let's assume that it
ought to match as well.
Unfortunately, there has been a bug in the i915 workarounds
infrastructure for non-masked context registers where writing one
field of the register zeroes out all the others. So, I believe the
Tigerlake TDS timer value of 4 isn't being applied correctly there,
though the register is also not readable on that platform which
makes it hard to verify. So, this may also speed up tessellation.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9233
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23839>
This enables L3 partial write merging for a number of cases that seem
to be getting accidentally disabled by the kernel, which was causing a
serious performance bottleneck on DG2 and MTL platforms. The
"Compressible Partial Write Merge Enable", "Coherent Partial Write
Merge Enable" and "Cross-Tile Partial Write Merge Enable" bits in
L3SQCREG5 were expected to be enabled by default (and confusingly,
they even read off as enabled if you ran 'intel_reg read 0xb158' on an
idle system), but they are getting clobbered during 3D context
initialization by an i915 workaround.
Enabling L3 partial write merging of compressible surfaces in
particular seems to increase rendering fillrate by over 3x in some
cases (e.g. the
"VulkanFillRate/FillRateGPU/resolution:1[0-3]/format:*/blend:0"
fillrate-bound microbenchmarks). Significant improvements can also be
reproduced in most real-world workloads we've tested so far,
e.g. Counter Strike GO improves by ~11%, Shadow Of the Tomb Raider
improves by ~5.5%, and AztecRuins-VK improves by ~6.5% on DG2-512 --
Thanks a lot to Caleb Callaway for these figures. No regressions have
been observed so far.
Even though this patch might strike as surprisingly simple for such a
large payoff, it's the result of Felix DeGrood and I trying to
root-cause the rendering performance gap of DG2 on Linux vs Windows on
and off during the last year, and some of the OA statistics captured
by Felix early this month were greatly helpful for me to connect the
last few dots, so Felix deserves a big chunk of the credit for this
work.
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23783>
The SSA killer feature is that, under an "optimal" allocator, the number of
registers used (register demand) is *equal* to the number of registers required
(register pressure, the maximum number of variables simultaneously live at any
point in the program). I put "optimal" in scare quotes, because we don't need to
use the exact minimum number of registers as long as we don't sacrifice thread
count or introduce spilling, and using a few extra registers when possible can
help coalesce moves. Details-shmetails.
The problem is that, prior to this commit, our register allocator was not
well-behaved in certain circumstances, and would require an arbitrarily large
number of registers. In particular, since different variables have different
sizes and require contiguous allocation, in large programs the register file may
become fragmented, causing the RA to use arbitrarily many registers despite
having lots of registers free.
The solution is vector live range splitting. First, we calculate the register
pressure (the minimum number of registers that it is theoretically possible to
allocate successfully), and round up to the maximum number of registers we will
actually use (to give some wiggle room to coalesce moves). Then, we will treat
this maximum as a *bound*, requiring that we don't use more registers than
chosen. In the event that register file fragmentation prevents us from finding a
contiguous sequence of registers to allocate a variable, rather than giving up
or using registers we don't have, we shuffle the register file around
(defragmenting it) to make room for the new variable. That lets us use a
few moves to avoid sacrificing thread count or introducing spilling, which is
usually a great choice.
Android GLES3.1 shader-db results are as expected: some noise / small
regressions for instruction count, but a bunch of shaders with improved thread
count. The massive increase in register demand may seem weird, but this is the
RA doing exactly what it's supposed to: using more registers if and only if they
would not hurt thread count. Notice that no programs whatsoever are hurt for
thread count, which is the salient part.
total instructions in shared programs: 1781473 -> 1781574 (<.01%)
instructions in affected programs: 276268 -> 276369 (0.04%)
helped: 1074
HURT: 463
Inconclusive result (value mean confidence interval includes 0).
total bytes in shared programs: 12196640 -> 12201670 (0.04%)
bytes in affected programs: 1987322 -> 1992352 (0.25%)
helped: 1060
HURT: 513
Bytes are HURT.
total halfregs in shared programs: 488755 -> 529651 (8.37%)
halfregs in affected programs: 295651 -> 336547 (13.83%)
helped: 358
HURT: 9737
Halfregs are HURT.
total threads in shared programs: 18875008 -> 18885440 (0.06%)
threads in affected programs: 64576 -> 75008 (16.15%)
helped: 82
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
This makes PIPE_CAP_SURFACE_SAMPLE_COUNT do something, namely, explode
with lots of fireworks. We'll have to figure out what's wrong, but at
least now we aren't just not trying at all. Should not break anything as
long as PIPE_CAP_SURFACE_SAMPLE_COUNT is not flipped on.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
This reverts commit 9e67d3f237.
We do not, in fact, implement texture barriers. Texture barriers are
supposed to allow non-overlapping rendering feedback loops. We cannot
support that at non-tile boundaries when texture compression is enabled
without some kind of downgrade path or other special handling.
Fixes Emacs corruption on X/Glamor.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
Switch our frontends from generating sample_mask_agx to discard_agx, and
switching from legalizing sample_mask_agx to lowering discard_agx to
sample_mask_agx. This is a much easier problem and is done here in a way that is
simple (and inefficient) but obviously correct.
This should fix corruption in Darwinia.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
sample_mask_agx corresponds directly to the hardware's 2-source instruction, but
it's hard to use correctly and even harder to legalize after the fact, since
it's responsible for not only discard but also late depth/stencil testing. For
our various high-level lowering passes, it's easier to use a one-source discard
(where we don't have to worry about sample masks), which the compiler will
internally lower to the two-source instruction. Introduce such an instruction.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
The preprocess buffer is the buffer used to generate the cmdbuf. It
was aligned to 256 bytes but the correct alignment is actually
ac_gpu_info::ib_alignment.
Otherwise, if a DGC IB is executed like a IB1, this hits an assertion
in radv_amdgpu_cs_submit() because the alignment is incorrect.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23764>
Shaders are the largest thing we hash now, so they benefit from a faster
hash.
Change the field name from `sha1` to `hash` to avoid tying the definition
to a particular algorithm. This doubles down as a precaution against
callers still assuming a 20-byte hash (in which case the compilation will
error out).
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22571>
- Fix null deref. VkPipelineLayoutCreateInfo::pSetLayouts is allowed to
contain VK_NULL_HANDLE.
- The loop 'break' was misplaced.
Fixes crash in
dEQP-VK.pipeline.pipeline_library.graphics_library.fast.0_00_11_11 after
VK_EXT_graphics_pipeline_library is enabled in a later patch.
Fixes: 91966f2eff ("venus: extend lifetime of push descriptor set layout")
Signed-off-by: Lina Versace <linyaa@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Dawn Han <dawnhan@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23810>
Now that Intel and R600 both do their own modifier propagation, the only
backends that still lower modifiers in NIR are:
* nir-to-tgsi
* lima
* etnaviv
* a2xx
The latter 3 backends do not support integers, and certainly do not support
fp64. So they don't use these.
TGSI in theory supports integer negate modifiers but NTT doesn't use them, so
they're unused there too.
Since they're unused, we remove NIR support for integer and 64-bit modifiers,
leaving only 16/32-bit float modifiers. This will reduce the scope needed for a
replacement to NIR modifiers, being pursued in !23089.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23782>
Venus sync feedback design relies on concurrent host device resource
access. To avoid device flush overwriting host writes, we must
suballocate the slots with a minimum size of the buffer alignment.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23633>
VE_COND_MUX_GTE4 is a nice match for the TGSI CMP opcode, however
there is a big limitation due to the general shortcoming of the
vertex shader engine that any instruction can read only two different
temporary registers. So we still have to lower in some cases.
Shader-db RV530:
total instructions in shared programs: 130872 -> 130333 (-0.41%)
instructions in affected programs: 29854 -> 29315 (-1.81%)
helped: 294
HURT: 83
total temps in shared programs: 16747 -> 16775 (0.17%)
temps in affected programs: 407 -> 435 (6.88%)
helped: 10
HURT: 38
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23691>
I have been running ci stress tests during the last few days
and nights and this is what I needed to get a pass rate > 80%.
There are still many flakes but I think this is a good starting
point to make better use of the ci.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23797>
error is:
../src/mapi/glapi/tests/check_table.cpp:563:19: error: non-constant-expression cannot be narrowed from type 'unsigned long' to 'unsigned int' in initializer list [-Wc++11-narrowing]
{ "glNewList", _O(NewList) },
This is just a test and only with clang, and can be disabled by compiler option, so there is no need to back ported
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23793>
The compile error when compiled with "-Dglx=xlib -D shared-glapi=disabled":
check_table.cpp:1133:37: error: ‘struct _glapi_table’ has no member named ‘DrawArraysInstancedARB’; did you mean ‘DrawArraysInstanced’?
1133 | { "glDrawArraysInstancedARB", _O(DrawArraysInstancedARB) },
Fixes: 5679ef99b8 ("glapi: remove EXT and ARB suffixes from Draw functions")
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23793>
Hidden behind `RUSTICL_ENABLE=fp16` for now as the OpenCL CTS doesn't have
enough fp16 tests at the moment. There has been a lot of work on it though,
so hopefully we can enable and verify it soon.
Additionally libclc also misses a bunch of fp16 functionality, so most of
the tests would also just crash.
However this flag is useful for development as it already wires up most of
the code needed.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23788>
tcs_out_lds_layout is basically renamed to tes_offchip_addr in TCS, using
the same variable as TES and also using the same bit layout. The only
difference in the bit layout was that TCS had to mask out the low bits,
which this also removes.
The enums are renamed to *_SGPR_TCS_OFFCHIP_ADDR so as not to conflict
with *_SGPR_TES_OFFCHIP_ADDR, which are in different user data SGPRs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23517>
We are trying to re-enable the valve CI... but doing so runs all the
jobs, including the manual ones.
Since some can take over an hour to run, let's disable them, and
re-enable them in another MR by reverting this commit.
Sorry for the noise!
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23789>
Because a plain mov with the fsat modifier doesn't do a proper 64 bit fsat
we either have to propagate the op as modifier to the instruction that
creates the value, or we add a fake op that applies the fsat op, i.e. we
implement the mov as an add_64 with zero as the second value.
Fixes: 0ff3c4bef2
r600/sfn: drop use of nir source mods
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23754>
The Vulkan CTS started generating the list of valid versions the driver
can report as conformant against based on the active branches, and the
1.3.0 branch we were reporting up to now is no longer valid.
Fixes dEQP-VK.api.driver_properties.conformance_version
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23784>
This was effectively happening *anyway* because WSI init was calling
functions that needed a D3D12 device around to be able to answer.
Just remove the whole song and dance of maybe not having a device.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23658>
Previously we reserved space for a stream link and whenever we ran
out of space in the current bo, allocated a new one, and emitted a
link to it. This is problematic as stream links can only be emitted
at state update boundaries so the handling could have produced a
corrupted control stream.
That's fixed by using a `relocation_mark` set by the driver to
indicate where a state update was last started, so csb can relocate
the whole update into the new bo and link to it.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23520>
Code already exists to convert SHADER_OPCODE_SHUFFLE into a simple MOV
when either source is constant. However... the constants have to
actually get into those sources!
On a shader that I'm working on that multiplies very large matrices using
lots of subgroup operations,
-SIMD8 shader: 1378 instructions. 3 loops. 793896 cycles. 0:0 spills:fills, 23 sends, scheduled with mode non-lifo. Promoted 0 constants. Compacted 22048 to 21664 bytes (2%)
+SIMD8 shader: 346 instructions. 3 loops. 61742 cycles. 0:0 spills:fills, 23 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 5536 to 5216 bytes (6%)
No changes in shader-db or fossil-db on any Intel platform.
v2: Merge a bunch of identical cases. Suggested by Ken.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23609>
Previously the set of dynamic offsets were being reused per each
binding. That's now fixed, by using an offset to determine where
each binding's dynamic offsets reside.
Tests fixed:
dEQP-VK.binding_model.descriptor_copy.{compute,graphics}
.{uniform,storage}_buffer_dynamic_0
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Fixes: aa791961a8 ("pvr: Add support for dynamic buffers descriptors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23587>
The driver can merge subpasses within a render pass into a single
hw render. While doing so it makes the assumption that the subpasses
in an hw render will all be submitted in a single job.
On vkCmdPipelineBarrier() the driver was previously incorrectly
inserting an event sub-cmd on a merged subpass, breaking that
assumption leading to incorrect values for input attachments.
Signed-off-by: Soroush Kashani <soroush.kashani@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Fixes: 6d672e0336 ("pvr: Add initial vkCmdPipelineBarrier skeleton.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23693>
Vulkan allows empty descriptor sets to be created. When we setup
the descriptor set addresses table we fill in the address of the
`bo` for each valid/currently bound desc set. For empty desc sets
there is no `bo` which was causing a seg fault. Now skip them,
leaving their address set to `~0`.
Reported-by: Simon Perretta <simon.perretta@imgtec.com>
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Fixes: ce67f5ac94 ("pvr: Write descriptor set addrs table dev addr into shareds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23692>
Move most of the DB_SHADER_CONTROL fields from the pipeline to the pixel
shader for preparation for shader objects.
Also, the GFX11 export conflict bug workaround doesn't need to be enabled
for non-1x sample counts or if blending is not enabled, so make the
application of DB_SHADER_CONTROL consider the current sample count and
blending state even if they're dynamic.
Having access to the exact sample count in DB_SHADER_CONTROL setup is also
necessary for good performance in SampleInterlock execution modes of
fragment shader interlock, for configuration of POPS_OVERLAP_NUM_SAMPLES
(GFX9-10.3) or OVERRIDE_INTRINSIC_RATE (GFX11), as PixelInterlock is
massively slower with multisampling due to overlap between adjacent
polygons sharing covered pixels among the common edge.
The name of the dynamic state controlling DB_SHADER_CONTROL is now
unambiguous - previously line rasterization mode had effect on attachment
feedback loop state emission.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23474>
We have a number of users reporting surface creation issues with
modifiers etc...
This makes Anv & Iris printout the reason of the failure with
INTEL_DEBUG=isl Failure example in Iris :
MESA: debug: ISL surface failed: ../src/intel/isl/isl.c:1729: requested row pitch (42B) less than minimum alignment requirement (1024B) extent=160x160x1 dim=2d msaa=1x levels=1 rpitch=42 fmt=B8G8R8X8_UNORM usage=+rt+tex+disp
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14039>
there are only 2 ubos that can be emitted, except the emitted ubos
can start at an offset based on the first-used ubo, which means this
has to support the full range of ubo indices
fixes oob access in game Beyond All Reason
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23729>
Indeed, the vertex state was restored using a specific
condition at the util_blitter_restore_vertex_states()
level. This change ensures that the condition is the
same when the vertex state is saved.
The function util_blitter_clear_buffer() is only called
by the r600 driver on pre-evergreen gpus.
This issue is triggered on a rv770 gpu with "piglit/bin/fbo-1d -auto -fbo"
or "piglit/bin/draw_buffers_gles2 -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: 5f566faa46 ("radeonsi: don't save and restore vertex buffers and elements for u_blitter")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23721>
For historical reasons, nir and vtn were compiled together,
and a bunch of vtn specific targets were defined in
src/compiler/meson.build.
Now that we can, make src/compiler/spirv produce an internal
library that depends on NIR, and is used by the drivers/tools.
Also move the vtn specific targets into that directory's
meson.build.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23668>
In the following sequence :
- write buffer B with a shader
- barrier on buffer from shader-write to transfer
- vkCmdCopyQueryPoolResults to buffer B
The barrier should take care of ordering things between the shader
writes and vkCmdCopyQueryPoolResults.
The problem is that vkCmdCopyQueryPoolResults runs on the command
streamer and that is not coherent or synchronized in the same way as
shaders.
This change marks the barrier has potentially containing pending
buffer writes for queries so that we can insert the necessary flush
for vkCmdCopyQueryPoolResults later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9013
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23675>
src/freedreno/vulkan/tu_pipeline.c:4723:25: error: passing argument 1 of ‘pthread_mutex_unlock’ from incompatible pointer type [-Werror=incompatible-pointer-types]
4723 | pthread_mutex_unlock(&dev->pipeline_mutex);
| ^~~~~~~~~~~~~~~~~~~~
| |
| mtx_t *
In file included from ../../src/freedreno/vulkan/tu_common.h:14,
from ../../src/freedreno/vulkan/tu_pipeline.h:13,
from ../../src/freedreno/vulkan/tu_pipeline.c:10:
/usr/include/pthread.h:835:51: note: expected ‘pthread_mutex_t *’ but argument is of type ‘mtx_t *’
835 | extern int pthread_mutex_unlock (pthread_mutex_t *__mutex)
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23733>
This pass uses a safe iterator so it can't lower new instructions that are
injected as part of the lowering, which is exactly what lower_tg4_offsets
does, and if lower_tg4_broadcom_swizzle is also set then we need to lower
these new instructions. Handle this by running the pass twice when both
are set: the first pass will only handle lower_tg4_offsets and
the second pass (which will see the new tg4 instructions produced with
lower_tg4_offsets) will process the remaining options, including
lower_tg4_broadcom_swizzle.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23616>
The way it's used is unacceptable. Here's the change it suggested:
diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 05b9cb0a3f1..e58abfa885f 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -783,8 +783,7 @@ radv_CreateDevice(VkPhysicalDevice physicalDevice, const VkDeviceCreateInfo *pCr
device->overallocation_disallowed = overallocation_disallowed;
mtx_init(&device->overallocation_mutex, mtx_plain);
- if (physical_device->rad_info.register_shadowing_required ||
- device->instance->debug_flags & RADV_DEBUG_SHADOW_REGS)
+ if (physical_device->rad_info.register_shadowing_required || device->instance->debug_flags & RADV_DEBUG_SHADOW_REGS)
device->uses_shadow_regs = true;
/* Create one context per queue priority. */
This is the dumbest reason to prevent merging a MR. I don't want to see this
in projects I work on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23687>
We don't implement them and we don't advertise the extension, but DPCPP
queries them regardless. We ultimately plan to implement the intel USM
extension. However until we do, just return 0 for those queries.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23705>
The spec does not have such requirement, but anv requires it for
validating the offset. However, for DRM_FORMAT_YVU420, chroma channels
can be swapped upon import to match B/R channel order of
VK_FORMAT_G8_B8_R8_3PLANE_420_UNORM.
This fixes some sw codec path in Instagram when interop with gpu.
v2: fix image memory requirement for re-ordered explicit import
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net> (v1)
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23643>
We have a ubfe_imm helper that creates ubfe ops. Not all drivers support ubfe,
however, as it requires SM5 semantics. A few drivers support oly
ubitfield_extract. They should still get the convenience of an _imm helper, so
add a symmetric helper.
It might be nice to unify these helpers into a single helper that asserts its
inputs do not overflow (such that the two ops become equivalent) and emits
either ubfe or ubitfield_extract depending on the underlying driver. That is
left for future work as it's unclear exactly what naming/semantics we want.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23351>
In function ‘lower_load_kernel_input’,
inlined from ‘clc_nir_lower_kernel_input_loads’ at ../src/microsoft/clc/clc_nir.c:205:28:
../src/microsoft/clc/clc_nir.c:169:7: warning: ‘base_type’ may be used uninitialized [-Wmaybe-uninitialized]
169 | glsl_vector_type(base_type, nir_dest_num_components(intr->dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/microsoft/clc/clc_nir.c: In function ‘clc_nir_lower_kernel_input_loads’:
../src/microsoft/clc/clc_nir.c:151:24: note: ‘base_type’ was declared here
151 | enum glsl_base_type base_type;
| ^~~~~~~~~
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23666>
Also remove unused backend CEIL lowering.
Single regressed gnome-shell shader due to fceil followed by f2i32
where before nir_lower_int_to_float would recognize that we already
have integer and emit mov instead of trunc for the f2i32. We can
clean this up easily once we move ntt to the backend.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23642>
The render target flush would have been needed if it was possible to:
1) pollute the render cache and write to the data port in one draw
call.
2) perform a subsequent operation that assumed the render cache was
up-to-date.
However, this is not possible for the two glMemoryBarrier barrier bits
that get translated to this pipe barrier:
* GL_TEXTURE_FETCH_BARRIER_BIT is only used for sampling operations.
It's possible to pollute the render cache and data cache with writes
to a texture in one draw call (1). However, the GL spec states that
apps cannot assume that any existing render caches are up-to-date for
sampling the written locations immediately afterwards. Apps are
required to use glTextureBarrier before the sampling operation, so
requirement #2 is not satisfied.
* GL_PIXEL_BUFFER_BARRIER_BIT could be used for a PBO upload (2), but
it's not possible to pollute the render cache and data cache with a
PBO access in one draw call. PBOs cannot be bound to framebuffers
for rendering, so requirement #1 is not satisfied.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18725>
For 32bpc formats, the ICL+ sampler fetches the raw clear color dwords
used for rendering instead of the converted pixel dwords typically used
for sampling. The CLEAR_COLOR struct page documents this for 128bpp
formats, but not for 32bpp and 64bpp formats.
In blorp_copy, map R11G11B10_FLOAT to R8G8B8A8_UINT instead of R32_UINT.
This will cause the sampler to fetch the clear color pixel, allowing
drivers to keep clear color support enabled during copies.
If iris is forced to convert blits to copies, this patch fixes the
following test on gfx12:
dEQP-GLES3.functional.fbo.color.repeated_clear.blit.rbo.r11f_g11f_b10f
At the moment, both iris and anv won't hit this issue outside of
blorp_copy. This is due to the read/write access restrictions they
currently place on texture views that reinterpret the surface format.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8964
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23604>
isl will assert out otherwise. Hit this with intel_stub_gpu on arm64, but it is
a legitimate bug since someone might plug a DG2 card into a workstation-grade
arm64 or ppc64 supporting PCIe (it exists).
This forward ports the logic from crocus, which checks for both SSE at a
compile-time level as well as in the CPU caps. This might be excessive since DG2
cards apparently wouldn't work properly on old non-SSE x86 boxes anyway? I just
crocus-and-pasted.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23608>
These are similar to the nir_{cmp}_imm variants we already have, except
they negate the condition (apart from equality) and flip the arguments.
The reason we need this, is that we don't have all comparison directions
that would be required to always pass the immediate in the second
argument.
This allows us to create any comparison with an immediate without
having to manually create the immediate value.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23460>
When DEBUG_FP is set I see the following compiler errors:
../../src/gitlab_mesa/src/mesa/program/arbprogparse.c: In function '_mesa_parse_arb_fragment_program':
../../src/gitlab_mesa/src/mesa/program/arbprogparse.c:133:4: error: implicit declaration of function '_mesa_print_program'; did you mean '_mesa_parse_arb_program'? [-Werror=implicit-function-declaration]
133 | _mesa_print_program(&program->Base);
| ^~~~~~~~~~~~~~~~~~~
| _mesa_parse_arb_program
../../src/gitlab_mesa/src/mesa/program/arbprogparse.c:133:32: error: 'struct gl_program' has no member named 'Base'
133 | _mesa_print_program(&program->Base);
| ^~
cc1: some warnings being treated as errors
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23644>
this should enable compression on more intermediate fb attachments
it also means that VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT can now be set
on images where ZINK_BIND_MUTABLE is not set, so non-resource APIs need
to check ZINK_BIND_MUTABLE
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23514>
The hardware gets given a session context from userspace in each
submission, but if the session context changes the hardware wants
a FENCE to be emitted to know it can give up the current session.
IF a test submits interleaved session ctx access and uses a single
vulkan submit the hardware crashes, unless each IB is submitted
in a separate submission so the fence can be sent.
In theory it could be possible to construct a single command buffer
to trigger this so I do think the hardware should be smarter here.
Should this be fixed in the kernel to always emit a fence between
IBs?
Fixes: dEQP-VK.video.decode.h264_interleaved
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23641>
There's no reason to differentiate between primitive types and structs here. `cl_prop_for_struct`
can handle primitive types just fine.
Drop `cl_prop_for_type` and rename the existing `cl_prop_for_struct` to `cl_prop_for_type`.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
It's a layering violation and really the wrong tool for the job. Add a new fn to view a given slice
as a &[u8] instead of going though the clprop machinery which creates a new Vec.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
These are mostly just obvious patterns that somebody will eventually
want to add.
DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar
results (Ice Lake shown)
total instructions in shared programs: 20570033 -> 20570026 (<.01%)
instructions in affected programs: 7363 -> 7356 (-0.10%)
helped: 6 / HURT: 0
total cycles in shared programs: 902118781 -> 902118854 (<.01%)
cycles in affected programs: 419132 -> 419205 (0.02%)
helped: 4 / HURT: 2
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819500 -> 152819380 (-0.00%)
Cycles: 15014627187 -> 15014624437 (-0.00%)
Totals from 115 (0.02% of 662497) affected shaders:
Instrs: 28963 -> 28843 (-0.41%)
Cycles: 404582 -> 401832 (-0.68%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
v2: Fix a copy-and-paste bug s/('find_lsb', a)/a/ in the patterns. See
piglit!819.
DG2, Tiger Lake, Ice Lake, Skylake, and Broadwell had similar results (Ice Lake shown)
total instructions in shared programs: 20570063 -> 20570033 (<.01%)
instructions in affected programs: 452 -> 422 (-6.64%)
helped: 30 / HURT: 0
total cycles in shared programs: 902118723 -> 902118781 (<.01%)
cycles in affected programs: 1762 -> 1820 (3.29%)
helped: 0 / HURT: 29
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819969 -> 152819500 (-0.00%)
Cycles: 15014628652 -> 15014627187 (-0.00%); split: -0.00%, +0.00%
Totals from 469 (0.07% of 662497) affected shaders:
Instrs: 7644 -> 7175 (-6.14%)
Cycles: 31787 -> 30322 (-4.61%); split: -4.90%, +0.29%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
The needs of this pass are ever so slightly more than what
nir_opt_algebraic can do. :( Specifically, it needs to be able to look
at the relationship of constant values used in an expression tree.
v2: Add nir_mov_alu to handle swizzles on the original sources.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
Only retry when there's some kind of non-job failure, such as
runner-internal issues, or API/network issues, etc. If the job itself
fails or times out, then given the length of these jobs, there's no
point trying again and just tying up the job slots for even more hours.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
Rather than always retrying, only retry jobs on a limited set of causes.
This notably excludes retries when a job is stuck due to lack of runners
to schedule it; if we can't get a slot on a runner in time, there's no
reason to try again, since our window of opportunity has gone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of
last_grf_write 2400 times, multiple times through the scheduler. Just
resetting for the processed instructions reduces runtime from 21s to 16s.
No change on steam shader-db runtime across several runs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
We were zeroing it out per block, but it doesn't actually help to count
per block, since the question is "will scheduling this instruction free
the reg?". Saves some memsetting, which was showing up high in the
profile (but not from this source).
No change on iris SKL shader-db.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
Only found 3 shaders affected in Red Dead Redemption :
Totals from 3 (0.05% of 5969) affected shaders:
Instrs: 2246 -> 2230 (-0.71%)
Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05%
This will have a larger effect when we add the
load_ubo_uniform_block_intel intrinsic where we will have larger
blocks (vec8/vec16 vs vec4 only now).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
This new approach handles things as follows:
1. Fences won't be attached to events anymore, applications only wait on
the cv attached to the event.
2. Only the queue is allowed to update event status for non user events.
This will eliminate all remaining status updating races between the
queue and applications waiting on events.
3. Queue minimized flushing by bundling events
4. Increase cv wait timeout as there is really no point in waking up too
often.
Reduces amount of emited fences on radeonsi in luxmark 3.1 luxball by 90%
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed by Nora Allen <blackcatgames@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23612>
Since commit 'c98ddc778a3 broadcom/compiler: force a last thrsw for spilling'
we always ensure we signal the last thread section explicitly with a
last thread switch.
Relying on VPM stores to detect the last thread section is particularly bad,
because we can have VPM stores occurring quite early in a shader program,
which would disable TMU spilling almost entirely.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22461>
This was only used for version < 40 (See commit 22a02f3e3).
Adding some extra explanations and asserts of places where it is used.
As we are here also move the definition of a register with QFILE_VPM,
to avoid defining it if not needed.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22984>
Currently SAMPLER_CONFIG0 is always emitted to either update the active
configuration or disable the sampler. With NTE this always emits 32 state
dwords, while there are a lot of cases that only use a small number of
samplers and never change the other samplers from their disabled state.
Track the active samplers from the last emit, so we can skip the state
emission when the sampler is already disabled. Only emit the full state
after a context flush where we don't know the previous sampler state of
the GPU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23579>
In addition to bringing us one backend closer to the scoped-only future, this
improves the generated code in cases like:
memoryBarrierBuffer();
memoryBarrierShared();
controlBarrier();
With scoped_barriers + nir_opt_combine_barriers, we now emit only one MEMBAR
instruction (and a BARRIER) rather than two MEMBARs.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
Specifically we would bail out previously when encountering any
control flow, now we would optimize it even when the second ARL/ARR
is inside a lower level if/else branch.
shader-db
RV530:
total instructions in shared programs: 132020 -> 131924 (-0.07%)
instructions in affected programs: 3374 -> 3278 (-2.85%)
helped: 4
HURT: 0
RV370:
no change (no control flow there)
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
We already do this for ARL, so just generalize the pass.
shader-db
RV530:
total instructions in shared programs: 132235 -> 132020 (-0.16%)
instructions in affected programs: 8492 -> 8277 (-2.53%)
helped: 41
HURT: 1
total temps in shared programs: 16900 -> 16887 (-0.08%)
temps in affected programs: 83 -> 70 (-15.66%)
helped: 13
HURT: 0
RV370:
total instructions in shared programs: 82395 -> 82320 (-0.09%)
instructions in affected programs: 4715 -> 4640 (-1.59%)
helped: 33
HURT: 1
total temps in shared programs: 12316 -> 12305 (-0.09%)
temps in affected programs: 75 -> 64 (-14.67%)
helped: 11
HURT: 0
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
Its particularly important to have the copy-propagate pass run first.
So that when the round is vectorized, we don't have to follow the MOVs
to find out if it leads to ARL or not (we don't vectorize ARR/ARL at the
moment).
No shader-db change.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
Specifically after the first copy propagate run but before the
second one. Removal of ARLs will enable the copy propagate to be more
aggresive, as it is very carefull in such cases.
shader-db
RV530:
total instructions in shared programs: 131861 -> 131503 (-0.27%)
instructions in affected programs: 23949 -> 23591 (-1.49%)
helped: 199
HURT: 15
total temps in shared programs: 16997 -> 16903 (-0.55%)
temps in affected programs: 767 -> 673 (-12.26%)
helped: 69
HURT: 9
RV370:
total instructions in shared programs: 82360 -> 82027 (-0.40%)
instructions in affected programs: 19516 -> 19183 (-1.71%)
helped: 183
HURT: 15
total temps in shared programs: 12370 -> 12262 (-0.87%)
temps in affected programs: 664 -> 556 (-16.27%)
helped: 73
HURT: 0
The hurt programs are due to some constant load being copy propagated
which leads to bad interaction with source conflict resolve pass later.
v2: add missing shader type initialized to the tests. Previously we were
checking for has_omod which also practically means we have a fragment
shader, however its less readable.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
On RDNA2, VRS rates are part of the HTILE buffer but if we disable
HTILE completely for eg. GENERAL, VRS rates aren't read by the hw.
Fix this by disabling HTILE compression which should have the same
effect without VRS.
Fixes recent
dEQP-VK.fragment_shading_rate.renderpass2.monolithic.attachment_rate.misc.*
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23209>
Otherwise we run into problems by putting this optimization loop
before I/O lowering, where there might still be 8-bit values that
haven't been lowered to 16 or 32. Once that's done, any remaining
movs or vec ops will have higher bit sizes already.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Add support for 16-bit UBO loads, delete handling of byte-addressed
UBO loads (which I think was never used anyway) and add handling
for the component const index to optimize out unneeded extractResults.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Derefs have index-based access semantics, which means we don't need
custom intrinsics to encode an index instead of a byte offset.
Remove the "masked" store intrinsics and just emit the pair of atomics
directly. This massively reduces duplication between scratch, shared,
and constant, while also moving more things into nir so more optimizations
can be done.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
We still use shader_temp as a temporary variable mode to differentiate
which variables have simple deref patterns vs ones that need to be
lowered to ssbo, but then we put it back to mem_constant when we're
done to restore sanity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
There's a few changes in here that are very inter-related.
First, we stop lowering load_deref on shader_temp to load_ptr_dxil,
and just leave it as load_deref. In order for that to work, we need
the derefs to be in a shape that's acceptable to DXIL, so the only
current producer of shader_temp loads (the CLC frontend) needs to
run some lowering passes on them first.
The DXIL backend is augmented to just write out deref indices while
walking a deref chain, which will get combined in the load op into
a GEP instruction. For non-mesh/raytracing shaders, these are required
to be single-level scalar arrays, but the complexity here is preparation
for when we don't need to do that anymore.
Additionally, the const lookups are changed from using a hash table
to just putting an index on the variable.
All of this together is enough to enable the authored-forever-ago test
which uses indirect array access into a const packed struct. The
load_ptr_dxil handling didn't deal with packed structs / unaligned
accesses, but now that we're in a logical address space with derefs
instead of physical, there's no alignment to deal with anymore and
the fact that it's packed goes out the window.
This removes one custom DXIL intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
DXIL requires GEP chains to point to a global variable that's a flat
array of primitive types. If we're converting deref chains to GEP
chains, we're effectively in a logical address space, which means
we can do things like change sizes of variables, since we know
they won't alias with anything else. If they could alias, we'd be
lowering them to an explicit I/O op instead. That means we can
start disabling some of the low-bit-size lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Now that we try harder for memcpys, we can use nir's complex usage helper.
We also can just mark the vars instead of using a hash map, since location
doesn't mean anything for constant vars.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
For the case of memset, the SPIR-V translator produces a copy from
a byte array of 0s. If we wait to lower memcpys until after types
are sized, we can potentially turn those 0s into SSA zeros and remove
the entire constant array.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
I was seeing u2u64 still in my final shader after pack/unpack were
lowered, which sounds to me like some other optimizations are missing
for detecting the post-lowering pack/unpack patterns, but let's at
least add some patterns for the simple cases.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
The var splitting pass can rearrange the variables as long as their
position in memory doesn't matter. For block-arranged variables,
or things like memcpys or casts, the layout matters, but atomics
don't imply anything about the layout of the overall variable, so
don't treat them as "complex" for this use case.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Idiomatic DXIL has constants contained within global variables rather
than a big blob of data. Doing this allows us to have 16-bit and 64-bit
data as well, where normally bitcasts would be disallowed on variable
GEP chains.
Unfortunately, DXIL validation requires SOA to be turned into AOS,
which means we need to split structs. We want to be able to run this
on nir_var_mem_constant variables which have constant initializers,
so add a bit of logic to handle that case, and relax the mode validation.
There's nothing special about the modes it was set up to handle.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Instead of on Android. Which allows an end user to turn off expat
without breaking or disabling Intel support. I've additionally
refactored to separate expat and xmlconfig a bit more in the root
meson.build
This does make expat a hard dependency for building Intel tools, despite
the fact that only aubinator actually requires it. This simplifies the
build for the common case, and in the event that someone wants to build
the Intel tools and doesn't have libexpat, they can fall back to the
meson wrap for expat instead.
fixes: 75276deebc
closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8791
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23605>
Since Mesa drivers are now version-locked to the loader, that means that
we never need to support a newer hardware driver than the loader, and thus
don't need to generate dynamic dispatch stubs. This is great news, given
that we don't test those paths, and it involved delightful features like
arrays of hex for code to be pasted into executable memory.
More code removal will follow, this is the first cut of "don't generate,
and DCE generation code".
Fixes: #9158
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23451>
While this is a generic bit twiddling ALU instruction, it's especially useful
for address calculations, since the architecture's tiled textures use Morton
coding within the tiles.
This will be used when lowering image_texel_address on AGX, as part of the image
atomics implementation. I don't know if there's any other neat uses I could
detect with opt_algebraic, this doesn't seem like an operation a shader would
open-code... Maybe useful for BVH building or something...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
This change fixes a buffer overflow by implementing the
special swizzles. This behavior is already available with
evergreen_convert_border_color().
For instance, this issue is triggered on a cayman gpu with
"piglit/bin/texwrap bordercolor -auto -fbo" or "piglit/bin/max-samplers -auto -fbo":
==5610==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000012d20 at pc 0x7fb798cb876f bp 0x7ffd78670460 sp 0x7ffd78670458
READ of size 4 at 0x603000012d20 thread T0
#0 0x7fb798cb876e in cayman_convert_border_color ../src/gallium/drivers/r600/evergreen_state.c:2444
#1 0x7fb798cb876e in evergreen_emit_sampler_states ../src/gallium/drivers/r600/evergreen_state.c:2539
#2 0x7fb7989e6cb2 in r600_emit_atom ../src/gallium/drivers/r600/r600_pipe.h:655
#3 0x7fb7989e6cb2 in r600_draw_vbo ../src/gallium/drivers/r600/r600_state_common.c:2333
#4 0x7fb7985082c7 in u_vbuf_draw_vbo ../src/gallium/auxiliary/util/u_vbuf.c:1497
#5 0x7fb796ef2eda in cso_draw_vbo ../src/gallium/auxiliary/cso_cache/cso_context.h:262
#6 0x7fb796ef2eda in st_draw_gallium_multimode ../src/mesa/state_tracker/st_draw.c:170
#7 0x7fb7970d9cfd in vbo_exec_vtx_flush ../src/mesa/vbo/vbo_exec_draw.c:341
#8 0x7fb7970d32d7 in vbo_exec_FlushVertices_internal ../src/mesa/vbo/vbo_exec_api.c:693
#9 0x7fb7970d32d7 in vbo_exec_FlushVertices ../src/mesa/vbo/vbo_exec_api.c:1193
#10 0x7fb7975f237c in enable_texture ../src/mesa/main/enable.c:337
Fixes: 923d635357 ("r600: fix some border color swizzles on CAYMAN")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23435>
Now that everyone but SVGA is requesting NIR, this path had been
effectively disabled. I had done a partial port of the VS side in
9143c08125 ("st/nir: Fix the st->pbo.use_gs case.") for the sake of
nv50, but with it should be ready for all drivers. Affects nv50, v3d,
d3d12, svga (I think).
Note that this GS code is slightly different from the TGSI: We put a 0 in
pos.z, rather than leaving the layer value there, because apparently v3d
didn't like those denorm Z values.
Also, it's nice to see that the NIR code is shorter than the TGSI code
was, we've made great progress on nir_builder.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23114>
In most VS creation paths in the frontend, nir_lower_io_to_temporaries()
is called, which ensures that all outputs have a single write to them
(with a full writemask). The builtins don't, however, and this VS was an
oddball that overwrote one channel of an output that it had already
written. We can avoid surprises for backends (such as d3d12 and v3d) by
emitting a single write per output here.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23114>
Usually lower_io_to_temps sorts this out for us so you only get full
writes, but we should be able to handle it without that. Avoids a
regression with the mesa/st PBO VS with layer output.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23114>
Indeed, this function was not processing the linked
allocated list.
For instance, this issue is triggered with "piglit/bin/hiz-depth-read-fbo-d24-s0 -auto":
Indirect leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7f6795638987 in calloc (/usr/lib64/libasan.so.6+0xb1987)
#1 0x7f678bac13b9 in nouveau_heap_alloc ../src/gallium/drivers/nouveau/nouveau_heap.c:64
#2 0x7f678bb6c7e4 in nv50_program_upload_code ../src/gallium/drivers/nouveau/nv50/nv50_program.c:490
#3 0x7f678bb83b92 in nv50_vertprog_validate ../src/gallium/drivers/nouveau/nv50/nv50_shader_state.c:161
#4 0x7f678bba3000 in nv50_state_validate ../src/gallium/drivers/nouveau/nv50/nv50_state_validate.c:552
#5 0x7f678bba3c4d in nv50_state_validate_3d ../src/gallium/drivers/nouveau/nv50/nv50_state_validate.c:575
#6 0x7f678b9e3e92 in nv50_blit_3d ../src/gallium/drivers/nouveau/nv50/nv50_surface.c:1444
#7 0x7f678b9e3e92 in nv50_blit ../src/gallium/drivers/nouveau/nv50/nv50_surface.c:1832
#8 0x7f678a0b378a in blit_to_staging ../src/mesa/state_tracker/st_cb_readpixels.c:337
#9 0x7f678a0b7358 in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:516
#10 0x7f6789f82005 in read_pixels ../src/mesa/main/readpix.c:1178
#11 0x7f6789f82005 in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1195
#12 0x7f6789f82ac0 in _mesa_ReadPixels ../src/mesa/main/readpix.c:1210
...
SUMMARY: AddressSanitizer: 80 byte(s) leaked in 2 allocation(s).
Fixes: 67635a0a71 ("nouveau: get rid of tabs")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23592>
split ANV_PIPE_RENDER_TARGET_BUFFER_WRITES into separate CS_STALL,
RT_FLUSH & TILE_FLUSH flags in order to have finer control over cache
coherency.
Tigerlake CS has it's own cache fetching directly from the memory controller,
so we need to do a tile flush to ensure the query data is visible.
This fixes test_resolve_non_issued_query_data in vkd3d on TGL.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Fixes: 3c4c18341a ("anv: narrow flushing of the render target to buffer writes")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23500>
There was a race between the worker thread and flush, which could lead to
the last event flushed getting its status set to CL_SUCCESS before any
other event.
Just wait on all flushed events in order to solve this.
The current queue/event implementation isn't the best and we want to
rework it, so this is good enough for now.
Fixes: 47a80d7ff4 ("rusticl/event: proper eventing support")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23578>
The only part we strictly need repr(C) is in `CLObjectBase` and we already
didn't have it set for `Device` and it worked just fine.
We keep it on in `Platform` as this is a more hand rolled type and less
relevant.
With this we can make use of Rusts data layout which saves us some memory.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23573>
With MTL onwards, creating protected contexts too early
may block for a longer period. To prevent that, use the new
kernel GET_PARAM:I915_PARAM_PXP_STATUS interface to get the
status of PXP support immediately without blocking.
Using this same interface, we can also wait for platform
dependency readiness before attempting to create a protected
context. Use a longer timeout when user explicitly requests
for protected context as the kernel assures readiness will be
achieved.
Reference to kernel change: https://patchwork.freedesktop.org/patch/533241/?series=112647&rev=8
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23382>
NOTE: skipped AMD header update due to build error.
From drm-next at the following commit:
commit 2e1492835e439fceba57a5b0f9b17da8e78ffa3d
Merge: 85d712f033d2 43049f17b526
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Jun 2 13:38:48 2023 +1000
Merge tag 'drm-misc-next-2023-06-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23382>
In order to prioritize the upcoming pre-merge jobs on VanGogh, we
limit all the post-merge jobs requiring VanGogh runners to the
low-priority pool of runners.
This will allow Marge to use any of the VanGogh runners we have (6),
while leaving 3 of them unused by post-merge jobs inside Mesa, or
DXVK-CI.
Suggested-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23548>
When we have many DUTs with the same tags (or subset of tags), it is
possible for the gitlab runner name not to match the DUT picked by
valve-infra's executor.
This is needlessly confusing, and prevents specifically tagging some
runners with a low-priority tag but still have these runners execute
high-priority jobs... because a low priority job may get started by
gitlab but the corresponding DUT is being used by a high-priority
task.
To fix this, simply tell the executor which DUT we want, not just the
list of tags we want.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23548>
Different query providers have different behavior on when they produce
samples: the perfmon provider provides a sample at the start and at the
end of the query, while the occlusion query provider only adds another
sample when the query is complete.
Move the sample count manipulation to the providers to be able to take
those differences into account. Removes a useless always-zero sample
for each OQ resume/suspend pair.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23557>
radeonsi switched over to CU wavefront execution mode, but didn't tell
LLVM. This can lead to shaders requiring too many VGPRs to be executed in
CU mode and so cause GPU resets.
Pass along +cumode to LLVM so it properly spills VGPRs.
Fixes: 9d7eab2ab1 ("radeonsi: don't enable WGP_MODE because of high cost of workgroup mem coherency")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23569>
This will be required for OpenCL subgroup support on radeonsi, but also
fixes some regressions today as radeonsi started to use the subgroup id
for invocation_index calculation.
Fixes: 39da12b7c7 ("ac/llvm: clean up visit_load_local_invocation_index and visit_load_subgroup_id")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23551>
amdgpu enables gfxoff by default and the feature resets the RLC clock
counter on idle on raven/raven2. Querying AMDGPU_INFO_TIMESTAMP does
not work as expected on those platforms.
There was an attempt in amdgpu to read from the TSC register instead,
but it did not work without a firmware update[1]. Another possible
solution is to disable the clock counter reset by clearing
AMD_PG_SUPPORT_RLC_SMU_HS, but that causes a 0.2W increase of power
consumption on idle which is undesirable.
The clock counter reset affects vkCmdWriteTimestamp as well. The spec
is vague on whether that is allowed or not. The WG is aware of the
issue[2] but never really addresses it.
[1] https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
[2] https://github.com/KhronosGroup/Vulkan-Docs/issues/216
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23481>
Like our loads and stores, our global atomics support indexing with a 64-bit
base plus a 32-bit element index, zero- or sign-extended and multiplied by the
word size. Unlike the loads and stores, they do not support additional shifting
(it's not too useful), so that needs an explicit lowering.
Switch to using AGX variants of the atomics, running our address pattern
matching on global atomics in order to delete some ALU.
This cleans up the image atomic lowering nicely, since we get to take full
advantage of the shift + zero-extend + add on the atomic... The shift comes from
multiplying by the bytes per pixel.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23529>
Most of the API stubs are very very trivial to generate as the sole
purpose of those are to deconstruct the returned `Result` object.
Sadly we can't use external crates yet, so "syn" and "qoute" can't be used
for this :'(
The code is kinda hacky, but we also don't expose this to other people, so
we can keep this as a big hack until we can use external crates.
I wish there was a better solution here.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23413>
When the RS uses the pixel pipes it seems to destroy/invalidate any
content sitting in the color and depth caches from a previous draw.
Always flush the color and depth cache before using the RS to make
sure that any cache content written by the PE is properly flushed
to memory.
Fixes spec@!opengl 1.0@gl-1.0-drawpixels-depth-test and probably a
few others that are suffering from corruption of PE writes.
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23530>
I want to have the possibility to use asan in for etnaviv ci. As lot
devices in the my CI farm are arm32 based lets do some prep work.
I had to skip the mesa:util suite as there are some asan problems
on 32bit platform with the hash_map. Once they got sorted out we can
enable the suite again.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23493>
Delete unnecessary virtual functions, we need just two. Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.
Rename the virtuals so overrides don't hide the common convenience
functions. Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
This is done to better match the terminology used by the kernel
and also because the follower may not always be ACE in the future.
- Gang: a group of command streams that are submitted to
more than one HW queue at the same time.
- Leader: the main command stream of a command buffer that works
on the queue type of the command buffer.
- Follower: a command stream on a different HW queue that doesn't
have a separate command buffer state and is submitted together
with its leader.
During submission, a follower must always precede the leader in
the submitted command streams array.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23462>
conditional render is only supposed to be enabled during renderpasses,
and this ends up doing mismatched start/stop in and out of renderpasses
affects:
GTF-GL46.gtf30.GL3Tests.conditional_render*
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23511>
Currently only the six vulkan 1.0 pre-defined formats are supported,
but some basic infrastructure that will be useful for implementing
VK_EXT_custom_border_color (and vulkan 1.1) is included.
Only formats currently listed in the pvr_format_table in pvr_formats.c
are currently supported. Unlike most (all?) other drivers, the PowerVR
hardware requires each entry in the border color table to be encoded
for every hardware format (of which there are 128 available, plus 128
for compressed formats).
Also in this commit:
- Two new constants in rogue_texstate.xml:
- IMAGE_WORD0_TEXFORMAT_MAX_SIZE, and
- SAMPLER_BORDERCOLOR_INDEX_MAX_SIZE; and
- A new device feature (tpu_border_colour_enhanced)
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21555>
Fixes multiple warnings when building with g++ 13.1.1 that look like
```
./src/gallium/drivers/r600/sfn/sfn_scheduler.cpp:1111:9: warning: ‘virtual void r600::CheckArrayAccessVisitor::visit(const r600::InlineConstant&)’ was hidden [-Woverloaded-virtual=]
1111 | void visit(const InlineConstant& value) override {(void)value;}
| ^~~~~
../src/gallium/drivers/r600/sfn/sfn_scheduler.cpp:1125:9: note: by ‘virtual void r600::UpdateArrayWrite::visit(const r600::LocalArrayValue&)’
1125 | void visit(const LocalArrayValue& value) override {
| ^~~~~
(...)
```
What's going on here is when mixing overloading and virtual functions,
compiler will warn when one of the variants is not overriden. So tell
it to also use the base class definitions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23509>
The firmware can write an 8-bit output buffer, but still needs
a 10-bit dpb allocation.
This also puts the 8-bit format after the 10-bit format though
apps should be smart enough to pick the correct one.
Reviewed-by: Lynne <dev@lynne.ee>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23476>
These functions are used by gl[Get]TexImage, which imposes no
alignment restructions on the void *pixels parameter.
This fixes an unaligned access in GTK's "gtk:gdk / memorytexture" unit
test on SPARC, which causes the test to fail.
Fixes: 45ae4434b5 ("util: Use bitshift arithmetic to unpack pixels.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23482>
So the shuffleup tests did a shuffle up with const 5,
we'd use invocation id (0..8) shuffle it down by 5,
get (-5..3), then call llvmshufflevector with that
which is totally illegal.
There might be a nicer way to fix this, but I can't see
it straight away, just bail on the fast path.
Fixes:
dEQP-VK.subgroups.shuffle.compute.subgroupshuffleup*
Cc: mesa-stable
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23484>
this adds sufficient handling to pass the vkd3d-proton tests as well
as running cts on zink, which is gonna have to be enough since there's
no vkcts
it works by dynamically generating a vk_cmd_queue list of commands just
like the regular cmd queue would generate, with the minor change that
the final link has a nulled next pointer to correctly handle buffer copies,
where the last link would otherwise have a next pointer pointing to the
original cmd list
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23394>
when passing around BDA, it's important to be able to link the pointer
back to the pipe_resource since BDA doesn't have an explicit lifetime
this mapping enables cmds to receive a BDA pointer and then map it back
to a pipe_resource in order to avoid gymnastics with dynamically creating
pipe_resource objects which may or may not be able to be freed
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23394>
With this patch, we compile separately
- general shaders (raygen, miss, callable)
- closest-hit shaders
- traversal shader (incl. all intersection / any-hit shaders)
Each shader uses the following scheme:
if (shader_pc == shader_va) {
<shader code>
}
next = select_next_shader(shader_va)
jump next
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22096>
Neither RENDER_SURFACE_STATE nor VDENC_SURFACE_CONTROL_BITS have a
Tiled Resource Mode field anymore. The RENDER_SURFACE_STATE field
was also overlapping with the L1 Cache Control settings field.
This also drops the assignment of that field in isl, because we were
just explicitly setting it to NONE (0) which is already the default
value genxml packing will give us. That saves us some ifdefs.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23449>
The FCV feature is documented to occur on regular render writes. Blitter
writes don't fall under that category. So, if the destination resource
of a blitter write had the FCV aux usage, we would have a good reason to
change the aux usage of the access to plain CCS_E.
We don't actually need to write code to handle this. iris doesn't use
any compression on blitter writes. Make that obvious with an assert.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23220>
The FCV feature is documented to occur on regular render writes. Images
are written to with the DC data port however.
By using plain CCS_E for image writes, we can avoid the COMPRESSED_CLEAR
aux state in more cases. Doing this can avoid full resolves or partial
resolves for future accesses.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23220>
We're going to start toggling between FCV_CCS_E and CCS_E. When
switching aux modes, flush_previous_aux_mode would typically perform
cache flushes for good reason. In the case of switching between CCS_E
with FCV on vs off, we haven't found aux mode flushing to matter. Treat
both CCS_E variants as equivalent to avoid extra cache flushing.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23220>
Rename the isl_aux_usage enum to clarify that it is optional on gfx125.
The new name comes from the Alchemist docs, where the feature is
referred to as "Fast Clear Optimization (FCV)".
The rename was done with this command:
git grep -l "GFX12_CCS_E" | xargs sed -ie "s/GFX12_CCS_E/FCV_CCS_E/g"
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23220>
Use the string itself as a key for searching -- and the internal
allocated name as a key when storing.
Because record_key_hash doesn't consider the name field, which is
the only used field for a SUBROUTINE type, the hash key was always
the same for all types. Using the name fixes this.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23277>
If the viewport center is not positive we can't express it
through the fine coordinates, which are unsigned, and we
need to use the coarse coordinates too.
Fixes new crashes in Vulkan CTS 1.3.6.0:
dEQP-VK.draw.renderpass.offscreen_viewport.*negative*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23489>
These are split between fine and coarse coordinates. We have only been
using fine until now, so we kept the same naming convention we had
prior to V3D 4.1 for simplicity, but we will start using the coarse
coordinates soon too.
Also, the signedness was reversed: coarse coordinates are signed and
fine coordinates are unsigned.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23489>
Previously the USC temps count was incorrectly being used for the
PDS temps count.
It was causing:
csbgen/pvr_packet_helpers.h:79:
__pvr_uint: Assertion `v <= max' failed.
This fixes the assert being hit on:
dEQP-VK.ubo.random.basic_arrays.2
dEQP-VK.ubo.random.all_shared_buffer.3
dEQP-VK.ubo.random.all_shared_buffer.48
dEQP-VK.ubo.random.all_out_of_order_offsets.3
This does not fully fix the tests though.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23356>
16K * 16K * 16bpp = 4G, which overflows int32, so layer_stride needs to
have 64 bits. More generally, any "byte_stride * height" computation
can overflow int32.
Use (u)intptr_t in some gallium and st/mesa places where we do CPU access.
Use uint64_t otherwise.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23389>
v_add can be dual issued on gfx11, v_med3 cannot.
Don't use v_add directly to still optimize omod(fsat(x)).
Foz-DB GFX1100:
Totals from 32702 (24.24% of 134913) affected shaders:
Latency: 475008203 -> 474928037 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 59226198 -> 59140787 (-0.14%); split: -0.14%, +0.00%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21402>
Instead of incrementing the seqno, just directly invalidate any existing
tex cache entries when we update a sampler view.
No reason not to just directly clear stale entries, and avoiding to re-
assign the seqno will simplify the next patch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23470>
The internal layout used with compression partially depends on the pixel format.
Some limited reinterpretation is definitely allowed (linear vs sRGB views of the
same physical format are documented by Apple as allowed). Some reinterpretations
are definitely forbidden (R8G8B8A8 vs R32, I think). At some point we'll need to
work out the exact rule. I suspect the answer is that "you can reinterpret iff
the Channels field matches". Meaning that R8G8B8A8_UNORM and B8G8R8A8_SINT would
be compatible, but not R16G16_UNORM. But I haven't tested that.
Fixes all fails in:
dEQP-GLES31.functional.image_load_store.*.format_reinterpret.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
Despite being mathematically equivalent, the following code sequences are not
bit-identical under IEEE 754 rules due to differing internal precision:
fadd16 r0l, r2, 0.0 z = f2f16 x
fadd16 r1h, r0l, r0h w = fadd z, y
versus
fadd32 r1h, r2, r0h f2f16(w) = fadd x, f2f32(y)
This is probably fine under GL's relaxed floating point precision rules, but
it's definitely not ok with the more strict OpenCL or Vulkan. It also is a
potential problem with GL invariance rules, if we get different results for the
same shader depending whether we did a monolithic compile or a fast link. The
place for doing inexact transformations is NIR, when we have the information
available to do so correctly. By the time we get to the backend, everything we
do needs to be bit-exact to preserve sanity.
Fixes dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_vertex. We believe that
this is a CTS bug, but it's a useful one since it uncovered a serious driver bug
that would bite us in the much less friendly Vulkan (or god forbid OpenCL) CTS
later. It also seems like a magnet for GL app bugs, the fp16 support we do now
is uncovering bad enough bugs as it is.
shader-db results are pretty abysmal, though :|
total instructions in shared programs: 1537964 -> 1571328 (2.17%)
instructions in affected programs: 670231 -> 703595 (4.98%)
total bytes in shared programs: 10533984 -> 10732316 (1.88%)
bytes in affected programs: 4662414 -> 4860746 (4.25%)
total halfregs in shared programs: 483448 -> 474541 (-1.84%)
halfregs in affected programs: 58867 -> 49960 (-15.13%)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
We now have support for baseline MSAA, except for support for eMRT. But hey,
this gets us 99% of the way there, so it's worth flipping on at least in
agx/next.
We can also advertise dual-source blending again. It was reverted since Chromium
freaks out with dual-source blending on a GL 2.1 driver, but since we're
advertising GL 3.1 now, it's ok.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
The driver needs to lower MSAA (because only it knows the sample count). MSAA
lowering depends on discards getting lowered (in order to get sample masks on the
discards for sample shading to work properly). Discard lowering depends on all
discards emitted. But the driver needs to lower clip planes which generates
discards. To break the circular dependency, we have the driver call the
discard lowering pass itself (in between lowering clip planes and lowering
MSAA). Technically, this is probably a layering violation but it's the least
gross solution I see.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
We already lower discard in NIR when depth/stencil writes are used in the
shader. In this patch, we extend that lowering for when depth/stencil writes are
not used, in which case the discard is lowered to a sample_mask instruction.
This is a step towards multisampling, since the old lowering assumed
single-sample and there's no way to express a sample mask with a standard NIR
discard instructions so we need to lower in NIR anyway for sample shading (i.e.
if a discard_if diverges between samples in a pixel).
This changes the lowering for discard_if to be free of control flow (instead
executing a sample mask instruction unconditionally). This seems to be slightly
faster in SuperTuxKart and slightly slower in Dolphin, but I'm not too worried
right now.
To make this work, we do need some extra lowering to ensure we always execute a
sample_mask instruction, in case a discard_if is buried in other control flow
(as occurs with Dolphin's ubershaders). So that's added too. We need that for
MSAA anyway, so pardon the line count.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
Including indirectly via discard/demote.
Fixes graphical artefacts in Chromium when API sample masks are hooked up, which
will result in fragment programs that do not write colour/depth but do a lone
sample mask write. These need tag writes enabled (according to a trace from
Metal for a case constructed to test this scenario).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
We need to control both sources to implement multisampling properly. The
semantic is something like:
foreach sample in the first mask {
if correspond bit in second bit set {
make sample live
} else {
make sample dead
}
}
But I'm reticent to document more formally until the details are really
understood and properly tested.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
This move is done for decouple glsl_types.h from src/mesa/*
This is achieved by move gl_texture_index from src/mesa/main/menums.h to src/compiler/shader_enums.h
And move ATOMIC_COUNTER_SIZE,MAX_VERTEX_STREAMS from src/mesa/main/config.h to src/compiler/shader_enums.h
Move include main/[config|menums].h into glsl/glsl_parser_extras.h from glsl_types.h
As now glsl_types.h should not include headers from src/mesa/*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23420>
The previous implementation was copying the data using the
aligned length (size_dw). The aligned length could overflow
the original buffer size.
For instance, this issue is triggered with "piglit/bin/draw-batch -auto -fbo":
==5736==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff139c77e8 at pc 0x7f25b350a9a0 bp 0x7fff139c6cb0 sp 0x7fff139c6460
READ of size 8 at 0x7fff139c77e8 thread T0
#0 0x7f25b350a99f in __interceptor_memcpy (/usr/lib64/libasan.so.6+0x3c99f)
#1 0x7f25a8fcdf24 in radeon_emit_array ../src/gallium/include/winsys/radeon_winsys.h:760
#2 0x7f25a8fcdf24 in r600_draw_vbo ../src/gallium/drivers/r600/r600_state_common.c:2448
#3 0x7f25a8ae7ba1 in u_vbuf_draw_vbo ../src/gallium/auxiliary/util/u_vbuf.c:1791
#4 0x7f25a7bc18ca in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:1696
#5 0x7f25a7bc7e53 in _mesa_DrawElements ../src/mesa/main/draw.c:1824
Fixes: 0cf5d1f226 ("gallium: remove PIPE_CAP_INFO_START_WITH_USER_INDICES and fix all drivers")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23436>
In many cases the raw value is not really helpful,
since we only work with enums and the raw value is
already printed for indices without special printing.
If an index benefits from having special printing AND the
raw value, we can include the printing of the raw value
as part of its handler.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23375>
Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
Via Coccinelle patch:
@@
expression a, b, c;
@@
-a.src = nir_src_for_ssa(b);
-a.src_type = c;
+a = nir_tex_src_for_ssa(c, b);
@@
expression a, b, c;
@@
-a.src_type = c;
-a.src = nir_src_for_ssa(b);
+a = nir_tex_src_for_ssa(c, b);
Plus manual fixups, including...
* a few identity swizzles changed to nir_trim_vector in TTN and prog-to-nir to
fix the Coccinelle-botched formatting, and similarly a pointless nir_channels
* collapsing a now-pointless temp in vtn
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
We just need to set the ps_inputs_read_or_disabled mask correctly.
The VS outputs_written mask should set BFCn instead of COLn, which is why
this removes the is_varying parameter that forced COLn to be set for BFCn.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22833>
The dist parameter to calculate_light_attenuation() is the
reciprocal of ||VP|| used in the distance attenuation formula (2.4).
So get its reciprocal first before applying it to the distance attenuation
formula.
This fixes a lighting issue in Knights of the Old Republic.
Fixes: c5b3d488f9 ("mesa/main: make ffvertex output nir")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collaborar.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23404>
Previously, the call to ensure_device_info in the intercepted ioctl
would eventually result in another call to ioctl, recursing until stack
overflow:
- ioctl (intercepted)
- ensure_device_info
- intel_get_device_info_from_fd
- intel_device_info_i915_get_info_from_fd
- getparam
- intel_ioctl
- ioctl (intercepted)
Signed-off-by: Benjamin Lee <benjamin@computer.surgery>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23418>
This required updating the expected results in a number of test. The
vast majority of these are cases where Gfx12.5 platforms don't allow
mixing F and HF sources.
In all honesty... I just updated the half_float_conversion expected
results until the test passed.
The next commit will add changes specific to Gfx12.5.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.
v3: Remove redundant patterns. Noticed by Matt.
shader-db:
DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15
total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32
Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.
All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.
fossil-db:
DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210
Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922
Lost: 90
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
There should not be any isub at this point due to lowerings that happened
ages before getting to late algebraic.
shader-db:
DG2
total instructions in shared programs: 23103769 -> 23103767 (<.01%)
instructions in affected programs: 65 -> 63 (-3.08%)
helped: 1 / HURT: 0
total cycles in shared programs: 842348074 -> 842347714 (<.01%)
cycles in affected programs: 28572 -> 28212 (-1.26%)
helped: 3 / HURT: 0
One compute shader in Assassin's Creed Odyssey was affected.
fossil-db:
DG2
Instructions in all programs: 196400668 -> 196400676 (+0.0%)
helped: 8 / HURT: 5
Cycles in all programs: 13931740724 -> 13931758003 (+0.0%)
helped: 8 / HURT: 7
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
Sources that are already W, UW, or HF can be represented as those types
by definition. Pass them through. Previously an HF source on a MAD would
have been marked as !can_promote. I'm pretty sure this means it would
get moved out to a register, but I did not verify this.
For ADD3, a constant source could be D or UD. In this case, the value
must be tested to determine whether it can be represented as W or
UW. The patterns in opt_algebraic won't generate an ADD3 with constant
source, so this problem cannot occur yet.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This only impacts ADD3, so at this point it should not have any
affect. As soon as constants are propagated into ADD3 instructions, it
will be a problem.
The worst part is, the ADD3 instrutions that are broken by the old code
aren't even "progress" on this pass.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This is the big execution one, it implement the draw mesh callback.
This executes task shaders and mesh shaders with no overlap.
When task shaders finish, the mesh shader is launched using the payload
data, unless no task shader is run, then the mesh shader is launched.
Once the mesh shader is finished the data is supplied to the draw
pipeline to convert to linear vertex data with no per-prim data,
then passed to draw for clipping and further processing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23066>
This allows generating task and mesh variants of compute shaders.
It adds:
- vertex and primitive outputs support - aos writing.
- payload support
- mesh iface for the output and count callbacks.
- draw_id
- multiple iteration support to the exec fn to allow launches
in multiple passes to reduce memory usage
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23066>
mesh shading has a payload to pass between task and mesh shaders,
this acts like shared memory as well, so we use the standard memory
hooks to access it.
This current adds the payload after the 12-byte header which will
contain the x/y/z grid sizes.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23066>
Fixes seg faults in:
dEQP-VK.binding_model.shader_access.primary_cmd_buf
.sampler_immutable.no_access.single_descriptor.*
dEQP-VK.binding_model.shader_access.primary_cmd_buf
.sampler_immutable.no_access.multiple_contiguous_descriptors.*
It does not fix them. Now they just hit asserts in the compiler.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23357>
Fix defect reported by Coverity Scan.
Unsigned compared against 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. val < 0U.
unsigned_conversion: val is converted to an unsigned type because it's compared to an unsigned constant.
Fixes: 480bdff4b5 ("pvr: Add support to process transfer and blit cmds")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23416>
We not passing very long string, but multiple separate packages, using
the array seems to be more logical and clear solution, without
shellcheck complaining about word spliting and risk making of accidental
mistakes (missed backslash etc.).
Shebang change, because let's have it same everywhere.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23425>
This is a prepare step to remove depends on p_defines.h in src/util/*
This is done by:
replace pipe_prim_type with mesa_prim
replace shader_prim with mesa_prim
replace PIPE_PRIM_MAX with MESA_PRIM_COUNT
replace SHADER_PRIM_ with MESA_PRIM_
replace PIPE_PRIM_ with MESA_PRIM_
This patch only replace code only
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23369>
Originally the engines size was set to I915_ENGINE_CLASS_VIDEO + 1,
where video was the largest value, and INVALID had a value of -1. Since
then a COMPUTE member was added to the enum, and the INTEL_ENGINE class
moved invalid to the last value.
CID: 1530425
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23384>
On error during wsi_wl_surface_create_swapchain(),
wsi_wl_swapchain_chain_free() is called, followed by vk_free() of the
recently freed chain.
Move the vk_free() to wsi_wl_swapchain_destroy() to avoid the double
free.
Fixes dEQP-VK.wsi.wayland.swapchain.simulate_oom.*
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23383>
Bo flink shares a bo in the same GPU between process, flink precedes
dma-buf so it lacks features like syncronization.
So to workaround that, here during iris_bo_flink() it is
attaching a dma-buf to the bo.
Then in imported process it tries to attach a dma-buf again but
as it was already create it is just ref counted.
From this point on importer and exporter can syncronize access like
any other dma-buf.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22544>
We emulate secondary command buffers via record/replay, we don't
need actual D3D objects, and having them can cause debug layer errors
due to getting out of sync with the state of the Vulkan object.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23403>
if a DS3 pipeline enabling dynamic samples is not bound when samples
are set dynamically, then such a pipeline is later bound, min samples
would have been incorrectly set to 1
instead, flag the update for later and do it just before draw
cc: mesa-stable
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23368>
The big pile of crashes were all either not covered by the fractional run
and a full run hadn't been run to update them, or unsupported where
Crash->Skip is not being reported as UnexpectedPass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23385>
Now it's implemented as a RMW of the FRAG_RESULT_COLOR output var (or
adjusting the store_output intrinsic's value for lowered i/o), which
should be reusable other places we might want to emit shader code for fog
(ARB_fp, fixed function fragment shaders).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23111>
Instead of treating all 16-bit values as "native 16-bit types,"
differentiate between concrete casts and mediump casts, where the
former requires native 16-bit types, and the latter only requires
DXIL min-precision. Additionally, UBO/SSBO loads/stores require
native 16-bit types.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23344>
The DXIL backend would like to distinguish between casts to 16-bit
that must cast, vs those that may. If a shader only ever produces
16-bit types from mediump casts and ALU ops on those values, then
the resulting shader can be annotated with DXIL's min-precision
qualifier, basically telling the driver to use 16-bit precision if
it's faster for them. If it uses concrete 16-bit casts, or loads/
stores to externally-visible memory, then it must use the "native"
16-bit flag, which is not supported on all hardware.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23344>
In commit 284f0c9a57 I refactored the
handling of the data source to just call a helper rather than special
casing opcodes with 0 or 2 sources. Unfortunately, I also dropped the
"else return 1", creating a fallthrough for all sources other than
SURFACE_LOGICAL_SRC_ADDRESS and SURFACE_LOGICAL_SRC_DATA.
The case below happened to return the correct value for all cases except
SURFACE_LOGICAL_SRC_SURFACE, which has been returning 2 instead of 1
since that commit.
Restore the else case. Thanks to Marcin Ślusarz for catching this.
Fixes: 284f0c9a57 ("intel/compiler: Add an lsc_op_num_data_values() helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23347>
Handle the other pipeline stats counters in order to implement
GL_ARB_pipeline_statistics_query. Note that this does away with
collecting *all* the counters if DEBUG_COUNTERS is enabled, other-
wise it was getting over-complicated.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23301>
We draw the gallium hud directly to the rendered buffer, meaning that if
the buffer age is queried and then a partial redraw is done, we get a
ghosting effect from the hud drawn in previous frames.
Since we need to draw the hud with updated values every frame anyway,
there's no harm in disabling the buffer age and partial redraw.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Chris Healy <cphealy@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20387>
This helper is useful to implement images. This implementation is based on the
one in Panfrost and extended to handle all pipe_image_views (notably including
tex2d_from_buf which did not exist when the panfrost version was written). It
will be used in both Panfrost and Asahi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23273>
Otherwise this can cause optimizations to fight resulting in infinite
optimization loops with opt_algebraic, constant_folding, and copy_prop.
Fixes: 368be872 ("nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23192>
For instance, this is triggered with "piglit/bin/ext_direct_state_access-named-program -auto -fbo":
==5695==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x606000050031 at pc 0x7f78dfca8d46 bp 0x7ffd9043b4a0 sp 0x7ffd9043ac50
READ of size 50 at 0x606000050031 thread T0
#0 0x7f78dfca8d45 (/usr/lib64/libasan.so.6+0x3fd45)
#1 0x7f78d450b18f in set_program_string ../src/mesa/main/arbprogram.c:385
#2 0x7f78d3fdbd3e in execute_list ../src/mesa/main/dlist.c:13025
#3 0x7f78d40c2564 in _mesa_CallList ../src/mesa/main/dlist.c:13451
#4 0x7f78d42f380a in _mesa_unmarshal_CallList ../src/mesa/main/glthread_list.c:43
#5 0x7f78d38e85c5 in glthread_unmarshal_batch ../src/mesa/main/glthread.c:122
#6 0x7f78d38ea20d in _mesa_glthread_finish ../src/mesa/main/glthread.c:382
#7 0x7f78d38ea20d in _mesa_glthread_finish ../src/mesa/main/glthread.c:347
#8 0x7f78d3d73f69 in _mesa_marshal_IsProgramARB src/mapi/glapi/gen/marshal_generated2.c:4256
Fixes: 0b196b40a3 ("mesa: don't compute the same SHA1 twice in glShaderSource")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23295>
As Matt Turner pointed out, the commit here fixed breaks in Iris and
ANV in kernel versions without support for DRM_I915_QUERY_ENGINE_INFO.
As compute engines are only present in gfx12 and newer, and support
for DRM_I915_QUERY_ENGINE_INFO was added before any gfx12 platform,
we can check for gfx version before trying to get engine info.
For ANV, this is done by checking if engine_info is not NULL, like in
other places in the ANV source code.
Fixes: a364f23a6c ("intel: Make gen12 URB space reservation dependent on compute engine presence")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9099
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23257>
This issue is happening on radeonsi. The reference was allocated
via _mesa_get_bufferobj_reference() with setup_arrays().
The same reference was never freed.
For instance, this issue is triggered on radeonsi with
"piglit/bin/gl-1.0-rendermode-feedback -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: ff8c2a1748 ("mesa/bufferobj: rename bufferobj functions to be more consistent.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22921>
When in an msaa feedback loop and when the image does not have tc-compat
cmask, we have to decompress and expand fmask. This can happen on gfx9
when sample count > 2 or when RADV_DEBUG=notccompatcmask is specified.
Fixes: a38de4c011 ("radv: disable tc_compatible_cmask on GFX9 in some cases")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23331>
Enabling or disabling the "microsoft-clc" option previously changed
shared logic for all compiler/clc users, which was surprising.
In addition, the option to avoid the use of system Clang headers at
runtime is useful outside the scope of Windows.
Separating the two concepts by making this a neutral feature option
addresses both matters.
Signed-off-by: Dor Askayo <dor.askayo@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23255>
==10199== Conditional jump or move depends on uninitialised value(s)
==10199== at 0xA107B13: radv_resume_queries (radv_meta.c:93)
==10199== by 0xA108097: radv_meta_restore (radv_meta.c:225)
==10199== Uninitialised value was created by a stack allocation
==10199== at 0xA1145B2: fill_buffer_shader (radv_meta_buffer.c:171)
saved_state is never memset, so the value should be inited.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23327>
Add proper support for ASTC texture compression in the v3d
gallium driver, instead of relying on the fallback software
conversion from gallium, as the hardware has native support
for ASTC compressed textures.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23260>
the base size of a vk_cmd_queue_entry is massive since there are a couple
union entries that have a trillion params. by allocating conditionally using
the union member size, memory can be reduced, which will affect some user-facing
api properties
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23322>
Fixes some tests when bindless is disabled, where the image format is
R32, we do atomics on it, but we didn't set the "typed UAV load with
additional formats" feature bit because when we loaded from it, we
only loaded one component. Since the image format on the DXIL side
was declared as U32x4, the DXIL validator said that we should have.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23266>
For loads/stores without formats, let's guess one based on how it's used.
The actual format doesn't matter, we just want to use it for the number
of components it has.
Also copy image formats from variables to intrinsics, to ensure that
deref-based intrinsics have formats assigned and lowered intrinsics
are up to date.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23266>
The GS and the FS needs to agree on the driver_location. But we just
used the num_outputs variable for the GS instead of matching the logic
from lower_aaline_instr in nir_draw_helpers.c.
This does that, but cleans up our copy slightly to avoid computing the
needless location, as well as using unsigned values.
This used to *mostly* work before, but only because we were lucky and
not too much crazy stuff went on with the inputs / outputs in
smooth-line cases.
Fixes: edecb66b01 ("nir: avoid generating conflicting output variables")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23316>
For:
v_mov_b32_e32 v0, 1.0
exp mrtz v0, off, off, off
we should completely remove the ALU entry before creating the EXP's WaR entry for v0.
Otherwise, the two will be combined into an entry which will wait for
expcnt(0) for later uses of v0.
gen_alu() should also be before gen(), since gen_alu() performs the clear
while gen() creates the WaR entry.
fossil-db (gfx1100):
Totals from 3589 (2.69% of 133428) affected shaders:
Instrs: 5591041 -> 5589047 (-0.04%); split: -0.04%, +0.00%
CodeSize: 28580840 -> 28572864 (-0.03%); split: -0.03%, +0.00%
Latency: 65427923 -> 65427543 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 11109079 -> 11109065 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23213>
Not sure we bumped it to 32 for the right reasons. This generates more
push constant data and because we're not tighly packing our push
constant data this can generate more register pressure.
We could tightly pack things at the cost of some CPU cycles but only
for some stages. RT stages would have to retain the current "sparse"
version of push constants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Now that descriptor sets are located a in a 1Gb area, we can avoid
storing the whole address to the descriptor and add the base address
of the area to a 32bit offset.
Replay a bunch of fossils with this and changes not really significant
one way or another :
Totals:
Instrs: 9278246 -> 9277148 (-0.01%); split: -0.01%, +0.00%
Cycles: 3547598421 -> 3547579435 (-0.00%); split: -0.00%, +0.00%
Totals from 353 (1.14% of 31021) affected shaders:
Instrs: 581546 -> 580448 (-0.19%); split: -0.23%, +0.04%
Cycles: 25885422 -> 25866436 (-0.07%); split: -0.31%, +0.24%
No difference on send messages or spills/fills.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
global load/store (or A64 messages) need the NIR bound checking which
is enabled by "robust" behavior even when robust behavior is disabled.
Many thanks to Christopher Snowhill for pointing out the pushed
constant related issue with the initial version of this patch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
This helps a lot with accessing surface handles in control flow. Our
resource_intel intrinsic has a non_uniform flag, in which case we
cannot apply this optimization. But in uniform cases, this is just a
massive win. We drop all kind of pipeline stalls due to
find_live_channel. We also reduce register pressure by doing the
surface handle computation in a single GRF (instead of 2 or 4).
There are some regressions in max dispatch width but those I think are
only on SIMD32 and due to the current heuristic disabling it after
throughput comparison with SIMD16. We know this heuristic is not
perfect, it should probably be updated in another change.
Here are some stats (all titles seem to have similar gains) :
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
red_dead_redemption2 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93%
---------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 4716 -37.29% -5.67% +0.95% +0.07% -81.26% -79.16% -70.62% -9.15% -8.47%
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93%
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
rise_of_the_tomb_raider_g2 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 11732 -37.27% -22.14% +0.01% +0.00% -99.01% -99.14% -98.65% -7.67% -5.11%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96%
PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
total_war_warhammer2 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62%
-----------------------------------------------------------------------------------------------------------------------------------
All affected 335 -28.31% -12.77% -82.35% -88.46% -66.67% -6.25% -7.24%
-----------------------------------------------------------------------------------------------------------------------------------
Total 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62%
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
witcher_3_dxvk_g2 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00%
------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 693 -41.93% -58.45% +0.09% +0.01% -98.52% -97.29% -98.10% -10.25% -1.33%
------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00%
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
The resouce_intel intrinsic doesn't not result in an actual
instruction, it's just a wrapper around another value, usually a
load_const.
Allowing this intrinsic to be moved anywhere means it's going to be
closer to the value it wraps, enabling opt_gcm to move a load_ubo
using this resource_intel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Intel HW has multiple ways to access resources like UBO/SSBO/images :
- binding tables : a small ~240 heap of surfaces
- bindless surfaces : a 64Mb heap of surfaces up to Gfx12+, 4Gb on Gfx12.5+
- surfaces : a 4Gb heap on Gfx12.5+ (mostly unused at the moment,
only available through the LSC)
For samplers, we have 2 options since Gfx11+ :
- samplers indexed from the Dynamic State Heap (4Gb)
- samplers indexed from the Bindless Sampler Heap (4Gb)
Additionally our whole push constant promotion mechanism is based
around binding table indices. This is problematic if you want to also
promote to push constants things that would be accessed through the
bindless heap.
To solve this issue, we introduce a new intrinsic that will cary a
block index that is not based off the binding table index nor the
bindless table offset.
We will also use this intrinsic to identify whether the buffer/surface
index in load_ubo/load_ssbo/store_ssbo/etc... is relative to the
binding table or the bindless heap.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Due to how the decoders work, they will write garbage data into
the padding, and later using the image for sampling with linear
images will use the garbage to create broken results. Let the
user specify the image size and align it up in the driver, so
sampling of the image later has the correct w/h.
cc: mesa-stable
Reviewed-by: Lynne <dev@lynne.ee>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23227>
This seems to be the best compromise I can come up with so far.
I can't figure out to get the tier2 programming to match between
264 and 265, maybe they are just programmed different here, good
old firmware.
Fixes: 1693c03a39 ("radv/video: add initial h264 decoder for VCN")
Reviewed-by: Lynne <dev@lynne.ee>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23227>
Add a Mesa base style instead of copying the same rules multiple times.
This is especially annoying with foreach macros, where every
.clang-format file maintains it's own incomplete list of the same
macros.
Adding a tree wide .clang-format allows other drivers to derive their
code style from whatever is considered default Mesa style.
Since clang-format doesn't allow us to derive ForEachMacros, driver
specific foreach macros have to be added to the common file.
Having a tree wide clang format should also help (new) contributers with
working oin parts of the tree that don't have their own .clang-format
file. (With regards to code formatting)
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23275>
.clang-format needs to exist in the root of the project for the target
to be generated, but since we don't have a global config it's a dummy,
empty file.
.clang-format-include lists the files (files! not folders) that are to
be formatted.
.clang-format-ignore lists the files to exclude, even if they are in the
include list above. Useful for vendored code for instance.
See https://mesonbuild.com/Code-formatting.html
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23269>
So far we have 12 jobs for v3d-gl (OpenGL/ES and piglit), 1 job for
v3d-traces, and 10 jobs for v3dv-vulkan, but we only have 21 rpi4
devices for testing.
So let's reduce from 12 to 10 jobs in v3d-gl, so all jobs can run
simultaneously.
Also, as the ideal goal is that each job doesn't take more than 15
minutes, let's increase a little bit the fraction for v3dv, and include
a fraction for v3d-gl as well, so all jobs are ideally under the time
limit.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23285>
Let each of the different blit paths to decide if they need to do the
blit operation based on the blit mask, and update the mask once the
operation is done.
This allows to call all the different versions without needing to check
if they success or not.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23136>
radv_destroy_shader_upload_queue waits for a semaphore, which will in turn
call query_reset_status on hw_ctx that will fail due to being already
destroyed.
Fix radv/amdgpu: amdgpu_cs_query_reset_state2 failed. (-9) spam in the logs
with RADV_PERFTEST=dmashaders.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23276>
The existing guardband region calculation was mixing up x/y_min with
x/y_max in cmd_buffer_emit_viewport(), causing the calculated viewport
area to always be an empty region. Luckily intel_calculate_guardband_size()
returns a non-empty but bogus guardband region in that case, so this
doesn't seem to have led to conformance regressions, but the
off-center guardbands could potentially impact performance in
geometry-heavy rendering.
Fixes: 893fa30afe ("anv: Include scissors in viewport calculations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23174>
My second attempt at a minimally invasive reshuffle of the uAPI,
this time only forcibly aligning structures to 32-bits or 64-bits
depending on the members which follow, so that 64-bit userspace
is identical to the current unmerged kernel module, and the 32-bit
compat userspace aligns with that, and functions rather than
crashing.
Should work just fine with the current drm-xe-next Git tree, which
is rebased on 6.3.0-1, with a few extra changes, as of this commit.
Based on commit 2cd469458fcc24c5f345ad39721a1aedaf70ec0f ("drm/xe: Add explicit padding to uAPI definition")
Signed-off-by: Christopher Snowhill <kode54@gmail.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22652>
DISABLE_FOR_AUTO_INDEX disables primitive restart for non-indexed draws.
We set it in the preamble first, so that non-indexed draws can completely
ignore the primitive restart state.
There is a little bit of duplication that's needed to enclose the primitive
restart code in "if (index_size)" for indexed draws.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23216>
This adds a little extra work, since now dominance is computed and
blocks that don't just have then-return or else-return are looked at.
However it means that nir_lower_returns can now keep phis up to date
by inserting undefs without causing some phis to become non-trivial.
This ends up obviating a couple of tests for lower_returns.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22913>
We don't need to add an offset in the buffer, because we submit
the offset where the data was written to to the host. The
correction of this offset is also not needed and results in draw
errors.
Fixes: 0cf5d1f226
gallium: remove PIPE_CAP_INFO_START_WITH_USER_INDICES and fix all drivers
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23196>
The auto-detection code currently looks for a repo called "mesa" in the
current user's fork (ie. the user providing the api token), which is great for
the common use case, but sometimes needs to be able to be overridden, such as
when running a pipeline in another fork than one's own, when working with
someone else in their fork.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23230>
sample_mask_agx maps to the AGX instruction used to write out a sample mask.
api_sample_mask_agx is a system value that returns the value of glSampleMask
(or its Vulkan equivalent), used to lower glSampleMask (etc).
This is distinct from sample_mask_in, which we map to the hardware thing and
AND with this as a lowering.
sample_positions_agx is a system value returning the sample positions in a
packed fixed-point format matching the hardware register, used to lower
gl_SamplePositions.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23040>
A bunch of Piglits cause crashes, at least when run with PAN_MESA_DEBUG=sync.
For many, the crashes are due to faults. Although Piglits are nominally
process-isolated, faults can leak across processes to subpar recovery, meaning
these crashes are liable to cause robust passing tests to flakes. So, skip any
tests known to crash to make sure the coverage is solid.
Given that we run piglit on panfrost in pre-merge CI, but there's nobody
actively working on fixing piglits for panfrost, I think this is the best
compromise. It means we get to keep the coverage (and ensure we don't regress
piglits that are currently passing) but we don't risk flaking CI. Currently
deqp-runner is eating massive numbers of piglit flakes. While it's really great
that the infrastructure is robust in that way, it'd be better to not have those
flakes in CI in the first place (for run time, if not robustness).
If someone starts hacking on Bifrost + desktop OpenGL again for some reason and
fixes these tests locally, they can reenable them then.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23235>
Already in hard-freeze, so we don't have to worry about breaking changes.
Significant changes:
- LLVM 15 is used instead of 11 or 13
- /dev/shm has to be manually mounted
- Debian 12 uses libdrm 2.4.114
- reworked creating of rootfs, from debootstrap to mmdebstrap
- split `create-rootfs.sh` into `lava_build.sh`, `setup-rootfs.sh`, and `strip-rootfs.sh`
- dropped winehq repository for now (Debian wine is up-to-date enough)
- we use wine now, no need to call explicitly call wine64
- bumped libasan from version 6 to 8
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21977>
We have been relying on NIR's gather info pass for this
but it is not safe unless we are certain we are always
calling it after any other pass that may emit a control
barrier.
As it stands, nir_zero_initialize_shared_memory can emit a
control barrier and we don't call the gather info pass after
it, which is problematic. The only reason this is not really
a problem right now is because for non-scoped barriers (which
is what we currently use) it doesn't emit a scoped barrier, just
a regular memory barrier (which is probably a bug in the pass!),
but as soon as we move to scoped barriers, this is going
to be a problem, since we need to know when we emit a control
barrier to ensure supergroup calculations prevent deadlocks at
the barrier op.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23228>
Rather than translate piles of discrete memory_barrier/control_barrier
instructions, translate the unified scoped_barrier which maps almost directly to
SPIR-V's barrier. Yes, this means I cheated off vtn for the implementation.
v2: Use existing scope translation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23186>
If two threads deserialize the raw object at the same time, the
refcount could be more than 1 temporarily.
This can be reproduced with Granite during the multi-threaded pipeline
cache pre-warm on startup, and also with Dota2.
Fixes: cbab396f54 ("vulkan/pipeline_cache: replace raw data objects on cache insertion of real objects")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22853>
Vulkan says you can do things like image resolves or blits on transfer
queues, but D3D only allows literal copies. We could try to emulate
a Vulkan transfer-only queue backed by multiple D3D queues, where we
use the copy queue when possible but fall back to compute when needed,
but let's wait until there's a good reason to do that...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23218>
Forcing bindless on is nice for apps that don't use EXT_descriptor_indexing,
but for the CTS, whenever EXT_descriptor_indexing is supported, it's used.
To be able to more thoroughly test the not-bindless path, add a debug flag
to turn it off.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23218>
pvr_gpu_upload() can't be used in the case of pvr_gpu_upload_usc() as it expects
the source and destination buffers to be the same size. This isn't the case
because pvr_gpu_upload_usc() adds some padding bytes to the size passed in by
the caller.
Fixes: 547a10f870 ("pvr: switch pvr_cmd_buffer_alloc_mem to use pvr_bo_suballoc")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23185>
Usually the register demand before an instruction would be considered part
of the previous instruction, since it's not greater than the register
demand for that previous instruction. Except, it can be greater in the
case of an definition fixed to a non-killed operand: the RA needs to
reserve space between the two instructions for the definition (containing
a copy of the operand).
fossil-db (navi21):
Totals from 5 (0.00% of 135636) affected shaders:
PreVGPRs: 35 -> 40 (+14.29%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8807
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22446>
We make the compiler assume the worst possible case (it's not great
because we have to burn 32 GRFs of potential input data) and then we
push the actual value through push constants.
This enables VK_EXT_gpl usage on zink, which causes two traces to change
their results. Raven is an imperceptible change, blender has missing
original pngs but looks plausible.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22378>
We need to do 3 things to accomplish this :
1. make all the register access consider the maximal case when
unknown at compile time
2. move the clamping of load_per_vertex_input prior to lowering
nir_intrinsic_load_patch_vertices_in (in the dynamic cases, the
clamping will use the nir_intrinsic_load_patch_vertices_in to
clamp), meaning clamping using derefs rather than lowered
nir_intrinsic_load_per_vertex_input
3. in the known cases, lower nir_intrinsic_load_patch_vertices_in
in NIR (so that the clamped elements still be vectorized to the
smallest number of URB read messages)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22378>
nir_registers are only supposed to be used temporarily. They may be created by a
producer, but then must be immediately lowered prior to optimizing the produced
shader. They may be created internally by an optimization pass that doesn't want
to deal with phis, but that pass needs to lower them back to phis immediately.
Finally they may be created when going out-of-SSA if a backend chooses, but that
has to happen late.
Regardless, there should be no case where a backend sees a shader that comes in
with nir_registers needing to be lowered. The two frontend producers of
registers (tgsi_to_nir and mesa/st) both call nir_lower_regs_to_ssa to clean up
as they should. Some backend (like intel) already depend on this behaviour.
There's no need for other backends to call nir_lower_regs_to_ssa too.
Drop the pointless calls as a baby step towards replacing nir_register.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23181>
All platforms supported by Iris will have aperture_bytes set as 4Gb.
Also this value is not the actual aperture in i915, it actualy is the
GGTT size.
So here replacing it by the sram size, something that will vary
depending in the amount of RAM available.
This fix some tests with Xe KMD, as it is not setting aperture_bytes.
And will not do that as there is no UAPI to fetch this information
and it is not planned to it to Xe UAPI.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ack-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22969>
disable_tmu_pipelining has been recently set to false on two
strategies that should set it to true.
Fixes the following CTS test:
dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite
Fixes: c950098ab - broadcom/compiler: move buffer loads to lower register pressure
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23207>
Ideally we would like to trigger a compilation error like we do on
v3dv (VK_ERROR_UNKNOWN). But with v3d we can't really do that, as this
could happen on a draw call. Let's at least assert so debug builds
stops at this point.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23203>
Right now if we fail to register allocate, we return the qpu_insts
that we had at that point, even if the driver can't really use it.
Also v3dv_pipeline was already assuming that it would return NULL on
failure, returning VK_ERROR_UNKNOWN on that case.
This allows CTS tests with a lot of pressure, that regress now and
then to not being able to allocate, to finish with an error, instead
of blocking forever. For example:
dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23203>
Indeed, the locally allocated "stimg" reference was not freed
on a specific code path.
For instance, this issue is triggered on radeonsi or r600 with:
"piglit/bin/egl-ext_egl_image_storage -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: 6a3f5c6512 ("mesa: simplify st_egl_image binding process for texture storage")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23165>
This fixes an assert crash in UE4 when forcing the blit path for
image copies, caused by an image copy of a small miplevel which
pixel size is smaller than a single compressed block, leading to
an empty blit region.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23180>
When the fragment shader reads the VRS builtin, VRS flat shading
shouldn't be enabled, otherwise the value might not be what the FS
expects.
Fixes dEQP-VK.fragment_shading_rate.renderpass2.monolithic.multipass.*
on RDNA2 (VRS flat shading isn't yet enabled on RDNA3).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23187>
Convert various run-time conditionals into a single draw type
determination, and use template specialization to generate unique
optimized code paths for each. This also lets us fold the WFM needed
in some cases into normal barrier flushes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23149>
As a side effect solve:
```
[91/1401] Compiling C object src/util/libmesa_util.a.p/u_cpu_detect.c.o
../src/util/u_cpu_detect.c: In function '_util_cpu_detect_once':
../src/util/u_cpu_detect.c:889:11: warning: 'regs2[2]' may be used uninitialized [-Wmaybe-uninitialized]
889 | if (((regs2[2] >> 27) & 1) && // OSXSAVE
| ^~~~~~~~~~~~~~~~~~~~~~
../src/util/u_cpu_detect.c:823:16: note: 'regs2[2]' was declared here
823 | uint32_t regs2[4];
| ^~~~~
```
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23164>
To add mesh/task later we have to loop over more stages the other side
of compute. So explicitly skip compute for now.
This has a couple of subtle bits to it, and I think there might be a bug
in pre rast
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23065>
With Anv/Zink, the piglit test :
arb_shader_storage_buffer_object-max-ssbo-size -auto -fbo fsexceed
is failing validation after copy propagation :
load_payload(8) vgrf15:F, vgrf1+0.12<0>:F, vgrf1+0.0<0>:F, vgrf1+0.4<0>:F, vgrf1+0.8<0>:F, vgrf1+0.12<0>:F
../src/intel/compiler/brw_fs_validate.cpp:191: A <= B failed
A = inst->src[i].offset / REG_SIZE + regs_read(inst, i) = 2
B = alloc.sizes[inst->src[i].nr] = 1
In most cases it works because src[0] would be at offset 0 and so
reading a full reg passes validation, but Anv/Zink started emitting
slightly different code adding an offset maybe the size read 2 GRFs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23126>
Its http cache proxy has been returning `curl: (52) Empty reply from
server` for a while and rebooting it didn't help, so turn it off for now.
Suggested-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Eric Engestrom <eric@igalia.com>
Improve readability by using an auxiliar
struct v3d_device_info *devinfo = &screen->devinfo;
this was triggered by the use of the v3d_X macro, where just having a
devinfo makes is more friendly. As we are here, we used it on other
places of the code.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23172>
this adds indexing for ssa/reg defs with the accompanying current
type of a given def (inaccurate for objects but whatever), enabling
that type to be used directly in order to avoid bitcasts in some places
this upends the assumption that all stored srcs are uint type
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22934>
has_work controls whether a flush can be deferred, i.e., when unset
a flush may be deferred
since a promoted cmd must still be flushed to take effect, ensure this
is always set when promoted cmds are pending
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23035>
when needs_present_readback is set, reordering is disabled without hitting
the path that would normally disable promotion for the resource, so this
needs to be changed manually to avoid layout desync on the swapchain
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23035>
resources use a private refcount to avoid overhead from atomics on
descriptor binds, but this has the side effect of evading batch usage,
meaning that the usage may not be properly removed once the batch state
is reset, which will cause issues with detecting whether usage exists
for a given resource
to fix this, the mechanism for tc fence disambiguation can be reused,
namely adding the batch state's submit count to the usage info and
then using that to add a second set of comparisons such that it becomes
possible to check both whether the batch usage for a resource matches
a given batch AND whether the batch usage is the current state of the
batch
affects:
KHR-GLES3.copy_tex_image_conversions.required.cubemap_posy_cubemap_negz
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23035>
in a scenario where an ordered read op occurs for an image,
successive read-only barriers SHOULD be able to be promoted
...but they can't, because there isn't yet a mechanism for handling layout
transitions between the unordered cmdbuf and the ordered cmdbuf,
meaning that promoting e.g., a SHADER_READ_ONLY barrier after a TRANSFER_SRC
barrier will leave the image with the wrong layout for the transfer op:
TRANSFER_SRC(unordered) -> COPY(ordered) -> SHADER_READ_ONLY(unordered)
becomes
TRANSFER_SRC(unordered) -> SHADER_READ_ONLY(unordered) -> COPY(ordered)
ideally I'll get around to figuring this out at some point
affects:
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.texture2d_array_to_renderbuffer
Fixes: bf0af0f8ed ("zink: move all barrier-related functions to c++")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23035>
This is useful for iris to know what formats will be used for copy
operations.
The new function introduces a couple refactors. It makes use of the
ISL_GFX_VER() macro and it also makes more use of the
isl_surf_usage_is_depth() function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23087>
In blorp_copy, instead of checking if the surface's aux-usage is CCS_E,
check if its format supports CCS_E.
ISL won't report that a surface supports CCS_E if its format doesn't, so
this should strictly widen the scope of surfaces included in this path.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23087>
OpenGL has a goofy feature that allows creating an image view of a single layer
of an array texture... in which case that image is treated as non-arrayed in
shader. If you have a 16x16x16 3D texture and bind the third layer, you get a
16x16 2D texture instead of a 16x16x1 3D texture. That distinction matters to
the hardware on AGX, since the texture dimension needs to match between the
shader and the pipe_image_view. If the shader is going to use image2D, we need
to know that the pipe_image_view should be treated as 2D (even though the
underlying resource is 3D).
"But, Alyssa, we already have first_layer and last_layer. Surely you can just
check if first_layer == last_layer?" you ask. The problem is that doesn't
distinguish a 16x16x1 3D texture (accessed as image3D in the shader) from a
16x16 slice (accessed as image2D in the shader) of a 16x16x16 3D texture. To
solve, we add a boolean flag indicating we want to create a view (with a lower
dimension than the underlying resource). This provides an unambiguous way to
communicate this case to drivers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23142>
Distinguish between the "entry point not found" and "parsing error"
cases in the error text. For consistency, identify the unhandled
specialization index case as part of the verification function.
The verification function was renamed to make clearer its scope and
what module it belongs.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22976>
We were already lowering image atomics to lea_image + global atomic. It's a lot
nicer to make that lowering explicit in the NIR. This is much bigger win than in
the Bifrost compiler since here lea_image is used only for atomics, and here it
wasn't well abstracted in the compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23120>
Copypaste fail when switching to unified atomics, missed becuase I don't have
any Valhall hardware and Valhall isn't in CI. (Good news, that means it probably
didn't affect anyone in the mean time :-p)
Fixes crashes with lots of dEQP-GLES31 tests observed under drm-shim.
Fixes: e258083e07 ("pan/bi: Use unified atomics")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23120>
Hardware that lacks dedicated image atomics can still implement image atomics
with regular atomics on global memory, as long as there is a way to get the
address of a texel in memory. I've open-coded this lowering in my first 2
compilers, so before I add another crappy vendored version in my 3rd, let's add
a common NIR pass to do the lowering.
Thanks to unified atomics, the pass itself is fairly concise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23120>
Some hardware has an instruction to load the address of a texel in a writeable
image, given the coordinates ("LEA_IMAGE"). This operation is defined only for
uncompressed images, but it is well-defined regardless of the underlying
twiddling. As such, it is not expected to be produced by APIs but is useful for
internal lowering when it is known that images will be uncompressed (e.g.
because image_store does not support compression on the hardware).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23120>
in the case where a draw is triggered after a flush, zink_update_descriptor_refs
will be called to set batch tracking for descriptors. this function also
handles refs for fb attachments, and everything is usually fine there
the problem with this approach is that tracking is no longer set on view
objects at renderpass begin, which makes them susceptible to early deletion
if a rp isn't started from a draw call
instead, apply batch tracking to fb attachment resources on renderpass
begin if the BATCH_CHANGED flag is set (need to rename this at some point)
in order to guarantee that the resource (object) lifetime will match the
cmdbuf runtime [since imageviews are now only freed upon batch completion]
fixes#9059
Fixes: f6bbd7875a ("zink: remove batch tracking/usage from view types"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23132>
Yuzu is running into a segfault because it writes the push descriptor
twice with 2 different layouts, but without a draw/dispatch in
between.
First vkCmdPushDescriptorSetKHR() writes descriptor 0 & 1 with a
uniform buffer. We toggle the 2 first bits of
anv_descriptor_set::generate_surface_states.
Second vkCmdPushDescriptorSetKHR() writes descriptor 0 with uniform
buffer and descriptor 1 with an image view. The first bit of
anv_descriptor_set::generate_surface_states stays, but the second bit
was already set before and it should now be off.
When we finally flush the push descriptor, we try to generate a
surface state for descriptor 1, but there is no valid buffer view for
it, we access an invalid pointer and segfault.
This fix resets the anv_descriptor_set::generate_surface_states when
the descriptor layout changes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b49b18f0b7 ("anv: reduce BT emissions & surface state writes with push descriptors")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23156>
We have an optimization for non-uniform if/else where if all channels meet the
jump condition we emit a branch to jump straight to the ELSE block. Similarly,
if at the end of the THEN block we don't have any channels that would execute
the ELSE block, we emit a branch to jump straight to the AFTER block.
This optimization has a cost though: we need to emit the condition for the
branch and a branch instruction (which also comes with a 3 delay slot), so for
very small blocks (just a couple of ALU for example) emitting the branch
instruction is typically worse. Futher, if the condition for the branch is not
met, we still pay the cost for no benefit at all.
Here is an example:
nop ; fmul.ifa rf26, 0x3e800000, rf54
xor.pushz -, rf52, 2 ; nop
bu.alla 32, r:unif (0x00000000 / 0.000000)
nop ; nop
nop ; nop
nop ; nop
xor.pushz -, rf52, 3 ; nop
nop ; mov.ifa rf52, 0
nop ; mov.pushz -, rf52
nop ; mov.ifa rf26, 0x3f800000
The bu instruction here is setup to jump over the following 4 instructions
(the last 4 instructions in there). To do this, we pay the price of the xor
to generate the condition, the bu instruction, and the 3 delay slots right
after it, so we end up paying 6 instructions to skip over 4 which we pay
always, even if the branch is not taken and we still have to execute those
4 instructions. With this change, we produce:
nop ; fmul.ifa rf56, 0x3e800000, rf28
xor.pushz -, rf9, 3 ; nop
nop ; mov.ifa rf9, 0
nop ; mov.pushz -, rf9
nop ; mov.ifa rf56, 0x3f800000
Now we don't try to skip the small block, ever. At worse, if all channels
would have met the branch condition, we only pay the cost of the 4
instructions instead of 6, at best, if any channel wouldn't take the
branch, we save ourselves 5 cycles for the branch condition, the branch
instruction and its 3 delay slots.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23161>
This shouldn't have been enabled at all. Depth-stencil formats were
accidentally disabled but not depth-only or stencil-only formats.
This doesn't seem allowed by DX12 and both AMD/NVIDIA don't enable it.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23122>
Super sampling on a 4K screen could hit this. 16k seems pretty big
but this image is only created on RDNA2 and on-demand if VRS attachments
are used without depth-stencil attachments, which should be rare
enough to care.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23105>
The idea of this pass is to promote small bit-sizes to larger, supported
bit-sizes for certain operations. It doesn't handle emulating large
bit-size operations on smaller bit-sizes; passes like nir_lower_int64
and nir_lower_doubles handle that.
So, assert that we aren't shrinking the bit-size, as this will almost
certainly produce incorrect results.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23123>
We only support 32-bit versions of ufind_msb, find_lsb, and bit_count,
so we need to lower them via nir_lower_int64.
Previously, we were failing to do so on platforms older than Icelake
and let those operations fall through to nir_lower_bit_size, which
used a callback to determine it should lower them for bit_size != 32.
However, that pass only emulates small bit-size operations by promoting
them to supported, larger bit-sizes (i.e. 16-bit using 32-bit). It
doesn't support emulating larger operations (i.e. 64-bit using 32-bit).
So nir_lower_bit_size would just u2u32 the 64-bit source, causing us to
flat ignore half of the bits.
Commit 78a195f252 (intel/compiler: Postpone most int64 lowering to
brw_postprocess_nir) provoked this bug on Icelake and later as well,
by moving the nir_lower_int64 handling for ufind_msb until late in
compilation, allowing it to reach nir_lower_bit_size which broke it.
To fix this, we always set int64 lowering for these opcodes, and also
correct the nir_lower_bit_size callback to ignore 64-bit operations.
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23123>
Using nir_gather_ssa_types works much better. There's 2 differences
compared to what I was doing before:
1. Multiple passes to allow data to propagate forward and backward
through the whole shader.
2. Allowing a value to have indeterminate types due to having both
int and float usages.
So this deletes some code and gets better results. Wish I'd known
this existed last week.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23062>
In i915 if the device has local memory it can only mmap bo with
I915_MMAP_OFFSET_FIXED, so all this set of ANV_BO_ALLOC_WRITE_COMBINE
were useless.
In Xe KMD there is no way to change mmap mode for all GPUs types.
So we can nuke bo->map_wc, ANV_BO_ALLOC_WRITE_COMBINE and related
dead code.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22483>
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT is also set in all memory types of
integrated GPUs.
This flag means that memory will be allocated in the most efficient
place for the GPU to access, which is true in integrated GPUs.
However, this was causing ANV_BO_ALLOC_WRITE_COMBINE to be set in
integrated GPUs in the block right below when allocating in the non-cached memory type.
But the comment only talks about lmem, so to still keep the write
combine behavior for iGPUs it was used VkMemoryPropertyFlags in mmap_calc_flags().
Additionally, this was causing anv_bo.has_implicit_ccs to always be
set, which could change the expected behavior of
anv_BindImageMemory2() in MTL.
Fixes: fbd32a04da ("anv: add a third memory type for LLC configuration") added a new heap
Fixes: 582bf4d9f7 ("anv: flag BO for write combine when CPU visible and potentially in lmem")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22483>
To ensure proper SSH functioning, the device IP should be added to the
LAVA device dictionary by setting device_ip. LAVA will then map the
value to lava-target-ip.
meson-g12b-a311d-khadas-vim3-cbg-4 has an IP in the dictionary, while
sun50i-h6-pine-h64-cbg-1 and meson-g12b-a311d-khadas-vim3-cbg-2 do not.
Since some devices are not yet properly configured, and device tag
fixing is not an option here, let's temporarily switch to a job
definition based on UART, until it gets fixed.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
To use the supported job definition depending on some Mesa CI job
characteristics.
The strategy here, is to use LAVA with a containerized SSH session to
follow the job output, escaping from dumping data to the UART, which
proves to be error prone in some devices.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
Create a separate job definition that runs the job via SSH session.
The DUT test only sets up the SSH server via dropbear, and another
deployed docker runner in LAVA dispatcher access the DUT via SSH with
pseudo-terminal to propagate the logs in real time.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
There's little point in emitting GLSL IR for the fixed-function fragment
shaders, when we can emit NIR directly instead.
This simplifies things a bit, and makes the fixed-function vertex and
fragment shaders look a lot more alike.
The reason the old code did the splats, was that TEXENV_SRC_ZERO and
TEXENV_SRC_ONE returned scalars. I decided to keep it vector, and let
the nir optimization passes clean this up instead when needed, as that
keeps the code a bit more straight forward.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22755>
The function didn't properly handle the case where the requested capacity was
less than the existing capacity. This led to the loop limit being some huge
number and it writing past the end of the 'buffers' array.
Partially fixes:
dEQP-VK.renderpass.suballocation.multisample_resolve.r16g16b16a16_unorm
.max_attachments_8_samples_2
The test no longer hangs, but segfaults instead.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23104>
Since this util moved to public place it turned out it could cause
a build error on current CI like the following:
../src/util/vl_vlc.h:225:52: error: 'vlc.data' may be used uninitialized in this function [-Werror=maybe-uninitialized]
225 | assert(vl_vlc_valid_bits(vlc) >= num_bits || vlc->data >= vlc->end);
| ^~
../src/util/vl_vlc.h:225:65: error: 'vlc.end' may be used uninitialized in this function [-Werror=maybe-uninitialized]
225 | assert(vl_vlc_valid_bits(vlc) >= num_bits || vlc->data >= vlc->end);
| ^~
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22202>
The PowerVR IP does not contain display hardware - this means it is
always necessary to open two separate devices for render and display.
The try_create_for_drm callback is not suitable for this configuration,
so we use the enumerate callback instead.
The previous implementation did not check that the discovered display
device was compatible with the render device - this is corrected by
unifying the compatibility lists into pvr_drm_configs.
The pvr driver is not currently supported on systems which contain
multiple compatible render or display devices, so the enumerate callback
implementation returns the first discovered render device and its
compatible display device.
This change also removes the workaround for drmGetDevices2() required
after libdrm commit 8cb12a2528d795c45bba5f03b3486b4040fb0f45. The
upstream fix has been in releases of libdrm for over a year now, and
mesa requires reasonably a recent version which is new enough to
contain it.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23023>
This sets us up to make use of the common physical device initialization
code.
As well as lifting the fd handles out of the implementations, this
pushes creation and destruction of the fds into the winsys layer. In
order for this to make sense, the winsys object is now created *before*
each pvr_device or pvr_physical_device. If there's an error setting up
the winsys instance, there's no point in continuing to create either
one.
Also lifts alloc to the winsys layer since there's nothing special about
it in either implementation.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23023>
The PBE input format must match format in clear color registers.
Fixes image clearing for following formats:
- B4G4R4A4_UNORM_PACK16
- A8B8G8R8_UNORM_PACK32
- R5G6B5_UNORM_PACK16
- A1R5G5B5_UNORM_PACK16
- R8G8B8A8_SNORM
- R8G8_UNORM
- R8G8_SNORM
- R8_UNORM
- R8_SNORM
- A2B10G10R10_UINT_PACK32 - only packmode U32 supported
For some of the norm formats the clear color register format was
changed from integer (pvr_float_to_sfixed) to float (pvr_float_to_f16).
This change was done to match the default PBE emit Norm settings.
An alternative way to fix this would have been to change the PBE
Norm setting.
Signed-off-by: Oskar Rundgren <oskar.rundgren@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23082>
Previously we allocated the sync prim at the end of the block and
also freed from the end. This is problematic if things are freed
out of order and some new ones allocated within the frees.
This commits uses the idalloc to keep track of the sync prim block
allocations.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23076>
Float conversions continue to be lowered early at the same time as
nir_lower_doubles, which we run early so we don't have to run it for
every shader key variant. However, all other int64 lowering is now
done late, after nir_opt_load_store_vectorize(), allowing it to
comprehend basic arithmetic on 64-bit addresses.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23064>
We'd like to postpone most int64 lowering until pretty late in the
process, because e.g. turning iadd@64 into (unpack + add-low + add-high
+ compare + b2i32 + repack) sequences makes it difficult for many
optimization passes to detect basic arithmetic patterns. In particular,
nir_opt_load_store_vectorizer becomes unable to handle basic offset math
on 64-bit addresses.
We'd like to do double precision lowering earlier in the process,
however. One snag is that nir_lower_int64's lower_2f and lower_f2 can
produce operations that may need lowering by nir_lower_doubles(), so
it's crucial to run those sets of lowering together.
To handle this, we make a new entrypoint that does nir_lower_int64
but skips everything except float conversions. Note that the newly
produced instructions will still be lowered according to the full set
of int64 lowering options; this shouldn't be a huge deal.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23064>
Some code still used our custom `status` field within the command
buffer. This could lead to unreliable error handling since we're
using the common vk_command_buffer handling.
This commit removes the field and changes the error paths to use
the common code instead.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23069>
On RDNA1&2, the driver needs to support both NGG and legacy for
primitives generated query because we can't know that before starting
queries.
To get the query pool results, we check the availability bit wrote by
the SAMPLE_STREAMOUTSTATS packet but the GDS copy was emitted after,
which means the availability bit might be TRUE before the GDS copy is
actually done.
Fix this by emitting the GDS copy before to ensure the availability is
TRUE for both results.
This fixes recent updates in
dEQP-VK.transform_feedback.primitives_generated_query.* because the
tests no longer wait for the fence.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23080>
Commit c65bde7b1e introduced a regression where under certain
circumstances `front` may be NULL, thus leading to a crash. It's not
currently known what exactly causes `front` to become NULL, nor we can
revert the offending commit, because there had been too many unrelated
changes that now depend on this commit.
So until someone comes up with a proper fix, let's add a workaround so
instead of crashing we just return from the function early.
This commit was tested with the bug `8982` and helps with the crash
with no other noticeable problems.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8982
Fixes: c65bde7b1e ("frontend/dri: inline __DRIdrawable in dri_drawable, make __DRIdrawable opaque")
Cc: mesa-stable
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23093>
Since 624e799cc3 ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA
defs don't have names, making the name argument unused. Drop it from the
signature and fix the call sites. This was done with the help of the following
Coccinelle semantic patch:
@@
expression A, B, C, D, E;
@@
-nir_ssa_dest_init(A, B, C, D, E);
+nir_ssa_dest_init(A, B, C, D);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
Include the expected and actual values in the errors -- since
very frequently we care about them to diagnose issues.
Since these helpers are meant to be inlined, also pull the
failure code out of the way into a separate function (not meant to
be inlined). This way, extra calls to to_string will not harm
the existing client code size. Verified this with GCC release build.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22977>
Fix issue by handling the OpString instructions when walking through
the preamble for validation.
The gl_spirv_validation() creates a vtn_builder() and walks the
instructions looking for a subset of the information. However
our current way to walk the instructions will also perform tracking
of OpLine/OpNoLine, that may make references to OpString instructions
that were being previously ignored by gl_spirv_validation().
This would cause the parsing to fail.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9004
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22973>
The transfer command "struct pvr_transfer_cmd" has support for
features not used by Vulkan: colour key, pattern, rop blit and
alpha blending
The whole "struct pvr_transfer_blit" can be removed. Also all code
related to transfer alpha blending can be removed.
This is an optimisation and doesn't fix any dEQP tests.
Signed-off-by: Oskar Rundgren <oskar.rundgren@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22958>
The byte unwind workaround can be used when source texture virtual
address doesn't meet HW requirements (is unaligned) and the pixel
format can't be changed i.e. destination is compressed. If
destination texture is not compressed the simpler texel extend
workaround can be used.
Currently byte unwind workaround has bugs so removing the
workaround fixes tests in
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.* when they
instead use texel extend workaround.
Signed-off-by: Oskar Rundgren <oskar.rundgren@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22958>
The rectangles in "struct pvr_rect_mapping" are unsigned so a
flipped rectangle mapping isn't possible. Add new struct members
flip_x and flip_y to specify flipped mapping.
Add support for flipped rectangles in transfer copy blit path.
Support for flipped rectangles in the clip blit path is not done
in this change.
The new booleans are false by default because transfer command
"struct pvr_transfer_cmd" is zeroed on allocation in
pvr_transfer_cmd_alloc (pvr_blit.c).
Fixes:
dEQP test case: dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests
.mirror_xy.nearest
Signed-off-by: Oskar Rundgren <oskar.rundgren@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22958>
The swizzle of pipe formats is described in
"struct util_format_description". The struct has array
"unsigned char swizzle[4];". The index of the array is the colour
channel (in the order R, G, B and A). The value is what position
the colour channel is sourced from.
In PBE register settings (REG_SWIZ_CHAN[0-3]) the register index is
output channel position (and not colour). The colours are in the PBE
source channels - SWIZ_SOURCE_CHAN0 typically red.
The function pvr_get_pbe_hw_swizzle doesn't translate the swizzle
correctly. Remove function and replace with switch for each colour.
This could be done in a for loop, but there is just as much code
in the loop, it involves pointers and it's less readable for humans.
That's why I opted for this implementation.
Fixed test:
dEQP-VK.api.copy_and_blit.core.image_to_image
.all_formats.color.2d.r4g4b4a4_unorm_pack16.b4g4r4a4_unorm_pack16
and other with this pixel format.
Signed-off-by: Oskar Rundgren <oskar.rundgren@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22958>
Some values like the transform feedback offset or the number of output
vertices in VS can be obtained knowing how many vertices and primitive
type are used in the drawcall.
But when the primitive restart is enabled, doing this is quite more
complex, as we should parse the vertex buffer to know where is the
restart values, and so on.
In this case, delay this computation after the drawcall is executed, by
querying the GPU to know these values.
Similarly, this delay is also applied to compute the transform feedback
buffer offsets when there is a geometry shader, as we don't know
beforehand how many vertices it is going to output.
This fixes `spec@!opengl 3.1@primitive-restart-xfb flush` and
`spec@!opengl 3.1@primitive-restart-xfb generated`.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22716>
Now p_wqm always kills its operand, so no movs will be created for it.
Long term we want to remove p_wqm in favor of a Definition flag,
so this is also a step in that direction.
Foz-DB Navi21:
Totals from 45351 (33.63% of 134864) affected shaders:
VGPRs: 2099552 -> 2116192 (+0.79%); split: -0.14%, +0.93%
CodeSize: 179530772 -> 179072104 (-0.26%); split: -0.29%, +0.03%
MaxWaves: 1054740 -> 1052262 (-0.23%); split: +0.10%, -0.33%
Instrs: 33238535 -> 33188347 (-0.15%); split: -0.17%, +0.02%
Latency: 451000471 -> 450869384 (-0.03%); split: -0.11%, +0.08%
InvThroughput: 86026785 -> 86286288 (+0.30%); split: -0.11%, +0.41%
VClause: 633291 -> 623920 (-1.48%); split: -1.91%, +0.43%
SClause: 1436708 -> 1431395 (-0.37%); split: -0.60%, +0.23%
Copies: 2166563 -> 2122592 (-2.03%); split: -2.29%, +0.26%
Branches: 706846 -> 706838 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 1976162 -> 1976592 (+0.02%)
PreVGPRs: 1797409 -> 1794704 (-0.15%)
MaxWaves regressions in Detroit: Become Human MaxWaves seem to be due
to the scheduler choosing to schedule more aggressively.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22956>
This should make writing some lowering/meta code easier. It also keeps
the num_inputs/outputs updated, when sometimes passes forgot to do so (for
example, nir_lower_input_attachments updated for one of the two vars it
creates). The names of the variables change in many cases, but it's
probably nicer to see "VERT_ATTRIB_POS" than "in_0" or whatever.
I've only converted mesa core (compiler and GL), not all the driver meta
code.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22809>
Replacing pvr_get_transfer_pbe_packmode with pvr_get_pbe_packmode
fixes many depth stencil deqp tests.
Replacing assert "Handle depth stencil format swizzle." with an
actual swizzle fixes tests using S8_UINT.
The swizzle for VK_FORMAT_S8_UINT returned from
pvr_get_format_swizzle (in pvr_image_state_set_codegen_defaults)
isn't correct. Modify the swizzle for format VK_FORMAT_S8_UINT.
Signed-off-by: Oskar Rundgren <oskar.rundgren@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22918>
Revert "pvr: Don't advertise S8_UINT support"
Adding back support for S8_UINT format. It's used in many deqp tests.
Example:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats
.depth_stencil.2d.d24_unorm_s8_uint_d24_unorm_s8_uint.optimal_optimal
This reverts commit ff07610462d5100a1ade101c1960beb4a454e7ce.
Signed-off-by: Oskar Rundgren <oskar.rundgren@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22918>
Previously, in the descriptor set layout, if there were gaps
within the binding numbers, the code would remove the gap and
assign a sequential binding number to each.
This is causes problems when looking up the binding on a
vkUpdateDescriptorSets() as the user would still be providing the
original binding numbers. If gaps were removed and binding
number re-assigned, the binding could either not be found, or a
different binding was found instead of the desired one.
Let's not re-assign binding numbers and just use the original
ones.
This fixes the following assert being hit:
`pvr_descriptor_set.c:1890: pvr_write_descriptor_set:
Assertion `binding' failed.`
on dEQP tests such as:
dEQP-VK.glsl.opaque_type_indexing.ubo.uniform_vertex
dEQP-VK.glsl.opaque_type_indexing.ubo.uniform_fragment
...
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22918>
Seems like `dEQP-VK.api.null_handle.destroy_device` was already
passing but let's add the null check in case of future changes
which might not accept NULL.
Fixes:
dEQP-VK.api.null_handle.destroy_descriptor_set_layout
dEQP-VK.api.null_handle.destroy_pipeline_layout
dEQP-VK.api.null_handle.destroy_query_pool
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22918>
VDMCTRL_INDEX_LIST3 is packed conditionally which can cause the
generation of a corrupted control stream as the function mandated
the provided buffer to be of a fixed size always including the
possibly unpacked word. This would leave a gap in the control
stream when the caller ends up copying the buffer into the control
stream.
Reported-by: James Glanville <james.glanville@imgtec.com>
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22918>
Delete the code. This isn't quite as nice as for the C drivers, because we can't
use a designated initializer in C++ without matching the order and this is an
autogenerated struct where it may not necessarily make sense to fix an order.
Not a big deal to workaround though.
Tested by diff'ing vulkaninfo output before/after the patch and confirming no
changes (other than the driverInfo git sha, the pipelineCacheUUID, the
driverUUID, and slight fluctuation in the memory budget).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23000>
The following flakes were found in the latest stress run:
* dEQP-VK.dynamic_rendering.primary_cmd_buff.basic.2_cmdbuffers_resuming
* dEQP-VK.dynamic_rendering.primary_cmd_buff.basic.contents_secondary_primary_cmdbuffers_resuming
Rather than documenting them directly, let's use a broad regular
expression, to match the already-existing `dEQP-VK.dynamic_rendering.basic.*`.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23014>
Then we can guarantee the settings correct, otherwise the 'screen->info.have_EXT_extended_dynamic_state3 = false' and 'screen->info.have_EXT_vertex_input_dynamic_state = false'
will be enable, but actually we should disable it when 'have_EXT_extended_dynamic_state2 = false'.
Fixes: d5cf6f7d2f ("zink: disable dynamic state exts if the previous ones aren't present")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23046>
This change is inspired from iris_destroy_context().
For instance, this issue is triggered with
"piglit/bin/glsl-1.50-gs-max-output -scan 1 20 -auto -fbo":
Direct leak of 320 byte(s) in 2 object(s) allocated from:
#0 0x7f34fc769987 in calloc (/usr/lib64/libasan.so.6+0xb1987)
#1 0x7f34f4fa168a in bo_calloc ../src/gallium/drivers/crocus/crocus_bufmgr.c:288
#2 0x7f34f4fa168a in alloc_fresh_bo ../src/gallium/drivers/crocus/crocus_bufmgr.c:350
#3 0x7f34f4fa168a in bo_alloc_internal ../src/gallium/drivers/crocus/crocus_bufmgr.c:419
#4 0x7f34f4fe50a9 in crocus_get_scratch_space ../src/gallium/drivers/crocus/crocus_program.c:2678
#5 0x7f34f55e8954 in crocus_upload_dirty_render_state ../src/gallium/drivers/crocus/crocus_state.c:6871
#6 0x7f34f55e8954 in crocus_upload_render_state ../src/gallium/drivers/crocus/crocus_state.c:7812
#7 0x7f34f5d9f680 in crocus_simple_draw_vbo ../src/gallium/drivers/crocus/crocus_draw.c:332
#8 0x7f34f5d9f680 in crocus_draw_vbo ../src/gallium/drivers/crocus/crocus_draw.c:438
#9 0x7f34f1d2eeba in tc_call_draw_single ../src/gallium/auxiliary/util/u_threaded_context.c:3735
#10 0x7f34f1d12e03 in batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:394
#11 0x7f34f1d12e03 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:445
#12 0x7f34f1d22c9a in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:680
#13 0x7f34f1d238f8 in tc_texture_map ../src/gallium/auxiliary/util/u_threaded_context.c:2754
#14 0x7f34f120b9d9 in pipe_texture_map_3d ../src/gallium/auxiliary/util/u_inlines.h:579
#15 0x7f34f120b9d9 in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:530
#16 0x7f34f10d7355 in read_pixels ../src/mesa/main/readpix.c:1178
#17 0x7f34f10d7355 in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1195
#18 0x7f34f10d7e10 in _mesa_ReadPixels ../src/mesa/main/readpix.c:1210
Fixes: f3630548f1 ("f3630548f1da crocus: initial gallium driver for Intel gfx 4-7")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23019>
while swapchains themselves are protected against early deletion
during presentation, there is nothing protecting them from
deletion while they are rendering if a swapchain updates
while rendering but before presentation
to address this, add batch usage to swapchains which can be
checked during pruning to ensure a rendering swapchain isn't
pruned
Fixes: dc8c9d2056 ("zink: prune old swapchains on present")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22962>
enabling VK_EXT_pageable_device_local_memory guarantees that host memory
allocations will not consume device-local memory and enables overallocation
of device memory when paging can be done
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22855>
this is technically illegal even though it works everywhere,
though future spec changes may make it legal
affects KHR-GLES3.copy_tex_image_conversions.required.texture3d_cubemap_negz
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22960>
some resources may not be destroyed immediately and may instead be
queued for deletion onto the current batch state, so ensure that the
current state is the last one to be destroyed so that all deferred resources
are also destroyed
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23033>
Use COMPRESSED_NO_CLEAR for the initial CCS aux state instead of
COMPRESSED_CLEAR. This removes a dependency on the initial clear color,
meaning that some resolves related to clear color management are now
avoided.
In the Car Chase benchmark, this avoids all 50 CCS resolves. These only
happen during the warm-up phase of the benchmark, so I'm not sure there
is an impact on FPS. This was tested on a DG2 in small-BAR mode.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22857>
I hadn't been keeping this up-to-date as development was rapid but
now that we're starting to stabilize and new work is largely going
to be new extensions, it makes sense to start tracking this better.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23038>
The values isn't used later.
Resolves gcc warning:
```
../src/virtio/vulkan/vn_queue.c:1006:13: error: variable 'sem_feedback_count' set but not used [-Werror,-Wunused-but-set-variable]
uint32_t sem_feedback_count = 0;
```
Fixes: a55d26b566 ("venus: add back sparse binding support")
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23034>
Some SRGB formats don't get the expected linear counterparts in
isl_format_srgb_to_linear() in the generated isl_format_layout.c.
The replace() of string in python returns the unchanged input
string when no replacement occurred, so the first rule
('_SRGB', '') returns the original SRGB format name that passes
the following check unintendedly.
Another quirk is needed for a pair of formats not following
the patterns of other formats.
Signed-off-by: Zhang, Jianxun <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22247>
Vulkan's concept of dedicated resources is dangerously close to
D3D12's concept of a "committed" resource, where the memory and
resource are inextricably tied. This is a minor optimization,
but will start to be more important going forward with external
memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22879>
At this point, we already have the index of the declaration itself in
the tcs_vertices_out_word variable, so we only need to add the offset
from the start of the exec_modes buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23027>
This is a big delete-the-code win. Tested by diff'ing vulkaninfo output
before/after the patch and confirming no changes (other than the driverInfo git
sha and the pipelineCacheUUID).
Note: removes handling for VkDeviceMemoryOverallocationCreateInfoAMD. This was
surely added as a mistake.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22999>
The helper was creating input locations for some builtin bariables.
This caused validation errors in zink because those builtins can't be
used as input.
Fixes: e2220ee55e ("zink: filled quad emulation gs generation function")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22871>
The helper was creating input locations for some builtin bariables.
This caused validation errors in zink because those builtins can't be
used as input.
Fixes: d0342e28b3 ("nir: Add helper to create passthrough GS shader")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22871>
This is unnecessary because the hardware doesn't execute a FS when it
has no effect and it's possible to execute pre-rasterization stages
without a FS.
This might improve depth-only pass performance very slightly because
the number of packets emitted is reduced a bit.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22848>
aco does not implement fpow, need nir to lower it
first. llvm will do by itself in the same way, so
we always lower fpow in nir now.
Remove the llvm fpow implementation that has special
handling for the muliplication. It's not used any
more and does not match GLSL spec as fpow(0,0)=NaN
but here we get 0.
There's some pixel changes for gl-radeonsi-stoney:
ror-default 2 (no tolerance), 0 (1% tol.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
ACO only support nir_fsin/cos_amd.
There's some pixel changes for gl-radeonsi-stoney trace.
Different pixels:
furmark 61 (no tolerance), 0 (1% tol.)
gimark 93867 (no tolerance), 888 (1% tol.)
tessmark 39 (no tolerance), 0 (1% tol.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
Use case: radeonsi will generate internal tgsi shader
with 64bit udiv instruction, and we want all 64bit udiv
to be lowered in nir by lower_int64_options.
For GLSL shaders, this is done in glsl to nir, so we do
the same for tgsi here.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
For ALU ops where input types are known, we can store that info on
the input sources. This can be used to produce the correct overloads
of load instructions that don't immediately need to be followed by
bitcasts, or similarly to produce a constant value which can be directly
consumed by the relevant instruction without needing a bitcast.
Similarly for values that will be stored in an output, we know type
information. And using that info, we can use more-correct information
for phis instead of forcing all phi sources to be bitcast to int just
to be bitcast back to float on the other side for an alu or an output
store.
One missing piece is SSBO stores, where we can support int or float.
If the input is coming from a phi, we don't influence the phi's type,
so it'll be int, even though the incoming sources might've been float.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22972>
For each phi src, ensure that it's only used as a phi src.
This lets us give each phi their own unique types without worrying
about them stomping on each other. Also scalarize phis.
For each constant, ensure that it's only used once. The DXIL backend
will already dedupe these consts within the module, but this lets a
single load_const have multiple types depending on how it's used.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22972>
Use ACCESS_* flags in call sites instead of GLC/DLC/SLC.
ACCESS_* flags are extended to describe other aspects of memory instructions
like load/store/atomic/smem.
Then add a function that converts the access flags to GLC, DLC, SLC.
The new functions are also usable by ACO.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22770>
Refactor framebuffer to renderpass to mirror previous INTEL_MEASURE
changes.
We dump hashes/pointers for shaders and framebuffer/renderpass.
Reduce from 64bit to 32bit pointers. We don't benefit from the
extra precision and reduced output size is convenient.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22723>
nir_validate checks that the format of an atomic (if specified) is compatible
with the atomic operation. For example, we can't fadd R64_UINT texels. The logic
can't be extended as-is to unified atomics because it's split across different
switch cases for different atomic-op intrinsics. So we add our own validation
case, porting over the logic from the separate existing cases below.
(The redundant logic will be deleted once we delete legacy atomics.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
In the future, we'd like to have all drivers only ingest unified atomics, and
all frontends only produce unified atomics, and garbage collect the existing
non-unified atomics. To get to that future, it's a lot nicer to convert drivers
one-by-one. Add a pass to translate old-style atomics to new-style atomics so
drivers can opt-in to the new form one-by-one. Once all drivers are converted,
we can convert producers one-by-one. Finally, we can just drop the calls to the
pass and garbage collect this pass and the old atomics. That's probably a while
out, though, so this will be out bridge to get there.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
Currently, we have an atomic intrinsic for each combination of memory type
(global, shared, image, etc) and atomic operation (add, sub, etc). So for m
types of memory supported by the driver and n atomic opcodes, the driver has to
handle O(mn) intrinsics. This makes a total mess in every single backend I've
looked at, without fail.
It would be a lot nicer to unify the intrinsics. There are two obvious ways:
1. Make the memory type a constant index, keep different intrinsics for
different operations. The problem with this is that different memory types
imply different intrinsic signatures (number of sources, etc). As an
example, it doesn't make sense to unify global_atomic_amd with
global_atomic_2x32, as an example. The first takes 3 scalar sources, the
second takes 1 vector and 1 scalar. Also, in any single backend, there are a
lot more operations than there are memory types.
2. Make the opcode a constant index, keep different intrinsics for different
operations. This works well, with one exception: compswap and fcompswap
take an extra argument that other atomics don't, so there's an extra axis of
variation for the intrinsic signatures.
So, the solution is to have 2 intrinsics for each memory type -- for atomics
taking 1 argument and atomics taking 2 respectively. Both of these intrinsics
take an nir_atomic_op enum to describe its operation. We don't use a nir_op for
this purpose, as there are some atomics (cmpxchg, inc_wrap, etc) that don't
cleanly map to any ALU op and it would be weird to force it.
The plan is to transition to these new opcodes gradually. This series adds a
lowering pass producing these opcodes from the existing opcodes, so that
backends can opt-in to the new forms one-by-one. Then we can convert backends
separately without any cross-tree flag day. Once everything is converted, we can
convert the producers and core NIR as a flag day, but we have far fewer
producers than backends so this should be fine. Finally we can drop the old
stuff.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
Serious preprocessor voodoo here. There are two tricks here.
1. Iterating only phis. We know that phis come only at the beginning of a block,
so all over the tree, we open-code iteration like:
nir_foreach_instr(instr, block) {
if (instr->type != phi)
break;
/* do stuff */
}
We can express this equivalently as
nir_foreach_instr(instr, block)
if (instr->type != phi)
break;
else {
/* do stuff */
}
So, we can define a macro
#define nir_foreach_phi(instr, block)
if (instr->type != phi)
break;
else
and then
nir_foreach_phi(..)
statement;
and
nir_foreach_phi(..) {
...
}
will expand to the right thing.
2. Automatically getting the phi as a phi. We want the instruction to go to some
hidden variable, and then automatically insert nir_phi_instr *phi =
nir_instr_as_phi(instr_internal); We can't do that directly, since we need to
express the assignment implicitly in the control flow for the above trick to
work. But we can do it indirectly with a loop initializer.
for (nir_phi_instr *phi = nir_instr_as_phi(instr_internal); ...)
That loop needs to break after exactly one iteration. We know that phi
will always be non-null on its first iteration, since the original
instruction is non-null, so we can use phi==NULL as a sentinel and express a
one-iteration loop as for (phi = nonnull; phi != NULL; phi = NULL).
Putting these together gives the macros implemented used.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22967>
Up until now, we have been initializing MCS with fast clears. This is
mostly safe, but there's a corner case that can be an issue.
The issue is with a workaround for MCS that requires the sampler not see
any fast-cleared blocks for certain surfaces (14013111325). Even though
we have been initializing MCS with fast clears, we expect most
applications to be safe because we expect that they would only sample
the samples they've rendered to previously (and the render would've
removed the fast-cleared blocks). In other words we don't expect that
apps would transition from VK_IMAGE_LAYOUT_UNDEFINED to
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL and start sampling immediately.
If an application took the unexpected path of sampling undefined
samples, it's possible they'd hit the issue described in the workaround.
Fix this corner case by using an ambiguate to initialize MCS.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
Add support for using BLORP's ambiguate pass to initialize MCS instead
of mapping and memsetting it on the CPU. Note that this won't be used if
the first operation on the MSAA layer is a fast clear.
Since we're no longer mapping, this removes a blocker towards getting
MCS_CCS enabled in small-BAR mode.
This functionality is difficult to test because of the way iris is set
up. It always tries to compress writes. So, a test would only read the
ambiguated MCS element if it tries to read from undefined samples.
To test this, I locally disabled fast clears and rendering with MCS (via
iris_resource_render_aux_usage). I continued to allow sampling with MCS
in iris_resource_texture_aux_usage. So, writes go directly to the main
surface and reads go through the ambiguated MCS surface.
When I then ran the test group, dEQP-GLES3.functional.multisample.*, all
48/64 supported tests passed on my Ice Lake. If I slightly changed
BLORP's ambiguate pass, I observed several tests failing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
Select a horizontal alignment value that matches the main MSAA surface.
We need a valid horizontal alignment to perform MCS ambiguates. The
halign value doesn't actually affect test behavior, but it is validated
by isl_surf_fill_state. We currently have an invalid halign for gfx125.
This patch fixes that.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22545>
This tests the following matrix:
- Format: RGBA8Unorm, RGBA16Unorm, RGBA32Float
- Samples: 2 or 4
- Layers: 1 or 2
- Width: Interesting values 1..4097
- Height: Interesting values 1..4097
Compression is based on the dimensions (that is, everything that can be
compressed is). This test compares both the total texture size and the
compression metadata offset.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
For multisampled textures, the decision about whether to compress or not
is based on the effective width and height in samples, not pixels.
Introduce ail_can_compress() to encode this logic in ail, so the driver
can use it to decide whether to compress or not before the full layout
is determined.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
BOs can be written from several contexts, so writing to this member is
racy. We only care about this for the purposes of exporting BOs after a
submission (and if the app is racing writers/submissions at that point
all bets are off), so just keeping track of the last written value is
sufficient.
Switch to atomic operations to eliminate the race, and drop the assert
in the batch cleanup path that no longer holds when the BO might have
been written to from another context.
Fixes: asahi/mesa#20
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
We track buffers written by batches, but this gets messy when we end up
with an empty batch that is never submitted, since then it might have
taken over writer state from a prior already submitted batch (for its
framebuffers).
Instead of trying to track two tiers of resource writers, let's just
defer initializing batch state until we know we have a draw (or compute
launch, or clear). This means that if a batch is marked as a writer for
a buffer, we know it will not be an empty batch.
This should be a small performance win for empty batches (no need to
emit initial VDM state or run the writer code), but more impontantly it
eliminates the empty batch writer state revert corner case.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
This seems to flake some dEQPs due to some kind of race/UB (which
doesn't even always cause the dEQPs to fail due to leeway in the image
comparison, since the problem is usually just a few pixels, but it's
there).
I spent a bunch of time trying other flags/things, and almost everything
changed the bad pixel pattern randomly but nothing fixed it. Let's
revisit this one later, since it looks like a pretty deep rabbit hole.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
With the shader cache enabled, intel_clc attempts to write to ~/.cache.
Many distributions' build systems limit file-system access, and will
kill the process thus causing the build to fail.
Fixes: 639665053f ("anv/grl: Build OpenCL kernels")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22968>
VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT should always be set, as
required by the spec.
VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT should be set when
radv_ahb_format_for_vk_format knowns the format. That is,
radv_create_ahb_memory should at least know how to call
AHardwareBuffer_allocate.
VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT is always set. We can't know
if gralloc can allocate the format/flags/usage combo or not (gralloc
might use a private format for the combo).
Fixed
dEQP-VK.api.external.memory.android_hardware_buffer.image_formats.*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
anv_ahb_format_for_vk_format needs to know the format at least. There
is no guarantee that AHardwareBuffer_allocate will succeed, but we are
reluctant to check with AHardwareBuffer_isSupported which may
test-allocate internally and is expensive.
v2: add anv_ahb_format_for_vk_format to anv_android_stubs.c
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
When allocating a VkDeviceMemory exportable as AHB, it seems incorrect
to fall back to AHARDWAREBUFFER_FORMAT_BLOB when the image has no known
AHB format. We should fail the allocation instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22619>
The base nir options were assuming all bit sizes were supported at
shader model 6.2. Multiple callers were then changing properties
based on actual support.
Standardize behavior by providing the majority of things that can
impact nir options when getting them. Some callers (e.g. meta blit
shaders or libclc) don't bother, because they are known to have
contents that are unaffected by these options. Other callers might
munge more properties afterwards, but this minimizes that.
Note that lower_helper_invocation was incorrectly being turned off
for SM6.6+ by some callers, despite load_helper_invocation being
unimplemented by the backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22952>
If we're compiling shaders in shader-db, with shader-db's ./run and
ZINK_DEBUG=shaderdb, we won't get much state set on the graphics pipeline, since
shader-db doesn't actually do any rendering. For a driver like RADV, that is
*almost* ok... Since we use dynamic vertex input, we don't need to make up any
state for vertex inputs; since we use dynamic rendering, we don't need to make
up any render attachments. All of that being said, we *do* need to make up a
blend state to ensure that the Vulkan driver doesn't optimize away all of
store_derefs in the fragment shader (and in turn, optimize the entire fragment
shader away, if there are no image/SSBO writes.) So set the obvious blend state,
fixing fragment shaders in shader-db with zink + radv.
I don't know why other people would want to use Zink with shader-db, but for me
it's an easy way to test ACO, at least until radeonsi gains aco support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22948>
Due the lack of APIs to set mmap modes, Xe KMD can't support the same
memory types as i915.
So here adding a i915 and Xe function to set memory types supported
by each KMD.
Iris function iris_xe_bo_flags_to_mmap_mode() has a table with all the
mmaps modes of each type of placement.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22906>
We have an imad instruction and our iadd has a small immediate shift on the
second source. Together, these allow expressing lots of integer multiplies more
efficiently. Add some rules to optimize these now that the backend compiler can
ingest the optimized forms.
Half-register changes are from load_const scheduling changing in some vertex
shaders.
total instructions in shared programs: 1539092 -> 1537949 (-0.07%)
instructions in affected programs: 167896 -> 166753 (-0.68%)
total bytes in shared programs: 10543012 -> 10533866 (-0.09%)
bytes in affected programs: 1218068 -> 1208922 (-0.75%)
total halfregs in shared programs: 483180 -> 483448 (0.06%)
halfregs in affected programs: 1942 -> 2210 (13.80%)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
Models `(a * b) + (c << d)` in general, as implemented in various forms on AGX.
This will be fused with backend NIR opt algebraic rules, both for the literal
pattern as well as to strength reduce certain multiplications, e.g. replacing
a * 5 with `a + (a << 2)` expressed as imadshl_agx(a, 1, a, 2).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
We have a few ALU instructions that take a constant source. Technically, they
have a swizzle so you can't just nir_src_as_uint them, even though a bunch of
backends do. To help backends do the right thing, add a helper that's just as
easy to use that will chase the swizzle properly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
This reverts commit 5489033fa8.
The problem I was trying to address is that we were programming the
3DSTATE_PS::PositionXYOffsetSelect bit differently with GPL (CENTROID)
than without (NONE).
I failed to understand that this bit also impacts the thread payload
layout. GPL fragment shaders don't know ahead of time if pos_offset is
going to be used. It'll be choosen at runtime base on push constant
bits. So we need to program this bit different just to have a payload
matching the compiled shader code.
This fixes the freedoom replay with GPL FS shader in SIMD32.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22938>
loader_get_pci_driver calls os_read_file on linux to get the pci id, and
os_read_file uses open instead of fopen.
This allows loader_get_pci_driver to work rather than falling back to
loader_get_kernel_driver_name.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22951>
We cannot immidiately free VMA range when BO is freed, we have to
wait until kernel stops considered BO as busy and frees its internal
VMA range. Otherwise userspace and kernel VMA will get desynchronized.
To fix this and re-enable replaying of BDA we place BO's information
into a queue. The queue is drained:
- On BO allocation;
- When we cannot allocate an iova passed from the client.
For more information about this see:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/7106
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18254>
The main reason is to simplify BO managment when
bufferDeviceAddressCaptureReplay would be enabled.
Having to track some BO information in physical device and some
info in logical device gets challenging when BOs are shared
between logical devices.
Other benefits:
- Isolation from hangs in other logical devices;
- Each logical device limited only by its own address space size.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18254>
It is very common in games to read just the .x channel of a vec4 shadow
result (since GL defaults to either LUMINANCE or RED depth mode depending
on context). So, we can avoid shader recompiles to handle the other
components, in that case.
Fixes some recompiles in CS:GO.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22912>
This will give better visibility in perfetto, and prepares for the next
commit where we could have per-subpass clears.
While we are at it, start adopting vulkan terms for tile load/store. No
need to be pointlessly different.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22895>
batch->fast_cleared will be per-subpass. But we can use the cleared
bitmask instead in the few places where we just need to know if there
was a clear in any subpass. For the conditional-ib it is even
preferable since we know a clear touched the contents of the tile so
we know what the result of the conditional would be.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22895>
This helps by reducing the number of branches with their corresponding
delay slots, at the expense of additional register pressure. It also helps
a lot with SFU stalls, probably because removing control-flow blocks
gives us more QPU scheduling flexibility to hide them.
Shader-db results below correspond to the "closed shaders" set, since the
full set is very dominated by the massive impact this change has on Skia's
shaders (for the better), so this is probably more representative of real
impact:
total instructions in shared programs: 11887255 -> 11854898 (-0.27%)
instructions in affected programs: 538170 -> 505813 (-6.01%)
helped: 1653
HURT: 43
Instructions are helped.
total threads in shared programs: 385924 -> 385872 (-0.01%)
threads in affected programs: 236 -> 184 (-22.03%)
helped: 22
HURT: 48
Inconclusive result (%-change mean confidence interval includes 0).
total uniforms in shared programs: 3552808 -> 3547894 (-0.14%)
uniforms in affected programs: 157486 -> 152572 (-3.12%)
helped: 1673
HURT: 35
Uniforms are helped.
total max-temps in shared programs: 2062403 -> 2064720 (0.11%)
max-temps in affected programs: 18209 -> 20526 (12.72%)
helped: 168
HURT: 369
Max-temps are HURT.
total spills in shared programs: 1937 -> 1994 (2.94%)
spills in affected programs: 79 -> 136 (72.15%)
helped: 0
HURT: 1
total fills in shared programs: 2652 -> 2717 (2.45%)
fills in affected programs: 115 -> 180 (56.52%)
helped: 0
HURT: 1
total sfu-stalls in shared programs: 19349 -> 18010 (-6.92%)
sfu-stalls in affected programs: 2321 -> 982 (-57.69%)
helped: 674
HURT: 74
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 11906604 -> 11872908 (-0.28%)
inst-and-stalls in affected programs: 541339 -> 507643 (-6.22%)
helped: 1656
HURT: 43
Inst-and-stalls are helped.
total nops in shared programs: 245740 -> 238085 (-3.12%)
nops in affected programs: 19282 -> 11627 (-39.70%)
helped: 1335
HURT: 76
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22922>
We are seeing frequent hangs in other workloads when something using
mesh shaders runs at the same time, so gate the feature behind an
environment variable until we figure out what's going on.
v2: (Sagar)
- Give the mesh enabled variable a more descriptive name
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22910>
The 'struct drm_amdgpu_cs_chunk_fence' is processed as
'struct drm_amdgpu_cs_chunk_data' which is a union.
This change ensures the proper alignment for this structure
to be processed as 'struct drm_amdgpu_cs_chunk_data'.
The presence of __u64 as one member of
'struct drm_amdgpu_cs_chunk_data' makes the
whole structure expected to be 64-bit aligned.
This is a minor issue detected by the gcc sanitizer (ubsan), for instance at the libdrm library:
../amdgpu/amdgpu_cs.c:937:26: runtime error: member access within misaligned address 0x63100001484c for type 'struct drm_amdgpu_cs_chunk_data', which requires 8 byte alignment
0x63100001484c: note: pointer points here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
Fixes: ae7e4d7619 ("amd: rename ring_type --> amd_ip_type and match the kernel enum values")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22920>
Similar to what was done to alloc buffer but now for userptr bos.
There is no changes in i915 modes but Xe may different values in
future.
While at it, also setting bo->real.heap to IRIS_HEAP_SYSTEM_MEMORY
as it was already implicit set as IRIS_HEAP_SYSTEM_MEMORY is the
value 0 of the enum.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22240>
if a sampler is never used (no derefs) then its binding will never be
applied here, leaving it with binding=0. this will clobber the real binding=0
sampler in driver backends, leading to errors, so try to iterate using
the same criteria as above and apply bindings in the same way
fixes#8974
cc: mesa-stable
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22902>
This can generate things like fneg! of load_const, which is silly.
Fold those away into an actual constant. Only do so on the scalar
backend because there's a comment above that the vec4 backend doesn't
want any new constants this late, and I'm inclined to believe it.
fossil-db stats show a very minor improvement:
Totals:
Instrs: 203091223 -> 203091099 (-0.00%); split: -0.00%, +0.00%
Cycles: 14410638075 -> 14410577067 (-0.00%); split: -0.00%, +0.00%
Totals from 20 (0.00% of 665070) affected shaders:
Instrs: 27067 -> 26943 (-0.46%); split: -0.47%, +0.01%
Cycles: 2687958 -> 2626950 (-2.27%); split: -2.27%, +0.00%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22881>
Add back support for vkQueueBindSparse that works with fence and timeline
semaphore feedback.
For each vkQueueBindSparse batch, if it contains feedback then move the
signal operations to a subsequent vkQueueSubmit with feedback cmds.
This requires queue families that support vkQueueSubmit alongside sparse
binding support so any queue familes that exclusively support sparse
binding will be filtered out.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22616>
the inner conditional here didn't include uncached readback, meaning
that many (most?) buffers allocated with uncached memory (i.e., BAR) were
being read back directly instead of using staging resources to be faster
at some point this inner conditional should be reevaluated to determine
whether it still does anything, but this is not that time
fixes, among other things, loading in DOOM2016 on some GPUs
Fixes: 52f27cda05 ("zink: allow direct memory mapping for any COHERENT+CACHED buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22907>
in the case where a cmdbuf was submitted with write access and the subsequent
batch promotes an op to unordered, it's important for associated barriers
to happen-before those ops to guarantee synchronization
the fixes tag is wrong on this, but it's all in the same release
Fixes: bf0af0f8ed ("zink: move all barrier-related functions to c++")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22907>
WARP has a temp register limit, and the control flow needed to convert
indirect to direct accesses on large temps ends up bloating shaders massively.
We can just go ahead and spill these large temps to scratch, which maps
to an alloca in DXIL.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22787>
Previously we would implement position_t by
applying the inverse of the viewport, and
advertising clipping was going to occur with
the cap CLIPTLVERTS.
However when the cap is advertised, clipping
is supposed to be disabled via sw emulation
when D3DRS_CLIPPING is set to FALSE.
Since we don't support that either, instead take the
approach of disabling at least depth clipping, and
not advertising the cap.
Ideally, clipping should be totally disabled.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Right now subsampled images are the same as non-subsampled images, this
will change when we actually implement them which will be an ABI break.
Disallow importing/exporting them with modifiers until that's stabilized
to force users to match the driver UUID.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
When rendering a scaled tile, we need to use the original, hardware
FragCoord when accessing input attachments that are on-tile (i.e. were
rendered to in a previous subpass) because they are also scaled in the
same way that FragCoord is scaled. For input attachments that aren't
already on-tile, however, we need to use the fixed gl_FragCoord. Add a
new intrinsic and a bitfield of input attachments which should use it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
We scale the actual rendering by patching the viewport state. This is
helped by a HW bit to make the viewport index equal to the view index,
so that we can have a different scaling per-view.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
FDM is implemented pretty much entirely inside the driver, by patching
various structures for each bin. This adds the core infrastructure to
sample the density map, compute the scaled bin sizes we will use, create
patchpoints, and apply them at the start of each bin before executing
the IB2.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
In order to patch the command stream on the gpu, we need two features:
1. The ability to use a read-write BO instead of a read-only one, when
patching might be performed.
2. The ability to get the iova of the current position after reserving
some number of dwords, even with externally-allocated command
streams.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
This is similar to old gens which couldn't support loading from GMEM
automatically. It will be needed for loads with a fragment density map,
because we need to scale the image when loading to GMEM.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
Different uses in various registers and the texture descriptor have
different shifts, and we already had a few ugly workarounds to handle
this. Remove the foot-gun by specifying it in bytes and letting users
handle the shift themselves using the correct macro.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
Otherwise accesses to non-0 views of input attachments may be considered
out-of-bounds and return 0. This should've been removed when enabling
multiview for GMEM, not sure how it was missed.
Fixes: def56b531c ("tu: Support GMEM with layered rendering and multiview")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
to comply with ES2+ line clipping rules, guardband clipping should be
used so that the rasterizer will clip lines without using clip planes
fixes (llvmpipe):
dEQP-GLES*.functional.clipping.line.wide_line_clip_viewport_center
dEQP-GLES*.functional.clipping.line.wide_line_clip_viewport_corner
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17284>
Below is the summary from the power validation.. "it looks like the only
workload where I see savings from DCC is PLT and it is only about 65mW
which is just run to run variation. For Idle I am seeing ~280mW increase
in power, ~200mW increase for power_VideoCall, and ~80mW increase for VP"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22771>
When an ACTIVE batch takes over the active writer role from a SUBMITTED
batch, the written BO has the syncobj from the latter even though the
writer is the former. This is correct and an intended state, but it
means that then we can't gate the syncobj cleanup in agx_batch_cleanup
on being the active writer, since the SUBMITTED batch won't be.
Fixes: asahi/mesa#18
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This code was originally based on the Panfrost implementation, but has been
improved in a number of ways.
1. Transform feedback programs are dispatched generically with Gallium calls,
rather than emitting something hardware-specific. This is cleaner and
portable to future GPUs.
2. Transform feedback with indexed draws is now fixed, by lowering to an index
buffer pull.
3. Transform feedback with buffer overflows is now fixed, by correctly
bounds checking in transform feedback programs.
4. Transform feedback with strips/fans/loops are fixed, by correctly
tessellating to the underlying primitives as required by OpenGL.
5. Transform feedback with QUADS is fixed, by tessellating to triangles as
required by OpenGL.
That said, the code is still not in its final form.
1. It still does not support indirect draws. This will require a substantial
overhaul to do tracking on the GPU instead of the CPU. Currently we force
unroll indirect draws (slow but kosher in GL, treif in Vulkan). This isn't
hard to solve but I'm not going to duplicate the code until the algorithms
are otherwise complete because it's a lot easier to hack on the CPU versions
than the GPU versions.
2. It still does not support primitive restart. This has especially nasty
interactions with transform feedback. Again we force unroll to non-primitive
restart forms, again slow but kosher in GL but treif in Vulkan. This is a lot
harder to deal with. I sketched out something really nasty in my notebook
(hinging on efficient GPU prefix sums) but I'm not in a hurry to type this
out.
3. There will be interactions with geometry and tessellation shaders and I don't
think I can get the core code here future-proofed without actually bringing
up the new shader stages.
As such, this is a hard fork of the panfrost code for now, I'm not trying to
share the code (although it *would* clear out almost all of panfrost's transform
feedback related piglit failures).
Passes dEQP-GLES3.functional.transform_feedback.* and most of the relevant
piglits.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This shortcuts all headaches about how big this should be. It does increase
memory usage a bit if there are lots of shader variants compiled, but this
should be tolerable, and can be optimized later if so required. Thanks to the
previous commit, the disk cache size should be unaffected.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
We were being sloppy with the sizes before. It mostly worked out, but there were
some corner cases where we would end up with mixed sized collects and that won't
end well for us. Let's rework the logic to make all the sizes explicit in NIR --
32-bit for depth and 16-bit stencil -- and then do the needed promotions to make
it happen in the AGX IR side.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
In case .x isn't read, it'll be null which has the wrong size and will fail
the validation added later in this series. We fix this by padding with sized
undefs (something that exists of defined size but undefined value) rather than
nothingness (of undefined size).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Sometimes a shader might index with a non-power-of-two stride. For example, if
it's indexing into an array of structures where the structure size is not a
power of two, we'll get a multiply with a constant as opposed to a shift. We
want to handle these cases, too. To do so, we generalize our pattern matching to
look for any kind of multiply (with our new helper), rather than hardcoding
logic for ishl. This eliminates right-shifts in a pile of compute shaders, which
makes me happy from a "I read lots of shader assembly when debugging"
perspective.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Currently, we hardcode logic in the addressing chasing code to look for ishl
instructions that shift by constants. We can generalize this to looking for
integer multiplies by constants to optimize more addressing patterns. Add a
helper to do so.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This is totally pointless. This saves some waits at the ends of compute kernels
(waiting for stores to complete before terminating the thread). I don't know
how much this would matter for performance, since the hardware may have to do
these waits internally, but it makes the generated code less silly which is
always nice.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This lets us shadow textures updated in the middle of rendering in Quake3.
They're big memcpys, but as long as the texture memory is cached it's ok. We use
a heuristic to avoid too many memcpys from uncached memory, which would cause
slideshow performance in quake.
We need to be careful to avoid shadowing shared resources, though, that's
invalid and would break WSI pretty hard.
It would be better to blit on the GPU for large shadowing, but that's more
involved and left for future work.
Reduces stuttering in Quake3.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Comparison with PowerVR's XML shows that this is the actual name... And it needs
to be set a bit more carefully than "no colour output" in order to get correct
behaviour for depth-only passes that use sample mask / discard. Fix the name
first, the extra conditions will come when they're needed for multisampling.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
When possible. Only occassionally possible because the loads are pretty limited
in the addressing arithmetic. This probably doesn't matter for performance but
it saves some noise in dEQP tests which makes for nicer debugging, plenty of
optimizations end up worth it for that alone.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
They should already work, we just need it enabled. The comment here claimed
(incorrectly) that there is no hardware support for linear 2D arrays. In fact,
there is support, it's just not advertised in the public Metal API. With some
awful tricks, I managed to reverse-engineer the hardware interface and hooked it
up, so we can take advantage of it now.
In fact, we can stop checking the target explicitly at all. The only case where
we can't compress is 1D/buffer textures, which are necessarily less than 16
height so will be dropped in the next check.
When I originally wrote this cuhange, dolphin's MeltyMoltenGalaxy trace with
specialized shaders at 4K was helped from 28fps to 43fps, which is massive :-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Staging resources need to be linear for efficient CPU side mapping. This is a
problem for access to 3D and cube textures, since we don't have linear 3D
textures or linear cube textures. But we do have linear 2D array textures, which
can be reshaped to the same effect. So use a 2D array staging resource even for
3D textures and cube maps.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
We handle linear 2D arrays internally for blit shaders, so we need textureSize
to work for these. That requires some special casing, because there's a line
stride where the layer count would otherwise be. But it's not too bad.
Fixes
dEQP-GLES3.functional.shaders.texture_functions.texturesize.sampler2darray_*
when forcing linear textures.
Since we clamp array access to the maximum layer, we need textureSize() to work
for even the most basic array texturing. So this should fix blits from linear 2D
arrays as well, which finally unlocks support for compressed arrays/cubes/3D
textures.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
From the commit message of 94f7c011d6 ("v3d: Track write reference to the
separate stencil buffer."), anholt says:
Otherwise, a blit from separate stencil may fail to flush the job that
initialized it, or new drawing could fail to flush a blit reading from
stencil.
Fixes
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Our staging resources need to be LINEAR, however we don't support LINEAR with
DEPTH_STENCIL. The APIs don't actually require this, we just need to make sure
we don't generate internal staging blits to linear depth/stencil resources. For
uploading to compressed depth/stencil textures, we could use a depth/stencil
staging (since we can read from linear depth/stencil). However, for downloading
from compressed depth/stencil, we can't use a depth/stencil staging (since we
can't write linear depth/stencil). So, to handle both cases in a unified way,
just use colour blits for depth/stencil resources, using a compatible colour
format. This wouldn't be ok for an application to do itself, but within the
driver we know that it's safe, since there's no difference in memory between
depth/stencil and colour on AGX. In particular, Z16 is compressed exactly the
same as R16, Z8 as R8, and so on.
Fixes depth/stencil compression.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
We go to initialize the disk cache before we've compiled any shaders so
agx_compiler_debug is 0 at this point. Don't try to read it, instead go through
sa safe getter that will do the right thing.
Fixes: 5e9538c12e ("agx: isolate compiler debug flags")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
When an empty batch is submitted, nothing happens. However, this batch
may have taken over writer status for some BOs which still have a
pending submitted batch that hasn't finished yet. If we drop writer
status at this point, two bad things happen:
- We spuriously clear bo->writer_syncobj, which breaks syncing on
post-facto BO exports
- We break agx_sync_writer(), since we no longer know about the old
writer to properly block on it.
To fix this (hopefully rare) case, take advantage of bo->writer_syncobj
to find the currently submitted writer batch again, and revert the
writer to it. If this turns out to be common and a performance issue
iterating through submitted batches for each written BO, we could
implement it with two writer batch arrays instead, one for active
writers and one for submitted writers... but hopefully that isn't
necessary.
This splits the cleanup path in agx_batch_cleanup() depending on whether
the cleanup is for a reset or proper completion. Since this is only used
within agx_batch.c, drop the public prototype while we're at it.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
There's a mixture of indent styles here, with either two or three
spaces. We have standardized on three spaces for .rst-files in the
editorconfig, so let's apply that.
While we're at it, make sure math-blocks are indented into their
opcode-block. While the result might look the same most of the time,
this matters when we have textual explaination following math-blocks,
like we have in a few caess. If we don't indent the math there, we
end up with having to unindent the text following the math-block for it
not to count as a part of the math block, which looks very confusing
when reading the source code.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21893>
../src/microsoft/vulkan/dzn_image.c: In function ‘dzn_GetImageMemoryRequirements2’:
../src/microsoft/vulkan/dzn_image.c:918:91: error: passing argument 6 of ‘dzn_ID3D12Device12_GetResourceAllocationInfo3’ from incompatible pointer type [-Werror=incompatible-pointer-types]
918 | &image->castable_format_count, &image->castable_formats,
| ^~~~~~~~~~~~~~~~~~~~~~~~
| |
| DXGI_FORMAT **
In file included from ../src/microsoft/vulkan/dzn_private.h:67,
from ../src/microsoft/vulkan/dzn_image.c:24:
../src/microsoft/vulkan/dzn_abi_helper.h:64:107: note: expected ‘const DXGI_FORMAT * const*’ but argument is of type ‘DXGI_FORMAT **’
64 | const UINT *num_castable_formats, const DXGI_FORMAT *const *castable_formats,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
ninja: build stopped: subcommand failed.
Fixes: 71dbb3120a ("dzn: Use GetResourceAllocationInfo3 for castable formats")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22877>
With the following test :
dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_out_of_bounds_load
There is a :
shader_start:
... <- no control flow
g0 = some_alu
g1 = fbl
g2 = broadcast g3, g1
g4 = get_buffer_size g2
... <- no control flow
halt <- on some lanes
g5 = send <surface>, g4
eliminate_find_live_channel will remove the fbl/broadcast because it
assumes lane0 is active at get_buffer_size :
shader_start:
... <- no control flow
g0 = some_alu
g4 = get_buffer_size g0
... <- no control flow
halt <- on some lanes
g5 = send <surface>, g4
But then the instruction scheduler will move the get_buffer_size after
the halt :
shader_start:
... <- no control flow
halt <- on some lanes
g0 = some_alu
g4 = get_buffer_size g0
g5 = send <surface>, g4
get_buffer_size pulls the surface index from lane0 in g0 which could
have been turned off by the halt and we end up accessing an invalid
surface handle.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20765>
Comparisons which produce 32-bit boolean results (0 or 0xFFFFFFFF)
but operate on 16-bit types would first generate a CMP instruction
with W or HF types, before expanding it out. This CMP is a partial
write, which leads us to think the register may contain some prior
contents still. When placed in a loop, this causes its live range
to extend beyond its real life time.
Mark the register with UNDEF first so that we know that no prior
contents exist and need to be preserved.
This affects:
flt32, fge32, feq32, fneu32, ilt32, ult32, ige32, uge32, ieq32, ine32
On one of Cyberpunk 2077's most complex compute shaders, this reduces
the maximum live registers from 696 to 537 (22.8%). Together with the
next patch, Cyberpunk's spills and fills are cut by 10.23% and 9.19%,
respectively.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22835>
Fixes:
```
[829/1646] Compiling C object src/panfrost/vulkan/libpanvk_v6.a.p/panvk_vX_meta_clear.c.o
In function 'panvk_meta_clear_zs_img',
inlined from 'panvk_v6_CmdClearDepthStencilImage' at ../src/panfrost/vulkan/panvk_vX_meta_clear.c:457:7:
../src/panfrost/vulkan/panvk_vX_meta_clear.c:415:26: warning: storing the address of local variable 'view' in '((struct pan_fb_info *)((char *)commandBuffer + 144))[23].zs.view.zs' [-Wdangling-pointer=]
415 | fbinfo->zs.view.zs = &view;
| ~~~~~~~~~~~~~~~~~~~^~~~~~~
../src/panfrost/vulkan/panvk_vX_meta_clear.c: In function 'panvk_v6_CmdClearDepthStencilImage':
../src/panfrost/vulkan/panvk_vX_meta_clear.c:393:26: note: 'view' declared here
393 | struct pan_image_view view = {
| ^~~~
../src/panfrost/vulkan/panvk_vX_meta_clear.c:393:26: note: 'commandBuffer' declared here
[844/1646] Compiling C object src/panfrost/vulkan/libpanvk_v7.a.p/panvk_vX_meta_clear.c.o
In function 'panvk_meta_clear_zs_img',
inlined from 'panvk_v7_CmdClearDepthStencilImage' at ../src/panfrost/vulkan/panvk_vX_meta_clear.c:457:7:
../src/panfrost/vulkan/panvk_vX_meta_clear.c:415:26: warning: storing the address of local variable 'view' in '((struct pan_fb_info *)((char *)commandBuffer + 144))[23].zs.view.zs' [-Wdangling-pointer=]
415 | fbinfo->zs.view.zs = &view;
| ~~~~~~~~~~~~~~~~~~~^~~~~~~
../src/panfrost/vulkan/panvk_vX_meta_clear.c: In function 'panvk_v7_CmdClearDepthStencilImage':
../src/panfrost/vulkan/panvk_vX_meta_clear.c:393:26: note: 'view' declared here
393 | struct pan_image_view view = {
| ^~~~
../src/panfrost/vulkan/panvk_vX_meta_clear.c:393:26: note: 'commandBuffer' declared here
```
Cc: mesa-stable
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22829>
pvrsrvkm type sync objects can have a pending state where,
the fence is unsignaled but does not have a valid sync file
due to not yet being submitted to kernel.
The wait function therefore needs to handle these types of syncs
through a spin loop.
This was seen as crashes in dEQP-VK.synchronization.timeline_semaphore.*
Signed-off-by: SoroushIMG <soroush.kashani@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22822>
Seems the only thing that really needs this is fpow(0, 0), which should
return NaN, but then gets multiplied with zero. Let's fix that by doing
a bcsel instead of fmul to select the result here. While we're at it,
get rid of the fabs for stop, which isn't needed.
This fixes a piglits failure for most (if not all?) drivers that doesn't
support legacy math rules.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22789>
Prevent GCC warning:
```
[230/1401] Compiling C object src/compiler/nir/libnir.a.p/nir_lower_io_to_vector.c.o
In function 'get_flat_type',
inlined from 'create_new_io_vars' at ../src/compiler/nir/nir_lower_io_to_vector.c:300:10:
../src/compiler/nir/nir_lower_io_to_vector.c:208:14: warning: 'base' may be used uninitialized [-Wmaybe-uninitialized]
208 | return glsl_array_type(glsl_vector_type(base, 4), slots, 0);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/compiler/nir/nir_lower_io_to_vector.c: In function 'create_new_io_vars':
../src/compiler/nir/nir_lower_io_to_vector.c:163:24: note: 'base' was declared here
163 | enum glsl_base_type base;
| ^~~~
```
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8957
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22840>
These are unneeded. Events can't be used to indefinitely stall work
like you can with a semaphore or timeline semaphore. The signals
still need to happen so that execution will modify the state that
can be polled from the CPU though.
Fixes dEQP-VK.synchronization.basic.event.single_submit_multi_command_buffer
Fixes: 04fa6c71 ("dzn: Batch command lists together")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22842>
In theory, ResolveSubresourceRegion should be able to resolve just
the depth or just the stencil. In practice, WARP had bugs, which
means that was never tested, so just do it via blits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22836>
with tc, query results can be fetched async, and it's impossible to
sync tc in this scenario. to avoid needing to sync when a sync is not
possible, sync ahead of time in all cases
Fixes: 7c96e98975 ("zink: always start/stop/resume queries inside renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22827>
Without this, it treats the src[1] as a coordinate (it's actually LOD)
and may try to read more than one component. I don't think this usually
hurts anything as the coordinate should get ignored later but it can
result in OOB memory reads while translating NIR.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22834>
The shader can write to a specific RT, but the blend descriptor gets
to decide if the RT is really updated. We need to take that into
account when initializing the rt_written local variable in
pan_allow_forward_pixel_to_kill() otherwise we might get inconsistent
results.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22465>
UE4's Vulkan backend uses vkCmdWriteTimestamp with TOP_OF_PIPE
to measure how long a workload took in the GPU Benchmark. This is wrong
and writes the timestamp before the workload is actually finished,
making it seem like the GPU is much faster than it actually is.
This caused subsequent benchmark passes to contain way too big workloads,
which caused soft hangs on slower GPUs.
Fixes GPU hangs with Splitgate during automatic settings configuration.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22823>
If we are trying to lower register pressure this can make a big
difference in some cases. To avoid adding even more strategies,
merge this with disabling ubo load sorting, since they are basically
trying to do the same.
total instructions in shared programs: 12848024 -> 12844510 (-0.03%)
instructions in affected programs: 236537 -> 233023 (-1.49%)
helped: 195
HURT: 87
Instructions are helped.
total uniforms in shared programs: 3815601 -> 3814932 (-0.02%)
uniforms in affected programs: 31773 -> 31104 (-2.11%)
helped: 67
HURT: 115
Inconclusive result (value mean confidence interval includes 0).
total max-temps in shared programs: 2210803 -> 2210622 (<.01%)
max-temps in affected programs: 9362 -> 9181 (-1.93%)
helped: 114
HURT: 34
Max-temps are helped.
total spills in shared programs: 2556 -> 2330 (-8.84%)
spills in affected programs: 1391 -> 1165 (-16.25%)
helped: 39
HURT: 9
total fills in shared programs: 3840 -> 3317 (-13.62%)
fills in affected programs: 2379 -> 1856 (-21.98%)
helped: 39
HURT: 23
total sfu-stalls in shared programs: 21965 -> 21978 (0.06%)
sfu-stalls in affected programs: 2618 -> 2631 (0.50%)
helped: 45
HURT: 81
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 12869989 -> 12866488 (-0.03%)
inst-and-stalls in affected programs: 238771 -> 235270 (-1.47%)
helped: 193
HURT: 87
Inst-and-stalls are helped.
total nops in shared programs: 303501 -> 303274 (-0.07%)
nops in affected programs: 4159 -> 3932 (-5.46%)
helped: 87
HURT: 105
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22824>
When making a custom-theme, I accidentally hard-coded the edit-URL to
point to the index-file in the mesa3d.org repo instead of pointing to
the correct file in our docs. This fixes that, so the "Edit this
page"-links in the footer works the same way as the old "Edit on
GitLab"-links did.
Fixes: 7da0482636 ("docs: add custom html theme")
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22791>
This value depends on the per-sample value which can be unknown at
compile time with graphics pipeline libraries. So we need to have this
dynamic has well and pick the right value when generating the
3DSTATE_PS/3DSTATE_WM packet.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d8dfd153c5 ("intel/fs: Make per-sample and coarse dispatch tri-state")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22728>
rf0 is affected by restrictions in some scenarios so we rather use
a register that does not cause conflicts for scheduling.
total instructions in shared programs: 12850958 -> 12848024 (-0.02%)
instructions in affected programs: 331974 -> 329040 (-0.88%)
helped: 2559
HURT: 201
Instructions are helped.
total max-temps in shared programs: 2210893 -> 2210803 (<.01%)
max-temps in affected programs: 1486 -> 1396 (-6.06%)
helped: 96
HURT: 7
Max-temps are helped.
total sfu-stalls in shared programs: 21975 -> 21965 (-0.05%)
sfu-stalls in affected programs: 32 -> 22 (-31.25%)
helped: 16
HURT: 6
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 12872933 -> 12869989 (-0.02%)
inst-and-stalls in affected programs: 332036 -> 329092 (-0.89%)
helped: 2560
HURT: 189
Inst-and-stalls are helped.
total nops in shared programs: 305911 -> 303501 (-0.79%)
nops in affected programs: 11215 -> 8805 (-21.49%)
helped: 2131
HURT: 3
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22797>
Indeed, these references are not freed.
For instance, this issue is triggered on an evergreen card with
"piglit/bin/shader_runner tests/spec/arb_shader_atomic_counter_ops/execution/all_touch_test.shader_test -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: 06993e4ee3 ("r600: add support for hw atomic counters. (v3)")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22798>
Indeed, the objects are not freed when the function returns NULL.
"psurf->texture = tex;" is redundant with
"pipe_resource_reference(&psurf->texture, tex);".
For instance, this issue is triggered with
"piglit/bin/ext_texture_array-compressed teximage pbo -fbo -auto"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: f3630548f1 ("crocus: initial gallium driver for Intel gfx 4-7")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22799>
Android apps commonly use HWUI (a GLES-based UI framework provided by
the system), that generally performs eglInitialize() before the app can
do the same for its custom rendering needs.
If an app is going to set VIRGL_DEBUG to enable case-by-case driver
behaviors (e.g. experimental shader_sync option), it should be checked
again during context creation.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22744>
UBO descriptors can re-use the default descriptor when their upper
bound is 64KiB, not the end of the buffer.
Fixes dEQP-VK.memory.pipeline_barrier.host_write_uniform_buffer.1048576
Fixes: d34ac0a70b ("dzn: Re-design custom buffer descriptors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22805>
Due to a copy-paste error, we asserted pipelineStageCreationFeedbackCount == 1
and wrote the stage feedback of the combined shader into the feedback of the first
stage. This is fixed.
Instead, we now write the precompilation feedback for each stage. This not ideal,
but definitely an improvement.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22100>
We track fences in a global list and have a per context "current" fence
which we randomly attach things to. If we take such a fence and emit it
without also creating a new fence for future tasks we can get out of sync
leading to random failures.
Some of our queries could trigger such cases and even though this issues
appears to be triggered by the MT rework, I'm convinced that this was only
made more visible by those fixes and we had this bug lurking for quite a
while.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7429
Fixes: df0a4d02f2 ("nvc0: make state handling race free")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Acked-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22722>
It might happen that a raw data object (from pipeline cache creation)
was never looked up, and thus never deserialized, before it gets
inserted again into the cache. In this case, the deserialized object
got replaced by the raw data object.
Instead, replace the raw data object with the real object in the cache.
Fixes: 8b13ee75ba ('vulkan: Fall back to raw data objects when deserializing if ops == NULL')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22735>
Depth/stencil surfaces are not PBE renderable, so a compatible format must
be used instead. The code to calculate this compatible format was not called
when configuring the PBE, and it was missing formats.
Also the code to calculate PBE swizzles was throwing an error in unhandled
cases, rather than using the pre-calculated defaults which was the correct
behaviour.
Fixes blits in dEQP-VK.draw.renderpass.depth_clamp.d32_sfloat
Signed-off-by: James Glanville <james.glanville@imgtec.com>
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22748>
The Vulkan spec says:
"If the depth clamping state is changed dynamically, and the pipeline
was not created with VK_DYNAMIC_STATE_DEPTH_CLIP_ENABLE_EXT enabled,
then depth clipping is enabled when depth clamping is disabled and
vice versa"
Fixes: e48c0fbd8f ("radv: add support for dynamic depth clamp enable")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22777>
The check for max_dw means that none of checks triggered reliably
when we had an issue. Use a stricter reserved dw measure to increase
the probability of catching issues.
Adds a radeon_check_space to some places after cs_create as they
previously relied on the min. cs size, but that would still trigger
the checks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20152>
Removes duplicate ARL reads from the same source when the original ADDR
register is still live. This is the remaining low-hanging fruit from #7723
Should account for most of the potential improvements and is also
trivial as no source or destination rewrite is needed.
RV530:
total instructions in shared programs: 132447 -> 131488 (-0.72%)
instructions in affected programs: 33396 -> 32437 (-2.87%)
helped: 331
HURT: 0
total temps in shared programs: 17035 -> 17015 (-0.12%)
temps in affected programs: 361 -> 341 (-5.54%)
helped: 30
HURT: 10
RV370:
total instructions in shared programs: 83555 -> 82659 (-1.07%)
instructions in affected programs: 28310 -> 27414 (-3.16%)
helped: 312
HURT: 0
total temps in shared programs: 12418 -> 12426 (0.06%)
temps in affected programs: 302 -> 310 (2.65%)
helped: 21
HURT: 29
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22752>
There's little point in going from fixed-function to ARB progs just to
convert it to NIR in the end. So let's emit NIR code directly here
instead.
This all made sense back when we had DRI drivers that didn't use NIR at
all in the tree, but these days we unconditionally call prog_to_nir.
Since we're no longer generating something that resembles ARM asm
shaders, we also no longer pass is_arb_asm as true to NewProgram. But
we still require legacy math rules, so we set use_legacy_math_rules in
the shader_info instead. This should do the same thing, but
communicates exactly what we actually need rather than having a half-
truth about the source of the shader.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22520>
SQTT stands for SQ Thread Trace but it's shorter.
Note that environment variables aren't renamed because this might
break external applications.
This renames:
- ac_thread_trace_data to ac_sqtt (this is the main struct)
- ac_thread_trace_info to ac_sqtt_data_info
- ac_thread_trace_se to ac_sqtt_data_se
- ac_thread_trace to ac_sqtt_trace (this contains trace only)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22732>
Propagating array elements has the problem that we would have to
check whether the last load is not overwritten by an indirect store.
Indirect kcache buffer loads require starting a new CF, and we would
have to make sure that we don't split the LDS fetch/read group with
that, so don't do this.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21347>
Previously the code was saving the mask as a VkShaderStageFlags
but when allocating shareds it checked against pvr_stage_allocation.
This causes problems as only the vertex bit matches the
VkShaderStageFlagBits so the push constants utilized in fragment
shaders weren't picked up properly.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22731>
The list logic checks for list->next->next (+ some other checks)
to point to the list itself to determine that there is just one
single element.
┌───────────────────────┐
└< { HEAD } >─< { E0 } >┘
When the list_head is copied as was being done previously, the
list element's next pointer still points at the old head so
the `list_is_singular()` check fails.
Fixes pvr_cmd_buffer.c:605:`list_is_singular(&bo_list)` assertion
dEQP-VK.api.image_g.core.clear_color_attachment.cube_layers.b8g8r8a8_unorm
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22730>
When using INTEL_DEBUG=bat, INTEL_DEBUG_BATCH_FRAME_START and
INTEL_DEBUG_BATCH_FRAME_STOP can limit dumping of batches for
particular frame ranges. Batch dumps are huge. Smart filtering
allows debugging of single frames during game play. Initial
commit to debug infrastructure.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22564>
tex_format should be `enum V3DX(Texture_Data_Formats)`, but using that enum
type in the header requires including `v3dx_pack.h`, which triggers circular
include dependencies issues, so use a `uint32_t` for now.
"fix" the one place that was using the correct enum, because doing so
triggers `-Wenum-int-mismatch` in GCC 13 as the function declaration
doesn't match the function definition.
Reported-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Eric Engestrom <eric@igalia.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22739>
A new attribute on source GPRs reflecting if a certain usage of a
value is the last usage of it was added in A7xx. This is seemingly
a performance hint and doesn't affect anything when not applied.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21498>
The new stg.a/ldg.a addressing form supersedes the a6xx's one.
The new form is:
ldg.a.f32 r4.y, g[c0.z+r4.y+2], 4
There are no shift comparing to the a6xx:
ldg.a.f32 r4.y, g[r0.z+(r4.y)<<2], 4
Also on a7xx the first src is allowed to be both const and gpr.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21498>
Commit f9a074dd55 ("dri2/android: Bypass throttling") dropped
unnecessary throtting in the SwapBuffers() path for android. But
unfortunately MSAA resolve got tangled up in the throttle reason
flag. So add a new flag that indicates "no throttingling, but yes
please do MSAA resolve".
Fixes: f9a074dd55 ("dri2/android: Bypass throttling")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22719>
reason:
extra bytes are not needed and not necessary
in h264/h265 bitstreams, because they are in
between NALs, the only problem is they consumed
extra bits. And for av1 streams, that could be
explained to something else, especially in
multi-layer cases, that can cause syntax errors.
ptr[6] represents the bitstream size,
ptr[8] represents the extra zero bytes.
The total number of bytes of the output
should be ptr[6] - ptr[8]
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22585>
2 pass search map is a feature supported by VCN,
the main purpose is to enlarge motion search
range that in pre-encoding path the center global
motion vectors could be obtained and used in the
final path as a block center base. When 2pass is
used, this feature will be automatically enabled.
2 pass feature can be enabled by ffmpeg command
line "-compression_level 1"
and also correct some typos and move quality
package from vcn3.0 to vcn2.0 since it is availabe
in vcn2.0 and vcn3.0 can use it directly. Correct
vcn3.0 hevc spec misc IB package.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22585>
If Driver.GetGraphicsResetStatus exists for one context, other contexts
will be able to use it; so there's no need to inherit reset status from
the other contexts.
This also prevented implementing the spec correctly: we're supposed to
report GL_NO_ERROR when the reset is completed (after reporting GL_*_RESET
at least once):
If a reset status other than NO_ERROR is returned and subsequent
calls return NO_ERROR, the context reset was encountered and
completed. If a reset status is repeatedly returned, the context may
be in the process of resetting.
With the existing code, the contexts will report INNOCENT_CONTEXT_RESET
forever.
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290>
On older kernel the completion of the reset isn't signalled to userspace,
yet we need it to implement the EXT_robustness extension correctly.
In this situation, try to create a new context and submit a no-op job. If
the reset isn't done the kernel will reject the submission (-ECANCELED);
otherwise the submission will go through and we'll know that the reset is
done.
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290>
We have been stopping as soon as we find a conflict but that doesn't
mean we can't merge it in an earlier slot, so keep going. Going by
shader-db, this sometimes allows us to merge the final thrsw a bit
earlier and avoid emitting NOP instructions at the program end to
make up for its delay slots. I have not observed cases where this
helps with regular thrsw though, but it doesn't hurt to try with
those too.
total instructions in shared programs: 11526876 -> 11526354 (<.01%)
instructions in affected programs: 10760 -> 10238 (-4.85%)
helped: 236
HURT: 0
Instructions are helped.
total max-temps in shared programs: 2231705 -> 2231677 (<.01%)
max-temps in affected programs: 276 -> 248 (-10.14%)
helped: 27
HURT: 0
Max-temps are helped.
total inst-and-stalls in shared programs: 11545177 -> 11544655 (<.01%)
inst-and-stalls in affected programs: 10777 -> 10255 (-4.84%)
helped: 236
HURT: 0
Inst-and-stalls are helped.
total nops in shared programs: 321624 -> 321152 (-0.15%)
nops in affected programs: 751 -> 279 (-62.85%)
helped: 236
HURT: 0
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22679>
This change updates the sqtt layer to add mesh shader specific RGP
instrumentation logic. This should allow RGP to correctly identify GPU
work derived from vkCmdDrawMeshTasksEXT, vkCmdDrawMeshTasksIndirectEXT,
and vkCmdDrawMeshTasksIndirectCountEXT API calls.
This change also updates the mesa-to-RGP shader stage translation logic
to handle the mesh & task stages.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21917>
this should match the functionality of GPL, but it should also (theoretically)
have significantly less CPU overhead, so I've enabled this to be the new
default when available
currently I'm not changing any of the requirements for shader object enablement,
so this is probably only be usable on desktops
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22725>
this matches up the rest of the codebase using zink_shader_object
zink_gfx_program::objects is left in place for shader binding so that
the entire array can always be bound in one call
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22725>
drivers that use nir_assign_io_var_locations() with EXT_shader_object all
have the same bug with a shader interface that looks like this:
shader output block:
* PSIZ
* VAR0
* VAR8
shader input block:
* VAR0
* VAR8
in this case, output driver locations will be assigned like:
* PSIZ=0
* VAR0=1
* VAR8=2
and input driver locations will be:
* VAR0=0
* VAR8=1
which breaks the shaders even though this is a totally legitimate thing
to do
thus, a second set of shaders have to be created without PSIZ to work around
the bug since I've already spent 18+ hours trying to fix it and have only succeeded
in breaking every driver that uses it
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22725>
This change makes usage bits verification closer to the Vulkan spec.
i.e. VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT does not always require all formats
to support all the requested usage bits.
Also, VK_IMAGE_CREATE_EXTENDED_USAGE_BIT, when combined with
VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT can relax the requirements for the
usage supported by the original image format.
v2: Removed strict verification of the format_list_info formats usage
per chadversary's suggestion. Other minor style/comments tweaks.
v3: Added checking of all compatible formats when
VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT and VK_IMAGE_CREATE_EXTENDED_USAGE_BIT
are specified, but no list of possible formats was given.
v4: Add VK_IMAGE_CREATE_BLOCK_TEXEL_VIEW_COMPATIBLE_BIT handling.
Cc: 22.2 <mesa-stable>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6031
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17182>
To get Sphinx and Bootstrap to work well together, we need to massage
the output from Sphinx a bit. This adds an extension to do such changes,
based on work from here:
https://github.com/pydata/pydata-sphinx-theme
...However, because we don't ship as an external theme, we can't just do
things as a part of __init__.py, so instead we register an extension
that does the heavy lifting for us.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8399>
Indeed, the hardcoded framebuffer cleanup doesn't handle "resolve".
For instance, this issue is triggered with "piglit/bin/glx-copy-sub-buffer -samples=2 -auto"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: f5bde99cbd ("gallium: plumb resolve attachments through from frontends -> pipe_framebuffer_state")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22554>
We had a lot of implicit locks going on even though there was strictly no
need in doing so. This makes the compilation APIs more atomic while also
providing a cleaner interface.
Not in the mood of splitting it up without deadlocking in the middle. So
it's one big commit sadly.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22434>
0: KIL -none.1111
Negate is not allowed for texturing opcodes, so the incorrect swizzle
was detected, however later optimization, where we try to rewrite incorrect
swizzles from constant (immediate) registers by adding a new ones with
correct order was interfering and not handling this correctly, so we
ended with
CONST[0] = { -1.0000 -1.0000 -1.0000 -1.0000 }
0: KIL const[0].xyz-w;
Even if it would get the swizzle right, texturing opcodes can't read from
constant registers, so just skip it and let this be handled by a later
part which inserts an extra mov instead.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Fixes: a8e1e5b5c2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22704>
Instead of forcing vertex buffer stride to be 4 byte aligned only,
DX10 actually allows the stride to be non 4-byte aligned but the
alignment of an element must be the nearest power of 2 greater or equal to the
width of the element's format, or 4, whichever is less. So the requirement is
better met with PIPE_CAP_VERTEX_ATTRIB_ELEMENT_ALIGNED_ONLY which if set to
TRUE, the sum of vertex element offset + vertex buffer offset + vertex buffer
stride must be aligned to the vertex attributes component size.
Note: PIPE_CAP_VERTEX_ATTRIB_ELEMENT_ALIGNED_ONLY cannot be set
with other alignment-requiring CAPs, so we have to return 0 for all the
other alignement CAPs.
This avoids some unnecessary software vertex translate fallback.
cc: mesa-stable
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22689>
Indeed, the function nir_to_tgsi() returns an ureg_get_tokens() allocated
object which is assigned locally. The ureg_get_tokens() allocated object
should be freed.
For instance, this issue is triggered with a llvm enabled lima,
"piglit/bin/gl-1.0-rendermode-feedback -auto -fbo":
Direct leak of 512 byte(s) in 1 object(s) allocated from:
#0 0x7faeaa4500 in __interceptor_realloc (/usr/lib64/libasan.so.6+0xa4500)
#1 0x7fa4a88f1c in tokens_expand ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:239
#2 0x7fa4a88f1c in get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:262
#3 0x7fa4a900f4 in copy_instructions ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2079
#4 0x7fa4a900f4 in ureg_finalize ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2129
#5 0x7fa4a91dfc in ureg_get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2206
#6 0x7fa4b20a2c in nir_to_tgsi_options ../src/gallium/auxiliary/nir/nir_to_tgsi.c:4011
#7 0x7fa4a0c914 in draw_create_vertex_shader ../src/gallium/auxiliary/draw/draw_vs.c:77
Fixes: b5e782f5f4 ("aux/draw: use nir_to_tgsi for draw shader in llvm path")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21924>
Commit 60bc7c0e22 ("freedreno: Specify GMEM tile alignment per GPU")
changed the tile_align_h from 32 to 16 (which _should_ be the correct
value). But this is causing failure in android 9 skqp dstreadshuffle.
(But not, seemingly, with the android 11 version of skia+skqp, which
picks the same tile size. So this is likely papering something over.)
For now, to unblock things, revert back to the previous tile_align_h.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22683>
We can now no longer rely on certain dirty bits to re-trigger draw time
resource tracking. We need to use the new fd_dirty*_resource() APIs.
Fixes `org.skia.skqp.SkQPRunner#gles_recordopts` on android 9.
Fixes: 0a62a874fc ("freedreno: Re-work dirty-resource tracking")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22683>
Previously, custom buffer descriptors were owned by a descriptor set. Now,
custom buffer descriptors are owned by the buffer. Additionally, we respect
the app-provided sizes when they're smaller than the buffer size, even if
robustness is not enabled, so that size queries work correctly.
This new design fixes several issues:
* Descriptor set copies were broken when they involved custom descriptors,
because the original descriptor set owned the lifetime of the custom
descriptor, the new one was just borrowing it. If those lifetimes didn't
line up, problems would arise.
* A single buffer with the same sub-view placed in multiplel descriptor sets
would allocate multiple slots, when it only really needed one.
* Custom buffer descriptors now lower the base offset to 0 to allow merging
multiple overlapping (ending at the same upper bound) descriptors. Since
the shader is already doing an offset add, making it nonzero is free.
* Dynamic buffer descriptors were incorrect before. The size passed into the
descriptor set is supposed to be the size from the *dynamic* offset, not the
size from the static offset. By allocating/populating the descriptor when
placed into the set, it prevented larger offsets from working correctly. This
buffer-owned design prevents cmdbufs from having to own lifetime of custom
descriptors.
Fixes dEQP-VK.ssbo.unsized_array_length.float_offset_explicit_size
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22639>
Fixes dEQP-VK.draw.renderpass.depth_bias.depth_bias_triangle_list_point
This is not complete, there's no slope scale or clamp handling, but it
does handle static or dynamic (though dynamic is untested).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22639>
Previously it was assumed that between the and the variable there was
only one deref.
To handle all cases a new function is introduced that recreates a chain
of derefs.
Fixes: 5a4083349f ("zink: add provoking vertex mode lowering")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22678>
We use dwords (32 bit) quite a bit around the code base. Previously
we used '* 4', '<< 2', or '* sizeof(uint32_t)' to go from dwords to
bytes. The conversion isn't always clear when other operations
happen in the same line, which can leave one wondering where the
multiplication came from.
PVR_DW_TO_BYTES() should make the code more obvious as well as
making the conversion more consistent.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22658>
Some chips support firmware based mcbp. If supported this means
radeonsi needs to allocate 3 buffers and pass them to the firmware.
From there, the firmware will handle mcbp and register shadowing
on its own so we don't need to insert LOAD packet in the preamble.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986>
src/virtio/vulkan/vn_common.c: In function ‘vn_ring_monitor_acquire’:
src/virtio/vulkan/vn_common.c:129:16: error: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Werror=implicit-function-declaration]
129 | pid_t tid = gettid();
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22489>
It's not a lazy loaded type so doing the Once::call_once in every
Platform::get gives us a pointless atomic, which might be slow on some
platforms.
Every application has to call clGetPlatformIDs so we only need to do it
there.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22649>
Simplify the code which puts query results into the destination
buffer:
* Use a uint64 or uint32 pointer instead of uint8 to write the results
to the buffer for less casting and simplifient pointer incrementing.
* Use MIN2() macro to be more concise.
And fix some indentation.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22467>
The first attempt at the sprintf would have consumed part of va, so if
we're going to recurse on overflow to try again in a new allocation then
we have to do our work on a copy.
This was a common failure mode for MESA_GLSL=source, where it would just print:
Mesa: info: GLSL source for fragment shader 1:
Mesa: info: (null)
Fixes: 7a18a1712a ("util/log: improve logger_file newline handling")
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22618>
Instead of doing a slow depth clear, we can do depth fast clear in
vkClearAttachments.
Sascha Willems occlusionquery demo shows more than 2% perf boost with
this series.
On Felix's Tigerlake with the GPU at fixed frequency, this patch
improves performance of RoTR by +0.5%.
v2: (Nanley Chery)
- Clear stencil surface along with depth.
- Check for multilayer resources.
- Lookout for state.attachments.
- Fallback on slow clear for BDW and CHV if conditional rendering
enabled.
- Keep flush in same function.
v3: (Nanley Chery)
- Return immediately after fast clearing.
- Remove unnecessary comment.
v4: (Nanley Chery)
- Add assertion for BLORP_BATCH_NO_EMIT_DEPTH_STENCIL.
- Remove unnecessary local variable.
- Add 3DSTATE_WM_HZ_OP comment.
v5: (Nanley Chery)
- Fix comments.
- Don't take fast depth clear path if BLORP_BATCH_PREDICATE_ENABLE set.
- Refactor code in can_hiz_clear_att.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20175>
The VK pipeline cache passes a NULL bytes with a nonzero size to a
NULL-data blob to set up the size of the blob. In this case, we don't
actually execute the memcpy, so the non-existent "bytes" doesn't need to
have defined contents. Avoids a valgrind warning:
==972858== Unaddressable byte(s) found during client check request
==972858== at 0x147F4166: blob_write_bytes (blob.c:165)
==972858== by 0x147F4166: blob_write_bytes (blob.c:158)
==972858== by 0x14695FFF: vk_pipeline_cache_object_serialize (vk_pipeline_cache.c:240)
[...]
==972858== Address 0x0 is not stack'd, malloc'd or (recently) free'd
Fixes: 591da98779 ("vulkan: Add a common VkPipelineCache implementation")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22617>
When a GPU hung happens, all contexts are notified. They will receive
INNOCENT_CONTEXT if they are not the context that triggered the reset,
or GUILTY_CONTEXT otherwise.
At radv_check_status(), we return on the first context that was notified
as [GUILTY, INNOCENT]_CONTEXT, without further checks. This can make an
app think that it's innocent if the guilty context is not the first one
on the list of hw_ctx to be checked.
Check every context for a guilty one before returning CONTEXT_INNOCENT.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22648>
it's stupid to do optimized background compiles if the driver is going
to create the exact same pipeline, so add a workaround to disable
this behavior
should improve ci runtimes on lavapipe by some amount
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22613>
Before testing the waddr for SFU we should first validate this
is indeed a valid (not NOP) magic write. Use the helper we have for
this which gets this right.
total instructions in shared programs: 12898957 -> 12850958 (-0.37%)
instructions in affected programs: 4328937 -> 4280938 (-1.11%)
helped: 19974
HURT: 439
Instructions are helped.
total max-temps in shared programs: 2211503 -> 2210893 (-0.03%)
max-temps in affected programs: 12924 -> 12314 (-4.72%)
helped: 509
HURT: 20
Max-temps are helped.
total sfu-stalls in shared programs: 22233 -> 21975 (-1.16%)
sfu-stalls in affected programs: 722 -> 464 (-35.73%)
helped: 297
HURT: 54
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 12921190 -> 12872933 (-0.37%)
inst-and-stalls in affected programs: 4337977 -> 4289720 (-1.11%)
helped: 20015
HURT: 404
Inst-and-stalls are helped.
total nops in shared programs: 333743 -> 305911 (-8.34%)
nops in affected programs: 86902 -> 59070 (-32.03%)
helped: 14545
HURT: 76
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22593>
Previously, whenever a vertex was emitted immediately after emitting a
primitive, that vertex would not use the attributes that where assigned
last because the position variable got set.
Now the temporary attributes array is treated as a ring buffer and
whenever the position is set to 0 it's previous value is used as an
offset when accessing it. This way when a new primitive is created the
attributes at index 0 correspond to the last attributes written.
Fixes: 5a4083349f ("zink: add provoking vertex mode lowering")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22599>
otherwise the pool is freed before the query and zink will
give the vulkan driver NULL query pool which can make it crash.
this was seen when running the following cases with
primitivesGeneratedQueryWithRasterizerDiscard and color write
features disabled:
dEQP-GL45.functional.tessellation.invariance.outer_triangle_set.triangles_fractional_odd_spacing
dEQP-GL45.functional.tessellation.invariance.outer_triangle_set.triangles_fractional_even_spacing
dEQP-GL45.functional.tessellation.invariance.outer_triangle_set.quads_equal_spacing
dEQP-GL45.functional.tessellation.invariance.outer_triangle_set.quads_fractional_odd_spacing
dEQP-GL45.functional.tessellation.invariance.outer_triangle_set.quads_fractional_even_spacing
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_equal_spacing_ccw
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_equal_spacing_ccw_point_mode
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_equal_spacing_cw
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_equal_spacing_cw_point_mode
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_fractional_odd_spacing_ccw
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_fractional_odd_spacing_ccw_point_mode
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_fractional_odd_spacing_cw
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_fractional_odd_spacing_cw_point_mode
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_fractional_even_spacing_ccw
dEQP-GL45.functional.tessellation.invariance.tess_coord_component_range.triangles_fractional_even_spacing_ccw_point_mode
Fixes: e5d517f362 ("zink: rework query pool overflow")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22575>
this reimplements the same functionality that exists already, but
using shader object instead of GPL
it must be disabled by default, as this extension is not (currently)
compatible with feedback loops
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22612>
When wfog support is advertised, unless an orthogonal
projection matrix is detected, w is supposed to be used
instead of z for the fog equation when done in the pixel
shader.
Due to the spec being ambiguous, and tests being incomplete,
it seems we had got things wrong.
New tests confirm the behaviour.
For the explanation we will denote z_vs and w_vs the position
output's z and w channels in the vertex shader, and
z_ps, w_ps the position input z and w channels in the pixel shader.
w_ps = 1/w_vs
z_ps = z_vs/w_vs
In the programmable pixel shader, we used z_ps/w_ps, thus z_vs.
As basically z_vs and w_vs are usually in the same range, we didn't
notice an obvious difference with the correct behaviour.
In the ff pixel shader, we used z_ps for zfog and w_vs else.
z_ps was always used if a programmable vertex shader was detected.
This latter behaviour led to issue
https://gitlab.freedesktop.org/mesa/mesa/-/issues/8341
While using z_ps/w_ps like for programmable ps fixes the issue visually
for the same reason as it did for programmable ps, it breaks
wine tests using XYZRHW. These tests show that when passing
pre-transformed vertices and an orthogonal projection matrix,
z_vs is used, and due to the XYZRHW property, this is not
recovered by the z_ps/w_ps computation (instead z_ps=z_vs).
For the game affected by the issue, the projection matrix set
is not orthogonal.
The direct3D spec indicates that the projection matrix must be
set correctly for fog to work properly, even if we do not use the
transformation pipeline (could be related to xyzrhw, or programmable vs
or both). Previous tests had shown that the projection matrix
has the last two values of the last column tested against 0 and 1,
in order to activate zfog or wfog.
The R500 spec indicates that either z or 1/1/w can be used as source
for the fog computation, but it is not clear whether this is z_vs or
z_ps.
Tests confirmed the intuition that the correct behaviour
is to use z_ps (zfog) when an orthogonal projection matrix is set
(the spec spirit being that in that case z_ps=z_vs),
and 1/w_ps (wfog) else (even if programmable shaders are used).
This patch introduces this behaviour.
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8341
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22583>
It appears to be possible that IDLE is observed before COMPLETE.
In this case, an application may access present_id in subsequent
QueuePresentKHR and race against the fence worker reading present_id.
Solve this by adding a separate signal_present_id that is used when
completing to avoid the race.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22633>
Previously, continue preambles and postambles were added directly to
the CS array which means all BOs were correctly added to the BO list,
and this has been broken recently. IB BOs need to be added to the list.
When a BO isn't added to the list as part of a submission, it might
randomly VM faults.
This fixes VM faults and random GPU hangs on NAVI21 in Mesa CI.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8849
Fixes: 41a9bced31 ("radv: Fill continue preambles and postambles properly.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22625>
Previously zink ignored whether multisampling was enabled and rendered
with mulisampling whenever the target buffer had multiple samples.
This change now will only render with multisampling when it is enabled
and will use a lowering pass to make sure this case is handled correcly.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22626>
when a ms fallback texture is created, it has to actually be a ms texture
in order to be consistent with driver expectations for a given sampler in
a shader
this adds sample querying to both ends of the fallback creation to ensure
that a sample count is passed to the driver
affects:
KHR-GL46.sample_variables.position.fixed.samples_0
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22492>
this is only hit when populating multisampled fallback textures, so
don't assert if it fails since some drivers are able to handle it
d3d12 can't, however, and this should be enough to work around that issue
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22492>
vkcts-navi10-valve has the nasty habit on hanging the GPU, so we
introduced an auto-retry... but for every radv job. Let's stop doing
that, and instead limit the auto-retry to vkcts-navi10-valve only.
Additionally, let's increase the number of attempts to 3 (2 retries),
as sometimes, it may still fail and we don't want to flag it as a
fail in nightly runs.
Let's hope we'll get to the bottom of this hang sooner rather than
later, so that we can remove this hack!
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22596>
So far we were only considering the number of vertices to draw to
compute the offset in a stream output buffer.
But this is not correct, as it depends on the primitive type too. For
instance, with 4 vertices, if we use a triangle strip primitive, then 2
triangles are generated from those 4 vertices, so 6 vertices will be
captured.
This fixes spec@!opengl es
3.0@gles-3.0-transform-feedback-uniform-buffer-object.
CC: 23.1
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22607>
VulkanCTS can receive patches from a reference to an upstream commit or by a
file stored in Mesa. Those locally stored patches for VulkanCTS should be
stored in the specific directory for patches with a prefix like skqp does.
The schema of how both sources apply patches has received a slight
modification to resemble each other.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22463>
This lets us follow the Vulkan spec requirements for MSAA line
rasterization, using a width of 1.0 instead of D3D's proscribed
width of 1.4. There's no reason to predicate this on MSAA being
enabled, since quadrilateral lines with a width of 1.0 are actually
the most desired type of line rasterization for Vulkan.
Follow-ups:
- We can probably turn on 'strict lines' when this is supported.
- We should enable the line rasterization mode extension.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22606>
Initially this was just adding a missing popd, but actually there's no
reason to pushd into the build dir, so let's just pass the build dir as
arguments to cmake & ninja instead.
`--arch x64` was also dropped as it only applies to Windows builds,
which this script doesn't support anyway.
Fixes: 512f1c160a ("ci/zink: Add coverage using the vulkan validation layer on lvp.")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22488>
For MTL (verx10 == 125), float64 is supported, but int64 is not.
Therefore we need to lower cluster broadcast using 32-bit int ops.
For gfx12.5+ platforms that support int64, the register regions
used by cluster broadcast aren't supported by the 64-bit pipeline.
On MTL, dEQP-VK.subgroups.clustered.*_double* and
dEQP-VK.subgroups.clustered.*_dvec* were failing to validate the
compiled shader in debug mode, and reportedly gpu-hanging in release
mode.
With this change dEQP-VK.subgroups.clustered.*_double* passed all 48
tests and dEQP-VK.subgroups.clustered.*_dvec* passed all 140 tests on
MTL.
Rework:
* Move from generator to brw_fs_lower_regioning.cpp. (Suggested by
Francisco)
* Apply to verx10 >= 125.. (Suggested by Francisco)
Cc: 23.1 <mesa-stable>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>
This is really noticeable for games that resolve a bunch of occlusion
queries (in this case 4096) because it seems that emitting 4096
WAIT_REG_MEM packets can stall more than expected. Fixes this by
waiting for queries in the resolve query shader.
This improves performance of an unreleased game by +~10% (71->78 FPS).
RADV should now be really close to Windows performance for that title.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22579>
This function was returning the size of a single region header as the stride
when it was supposed to be returning the total size of the region headers for a
single render target. This went unnoticed due to the fact this function had two
variables with basically identical names. To avoid any future confusion, rename
rgn_header_size to single_rgn_header_size throughout the code.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22574>
if an attachment other than the msrtss blit attachment has clears pending,
unbinding the other attachment will trigger a clear flush, which will then
recurse into the msrtss blit that's being triggered
instead, save/restore these clears around the msrtss blit since they
can be executed during the normal renderpass
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22577>
with the new zsbuf elimination handling, the fb state calculated in
u_blitter's fb restore may be incorrect if the zsbuf has indeed been
eliminated, so ensure the right fb is stored to be reapplied so that
misrenders will be avoided
fixes some crashes/misrenders in webgl
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22577>
The new code splits the work into a few passes instead of trying to do
everything with a single pass. This helps to apply the new clarified
rules for structured control flow in the SPIR-V specification, in
particular the "exit construct" rules.
First find an appropriate ordering for the blocks, based on the
approach taken by Tint (WebGPU compiler). Then, with those blocks
in order, identify the SPIR-V constructs start and end positions.
Finally, walk the blocks again to emit NIR for each of them, "opening"
and "closing" the necessary NIR constructs as we reach the start and
end positions of the SPIR-V constructs.
There are a couple of interesting choices when mapping the constructs
to NIR:
- NIR doesn't have something like a switch, so like the previous code,
we lower the switch construct to a series of conditionals for each
case.
- And, unlike the previous code, when there's a need to perform a
break from a construct that NIR doesn't directly support (e.g. inside
a case construct, conditionally breaking early from the switch), we
now use a combination of a NIR loop and an NIR if. Extra code is
added to ensure that loop_break and loop_continues are propagated
to the right loop.
This should fix various issues with valid SPIR-V that previously
resulted in "Invalid back or cross-edge in the CFG" errors.
Thanks to Alan Baker and David Neto for their explanations of
ordering the blocks, in the Tint code and in presentations to
the SPIR-V WG.
Thanks to Jack Clark for providing a lot of valuable tests used to
validate this MR.
Closes: #5973, #6369
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17922>
- VAR31 was ignored.
- Only a half of the 16-bit slot was passed through, though I'm not sure
if nir_lower_io handles vec8. The slots are only for GLES and I don't
think a passthrough TCS is possible with GLES.
Fixes: a8e84f50bc - nir: Add helper to create passthrough TCS shader
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21861>
The buffer max_index value in translate_generic struct is relevant for
indexed draw only. So do not clamp the element index in generic_run() as it
is called for non-indexed draw only.
This patch passes index_size to the common generic_run_one function
so index clamping is only performed when a non-zero index_size is specified.
This fixes a text selection bug with kitty terminal emulator running on ARM
when it falls back to the generic translate path for unsigned byte vertex
array.
cc: mesa-stable
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22568>
This commit introduces the StructuredLogger module, which provides a
robust and flexible logging utility supporting multiple data formats
(CSV, JSON, and YAML). By incorporating this module into our CI system,
we enhance our log management capabilities, making it easier to:
1. Monitor and analyze logs: The StructuredLogger is a dict-like data
abstraction which autosaves into a structured data file, whenever it
is updated. With this file, one can easily know specifics of the job
execution without having to grep it in the traces logs or exploring
the job artifacts. The autosave feature makes it useful even when the
CI job fails unexpectedly, since the partial dict is always written
back to the disk.
2. Maintain data integrity: The module includes context managers for
file locking and editing log data, ensuring data integrity and
preventing race conditions.
3. Support multiple formats: With built-in support for CSV, JSON, and
YAML formats, this module caters to a wide range of use cases and
user preferences.
4. Increase maintainability: The modular design of the StructuredLogger
and its corresponding strategies simplifies maintenance and allows
for seamless integration of additional formats in the future.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>
Refactor some pieces of the submitter to improve the clarity of the
functions and create a simple dictionary with aggregated data from the
submitter execution which will be dumped to a file when the script
exits.
Add support for the AutoSaveDict based structured logger as well, which
will come in a follow-up commit.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>
Update the LogFollower class to improve section handling and provide a
history of sections encountered during log processing:
1. Add section_history attribute to store the history of encountered
GitlabSections.
2. Make LAVA job submitter use the section history feature to improve
structural logging.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>
Let's make lava_job_submitter.py cleaner with only parsing and retry
mechanism capabilities.
Moved out from the submitter script:
1. proxy functions
- moved to lava.utils.lava_proxy.py
2. LAVAJob class definition
- moved to lava.utils.lava_job.py
- added structural logging capabilities into LAVAJob
- Implemented properties for job_id, is_finished, and status, with
corresponding setter methods that update the log dictionary.
- Added new methods show, get_lava_time, and refresh_log for improved
log handling and data retrieval.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>
This does two things:
1. Flush the command buffer and associate a fence with each
glLinkProgram().
2. Force the application calling glLinkProgram() to wait on the
associated fence, matching the semantics of native drivers.
This important for some workloads and some environments. For example, on
ChromeOS devices supporting VM-based android (ARCVM), an app's HWUI thread
may be configured to use skiagl, while the app may create its own GLES
context for custom rendering. Virgl+virtio_gpu supports a single fencing
timeline, so all guest GL/GLES contexts are serialized by submission
order to the guest kernel.
If the app's submits multiple heavy shaders for compliation+linking
(glCompileShader + glLinkProgram()), these are batched into a single
virtgpu execbuffer (with one fence). Then rendering performed by the
HWUI thread is blocked until the unrelated heavy host-side work is
finished. To the user, the app appears completely frozen until finished.
With this change, the app is throttled in its calls to glLinkProgram(),
and the HWUI work can fill in the gaps between each while hitting most
display update deadlines. To the user, the UI may render at reduced
framerate, but remains mostly responsive to interaction.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22341>
This deduplicate two identical bytes_to_hex implementation into one.
The intention is to ease the introduction of a new hash algorithm, which
will also have its formatting helper (to ensure seamless transition from
sha1).
Note that the new functions always take the size of the binary buffer,
unlike the old disk_cache_format_hex_id which took `binary * 2` which was
inconsistent (binary size is `binary` and string size is `binary * 2 + 1`).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22527>
For MUX.v2i16 should be:
`MUX.v2i16.bit A, B, mask` calculates `(A & mask) | (B & ~mask)`
For MUX.v4i8 should be:
`MUX.v4i8.bit A, B, mask` calculates `(A & mask) | (B & ~mask)`
Signed-off-by: Signed-off-by: Aleksey Komarov <q4arus@ya.ru>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20467>
Since there are concerns that the VK_EXT_GPL implementation may have
issues with mesh shading, disable it by default but give users a knob to
turn it on to experiment.
This doesn't automatically enable GPL use in zink, because we lack
extendedDynamicState2PatchControlPoints, but it means that you only need
to set ZINK_DEBUG=gpl and not both env vars.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With independent sets, we're not able to compute immediate values for
the index at which to read anv_push_constants::dynamic_offsets to get
the offset of a dynamic buffer. This is because the pipeline layout
may not have all the descriptor set layouts when we compile the
shader.
To solve that issue, we insert a layer of indirection.
This reworks the dynamic buffer offset storage with a 2D array in
anv_cmd_pipeline_state :
dynamic_offsets[MAX_SETS][MAX_DYN_BUFFERS]
When the pipeline or the dynamic buffer offsets are updated, we
flatten that array into the
anv_push_constants::dynamic_offsets[MAX_DYN_BUFFERS] array.
For shaders compiled with independent sets, the bottom 6 bits of
element X in anv_push_constants::desc_sets[] is used to specify the
base offsets into the anv_push_constants::dynamic_offsets[] for the
set X.
The computation in the shader is now something like :
base_dyn_buffer_set_idx = anv_push_constants::desc_sets[set_idx] & 0x3f
dyn_buffer_offset = anv_push_constants::dynamic_offsets[base_dyn_buffer_set_idx + dynamic_buffer_idx]
It was suggested by Faith to use a different push constant buffer with
dynamic_offsets prepared for each stage when using independent sets
instead, but it feels easier to understand this way. And there is some
room for optimization if you are set X and that you know all the sets in
the range [0, X], then you can still avoid the indirection. Separate
push constant allocations per stage do have a CPU cost.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With independent sets, we cannot bake into the shader the binding
table entry of input attachments anymore because that final location
is affected by multiple sets.
We can still access them by looking into the descriptor buffer. This
change enables the image handle to be stored in the descriptor buffer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
With variable fragment shading rate, the last pre-rasterization stage
is responsible to write the shading rate value.
The current checks is as follow :
If the fragment shader can be dispatched at variable shading rate,
look for the last pre-raster stage to force the write.
We change this to :
If we're the last pre-raster stage, force the write.
That way this works for pre-rasterization shaders compiled without a
fragment shader.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
Indeed, the current framebuffer hardcoded cleanup
is not sufficient.
For instance, this issue is triggered with:
"piglit/bin/fbo-depthstencil clear default_fb -samples=2 -auto"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22234>
For CHIP_GFX1100, there are 2 VCN instances but using unified queue i.e.
decode and encode will go to HW via same ring type. With AMDGPU kernel
scheduler, since the trancode is sharing the same pipe context, so that
the gpu scheduler assign the decode and encode into the same VCN engine.
In order to use both engines with transcode case, the new pipe context will
be created when the case being detected, with that the transcode can be
load balanced with multiple VCN engines.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22471>
To implement some of the features of the layer, we need to enable some
of the feature bits at device/command_buffer creation. To do so, we
need to edit some of the structures coming from the application. Most
of those are const so we need to clone them before edition.
This change disables some of the layer features if we run into a
situation where one of the structure we need to clone is unknown such
that we can't make a copy of it (since we don't know its size).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7677
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19897>
Indeed, the reference was overwritten.
For instance, this issue is triggered with:
"piglit/bin/shader_runner tests/spec/arb_shader_image_load_store/execution/write-to-rendered-image.shader_test -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: a6b3792843 ("r600: add core pieces of image support.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22394>
This is the analogous of
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9490 but for
r600.
Discoloration of NV12 video frames was observed in Chrome/ChromeOS and
the problem was tracked down to the fact that Mesa was following the
PIPE_FORMAT_R8_G8B8_420_UNORM/lower_yuv_external() path. The symptom is
that (for an unknown reason) the YUV-to-RGB conversion is using the
value of Y as the value of Y, U, and V. So, for example, if the input
value is YUV = (50, 120, 130), then what actually gets converted to RGB
is YUV = (50, 50, 50).
Considering that PIPE_FORMAT_R8_G8B8_420_UNORM was introduced for
freedreno
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6693) and it
is already being reported as unsupported for radeonsi, it's reasonable
to assume that GPUs targeted by r600 don't support this path either.
Note: I tested this patch with an AMD Palm device which follows the
evergreen_is_format_supported() path. I did not have access to a device
to test the r600_is_format_supported() path.
v2: Changed >= 2 to > 1.
Fixes: 826a10255f ("st/mesa: Add NV12 lowering to PIPE_FORMAT_R8_G8B8_420_UNORM")
Tested-by: Andres Calderon Jaramillo <andrescj@chromium.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22511>
previously there was a fallback path here (broken by f6d3a5755f)
which would attempt to demote BAR allocations to other heaps on failure
to avoid oom
this was great, but it's not the most robust solution, which is to iterate
all the memory types matching the given heap and try them in addition to having
a demotion fallback
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22479>
This fixes some CTS compiler tests where they relied on the cl_kernel
object to be released in time so it can recompile a program without
throwing CL_INVALID_OPERATION due to still having active kernel objects.
Fixes: 47a80d7ff4 ("rusticl/event: proper eventing support")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22510>
I doubt the shebang line in the backend script has any effect now,
since the frontend scripts just source it directly.
v2:
* Use "set -e" instead of adding -e to shebang (Eric Engestrom)
v3:
* Apply to the clang wrapper scripts as well (Eric Engestrom)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22438>
The previous indirect method was more complicated and still error prone.
v2:
* Use "grep -E" (Eric Engestrom)
* Exclude spaces and slashes in the grep pattern, to avoid accidentally
matching across unrelated compiler arguments.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22438>
INTEL_MEASURE normally measures timing of GPU events. However, it
is sometimes useful to instead measure when these gfx API calls
were requested of the driver. INTEL_MEASURE cpu can be used in
in conjunction with other driver debug capabilities, like
INTEL_DEBUG=pc for analyzing stalls/flushes or when debugger is
attached, to track which frame you're currently on or where in
the frame you're at.
Initial commit, without plumbing into anv/iris.
"INTEL_MEASURE=cpu" will collect a cpu timestamp for each
INTEL_MEASURE event instead of GPU timestamps.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21505>
Some comparisons were wrong depending on swapped endpoints and
the bottom_edge_rule (if the endpoint was exactly touching
pixel center), hence the code assuming a line endpoint which
should be drawn according to diamond exit rules would already be
drawn (so not adjusting the endpoint) when in reality it was not,
in which case the line would end up one pixel too short.
Note that this is still not fully correct - the logic as such
should be correct now, however these comparisons can give wrong
results due to float math vs. fixed point planes (an endpoint very
close to a pixel center might be exactly at pixel center in fixed
point), but this should do for now. (Also, the logic is still
completely wrong for state trackers not using half_pixel_center
setting, but this is only really a concern for dx9 state tracker.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22481>
For instance, this issue is triggered with: "piglit/bin/useprogram-flushverts-2 -auto -fbo" or
"piglit/bin/primitive-restart-draw-mode line_loop -auto"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: 27dcb46629 ("gallium: add take_ownership param into set_vertex_buffers to eliminate atomics")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22395>
Ensure builds flag the use of Intel sources when the i915 Gallium driver
is configured (`-Dgallium-drivers=i915`). Otherwise, a build may fail if
other Intel-based configuration options are not enabled:
./src/gallium/winsys/i915/drm/meson.build:21:0: ERROR: Unknown variable "libintel_common".
Signed-off-by: James Knight <james.d.knight@live.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22490>
At the moment it's an all or nothing. A driver supporting fine-grained
system SVM can enable it in order to get full SVM support.
Lower levels could be emulated by userptrs and placing the bo at the same
locations in the GPU's VM as well, but that would require reworking quite
a bit on the drivers side.
For now supporting mmu_notifiers on the kernel side is the only way of
getting SVM support with Rusticl.
The only driver having the gallium bits wired up atm is Nouveau, but I
suspect it shouldn't be all to hard for iris and radeonsi as well.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19712>
Add more system values for XFB. This should be good enough for lowering GL3.1 +
transform_feedback2 + transform_feedback3. More will probably be needed for
geom/tess but that will be easier to work with when I'm actually bringing up
geom/tess. At any rate, we're splitting out XFB from the rasterization pipeline
and since XFB happens only in the last shader pre-rasterization stage, VS+XFB is
an orthogonal problem from e.g. VS+GS+XFB. Yeah, the combinatorics suck.
These will be used by Asahi, and hopefully eventually Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22123>
ctx->get_sample_position doesn't change what it returns based on the programmed
positions, it's just supposed to return the defaults. For most (all?) hardware,
those are the Vulkan standard sample positions. In bf9a1e0a4b ("zink: add a
pipe_context::get_sample_position hook"), Mike wondered why there wasn't a
common implementation. So here's one to fix that :~)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22383>
With the on_demand shaders feature, meta pipelines are only created
when they are used, otherwise they are NULL. Though, inside secondary
cmdbuffers, the graphics pipeline might be also NULL. In this specific
case, radv_is_{dcc,fmask}_decompress_pipeline() would return
TRUE because these pipelines are NULL too...
This fixes flakes with tests that use secondary cmdbuffers with
TC-compat images.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22440>
For the CL spec it really matters how a program object was created. We
never really cared all that much, but it didn't support the corner case of
having an empty string as the OpenCL C source code.
Enums feel like the more Rust way to do this kind of stuff anyway.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22280>
We want to validate the actual passed in SPIR-V, but we can only report
errors back on build/compile time. So instead of storing the initial IL
in the devices `ProgramBuild` objects, just store it on the Program
instead. This also simplifies setting spec constants as this is only valid
on program directly created from IL and not e.g. linked ones.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22280>
In OpenCL, we can have no shader-defined shared memory but some dispatch-time
variable memory. This is not reflected in ss->info.wls_size, so check the right
variable instead so we allocate the appropriate memory.
Fixes page faults accessing shared memory with Rusticl, e.g. in the vstore_local
test.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22228>
OpenCL can generate large loads and stores that we can't support, so we need to
lower. We can load/store up to 128-bits in a single go. We currently only handle
up to 32-bit components in the load and no more than vec4, so we split up
accordingly.
It's not clear to me what the requirements are for alignment on Valhall, so we
conservatively generate aligned access, at worst there's a performance penalty
in those cases. I think unaligned access is suppoerted, but likely with a
performance penalty of its own? So in the absence of hard data otherwise, let's
just use natural alignment.
Oddly, this shaves off a tiny bit of ALU in a few compute shaders on Valhall,
all in gfxbench. Seems to just be noise from the RA lottery.
total instructions in shared programs: 2686768 -> 2686756 (<.01%)
instructions in affected programs: 584 -> 572 (-2.05%)
helped: 6
HURT: 0
Instructions are helped.
total cvt in shared programs: 14644.33 -> 14644.14 (<.01%)
cvt in affected programs: 5.77 -> 5.58 (-3.25%)
helped: 6
HURT: 0
total quadwords in shared programs: 1455320 -> 1455312 (<.01%)
quadwords in affected programs: 56 -> 48 (-14.29%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22228>
The query buffer contains a batch to implement the multi pass
replay/accumulation of results. So we can't clear it with a memset.
An optimization for later would be to move the batches to the very end
of the query buffer so we can clear the query data without touching
the batches.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4dc7256bf9 ("anv: reset query pools using blorp")
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22421>
Primitive export used the gs_accepted variable after culling,
so we overwrote this variable after vertex compaction to make
sure not to hang the GPU.
This had an unintended side effect when storing the primitive ID
to LDS on GS threads: the LDS store was done even on threads whose
triangle was culled; potentially causing issues.
As a fix, create a separate boolean variable that remembers
which invocations need to export a primitive; and don't store
the primitive ID to LDS when gs_accepted is false.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8805
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22424>
when a new resource is created for an extant swapchain, the existing
acquire (if any) should be transferred to the resource to ensure
expected behavior
this should be enough to fix piglit's glx-make-current
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22431>
this is important for handing off the swapchain between resources
on makecurrent since a context that is made not-current will have its
swapchain resources destroyed while the swapchain persists
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22431>
This information was wrong in some places, let's fix it now.
GFX6:
The GPU has 64KB LDS, but only 32KB is usable by a workgroup.
NGG:
There was some misinformation about NGG only being able to
address 32 KB LDS, it turns out this is actually not true
and it can address the full 64K.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21935>
in a scenario like:
* bind fb
* clear
* bind fb attachment as sampler
* begin rp
* draw
* end rp
* flush
* bind new fs
* begin rp
* draw
the first draw will have an implicit feedback loop, but the second one will not
need a feedback loop. since no samplers or attachments are changed between
draws, however, the feedback loop will remain active for successive renderpasses,
which is problematic since the shader part of the driver (zink_update_barriers)
attempts to eliminate these same feedback loops, leading to layout desync
instead, add handling to attachment prep here to eliminate feedback loops
in the event that an attachment can be switched from a write layout to a read layout
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22423>
Imported buffers may be created in a device with different
memory alignment and this can cause vm bind to fail because bo
size can be smaller than the calculated vm bind range using the
importer device memory alignment.
So here adding actual_size to anv_bo, this will be set with the actual
size of the bo allocated by kmd for bos allocate in the current device.
For other bo the lseek or the Vulkan API size will be used.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22219>
gbm_bo_map returns a stride for the mapping, which may differ from the
stride of the underlying BO. Drivers may implement mappings via staging
blits, returning a map of a temporary resource instead. That temporary
may have fewer stride restrictions (i.e. it isn't used for display), and
thus be more tightly packed, saving memory.
However, gbm_gralloc has a design flaw where after calling gbm_bo_map,
it asserts that the stride exactly matches the original BO's stride:
assert(stride == gbm_bo_get_stride(bo));
This is a bad assumption, as the GBM API returns a stride explicitly
precisely because it -can- differ. But, this would require significant
changes to gbm_gralloc to fix. So, to work around it, we add a driver
hack for Android-only that forces staging maps of any external BO to use
the original resource's stride.
This should fix issues with mapping cursor planes and SW media codec
uploads on Android-x86.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7974
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22156>
We'll want to create temporary staging images with explicit strides
in the next commit. This extends iris_resource_create_with_modifiers
to have an explicit row_pitch_B parameter (0 continues to mean "let
ISL pick one").
Because resource_create_with_modifiers() is a driver hook, we can't
just add a parameter, so unfortunately we gain another wrapper layer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22156>
this was improperly added into the conditional for removing a prog from the
ctx hash when it had no relation to that code, leading to refcount
leaks that ended up leaking the whole thing
Fixes: 487ac6dbd6 ("zink: implement cross-program pipeline library sharing")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22397>
This fixes some cases where flush is called from the app without work being scheduled before, causing d3d12_promote_to_permanent_residency
to be called with garbage pointers/arguments.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22415>
ms.sample_mask is only 16b, while VkSampleMask is 32b and it is allowed
to have all of them set even if maximum 16 samples are supported.
E.g. happens with Zink running supertuxkart:
supertuxkart: ../../../source/mesa/src/vulkan/runtime/vk_graphics_state.c:2346: vk_common_CmdSetSampleMaskEXT: Assertion `(dyn)->ms.sample_mask == (*pSampleMask)' failed.
vk_common_CmdSetSampleMaskEXT (commandBuffer=0x5556e903f0, samples=VK_SAMPLE_COUNT_1_BIT, pSampleMask=0x5556819ccc) at vk_graphics_state.c:2346
zink_draw<(zink_multidraw)1, (zink_dynamic_state)5, true, false> (...) at zink_draw.cpp:639
zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)5, true> (...) at zink_draw.cpp:922
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22360>
This commit consolidates the queue submit code paths into one.
Now we always allocate BOs for every CS, but when IBs aren't
allowed, we simply submit every BO to the kernel.
A microbenchmark done by Bas indicated that submitting more IBs to
the kernel only adds a negligible overhead. Additionally, this
allows us to stop copying the command buffer contents in system
memory and get rid of a lot of legacy code.
In order to be able to submit every BO, we make sure to add the
last BO to the old_ib_buffers array on cs_finalize. This also
necessitates some changes in cs_execute_secondary and other
functions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22354>
This will still make it so that RADV_DEBUG=hang will only submit
one command buffer at a time, but otherwise let's pass all CS
objects into one submission and let the winsys split them if
necessary.
The winsys can do a better job at splitting them because
radv_queue has no knowledge of IBs and ignores chaining in the
splitting logic.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22354>
In a gang submit, the follower (typically ACE) and leader
(typically GFX) can have synchronization between each other.
We must ensure that these end up in the same submission,
otherwise we can deadlock the GPU.
We rely on radv_queue here to order follower before the leader
in the submitted CS array.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22354>
Currently, radv_queue already splits submissions but we want to
change this and be able to split them in the winsys code as well.
Necessary because we want to split based on number of actual
IBs not number of command buffers, but radv_queue is not
aware of IBs.
Note that this commit does not actually take this new split into
use yet, that will be done in a following commit when it is ready,
this is why we set the max IB count higher than radv_queue here.
This commit is the first step in making "fallback" the default and
only submission code path.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22354>
Skipping the continue preamble can allow other processes to mess
up some registers set by the current process.
Originally, we could omit generating the continue preamble when
no shader rings were used, because the register initialization
happened at the beginning of every main cmdbuf. However, this
isn't the case anymore.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22354>
Indeed, the unnecessary drmDevice objects were not freed.
For instance, this issue could be triggered with: "piglit/bin/egl_ext_platform_device -auto -fbo":
SUMMARY: AddressSanitizer: 2796 byte(s) leaked in 12 allocation(s).
Fixes: e39d72aec2 ("egl: only take render nodes into account when listing DRM devices")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22408>
The base SGPR and the number of SGPRs can be equal but it was incorrect
because one VS can have draw_id and one can have base_instance. Fix
this by invalidating the vertex user SGPRs unconditionally.
Though they should also be invalidated after executing secondaries,
otherwise nothing is invalidated if the same pipeline is bind to the
primary again.
This fixes dEQP-VK.dynamic_rendering.primary_cmd_buff.random.seed*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21652>
Replaces the RADV pipeline cache with an implementation
based on the common vk_pipeline_cache.
We use a dual-layer approach with two types of cache entries.
1. radv_shader:
- serialized as radv_shader_binary
- uses SHA1 of the binary as key
2. radv_pipeline_cache_object:
- contains pointers to associated radv_shaders
- serialized as list of SHA1
- uses the pipeline hash as key
In combination with single-file disk-cache, this reduces the cache size by ~60%.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22030>
This function takes a radv_shader_binary and writes it to the
disk cache before creating and returning a radv_shader cache entry.
The key of the cache entry is the full SHA1 hash of the binary.
This way, we will be able to deduplicate identical shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22030>
The instructions manipulation cr0 use the default mask on lane0. So if
for some reason that lane is disabled in some of the dispatchs, we can
end up not executing the instructions.
Fixes flakyness in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_float_32_to_16.uniform_matrix_float_rtz_frag
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22314>
In addition to requiring per-sample interpolation, sample shading
changes the behaviour of gl_SampleMaskIn, so we need per-sample shading
even if there are no shader-in variables at all. In that case,
uses_sample_shading won't be set by glsl_to_nir. We need to do so here.
Affected dEQP test on asahi:
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.bits_unique_per_two_samples.multisample_texture_4
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22402>
We're going to enforce clang-format in CI, so get with the program! This doesn't
change a *ton* all considered, because panvk was already aiming for the style we
have in the panfrost clang-format file.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22372>
There's bad interactions with dynamic buffers at this point:
* Perf issues due to allocating and freeing the buffer to store indices/offsets
* Large dynamic uniform buffer offsets (above 65K) cause out-of-bounds reads
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22371>
The scenario of:
* App binds multiple descriptor sets
* App binds a pipeline that uses a subset of them
* App binds a pipeline that uses more of them
was broken. We were only copying the descriptors for the accessible
subset before, but then clearing all dirty bits, so simply changing
the pipeline wouldn't result in more descriptors being copied.
When running not-bindless, the right thing to do is to copy *all*
descriptors if we're copying any. When running bindless, each parameter
is set separately, and more importantly, *can't* be set on the command
list if the root signature can't access them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22371>
This case is a new addition in GFX11, and according to PAL, when no
depth/stencil attachment is bound, it must be set to the number of coverage
samples (the number of SampleMask bits - which is MSAA_EXPOSED_SAMPLES):
4640888b57/src/core/hw/gfxip/gfx9/gfx9UniversalCmdBuffer.cpp (L6978)
Without this change, the maximum of depth/stencil and color sample counts
is used, and if there are no depth/stencil or color attachments (target-
independent rasterization), the Depth Block assumes 1 coverage sample, and
thus Primitive Ordered Pixel Shading doesn't work correctly (and fails 4xAA
fragment shader interlock CTS tests), and occlusion queries don't count the
correct number of samples (according to the "Sample Counting" section of
the Vulkan specification, "the occlusion query sample counter increments by
one for each sample with a coverage value of 1...")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22375>
There's no guarantee that this is a SSA value. Use the helper to handle
both SSA values and register correctly. Otherwise we read trash when we
encounter a register and make bad decisions on types, possibly leading
to our destination being UQ typed when the VGRF is only 32-bit.
Fixes compilation with -Dintel-clc=enabled since 7f6491b76d
(nir: Combine if_uses with instruction uses) but the bug is much older
than that, circa 2017. We were just getting lucky before.
Fixes: 069bf7c907 ("i965/fs: Match destination type to size for ballot")
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22374>
Like nir_instr_rewrite_ssa but without the asserted extra argument. Works on ifs
too, now that we have a unified use list.
We do need to assert that the source has actually been inserted and has valid
use/def chains. Previously, asserting on the parent instruction accomplished
that indirectly. For the more general helper, we instead directly assert that
there exists a non-null parent, whatever it is.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists. That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.
To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.
However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.
There's a bit of churn, but this is largely a mechanical set of changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Sanitizes properties returned through GetPhysicalDeviceFormatProperties2.
Bit-31 is not valid to return in the original
vkGetPhysicalDeviceFormatProperties{2,}. Sanitize the bit returned from the
internal to ensure invalid bits aren't return to the application.
Falls in line with the other vulkan drivers.
Based on original commit by Ryan Houdek.
Closes: #8733
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22217>
When the driver gets a cache hit for the binary, we still have to
retain shaders because we can't know if the LTO pipeline will be a
cache hit as well.
Though, serializing the NIR is too costly and most of the libraries
took more than 10ms to be created, which isn't acceptable. To fix this,
keep track of the shaders stage info for libs with the RETAIN flag.
This might be replaced by NIR caching later if it's worth a try.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22327>
`nir_to_spirv` has flat arrays to map driver_location to sampler
variables.
Now when bindless textures are used together with non binless textures
the sampler vars are in different descriptor sets and the binding can be
the same between different descriptor sets, this causes a collision in
arrays.
This patches chamges `nir_to_spirv` to also index the array by whether
the texture is bindless.
Fixes: bc202553e9 ("zink: implement bindless textures")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22358>
If vertex does not complete a primitive, it should not set the odd
flag which miss lead liveness check when culling is enabled.
For example, if odd flag is set regardless of complete flag, when
culling is enabled, 3 vertices of a triangle's init prim flag:
[0x00 0x04 0x01]
then after culling, this triangle has been culled, their prim flag:
[0x00 0x04 0x00]
the second vertex is miss treat as live because its odd flag (code
check prim_flag!=0 for liveness).
Fixes: 1bdeb961bd ("ac/nir/ngg: add gs culling")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8725
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22304>
Our display controller can handle arbitrary GPU imports, so there is no
reason to use dumb KMS buffers. Allocate everything on the GPU instead.
This also allows us to be lazy about mapping things to the KMS side, so
only clients that really want a KMS handle actually do that, which stops
us from ending up with a bunch of junk mapped to DCP (e.g. X11 clients
always request SCANOUT even under XWayland).
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
A common pattern is to allocate a vertex/etc buffer and write to it in
subsets. Some games interleave this with draw calls using the buffer.
This causes very expensive flushing for every draw call.
Fix this by tracking which range of a buffer has been written to, and
elide syncs when the range was previously uninitialized.
Fixes Source engine game performance and probably helps a bunch of
others.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
This is the name applegpu is currently using, to capture the semantics of a
pixel fence. I'm not sure what Apple calls this but wait_pix is closer than
writeout for sure.
This commit just does the rename. It doesn't fix the broken semantics we've had,
this is to ease review and bisection.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
There's a second instruction here, and a second source in the first instruction.
applegpu has known about the encodings for a while but I never updated the
packing code. We will need to stop hardcoding this for multisampling support, as
preparation tease apart the magic pieces.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
Khronos APIs require that we support mipmapping even for 1D textures. However,
it isn't clear if this is supported in the hardware, and how it would work even
if it is. But 1D textures are pretty useless, so we just lower 1D textures to 2D
textures instead of worrying about that.
Fixes piles of Piglits relating to 1D textures.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
Use the same silly workaround that Metal does, to fill in texture descriptors
when there's nothing bound in the interest of robust behaviour.
Fixes null pointer dereference in
arb_shading_language_420pack-active-sampler-conflict.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
Try harder to coalesce collects, by trying to allocate collects only to regions
of the register file where we actually have a full vector worth of registers
free. If we already know that the vector will be blocked later, it's not a good
base register to pick since we'd be force to shuffle later. So, this tweak to
the collect coalescing heuristic lets us eliminate a pile of pointless copying.
shader-db results are excellent. Note that, although we use more registers,
none of the shaders tested had their thread count affected, likely because the
max HURT isn't too high and most of the scary % here is from using a few more
registers when the register pressure is already low. In the near future, that
property will become guaranteed thanks to live range splitting, too.
total instructions in shared programs: 1507337 -> 1500562 (-0.45%)
instructions in affected programs: 428137 -> 421362 (-1.58%)
helped: 2658
HURT: 167
helped stats (abs) min: 1.0 max: 34.0 x̄: 2.63 x̃: 2
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.04% x̃: 2.14%
HURT stats (abs) min: 1.0 max: 10.0 x̄: 1.24 x̃: 1
HURT stats (rel) min: 0.20% max: 23.81% x̄: 3.90% x̃: 3.57%
95% mean confidence interval for instructions value: -2.49 -2.31
95% mean confidence interval for instructions %-change: -2.76% -2.51%
Instructions are helped.
total bytes in shared programs: 10333670 -> 10293172 (-0.39%)
bytes in affected programs: 2996682 -> 2956184 (-1.35%)
helped: 2660
HURT: 175
helped stats (abs) min: 2.0 max: 204.0 x̄: 15.70 x̃: 12
helped stats (rel) min: 0.08% max: 23.08% x̄: 2.64% x̃: 1.83%
HURT stats (abs) min: 2.0 max: 60.0 x̄: 7.26 x̃: 6
HURT stats (rel) min: 0.12% max: 22.39% x̄: 3.19% x̃: 2.78%
95% mean confidence interval for bytes value: -14.81 -13.76
95% mean confidence interval for bytes %-change: -2.39% -2.18%
Bytes are helped.
total halfregs in shared programs: 417284 -> 427363 (2.42%)
halfregs in affected programs: 49814 -> 59893 (20.23%)
helped: 95
HURT: 3018
helped stats (abs) min: 1.0 max: 8.0 x̄: 2.29 x̃: 2
helped stats (rel) min: 2.44% max: 28.57% x̄: 9.20% x̃: 6.06%
HURT stats (abs) min: 1.0 max: 14.0 x̄: 3.41 x̃: 4
HURT stats (rel) min: 2.08% max: 150.00% x̄: 36.54% x̃: 27.27%
95% mean confidence interval for halfregs value: 3.17 3.31
95% mean confidence interval for halfregs %-change: 34.05% 36.23%
Halfregs are HURT.
total threads in shared programs: 16465280 -> 16465280 (0.00%)
threads in affected programs: 0 -> 0
helped: 0
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
Add information about the relationship between program register usage and
program occupancy (the maximum number of threads that may execute concurrently
on a single shader core). This table is derived from studying the
maxTotalThreadsPerThreadgroup property in Metal while varying the register
usage, something I blogged about a few years back. It's probably not 100%
accurate and it hasn't been tested against hardware, but it matters "only" for
performance (not correctness) so I'm not super stressed about the details.
In the (near) future, RA will be able to make use of this information to know
exactly when it can use more registers without hurting performance. In the
present, it's just used for better shader-db statistics.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
Compiling this can cause jank. This is still an issue in Quake3. There is a way
to solve it but it's rather involved and certainly not this weekend's project.
Better perf debugging on the other hand apparently is ^_^
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
The current implementation leaves a lot of perf on the table, so call it out on
ASAHI_MESA_DEBUG=perf to help debugging perf problems, especially if this
ever happens in a real application (i.e. not a benchmark).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
To simplify live range splitting, RA will soon assume that DCE has run (removing
extraneous vectors). So run DCE even when otherwise disabling backend
optimizations. AGX_MESA_DEBUG=noopt is still useful for disabling instruction
combining, which is the more-likely-to-be-buggy pass anyway.
This also fixes IR not being printed with noopt.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
bitstream buffer is unmapped, resized and mapped again if new size
is greater than the current bitstream buffer size. This will be done
for each input buffer. This patch will avoid that and do resize
only once irrespective of number of input buffers. With the new logic,
total size is calculated first and call unmap, resize and map only once.
Signed-off-by: Sajeesh Sidharthan <sajeesh.sidharthan@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22308>
NIR's docs for sampler_index say
The following operations do not require a sampler and, as such, this
field should be ignored:
- nir_texop_txf
- nir_texop_txf_ms
- nir_texop_txs
- nir_texop_query_levels
- nir_texop_texture_samples
- nir_texop_samples_identical
Contrary to this documentation, we were still printing the sampler_index anyway,
even though the value is formally undefined. This was helpful for
PIPE_CAP_TEXTURE_BUFFER_SAMPLER drivers that (despite the NIR docs) respected the
sampler_index anyway. There are no longer any such drivers, so we should stop
printing sampler_index for txf to avoid confusion (and noise).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
Now that PIPE_CAP_TEXTURE_BUFFER_SAMPLER is gone, txf does not require samplers
for any texture on any Gallium driver. NIR already requires drivers to ignore
sampler_index for non-sampler operation (mainly txf), and nowadays all Gallium
drivers ingest NIR. So, document that samplers aren't bound for txf (etc) as
part of the Gallium frontend-driver contract.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
Now that we upload workaround samplers for txf, sampler 0 is guaranteed to be
valid but other samplers are not. So ignore whatever the current sampler_index
value is (it's formally undefined in NIR) and use 0, which we know is valid. We
already do this on Valhall for OpenCL, just need to generalize for Midgard and
Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
The hardware requires a valid sampler even for texelFetch (txf), even though its
contents are ignored. We'd rather not pass on this requirement to the frontends,
so we should handle it by uploading our own workaround sampler in the case when
no sampler is already present. We already do this on Valhall (for rusticl), so
we just need to port the same workaround back to Midgard/Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
The one test that remains would have an automatically generated name
that would conflict with another test. This test is also a little
special (per the comment in the test), so it's probably best to leave it
separate anyway.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
If loop iterator is incremented with something other than regular
addition, it would be more error prone to calculate the number of
iterations theoretically. What we can do instead, is try to emulate the
loop, and determine the number of iterations empirically.
These operations are covered:
- imul
- fmul
- ishl
- ishr
- ushr
Also add unit tests for loop unrollment.
Improves performance of Aztec Ruins (sixonix
gfxbench5.aztec_ruins_vk_high) by -1.28042% +/- 0.498555% (N=5) on Intel
Arc A770.
v2 (idr): Rebase on 3 years. :( Use nir_phi_instr_add_src in the test
cases.
v3 (idr): Use try_eval_const_alu in to evaluate loop termination
condition in get_iteration_empirical. Also restructure the loop
slightly. This fixed off by one iteration errors in "inverted" loop
tests (e.g., nir_loop_analyze_test.ushr_ieq_known_count_invert_31).
v4 (idr): Use try_eval_const_alu in to evaluate induction variable
update in get_iteration_empirical. This fixes non-commutative update
operations (e.g., shifts) when the induction varible is not the first
source. This fixes the unit test
nir_loop_analyze_test.ishl_rev_ieq_infinite_loop_unknown_count.
v5 (idr): Fix _type parameter for fadd and fadd_rev loop unroll
tests. Hopefully that fixes the failure on s390x. Temporarily disable
fmul. This works-around the revealed problem in
glsl-fs-loop-unroll-mul-fp64, and there were no shader-db or fossil-db
changes.
v6 (idr): Plumb max_unroll_iterations into get_iteration_empirical. I
was going to do this, but I forgot. Suggested by Tim.
v7 (idr): Disable fadd tests on s390x. They fail because S390 is weird.
Almost all of the shaders affected (OpenGL or Vulkan) are from gfxbench
or geekbench. A couple shaders in Deus Ex (OpenGL), Dirt Rally (OpenGL),
Octopath Traveler (Vulkan), and Rise of the Tomb Raider (Vulkan) are
helped.
The lost / gained shaders in OpenGL are an Aztec Ruins shader that goes
from SIMD16 to SIMD8. The spills / fills affected are in a single Aztec
Ruins (Vulkan) compute shader.
shader-db results:
Skylake, Ice Lake, and Tiger Lake had similar results. (Tiger Lake shown)
total loops in shared programs: 5514 -> 5470 (-0.80%)
loops in affected programs: 62 -> 18 (-70.97%)
helped: 37 / HURT: 0
LOST: 2
GAINED: 2
Haswell and Broadwell had similar results. (Broadwell shown)
total loops in shared programs: 5346 -> 5298 (-0.90%)
loops in affected programs: 66 -> 18 (-72.73%)
helped: 39 / HURT: 0
fossil-db results:
Skylake, Ice Lake, and Tiger Lake had similar results. (Tiger Lake shown)
Instructions in all programs: 157374679 -> 157397421 (+0.0%)
Instructions hurt: 28
SENDs in all programs: 7463800 -> 7467639 (+0.1%)
SENDs hurt: 28
Loops in all programs: 38980 -> 38950 (-0.1%)
Loops helped: 28
Cycles in all programs: 7559486451 -> 7557455384 (-0.0%)
Cycles helped: 28
Spills in all programs: 11405 -> 11403 (-0.0%)
Spills helped: 1
Fills in all programs: 19578 -> 19588 (+0.1%)
Fills hurt: 1
Lost: 1
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
These operations are covered:
- imul
- fmul
- ishl
- ishr
- ushr
The only cases that can be currently affected are those where the
calculated loop-trip count would be zero.
v2 (idr): Split out from original commit. Rebase on lots of other work.
v3 (idr): Move operand size assertion. This code only cares that the
operands have the same size for the iadd and fadd cases. In other
cases, such as shifts, the sizes may not match. Fixes assertion
failures in
tests/spec/arb_gpu_shader_int64/glsl-fs-loop-unroll-ishl-int64.shader_test.
No shader-db or fossil-db changes on any Intel platform.
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
This dramatically simplifies will_break_on_first_iteration, and, much
more importantly, makes it significantly more flexible. It is now
possible to handle loops with more complex exit condition and other
kinds of increment operations.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
This ensures that scenarios like
nir_loop_analyze_test.iadd_inot_ilt_rev_known_count_5 don't regress in
the next commit. It also means we don't change float comparisons. These
are probably fine... but it still made me a little uneasy.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
...with a substitution. This function is largely a copy-and-paste of
try_fold_alu (nir_opt_constant_folding.c), and an argument could be made
that this function belongs in that file.
v2: Some changes were mistakenly squashed in to "nir/loop_analyze: Use
try_eval_const_alu and induction variable basis info" that should have
been here.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
Loop analysis doesn't currently treat values updated by shifts as
induction variables. Future patches will change this.
v2: Don't use the contradiction ilt(x, INT_MIN).
v3: Delete some errant code in UNKNOWN_COUNT_TEST. Noticed by Tim.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
initial bitstream size was set to width * height * 2 which is
larger than yuv size. set initial bitstream size to encoded
bitstream size approximately to optimize memory consumption.
This is just an initial size setting, it will get resized later
if it's not big enough. As a result of this change, we don't need to
allocate super big size at the every beginning. Only allocate
big size when needed in order to save some memory
Signed-off-by: Sajeesh Sidharthan <sajeesh.sidharthan@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Acked-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21918>
If the app passes us unaligned buffer offsets, we need to align them
down to the nearest aligned offset, and then put the difference into
the descriptor set buffer.
Fixes: 8bd5fbf8 ("dzn: Bind buffers for bindless descriptor sets")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Cache coherent UMA implies that the GPU is reading data through the
CPU caches. Using write-combined CPU pages for such a system would
be bad, since the GPU would then be reading uncached data. One
example of such a system is WARP. This significantly improves WARP's
performance for some apps (including the CTS).
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Unlike DXBC, DXIL's shift instructions don't have the implicit behavior
that they only take the 5 bits. This is observable if you try to have
DXC do a shift of a dynamic value, e.g. a constant buffer value, where
the compiler inserts the appropriate 'and' op. We need to do the same.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Shifts should always use 32bit shift values, and when lowering to
masked, we need to use 32-bit atomics. That means that we should also
treat 24bit stores as a single masked op rather than one 16bit unmasked
and one 8bit masked.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22225>
Unlike the legacy CreateContext path, we would try to send the
GLXCreateContextAttribs request regardless of whether we'd successfully
created the client context state. And there's not a lot on the server
side to go wrong besides BadAlloc, so if the request succeeded but
the client side didn't we'd need to destroy the server context and
synthesize an X error. Since that itself involves more X protocol it's
tricky to get the request number right in the error, and tests and apps
can notice when you get it wrong.
Since we have now fixed client-side validation to generate the right
errors at the right times, this patch does something simpler, we match
CreateContext and fail early if the client-side setup fails. Now there's
no question of what request number to use, because we haven't sent any
protocol, the error is for the request as if it'd been sent.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4763
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12006>
There's two kinds of "bad version" you might encounter here, either the
combination does not name a defined version (like 1.7) or it names
something the driver can't do (like asking r300 to do 4.0). EGL does not
distinguish these cases, but GLX calls them BadMatch and GLXBadFBConfig
respectively.
Since api_mask is the set of driver supported APIs, and we can only
support defined APIs, don't check it early in driCreateContextAttribs,
just let it fall out from validate_context_version.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12006>
This has no functional change because everyone calling this is
discarding the error code, because we're relying on the server to
generate the right thing for us. But we create the direct context first
and the server isn't going to enforce everything we want it to
(supported GL versions for example). Convert out from DRI error codes to
X/GLX error codes so we can fail the right way on the client side. We're
still throwing the error away in all of the callers but that'll change
shortly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12006>
This pass rarely makes any changes, so work a little harder to preserve
more meta data.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.2% ± 0.1% (n = 5, pooled s = 0.431885).
v2: Add some parenthesis. Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Two linked list management changes:
- Use the list head sentinel as the initial cursor. It is, after all, a
proper node in the list.
- Iterate the list of blocks starting with the second block instead of
skipping the first block in the loop.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-0.24% ± 0.09% (n = 5, pooled s = 0.324106).
v2: Use nir_cursor instead of direct list manipultion. Suggested by
Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-1.09% ± 0.084% (n = 5, pooled s = 0.354471)
Reduces the size of a release build by 26k.
text data bss dec hex filename
23163641 400720 231360 23795721 16b1809 before/lib64/dri/iris_dri.so
23137264 400720 231360 23769344 16ab100 after/lib64/dri/iris_dri.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Since one of the register must always be either VGRF or FIXED_GRF, much
of regions_overlap and reg_offset can be elided.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.29% ± 0.097% (n = 5, pooled s = 0.361697).
Using a release build, improves performance of compiling shaders from
batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s =
0.178312).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
This function only exists in builds with assertions, so it only matters
there.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-5.2% ± 0.16% (n = 5, pooled s = 0.657887).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Other than nouveau, every single Gallium driver relies on
u_pipe_screen_get_param_defaults to get the default values of CAPs.
For the driver, this is much more concise. Unsupported new features -- or
supported features that virtually all Gallium drivers support -- do not need to
written out explicitly. Their absence (or presence) is implied as the default.
If there's any doubt over whether the CAP is exposed, it's easy to check in
u_pipe_screen_get_param_defaults.
For the Gallium tree in general, this brings a number of benefits:
* Unused CAPs are easy to delete, because there is only a single place
(u_pipe_screen_get_param_defaults) where they are referenced and need to be
deleted from.
* New CAPs are easy to introduce, for the same reason.
* It's straightforward to audit which drivers support (or don't support) a given
CAP by grepping for the name (for example, when determining whether a CAP is
unused and can be garbage collected, or a CAP is so widely supported that it
can be made default.). You still need to check the source code in case it's
conditionally exposed (common for layered drivers) but the search space is
limited to drivers that reference the CAP by name.
Unfortunately, all of these benefits rely on all Gallium drivers cooperating.
The status quo is much less nice:
* Unused CAPs need to be deleted both from common code, and also specially from
nouveau. Why is nouveau special?
* New CAPs need to be added both to common code, and also specially to nouveau.
Again, why is nouveau special?
* When grepping for CAPs, nouveau (only) needs to be ignored, since it's
spurious. Unless sometimes it's not, in which case you need to open nouveau
source code anyway to check.
Compounding on the fun, you have to do the special nouveau step twice, once for
nvc0 and once for nv50.
Why might it be benefical to list CAPs explicitly instead of relying on the
defaults?
* Maybe easier auditing nouveau driver for CAP correctness? In practice this has
not been an issue for any of the drivers I've worked on, especially because
the defaults are quite reasonable.
* Maybe forcing people adding CAPs to think about nouveau specially? This isn't
fair to the tree in general, why should nouveau get this special
treatment? Instead, CAPs are generally added to gate functionality that may
not be supported on all drivers, and the default is disabling the new
functionality until a developer for a given driver can wire it up. There's
already no expectation that the person adding CAPs needs to also add the
functionality to nouveau (if that's even possible) -- unless the CAP is being
added for the particular nouveau's benefit of course -- so this isn't helpful.
* Maybe forcing people removing CAPs to think about nouveau specially? Similar
issues apply here, and it's not clear how this would even work.
* Maybe keeping novueau developers aware of CAP churn? Again nouveau should not
be special here and it isn't sustainable to do this for every driver. So, if
this is something that nouveau developers want to do -- and they choose not to
follow Gallium-tagged merge requests -- then the git log of
src/gallium/include/pipe/p_defines.h or indeed
src/gallium/auxiliary/util/u_screen.c may be consulted.
So, without an excellent reason why nouveau should be treated specially, and
with several reasons why it should not, let's bring nouveau in line with the
rest of Gallium and rely on the defaults.
I've left in CAPs with attached comments even when they are returning the
default value to preserve information from before the commit. Otherwise, this
commit aims to remove explicit cases that match the default value, as other
drivers generally aim to do.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22245>
The glsl compiler has been reworked to avoid passing gl_context around
so that we can avoid expensive recompiles across the code base for
minor changes. This helper will help us avoid passing gl_context around
where its otherwise unrequired.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22305>
Rename radv_layout_fmask_compressed and make it return an enum. We will
add partial compression (fmask decompressed and not expanded) in a
following commit.
Drop the check for VK_IMAGE_USAGE_STORAGE_BIT and
VK_IMAGE_USAGE_TRANSFER_DST_BIT. When transitioning to
VK_IMAGE_LAYOUT_GENERAL, we should decompress and expand FMASK even when
those usage bits are not set.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21615>
It's been popular for flakes due to oomkilling or kernel kmalloc failure
recently. Is it ultimately the source of running out of memory? Who
knows, but hopefully it's at least a big part of the problem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
The GLSL frontend was already lowering 32-bit frexp, so only 64-bit frexp
is possible as an op in the incoming NIR. However, svga and nouveau don't
set PIPE_SHADER_CAP_DFRACEXP_DLDEXP_SUPPORTED, leaving just r600's
non-default TGSI mode potentially using it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
This was needed for nir_lower_frexp, but it's a win anyway. shader-db
results:
total gpr in shared programs: 1143621 -> 1143502 (-0.01%)
gpr in affected programs: 33918 -> 33799 (-0.35%)
total instructions in shared programs: 7829415 -> 7820124 (-0.12%)
instructions in affected programs: 1204967 -> 1195676 (-0.77%)
total bytes in shared programs: 71802760 -> 71717352 (-0.12%)
bytes in affected programs: 11031888 -> 10946480 (-0.77%)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
We previously had GLSL do ldexp lowering to bitops, but NIR can do it
instead. It's tempting to just pass the NIR op through to the host Vulkan
driver, but to do that we'd need to split up NIR's flag between 32 and
64-bit support, and that's not worth anyone's time for an op we've never
seen used.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
If we move the blob-cache path into the async queue, then
path_init_failed is no longer a good way to check if puts
should be a no-op. But fortunately checking if the queue
is initialized is, and is a more obvious check because
what it is guarding is a util_queue_add_job().
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22248>
MTL support fp64, but not int64. The fsign(double(x))*FOO optimization
would try to use a 64-bit int xor operation to conditionally toggle
the sign bit off the result.
Since this only affects high bit of the result, we can do a 32-bit
move of the low dword, and a 32-bit xor on the high dword.
Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp64.input_args.modf_denorm_flush_to_zero
on MTL.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22259>
-1 is an invalid buffer index which breaks app expectations, specifically
apitrace, which checks for return value of 0 from checking buffer bindings
to determine whether to inject user vertex buffer bindings and create functional
traces
this should fix capturing traces with drivers using glthread
fixes#8383
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22293>
These were running on armhf because that's the default in the custom
distro that Raspberry Pi provides, but arm64 is ~20% faster, and we
already run weekly tests on both arm64 & armhf, so let's keep only the
faster one in the pre-merge path.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22272>
When a graphics pipeline library is created with only the vertex input
state, the driver binds this state at pipeline bind time. Though the
vertex binding stride is not necessarily dynamic, in this case the
pipeline stride should be used.
This fixes GPU hangs with recent
dEQP-VK.pipeline.fast_linked_library.vertex_input.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22285>
In V3D we were doing this incorrectly by peeking into the sampler state
unconditionally, which is not correct if the TMU operations don't use
sampler state at all (like PBOs). This was causing us to fail the second
test in this sequence when both tests run back back to back in the same
process:
dEQP-GLES3.functional.texture.shadow.2d.linear.greater_or_equal_depth_component32f
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rg32f_cube
Here, the first test would setup sampler state for shadow comparisons and
the second test would setup a PBO upload, which would incorrectly pick
up the sampler state to decide about the TMU output size for the PBO
operation.
In V3DV we were doing this right looking through each texture/sampler
instruction and checking if they all involved shadow comparisons or had
relaxed precission, defaulting to 32-bit otherwise.
This special-casing for shadow comparisons also leaks from drivers
into the compiler where we are forced to emit some pieces of sampler
state for 32-bit outputs, so we had to special-case shadow instructions
there as well and we also had a fix for CS textures not having correct
sampler state representing shadow operations too. Finally,
we also had at least a couple of bugs where forcing 32-bit TMU output
through V3D_DEBUG wasn't correctly forcing shadow comparisons to actually
be 32-bit in all the right places, leading to visual bugs with the
option enabled (Sponza being one example of this). This change eliminates
all of these issues.
Finally, the performance improvement observed from special casing shadow
comparison is negligible, and in specific scenarios it can even be
detrimental to performance due to increased register pressure (Sponza with
PCF filtering set to 4 is an example of this again).
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8684
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22284>
the spec for vkGetDrmDisplayEXT says:
"If there is no VkDisplayKHR corresponding to the connectorId on the
physicalDevice, the returning display must be set to VK_NULL_HANDLE.
The provided drmFd must correspond to the one owned by the physicalDevice.
If not, the error code VK_ERROR_UNKNOWN must be returned. (...)
The given connectorId must be a resource owned by the provided drmFd.
If not, the error code VK_ERROR_UNKNOWN must be returned"
We were only setting the display pointer to VK_NULL_HANDLE if the provided
drmFd was valid, however, there are CTS tests checking that it is also set
to NULL when it is not.
Fixes the following test on all drivers exposing EXT_acquire_drm_display
(tested with Intel and V3DV):
dEQP-VK.wsi.acquire_drm_display.acquire_drm_display_invalid_fd
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22283>
Usually, we postpone acquisition until a swapchain is created, but there are
some cases with display extensions (at least with EXT_acquire_drm_display)
where we need to acquire before a swapchain is ever created.
Fixes various tests in:
dEQP-VK.wsi.acquire_drm_display.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22283>
We were not initializing the PS invocation count to zero before
computing the sum of the per-thread results.
This fixes an issue where querying the result of the query more
than once would cause the result to grow larger each time.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22281>
there are two cases to be handled here:
* normal
* software
the latter case requires env vars based on the frontend, and if a sw
device isn't found then init should fail
the former case should (in theory) just yolo the first device and assume
that's what the user wanted based on whatever env vars and layers are
in use
fixes#7508, #7132
maybe also affects #8152
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22184>
Note: The meaning of clockwise vs counter-clockwise changes after the
yz flip, therefore the determination of winding needs to be done before
the yz flip logic. Therefore the yz flip is moved to the GS and applied
as a lowering on top of the base GS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22277>
For instance, with "piglit/bin/arb_shader_image_load_store-host-mem-barrier --quick -auto -fbo":
==18549==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61200000a059 at pc 0x7f65d8937b80 bp 0x7fff6ed19a00 sp 0x7fff6ed199f8
READ of size 1 at 0x61200000a059 thread T0
#0 0x7f65d8937b7f in evergreen_set_shader_images ../src/gallium/drivers/r600/evergreen_state.c:4277
#1 0x7f65d6b471b8 in st_bind_images ../src/mesa/state_tracker/st_atom_image.c:172
#2 0x7f65d6b76b26 in st_validate_state ../src/mesa/state_tracker/st_util.h:129
#3 0x7f65d6b76b26 in prepare_draw ../src/mesa/state_tracker/st_draw.c:88
#4 0x7f65d6b77c8a in st_draw_gallium ../src/mesa/state_tracker/st_draw.c:141
#5 0x7f65d72698a2 in _mesa_draw_arrays ../src/mesa/main/draw.c:1202
Fixes: a6b3792843 ("r600: add core pieces of image support.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22273>
When using multiple binaries, we don't know the required number of VGPRs beforehand,
which means we either have to over-allocate VGPRs or avoid shared VGPRs.
As bpermute is the only instructions needing shared VGPRs, we decide for the latter.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22267>
This was the last missing feature for GPL. The main problem is that
the on-disk shaders cache size will increase a lot because we don't
deduplicate shaders but there is on-going work to improve that.
We also can't use the shaders cache for libraries created with the
RETAIN_LINK_TIME_OPTIMIZATION flag and module identifiers because we
don't know the SPIR-V and thus can't retain NIR shaders for linking.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
This is to generate a different key for a library created with
FRAGMENT_SHADER_BIT and no FS (ie. it would generate a noop FS) and
a library created with FRAGMENT_OUTPUT_INTERFACE with no CB attachments.
Otherwise, the same key would be generated and this would corrupt
the cache.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
With GPL, a stage can be imported from a library which means that the
binary is NULL (it's freed right after compilation) but the shader is
non-NULL. To avoid crashing, rely on non-NULL binaries because this
implies that the shader is non-NULL as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
compute is a special case because the zink_shader itself is created
in a thread, which means it cannot be accessed directly at bind time
since it may not have finished creating itself yet
to avoid prematurely waiting on an async fence, the one value needed
at bind time can instead be stored separately
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22266>
nir_shader objects are hefty, and they really add up when there's a lot
of them. there's also not much use in keeping them around, as any time
they'll be used, they're always cloned first, and deserializing isn't
likely to be any slower than a clone
cuts driver memory usage by ~40% for tomb raider
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22266>
Even if the VkPipelineRasterizationStateCreateInfo sets
depthBiasEnable, internally we comput if it is really makes sense, and
use that to decide for example if we emit the Depth Offset packet.
But we were not using this to enable Depth Bias through the depth
offset enable field on the CFG packet.
So in some tests we were enabling depth bias, but not emitting the
packet to configure it, that seemed somewhat inconsistent.
This didn't cause any issue so far, but let's be conservative.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22252>
This records any rendering pipeline flags in the render pass. This
provides much-needed information for the VK_KHR_fragment_shading_rate
and VK_EXT_fragment_density_map extensions as well as provides an
alternative to VkRenderingSelfDependencyInfoMESA which is based on
VK_EXT_attachment_feedback_loop_layout.
v2 (Connor): Name something more general
v3 (Faith): Also add the FSR flag
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22191>
Verifying that the branch instruction reads exec is not actually
necessary because the pattern that we look for already implies that.
This prepares for the next commit which will remove the exec operand
from branches that have the same target. These branches will no
longer read exec, but they should still get the same optimization.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21493>
While analyzing cache loading performance, hashing the pipeline layout was
surprisingly consuming around 4% of time, sometimes close to the cost of
hashing shader modules.
Turns out we were hashing the pipeline layout on every pipeline creation.
Considering that pipeline layouts are usually deduplicated by the
application, this was amplifying the hashing cost by a big margin.
With Graphics Pipeline Library, we do need to rebuild the pipeline layout
by combining those from each library, but we can memoize the hash of the
descriptor set layout. The cost of re-hashing hashes is negligible since
each descriptor set layout can amount to 1–2KB in size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22254>
Android CTS 13_r4 tests dEQP-VK.memory.allocation.random* fail
with VK_ERROR_OUT_OF_DEVICE_MEMORY on ADL boards with 32GB memory
as memory allocation requests from DEQP are much larger(~2.9GB+)
based on device heap size/8.
Increase the limit to unsigned 32bit max(~4GB) which helps to
fix the dEQP-VK.memory.allocation.random* tests.
v1: Bound allocation by the largest memory heap size (Lionel Landwerlin)
v2: Clean up comments to reflect the code change (Ivan Briano)
Update the value of MAX_MEMORY_ALLOCATION_SIZE (Lionel Landwerlin)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22066>
If a resource is dirty but already tracked by the current batch, no need
to process it at draw time.
Note that the batch could change (ie. new fb state bound, etc) after the
check if we need resource dirty tracking, but in these cases all the
dirty-resource state is marked dirty.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
This wasn't taking into account a change in corresponding bit in
writeable_bitmask, causing problem if an SSBO was first bound for
read, and then rebound for write, we wouldn't update the buffers
valid range. Instead just drop the premature optimization.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
Previously when there was an & or | with two BitmaskEnums, the compiler
would try to cast the RHS and find a matching overload, but there were
many different casts (to the enum itself, to an integer, to a boolean,
etc.) each with a matching overload which meant that it couldn't pick
one and errored out due to an ambiguous overload. Fix this by
explicitly providing an overload that takes a BitmaskEnum on the RHS.
It has to also provide a BitmaskEnum output, so that subsequent
operators with the result on the LHS (e.g. when or'ing together three
BitmaskEnums without any parentheses tricks) also get the right
overload.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
Inspired by Nicolai Hähnle's commit in LLPC.
Instead of using a SALU instruction to add to the scalar
offset, rely on the buffer swizzling and use constant offset.
Fossil DB stats on GFX1100:
Totals from 47910 (35.51% of 134913) affected shaders:
CodeSize: 87927612 -> 86968136 (-1.09%)
Instrs: 17584007 -> 17440094 (-0.82%)
Latency: 97232173 -> 97126311 (-0.11%)
InvThroughput: 9904586 -> 9905288 (+0.01%); split: -0.02%, +0.02%
VClause: 544430 -> 542566 (-0.34%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22227>
Bos that will be scanout in display need to be allocated with
flags = XE_GEM_CREATE_FLAG_SCANOUT in Xe and that implies to different
caching rules for this buffer.
So here not allowing to get scanout buffer from cache or allow it
to be placed in a cache bucket for reuse.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22060>
This lets us provide a vk_device_memory_range helper similar to what's
provided for buffers for dealing with VK_WHOLE_SIZE. We can also handle
flags and some annoyance around Android hardware buffer import.
Reviewed-by: Lina Versace <lina@kiwitree.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22038>
Modify the code path taken in `u_pipe_screen_get_param_defaults`
to call DRM to check if `PIPE_CAP_DMABUF` is supported. This is
required for overriding the behavior in `dri2_init_screen_extensions`
to support importing DMA bufs on drivers that don't support DRM, by
simply changing how `PIPE_CAP_DMABUF` is handled in their driver.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21654>
Add enum for pv emulation primitives and `lower_pv_mode`
to `zink_gs_key`
The enum contains the possible values of the lower_pv_mode key
This key will be non 0 whenever provoking vertex mode needs to be
emulated and it's exact value encodes relevant information about the
primitive that needs to be emulated
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22162>
We dropped support for hardware macOS drivers in afe134a49c ("asahi: Drop macOS
backend"), so drop the corresponding documentation. Layered and software drivers
are still supported on macOS for better or worse, so the main "Notes on macOS"
page can stay I think.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22213>
Shader model 6.2 was the upper bounds of what *could* be generated
before, but not all devices support it. And other devices support
even more. So, let's pass in the shader model / validator that will
be used by the API caller.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21178>
Currently, we always use a hierarchy mask with all levels enabled. While this is
efficient for geometry-heavy workloads like 3D games, it is wasteful for 2D
applications that draw very few vertices. For drawing just a few textured quads,
the overhead of small bin sizes outweighs any performance advantages, so it's a
bit slower. More problematically, small bin sizes require tremendous amounts of
memory for the polygon lists, leading to significant memory consumption (~10MB)
for the polygon list for even the simplest of 2D blits.
To reduce our memory footprint, we need to choose our hierarchy masks more
carefully. In general, we want to allow small bin sizes for geometry-heavy
workloads but not for geometry-light workloads. We estimate vertex count in the
driver as a proxy for this, and use a simple heuristic to select a bin size
based on the estimated vertex count. None of this is an exact science, and the
heuristic could probably be tuned. Nevertheless, the heuristic used (comparing
framebuffer size to vertex count) works well in practice, significantly reducing
the memory footprint of 2D applications like Firefox without hurting the
performance of 3D applications.
I originally wrote this patch while diagnosing high memory footprints on my
Midgard laptop, which is why only Midgard is in scope here. On Bifrost and
Valhall, we have a similar hiearchy mask selection problem. It seems likely that
the same heuristic would work there too, but it's a different code path that I
have not integrated or tested. I'll leave that for the adventurous reader, to
get the memory footprint win there too.
(It's also possible the win is smaller on newer Malis than on Midgard, since Arm
claims they optimized the tiler data structures on the newer parts. There's
probably still some merit to the idea.)
On Mali-T860, glmark2 -bdesktop frametime decreased by 1.35% +/- 0.91% at 95%
confidence, showing a slight win for 2D workloads No statistically significant
difference for glmark2 -bshading:shading=phong, since 3D workloads continue to
use the same hierarchy masks.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
In the next commit, we will refine our algorithm to select hierarchy masks based
on the vertex count. In preparation, augment the driver to track rough estimates
of the vertex count so we have a "geometry complexity" input for the heuristic.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
On Valhall, we were calling emit_const_buf in two places:
1. The main "handle dirty flags" code shared with Bifrost
2. A Valhall-specific shader environment emitter
The latter was not dirty tracked, and the former was not used. That meant we
were calling emit_const_buf way too much. It's not a cheap routine, either.
Instead, use the results from the dirty tracked function in the shader
environment emitter, to avoid the redundant call and get the expected dirty
tracking.
In a Dolphin trace I'm looking at, fps increases 27->33.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
I tried to be too clever and ended up wasting cpu cycles. it's
much, much, much, much faster to just generate this one struct array
every time than it is to do set lookups with thousands of members
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
vstate draws bind their own vertex buffers unrelated to the bound
gallium buffers, so any draw occurring after a vstate draw must
rebind vertex buffers to ensure the correct ones are bound
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
vulkan resolves only provide "extents" instead of src and dst regions like
GL, which means vk resolves can't be used to downscale images, as such
operations will instead just crop the image
fixes#8655
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22195>
We've had drm/sched support on the kernel side for more than a year and
a half. This makes submit ioctl async by handling fence waits from the
sched's kthread, which is what threaded submit was originally working
around. For now, threaded submit is only used for virtgpu, which does
not (yet?) have drm/sched support.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
We've had gpu-sched support in the kernel for a while now, so our fence
waits are not synchronous in the ioctl path. The only reason this path
still exists is that virtgpu does not have gpu-sched. So lets disable
it on msm.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
Buffers are added to the deferred freelist at the tail. And frequently
the last reference is dropped immediately after the submit. So almost
always, once we see a still-busy BO, the remaining in the list will also
still be busy.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
this ensures that the buffer is marked active and prevents promotion
in cases where reordering would break rendering
unordered_read prohibits write reordering for buffers, so setting
this flag must be done when the buffer is actually used, ideally as
late as possible
setting it at the time of (re)bind catches all the buffer rebind cases
which might otherwise erroneously permit reordering
fixes#8381
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22205>
Requires some hand rolled assembly:
- DC CVAC / DC CIVAC for aarch64
- DCCMVAC / DCCIMVAC for arm32, unfortunately it seems that it is
illegal to call them from userspace.
- clflush for x86-64
We handle x86-64 case because Turnip may run in x86-64 guest
e.g. in FEX-Emu or Box64.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20550>
vkd3d requires cached memory type.
MSM backend doesn't have a special ioctl for memory
flushing/invalidation, we'd have to use cvac and civac
arm assembly instructions (would be done in following commit).
KGSL has an the ioctl for this, which is used in this commit.
Note, CTS tests doesn't seem good at testing flushing and
invalidating, the ones I found passed on KGSL with both
functions being no-op.
Based on the old patch from Jonathan Marek.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7636
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20550>
Indeed, the parameter "mem_ctx" was not processed.
For instance, this issue is triggered with the crocus driver and
"piglit/bin/shader_runner tests/spec/arb_tessellation_shader/execution/compatibility/tes-clip-vertex-different-from-position.shader_test -auto -fbo":
SUMMARY: AddressSanitizer: 235216 byte(s) leaked in 48 allocation(s).
Fixes: 96ba0344db ("intel: Use common helpers for TCS passthrough shaders")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22173>
Cloning NIR shaders consumes too much RAM and this can easily explode
in memory for games that create a ton of graphics libraries. Using
serialized NIR shaders help considerably.
This reduces RAM usage in dota2 with GPL from 3GiB to 400MiB.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22143>
The previous commit has freed up a couple of runners, so let's repurpose
them to make vk test jobs take less time; with that spare time, let's
increase the coverage a little bit.
Most jobs now take 10-12 minutes, just like they used to.
Stress-tested over 40+ runs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21902>
Without reducing the coverage and using 4 runners instead of 9, most
runs take 10-13 minutes instead of 12-13 minutes for the egl job, 9-11
minutes for the piglit job, and 6-8 minutes for the deqp job.
Stress-tested over 40+ runs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21902>
These allocate way more memory than is reasonable, a bunch of times. I'd
guess they pushed the machine pretty deep into memory pressure which is
why it was all taking like 3 minutes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22193>
This is way more horrifying than I hoped -- I can't figure out a way to
have the method be on TraceContext, so it's a static method of the
datasource, but then you have to name the templated types over and over.
You have to pass in a TraceContext because intel emits the clock sync
packet within a Trace(), and perfetto just silently corrupts the trace if
you Trace() in a Trace().
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22157>
Deduplicates some code from intel/tu/freedreno, and will be a common place
to put other shared code.
The downside I can see is this logging:
[013.129] tu_perfetto.cc:122 Tracing started
[013.129] intel_driver_ds.cc:133 Tracing started
("oh, huh, apparently data sources for both drivers are registered? wild")
becomes:
[142.906] erfetto_renderpass.h:50 Tracing started
[142.907] erfetto_renderpass.h:50 Tracing started
("huh, why is my driver's data source being started twice?").
Unfortunately we can't easily get a string for the data source type due to
not having rtti.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22157>
I was frustrated trying to write code and not be able to just mash ^K^F to
format what I'd written. This .clang-format is just cargo-cult of turnip
with a few tweaks to reduce the diff to the current directory contents.
The remaining deltas in the reformat look decent to me, and mostly bring
things closer to mesa-vague-consensus style.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22157>
ftruncate() allocates disk space lazily. If the disk is full and it is
unable to allocate disk space when accesed via mmap(), it will crash
with a SIGBUS.
Switch to posix_fallocate(), which ensures disk space is allocated
otherwise it fails if there isn't enough disk space. The disk cache
won't be enabled in this case.
For normal cases, a small increase in disk usage as the 1.3MB index
file will be fully allocated when initialized now.
fallback to ftruncate() if posix_fallocate() isn't found.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22097>
To be helpful, the thing inside the fsat has to be used with and without
the fsat. Otherwise it just moves a saturate destination modifier
around. To not be harmful, the fsat has to only be used by the bcsel.
All Broadwell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20174475 -> 20174449 (<.01%)
instructions in affected programs: 3913 -> 3887 (-0.66%)
helped: 13 / HURT: 0
total cycles in shared programs: 866844832 -> 866844719 (<.01%)
cycles in affected programs: 46037 -> 45924 (-0.25%)
helped: 10 / HURT: 1
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 161491468 -> 161491372 (-0.0%)
helped: 31 / HURT: 8
Cycles in all programs: 10933090736 -> 10933024716 (-0.0%)
helped: 32 / HURT: 18
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>
There are already NIR algebraic optimizations (see also ac6646129f
("nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.") that
will try to remove the saturate from things like
fmax(0.5, fsat(x))
This basically reverts 40aeb558ce ("i965/fs: Allow propagation of
instructions with saturate flag to sel"). That commit message had no
shader-db information, so it's unclear whether this actually helped
anything ever.
No shader-db changes on any Intel platform.
One shader in Far Cry New Dawn was affected.
Cycles in all programs: 10933090738 -> 10933090736 (-0.0%)
Cycles helped: 1
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>
Zink now exposes the `PIPE_PRIM_QUADS` among supported primitives and
handles them with geometry shaders.
Previously, while not exposing this capability, gallium would internally
generate an index buffer to draw them with triangles.
However the information necessary to avoid drawing the diagonal line
when using the line primitive was not preserved.
fails are added for wireframe xfb quads tests
xfb is expected to output tessellatated quads while showing a quad
without a diagonal, however there is no sane way of achieving this.
As part of the test quads will be rendered with and without xfb and the
results compared.
Now to avoid breaking xfb zink has to always split quads into triangles
when xfb is enabled. This means that the test will fail.
Previously the diagonal was always present so the test passed
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
Fixes a bug where, whenever a primtiive that has more than 2 vertices is rendered
with line stipple, the edge between the first and last vertex will have
stretched out stipple.
This happens because interpolation will occur between two non consecutive
stipple counters for the last edge
(which is between the last and first vertices).
Forcing `nir_create_passthrough_gs` to generate a line strip avoids
this because the last vertex will be duplicated and will have
the correct stipple counter for each edge.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
`nir_create_passthrough_gs` will now take a boolean argument to decide
whether it needs to handle edgeflags.
When true is passed it will output a line strip where edges that
shouldn't be visible are not emitted.
This is usefull because geometry shaders will generally throw away
edgeflags so for a passthrough GS to act transparently it needs to emulate them.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
NIR i/o lowering and sysval lowering can handle the compact var fine at
this point.
Affects: nouveau, virgl, svga, radeonsi, r600, llvmpipe. Does not affect
PIPE_CAP_NIR_COMPACT_ARRAYS drivers like crocus, iris, d3d12, freedreno,
zink.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21940>
It is observed that in display resolutions where width is not equal to
stride, vulkan rendering is being distorted. This is happening due to
stride calculation mismatch between minigbm and mesa.
This fix makes sure that the stride calculated in minigbm is passed to
anv and isl.
The issue was found while debugging the following android cts tests and
thus fixes them as well.
android.graphics.cts.VulkanPreTransformTest#testVulkanPreTransformNotSetToMatchCurrentTransform
android.graphics.cts.VulkanPreTransformTest#testVulkanPreTransformSetToMatchCurrentTransform
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22163>
The OpenGL specification requires that seamless cube maps ignore the wrap mode,
but some hardware may try to respect the wrap mode even for seamless cubes
contrary to the spec. Since OpenGL maps samplers 1:1 to textures (at least
without bindless texture support...), it's easy to override the wrap mode for
seamless cubes to something that works for the hardware.
I'm not sure if there is value in gating this behaviour behind a CAP. On one
hand, there is a tiny bit of extra CPU overhead added to change samplers. On the
other hand, normalizing wrap modes might improve CSO caching, and normalizing to
a non-BORDER mode avoids the expensive border colour code later in the function.
We will need a different workaround in our Vulkan driver. Potentially, we'll
have to duplicate *every* sampler to have a cubemap version and a non-cubemap
version, selecting a sampler in the shader based on the texture opcode. That
sucks and implementing it would depend on subtle details of how we implement
descriptor sets, so it's not like we would share that code with the GL driver
anyway. In the mean time, let's get this right for GL without the performance
hit of duplication.
Fixes dEQP-GLES3.functional.texture.filtering.cube.* on Asahi, as well as a
smattering of dEQP-GLES31.functional.texture.filtering.cube_array.* fails on
softpipe.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21978>
6148e3aae7 ("mesa: Fix ctx->Texture.CubeMapSeamless") introduced a hack, where
seamless cube maps would be requested even for GLES2 contexts despite the spec,
on the assumption that GLES2 gallium drivers would ignore the bit. But that
requires Gallium drivers to know what GLES version they advertise, which is a
horrible layering violation. When the commit was written 8 years ago, there were
classic drivers to contend with so it made sense as a fix to get GLES 3.0 up and
running. With classic drivers gone, it's time to sunset the hack and restore the
intended behaviour by setting ctx->Texture.CubeMapSeamless only once we know the
version.
In addition to fixing a semantic issue in the Gallium contract and preventing a
regression from the next commit, this fixes cube maps on Mali-T720 under
Panfrost. In general, Panfrost supports GLES3 (and honours the seamless flag
everywhere) but on T720 we only advertise GLES2 due to missing MRT support on
older Midgard devices, so we need the flag set properly to distinguish these
cases.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21978>
A future platform requires that mesh shader URB space be allocated
before task shader URB space.
If task shader is enabled, it will align the mesh shader URB size to
8Kb and give the remaning back to task shader. Otherwise, no aligment
is needed, and mesh shader will have all the URB space.
BSpec: 56229, 56230
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21603>
When this debug flag is set, the driver sets the umd metadata for
all color textures and enables the use of extended metadata.
Extended metadata allows umr to import textures and setting these
on all color texture allows to import non-exported textures
(eg: dGPU draw surface when DRI_PRIME=1 is used).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21984>
Update the metadata format. For gfx8- chips nothing change.
For gfx9 chips:
* for textures without a valid modifier a dw is added at index=10
containing the stride
* for textures with a valid modifier the modifier is stored at
index 10 and 11. Then the number of planes is stored at 12.
Then for each plane the offset and the stride are stored.
The goal here is to be able to create textures from dmabuf from
umr - without these changes this is impossible because these
values can't be guessed.
The new layout is compatible with version=1 so old/new UMD can
be used together without issues and isn't used by default.
For radeonsi, it will be possible to use it with a AMD_DEBUG=...
option.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21984>
Shader info is needed for mesh in linking (in nir_lower_io_to_scalar_early,
see commit 5e144454) and will be needed once MR !17622 (anv: work around
for per-prim attributes corruption) lands.
We still need to call nir_shader_gather_info in anv_pipeline_lower_nir,
because the information got stale between anv_graphics_pipeline_load_nir
and anv_pipeline_lower_nir. Some examples:
- some FS inputs were marked as per-primitive during linking
(brw_nir_link_shaders) affecting per_primitive_inputs mask
- some inputs and outputs were removed, because they are not used
(nir_remove_unused_varyings) affecting outputs_written and inputs_read
This fixes func.mesh.ext.outputs.per_primitive.unused crucible test on DG2.
(I didn't know this test wasn't fixed by 5e144454, because I was testing
with !17622 merged-in, which added its own nir_shader_gather_info before
nir_lower_io_to_scalar_early).
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21787>
Also drop my email address in the copyright lines and fix some "Copyright 208
Alyssa Rosenzweig" lines, I'm not *that* old. Together this drops a lot of
boilerplate without losing any meaningful licensing information. SPDX is already
in use for the MIT-licensed code in turnip, venus, and a few other scattered
parts of the tree, so this should be ok from a Mesa licensing standpoint.
This reduces friction to create new files, by parsing the copy/paste boilerplate
and being short enough you can easily type it out if you want. It makes new
files seem less daunting: 20 lines of header for 30 lines of code is
discouraging, but 2 lines of header for 30 lines of code is reasonable for a
simple compiler pass. This has technical effects, as lowering the barrier to
making new files should encourage people to split code into more modular files
with (hopefully positive) effects on project compile time.
This helps with consistency between files. Across the tree we have at least a
half dozen variants of the MIT license text (probably more), plus code that uses
SPDX headers instead. I've already been using SPDX headers in Asahi manually, so
you can tell old vs new code based on the headers.
Finally, it means less for reviewers to scroll through adding files. Minimal
actual cognitive burden for reviewers thanks to banner blindness, but the big
headers still bloat diffs that add/delete files.
I originally proposed this in December (for much more of the tree) but someone
requested I wait until January to discuss. I've been trying to get in touch with
them since then. It is now almost April and, with still no response, I'd like to
press forward with this. So with a joint sign-off from the major authors of the
code in question, let's do this.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rose Hudson <rose@krx.sh>
Acked-by: Lyude Paul [over IRC: "yes I'm fine with that"]
Meh'd-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22062>
The current implementation has a number of issues:
- it doesn't flush the depth cache, even though this can also be changed
due to fragment shader operations and thus is included in the definition
of glTextureBarrier
- it doesn't flush the vertex sampler cache
- it doesn't stall the pipeline until the flushes are done
Fix those issues and drop the comment, as it's pretty clear from the
code what is being done.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22104>
If a sampler resource is changed the vertex texture caches also need to
be flushed, as those are separate from the fragment texture caches.
It seems that some cores need the VS sampler cache flush to be in a
separate state. I have seen no adverse effects of merging the TEXTUREVS
flush into a single flush state emission on GC3000 and up, but the blob
always emits the vertex sampler cache flush as a separate state, so do
the same here to avoid nasty surprises.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22104>
Having two top-level headings in an article confuses Sphinx, and makes
both appear as separate articles in the toc-tree.
It doesn't seem like there's a good reason why the following headings
should be nested under the "Turnip"-heading anyway, so let's just make
it a sibling to the "Hardware architecture" heading.
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21948>
We never implemented it, and having broken mipmaps works out better
for applications and CTS. Actually implementing stencil adjust is
going to be a major pain due to stuff like the GENERAL layout.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21869>
The zap shader knows the offset of the embedded shader within the zap
sqe instructions, but uses this control reg to get it's own address in
memory, in order to calculate the address of the compute shader part of
the zap shader.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21748>
At ring creation, if supported by renderer, we can request
ringMonitoring. During driver ring waits, the ring's new ALIVE status
bit will be checked periodically at the configured rate. If the bit is
not set, the renderer must have crashed and the driver should do the
same to signal a problem to the app/user.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22036>
This reverts commit dda85cf94b ("venus:
move exp features init back to use ring submit"), and additionally adds
per stream shmem caching to determine when vkSetReplyCommandStreamMESA
is needed.
Checking renderer features before setting up ring means that the bound
shmem for replies on the ring will no longer be implicitly set on first
shmem creation (it was set for the renderer stream instead). So the
test for when another vkSetReplyCommandStreamMESA is needed must
independently consider the last stream set on renderer/ring(s)
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22036>
Use a new calling contract so we can do pre/post-work around every ring-waiting
iteration. All looping uses of `vn_relax()` must now call `vn_relax_init()` and
`vn_relax_fini()` before/after their loop bodies.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22036>
Performance jobs should work better if we fix the device under test to
be the same in every test, instead of using any device from a group of
devices of the same type.
We can do it quickly in LAVA, but it seems more
complicated on Google's farm. So, let's replace the a630 (in Google
farm) with a fixed a618 device to test freedreno traces performance.
Add a618-traces job as well, as we need to confirm that a618 is
generating stable traces with good results before proceeding to track
its performance
Co-authored-by: David Heidelberg <david.heidelberg@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22065>
.lava-test hidden job was setting the HWCI_TEST_SCRIPT variable to deqp
runner. But that is not always the case. When we run piglit traces jobs,
we use piglit-traces.sh instead, for example.
Splitting into:
- .lava-test-deqp (deqp-runner + deqp)
- .lava-traces (deqp-runner + piglit)
- .lava-piglit (piglit-runner + piglit)
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22065>
We filter out traces that work only in standard replay mode but not
profile one via yq (jq for YAML) manipulation.
The previous query needed to be fixed in some scenarios, such as traces
labeled with only `["no-perf"]`, which was being ignored by the query.
This commit updates the yq query with newer syntax to cover all current
cases (at least for freedreno).
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22065>
mpv didn't ask to be on this list, was never consulted about being
on this list and to the best of my knowledge has no problem with
adaptive sync. If there is an issue exposed by mpv having adaptive
sync enabled, then it should be reported to mpv, so that it can be
fixed in mpv.
The only problem I could remotely imagine with mpv and VRR is that
its display-resample mode tries to do something similar, and the
two mechanisms will likely race each other to the bottom, but the
display-resample mode is not the default and this is already a
known issue on Windows so users wouldn't expect this to behave any
differently on Linux.
In short, please don't try to make a list of all applications that
are not video games, it is not conducive to having a good time on
the computer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20701>
We don't provide executable properties for the prolog shader.
Fixes: f123d65e9f ('radv/rt: use prolog for raytracing shaders')
Fixes: dEQP-VK.pipeline.monolithic.shader_module_identifier.pipeline_from_id.ray_tracing_libs.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22111>
The goal is to make radv_shader_create() a function that creates a shader
from a binary without any additional information.
Postprocessing the config is only needed after compilation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22029>
Indeed, gs_copy_shader was not freed.
Fixes: commit 1371d65a7f
r600g: initial support for geometry shaders on evergreen (v2)
For instance, with "piglit/bin/shader_runner generated_tests/spec/arb_gpu_shader_int64/execution/built-in-functions/gs-abs-i64vec2.shader_test -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22087>
This changes gallium to decompose quad strips into quads instead of triangles
when the driver advertises support for them.
This should result in a more correct result when those are drawn
with the line raster primitve (avoids showing the diagonal line).
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21987>
Previosuly it was assumed that primitives where always converted to
triangles if the driver did not support all primitives, however that's
not true for a driver that supports quads but not quad strips.
Fixes piglit spec@!opengl 1.1@dlist-fdo3129-01 on Panfrost
Fixes: dcbf2423d2 ("vbo/dlist: add vertices to incomplete primitives")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21987>
Since 50b82ca818 ("nir/lower_blend,agx,panfrost: Use lowered I/O"),
nir_lower_blend needs to be called after lowering I/O rather than before.
Furthermore, after lowering blend, we need (in general) to lower the resulting
load_output intrinsics. Now that we have a proper preprocess_nir hook, there is
a natural place in panvk_vX_shader to do this.
Fixes dEQP-VK.pipeline.blend.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Add a NIR pass to lower all the sysvals seen in the GL driver to load_ubo
intrinsics. These load_ubo intrinsics will be pushed to uniforms by the backend
compiler as usual. This will let us remove all sysval handling from the backend
compilers.
This is a direct NIR port of the existing pan_sysvals.c infrastructure and the
consumers in the Midgard/Bifrost compilers. It aims to be bug-for-bug compatible
to ease bisection.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This is a flag-day change to how we compile. We split preprocessing NIR into a
separate step from compiling, giving the driver a chance to apply its own
lowerings on the preprocessed NIR before the final optimization loop. During
that time, the different producers of NIR (panfrost, panvk, blend shaders, blit
shaders...) will be able to (differently) lower system values.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
See previous commits for justification. Later, we'll split up NIR processing in
a few steps to give the caller a chance to lower the sysval, at which point the
goofy inputs here will go away.
v2: Only lower in fragment shaders. Likely harmless to run elsewhere but still
wrong because the location enum is defined per-stage.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This uses the new NIR sysvals to avoid materializing magic sysvals in the
driver, getting us closer to the Ekstrand Rule.
v2: Only lower for fragment shaders. Lowering in vertex shaders should be a
no-op, except that FRAG_RESULT_SAMPLE_MASK shadows a VARYING_SLOT for fog
coords, causing v1 of this patch to regress fog. Caught by the G52 piglit job in
CI. Thank you, Marge.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
These two switches are redundant.
Furthermore, bi_tex_op could previously assume its input was a supported texop,
so it returned undefined values for unsupported texops. Now, without the guard
in front of it, bi_tex_op should check for supported texops, so we need to drop
the unsupported texops from the switch.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Mali's LD_TILE instruction (mapping to NIR's load_output) requires a "conversion
descriptor" specifying how to convert from the register foramt to the tilebuffer
format. To implement framebuffer fetch on OpenGL without shader variants, we
generate these descriptors in the driver and pass them in a uniform. However, to
comply with the Ekstrand Rule, we can't have magically materialized system
values -- they should come only from the NIR where the driver can lower as it
pleases (e.g. PanVK can lower to a constant because it knows the framebuffer
format at pipeline create time). Add intrinsics to model this.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
We want to lower this in NIR instead of the backend IR to give the driver a
chance to lower the "is multisampled?" system value, which makes more sense to
do in NIR. This gets rid of one of the magic compiler materialized sysvals.
Plus, this will let us constant fold away the lowering in Vulkan when we know
that the pipeline is single-sampled / multi-sampled.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Depending on the ordering of includes, GFX_VER may not defined for
intel_device_info.h. The failure mode of this case is silent:
BITSET_TEST will be called when it could be compiled out.
GFX_VERx10 should be used in place of GFX_VER. GFX_VERx10 is defined
by a compiler flag, and is always present for genX compilation units.
Fixes: 3c9a8f7a6d ("intel/dev: generate helpers to identify platform workarounds")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21908>
intel_device_info.h tests macros in the form `INTEL_WA_{id}_GFX_VER`.
gen_wa_helpers.py produced macros in the form `INTEL_GFX_VER_WA_{id}`
Change the generated code to follow intel_device_info.h
Fixes: 3c9a8f7a6d ("intel/dev: generate helpers to identify platform workarounds")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21908>
This is needed so that we can handle two special cases:
* Dynamic buffer data is allocated out of a command-buffer-owned buffer,
rather than a descriptor-set-owned buffer, so the remapping puts them
in their own register space.
* Static samplers should be left alone and not converted to bindless.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21913>
Rather than trying to perfectly defrag, let's just allow re-use.
When a set is allocated for the first time, it locks in its range of
the heap that it'll use. If the last set in the heap is used, then
those descriptors go back to being free, but if a set in the middle
of the heap is freed, those descriptors remain assigned to that set.
A later allocation attempt can reclaim them, as long as the new set
fits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21913>
When operating in "bindless" mode, the device will own 2 descriptor
heaps, one for views, and one for samplers. Every time a view is
created (image view, buffer view), a slot is allocated for it out
of the device view heap for each usage type (sampled vs storage).
Then, in a future change, descriptor sets will just contain view/
sampler indices instead of actual descriptors. Instead of copying
these to a cmdbuf-owned descriptor heap, we can directly bind the
descriptor set as a buffer. We'll also modify shaders to perform
an indirection and index into the device heap.
Buffers also get views set up on creation. In a perfect world, we
could just put addresses/sizes in the descriptor set, but DXIL
doesn't support loading from addresses, we need descriptors. When
robust buffer access is disabled *or* descriptor set buffer views
reference the remainder of the buffer, we can just re-use a view
from the buffer and use an offset.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21913>
When running in a bindless mode, we won't ever be using SRVs for these.
Change terminology for determining descriptor offsets from "writable"
to "alt" to match naming already used elsewhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21913>
A future change is going to add descriptor heaps *to* the dzn_device,
and having 3x ID3D12Device pointers in a single object just seems
wrong. All of the callers already had a device, so just pass it
along where needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21913>
Currently we can get one by looking up already-emitted resource
metadata, but in the future we'll want to be able to get this
info from a call site alone. Depending on the type of call site,
we'll have different sets of info, so add helpers for the
various different kinds of call sites we can support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21913>
```
../src/freedreno/decode/rddecompiler.c:242:65: error: 'sscanf' may overflow; destination buffer in argument 3 has size 32, but the corresponding specifier may require size 33 [-Werror,-Wfortify-source]
if (sscanf(info->name, "%32[A-Z0-6_][%32[x0-9]].%32s", reg_name,
^
../src/freedreno/decode/rddecompiler.c:243:21: error: 'sscanf' may overflow; destination buffer in argument 4 has size 32, but the corresponding specifier may require size 33 [-Werror,-Wfortify-source]
reg_idx, field_name) != 3) {
^
../src/freedreno/decode/rddecompiler.c:243:30: error: 'sscanf' may overflow; destination buffer in argument 5 has size 32, but the corresponding specifier may require size 33 [-Werror,-Wfortify-source]
reg_idx, field_name) != 3) {
^
```
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22015>
Fixes reg names with headergen2, so that if we have separate a6xx and
a7xx variants for a register we get REG_A6XX_foo and REG_A7XX_foo
instead of both being REG_A6XX_foo. Otherwise generated headers for the
kernel wouldn't compile.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22035>
When stencil is enabled and it isn't non-op, Early-Z must be disabled.
The condition that checks this for stencil[0] is correct, but the one
for stencil[1] is wrong: it uses an "and" instead of "or" condition.
This affects dEQP-GLES3.functional.fragment_ops.interaction.basic_shader.14
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22081>
The mediump lowering tests are important for poking at the lowering pass
behavior, since you can't really assert the behavior in any given driver,
given that the GLSL spec allows any mediump op to be done in highp.
But, in hacking on mediump lowering, I wanted several things that the old
test couldn't do:
- Be able to assert about the actual NIR code we expect to generate for a
hypothetical driver (important if other compiler stages might do invalid
transformations like eliminating highp temps, or if we were to move the
lowering after GLSL IR)
- Run faster (gtest unit tests rather than python forking off the standalone
glsl compiler per testcase).
- Express expectations with a lot less escaping of typical syntax.
- High-quality logs for displaying failures.
This new test does all of that, I think, though I haven't converted all of
the unit tests over yet. In converting, I dropped some of the
combinatorial explosion for float/int variations, instead only doing so
when it gets at some different code path (default precision flags). I've
also included some new tests I wrote in the process of writing my proposed
gl_nir mediump lowering.
Even if the conversion isn't complete, getting these tests to run faster
is probably a good idea on its own, for anyone iterating running Mesa's
unit tests (80 tests in 25ms, compared to 109 tests in 1.5s!).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21886>
We set the "mode" to 1 for CS because we want CP_SET_DRAW_STATE to
immediately execute the state groups. But in the 3d path, we don't
restore the value in the sysmem path. This was causing GPU faults
on 7c3 and presumably other a6xx gen4 things. But somehow not on
a6xx gen1.
Let's just set it as part of initial state restore where we are
ensuring that the GPU is in a sane state.
Fixes: dec49ec50a ("freedreno/a6xx: Move CS state to PROG state group")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22080>
KHR-GL46.multi_bind.dispatch_bind_image_texture has been failing on
both Navi10 and VanGogh, so let's document that.
Zmike says he could not reproduce the fails on a newer version of
glcts, so the next release should address this issue.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22055>
As part of https://gitlab.freedesktop.org/tanty/mesa-valve-ci/-/jobs/38416444,
we saw the following flakes:
- dEQP-VK.dynamic_rendering.primary_cmd_buff.basic.2_cmdbuffers_resuming
- dEQP-VK.dynamic_rendering.primary_cmd_buff.basic.contents_secondary_2_primary_cmdbuffers_resuming
- dEQP-VK.pipeline.fast_linked_library.extended_dynamic_state.two_draws_static.topology_line
And the following failure (seen 4/4 times in the run):
- dEQP-VK.draw.dynamic_rendering.primary_cmd_buff.linear_interpolation.offset_min_2_samples,Fail
Samuel told me that these are usual flakes, so let's document all of them.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22055>
All that's really needed here is to handle the array offsetting by using
an Z or array offset instead of the Y offset.
This patch originally changed get_image_offset_sa_gfx9_1d(), but since
we only use linear with the 1d case, it was dropped.
Rework:
* Jordan: Include ISL_TILING_64 as well
* Jordan: Drop change to get_image_offset_sa_gfx9_1d as
recommended by Nanley
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21113>
Per the spec. This helper is only used in nv50 and panfrost, the latter is known
to have a completely broken transform feedback implementation and I'd be
unsurprised if the same is true for nv50. So unsurprising that compatibility
profile interaction was missed.
This is part of the Piglit ext_transform_feedback-tessellation quads puzzle.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22013>
When dealing with multiple Transform Feedback buffers, each of them
needs to have their own offset, so when resuming from one to another we
know exactly were to continue adding primitives.
Fixes "spec@arb_transform_feedback2@change objects while paused (gles3)"
piglit test.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17373>
Most tools looking at shader stats assume that there is only a single
resulting binary shader out of a single input. On Intel HW this is not
always the case. So having a statistic on each variant that reports
the maximum dispatch width helps showing improvement on a single
shader in terms of how large we manage to compile it.
For shaders that can be compiled in multiple SIMD width (like fragment
shaders), this will report the maximum dispatch width in the
statistics of each variants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22014>
When cs allocation fails in radv_create_cmd_buffer,
radv_destroy_cmd_buffer is called before returning
VK_ERROR_OUT_OF_HOST_MEMORY. At that point, the upload list is not
initalized yet, so SIGSEGV will occur when trying to iterate through the
upload bo list. Initialize the upload list earlier to avoid this.
Signed-off-by: Benjamin Cheng <ben@bcheng.me>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22016>
this improves handling for barriers that originate from a write in the
unordered cmdbuf, adding tracking to resources to better determine access
in the unordered cmdbuf and then utilizing that to generate a single split
memory barrier added at the end of the unordered cmdbuf for all the buffers
written to on that cmdbuf
the next step will be to also merge the read access down onto the end-of-cmdbuf
barrier so that all stream upload-type functionality becomes a single barrier
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22027>
there's no way to link up image layouts between the unordered cmdbuf
and the main one, so if an op is promoted to unordered after an image
is used as a descriptor, the layout will be broken
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22027>
previously the unordered access flags would be set before the deferred
barrier was added, which would guarantee no descriptor barriers could
be deferred and thus terminate renderpasses any time a new descriptor
was bound that was both an image and needed a layout change
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22027>
if this scenario occurs:
* bind fb on ctx A
* draw
* flush + change context to B
* read fb on ctx B
* delete ctx A
then a dead batch write will be left on the fb bo
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22027>
the only "safe" rp ends are:
* set_framebuffer_state (new rp)
* flush_resource (present)
* flush (end of rp)
any other rp end needs its rp info sanitized to avoid e.g., reapplying clears
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22027>
bitCount is still special in that our lowering would try to demote its arg
based on the precision of its output, and it shouldn't do that. But the
other special cases now have appropriate qualifiers on them at the IR
level.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21666>
These have precision qualifiers defined in the spec, in which case we
should emit them them while generating builtin signatures and code. We've
been special-casing them in GLSL lower_precision, but now we can just rely
on the precision qualifier of the builtin if non-NONE.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21666>
No need to generate a temp in this case. Cleanup I noticed while looking
at lower_precision behavior (and I've included a testcase to sanity check
that things work out).
This causes a tiny amount of scheduling change on freedreno:
total instructions in shared programs: 11010012 -> 11010012 (0.00%)
instructions in affected programs: 147 -> 147 (0.00%)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21666>
With GPL it's not possible to know the primitive topology when
compiling the pre-rasterization stages. For NGG, we use the maximum
number of vertices per prim and rely on the hardware to ignore the
extra bits for points/lines.
Though, this can't work for NGG streamout because the number of
vertices per prim is used to compute a streamout offset. The only
way to solve this is to pass the number of vertices per prim through
a new user SGPR.
This fixes a bunch of streamout tests with Zink/RADV on GFX11.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21833>
Unlike most other cases, we don't put the YAML-file in a ci-folder,
because we already have one for the CI-specific docs. So let's just
leave the YAML file directly in the docs-folder.
This should fix the problem that any docs-changes that touches the
CI-rules needs a full CI run just because of touching the root
.gitlab-ci.yml file. This causes needless friction and wastes CI
resources.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21953>
Many of the `V3D_DIRTY_*` flags are above 32 bits, but for now the only
one used here is V3D_DIRTY_SSBO.
`shader->uniform_dirty_bits`, where `dirty` ends up, is already 64 bits.
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22019>
Before we can present the buffer we need to wait for the fence to
finish. This fixes severe flickering of unfinished rendering in
many demos/tests. This has been broken for a while, I think.
Note, this is for the non-DRI / Xlib-based GLX.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21993>
as long as a few bits of state are swapped around and none of the "main"
cmdbuf state is applied, it becomes possible to promote the entire
u_blitter operation to the unordered cmdbuf and execute it there as
a "transfer" operation that can continue to enable further reordering
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21906>
this simplifies some codepaths at runtime by short-circuiting some
of the more complex operations since it's already known in advance
exactly which images will be used for which purpose
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21906>
this avoids the (sizable) overhead of going through the previous path
with set_frame_buffer state et al, instead just firing off a quick
begin+end rendering with a clear
it's also easily reorderable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21906>
if a feedback loop hasn't yet been added for an image with both
descriptor and fb binds, queue a check for that to avoid mismatch
affects godot-tps-gles3-high.trace
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21906>
Can you spot the problem?
exp param0 v6, v5, v5, v5
exp param1 v7, off, off, off
exp param1 v7, off, off, off
radeonsi uses ac_nir_optimize_outputs to eliminate output stores with
identical SSA defs (i.e. duplicated), which then causes 2 outputs to
map to the same parameter export.
This is a regression. The old LLVM code was correctly emitting each
export only once. vs_output_param_mask was supposed to be used for
this instead of vs_output_param_offset.
Fixes: 80506be31b - ac/nir/ngg,radv,radeonsi: nogs use ac_nir_export_(position|parameter)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21920>
This way when replacing a broken context we don't need to ask to
kernel what is the priority of the context being replaced.
Also this will be necessary for Xe kmd as it don't have any uapi to
query engine priority.
While doing that also taking the oportunity to move more code from
iris_bufmgr.c/h that only has one caller.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21965>
Intel Gen9 GPUs have hardware ASTC support, but have a bug where they
don't handle denormalized values in void extent blocks correctly. This
isn't that hard to work around - on upload, we can detect such blocks,
and flush any denorms to zero. Because we're altering the data behind
the application's back, and applications can theoretically ask to
download the original unaltered image data, we unfortunately need to
maintain shadow copies of the data.
To make sure that we don't accidentally skip the void-extent flushing
via any fast-upload paths, and support download correctly, we plug this
into the st/mesa compressed texture format fallback paths, which store
a CPU copy of the original image data, and upload altered data.
This is unfortunately common code for what's likely to be a single
driver's issue (on a single generation), but it beats replicating an
entire framework we already have inside the driver.
Fixes dEQP-GLES3.functional.texture.compressed.astc.void_extent_ldr.*
using iris on Intel Gen9 GPUs.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4167
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21943>
Some drivers use clang-format exclusively. We would like to lint for correct
formatting in CI to catch style issues before they land, because mixing
clang-format and not clang-format within a codebase is a recipe for conflicts.
We don't expect this lint to ever fail in "normal" usage, since we expect
developers on these drivers to setup automatic formatting in their editor.
However, it can be useful as a failsafe or for drive-by contributors who don't
know the style guide.
Enable the linting for Asahi. We'll enable for Panfrost shortly, but Panfrost
isn't clang-format clean quite yet.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20553>
Requires -Wno-error=... to be passed to the linking stage.
NOTE: This does not imply that it's safe to enable LTO for Fedora
package builds yet. It just helps prevent moving further away from that
long term goal.
v2:
* Keep passing -Wno-error=array-bounds & -Wno-error=stringop-overread.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
With LTO, some compiler warnings are generated only at the compiler's
linking stage. Therefore -Werror needs to be passed to the linking stage
as well for warnings to be turned into errors.
Meson should really do this when both werror and b_lto are enabled, but
meanwhile let's do it ourselves.
We can't just add -Werror to c{,pp}_link_args, because those are passed
for Meson's feature checks, some of which generate warnings, resulting
in false negatives. We use gcc/g++ wrapper scripts instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
Pointed out by GCC with LTO:
../src/gallium/drivers/iris/iris_context.c: In function 'iris_create_context':
../src/gallium/drivers/iris/iris_context.c:304:7: error: 'free' called on pointer 'block_180' with nonzero offset 48 [-Werror=free-nonheap-object]
304 | free(ctx);
| ^
[...]
../src/gallium/drivers/iris/iris_context.c:313:7: error: 'free' called on pointer 'block_180' with nonzero offset 48 [-Werror=free-nonheap-object]
313 | free(ctx);
| ^
v2:
* Use ice pointer instead of ctx. (Karol Herbst)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
Pointed out by GCC with LTO:
../src/gallium/drivers/crocus/crocus_context.c: In function 'crocus_create_context':
../src/gallium/drivers/crocus/crocus_context.c:261:7: error: 'free' called on pointer 'block_174' with nonzero offset 48 [-Werror=free-nonheap-object]
261 | free(ctx);
| ^
v2:
* Use ice pointer instead of ctx. (Karol Herbst)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
Fixes strict aliasing violation:
In function 'r600_init_resource_fields',
inlined from 'r600_buffer_create' at ../src/gallium/drivers/r600/r600_buffer_common.c:578:2:
../src/gallium/drivers/r600/r600_buffer_common.c:139:48: warning: array subscript 'struct r600_texture[0]' is partly outside array bounds of 'unsigned char[264]' [-Warray-bounds]
139 | if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/util/os_memory.h:37,
from ../src/util/u_memory.h:38,
from ../src/gallium/include/pipe/p_state.h:47,
from ../src/gallium/auxiliary/util/u_inlines.h:34,
from ../src/gallium/auxiliary/pipebuffer/pb_buffer.h:49,
from ../src/gallium/include/winsys/radeon_winsys.h:46,
from ../src/gallium/drivers/r600/r600_pipe_common.h:37,
from ../src/gallium/drivers/r600/r600_cs.h:33,
from ../src/gallium/drivers/r600/r600_buffer_common.c:27:
In function 'r600_alloc_buffer_struct',
inlined from 'r600_buffer_create' at ../src/gallium/drivers/r600/r600_buffer_common.c:576:34:
../src/util/os_memory_stdc.h:41:27: note: object of size 264 allocated by 'malloc'
41 | #define os_malloc(_size) malloc(_size)
| ^~~~~~~~~~~~~
../src/util/u_memory.h:46:24: note: in expansion of macro 'os_malloc'
46 | #define MALLOC(_size) os_malloc(_size)
| ^~~~~~~~~
../src/util/u_memory.h:54:41: note: in expansion of macro 'MALLOC'
54 | #define MALLOC_STRUCT(T) (struct T *) MALLOC(sizeof(struct T))
| ^~~~~~
../src/gallium/drivers/r600/r600_buffer_common.c:554:19: note: in expansion of macro 'MALLOC_STRUCT'
554 | rbuffer = MALLOC_STRUCT(r600_resource);
| ^~~~~~~~~~~~~
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
This matches the type of the underlying size member, and is consistent
with other getSize methods.
Avoids compiler warning with LTO enabled:
In member function '__ct ',
inlined from 'convertToSSA' at ../src/nouveau/codegen/nv50_ir_ssa.cpp:401:26,
inlined from 'convertToSSA' at ../src/nouveau/codegen/nv50_ir_ssa.cpp:310:28,
inlined from 'nv50_ir_generate_code' at ../src/nouveau/codegen/nv50_ir.cpp:1331:22:
../src/nouveau/codegen/nv50_ir_ssa.cpp:407:48: error: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=]
407 | stack = new Stack[func->allLValues.getSize()];
| ^
/usr/include/c++/12/new: In function 'nv50_ir_generate_code':
/usr/include/c++/12/new:128:26: note: in a call to allocation function 'operator new []' declared here
128 | _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc)
| ^
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
This can have two main uses:
* If we suspect a problem with TFU copies, we can disable it and
check if other codepaths gets a test/app working.
* To test other codepaths, as in general, TFU is the preferred
option for copies.
Note that for now this is only for v3dv, as for v3d, mipmap generation
uses TFU without an alternative codepath.
With this option we also adds an assert if we try to submit a TFU job,
just in case we keep adding other methods that use TFU, and forget to
include the debug option there.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21952>
the tricky part of tracking renderpass info in tc is handling batch
flushing. there are a number of places that can trigger it, but
there are only two types of flushes:
* full flushes (all commands execute)
* partial flushes (more commands will execute after)
the latter case is the important one, as it effectively means that
the current renderpass info needs to "roll over" into the next one,
and it's really the next info that the driver will want to look at.
this is made trickier by there being no way (for the driver) to distinguish
when a rollover happens in order to delay beginning a renderpass for
further parsing
to solve this, add a member to renderpass info to chain the rolled-over info,
which tc can then process when the driver tries to wait. this works "most"
of the time, except when an app/test blows out the tc batch count, in which
case this pointer will be non-null, and it can be directly signaled as a less
optimized renderpass to avoid deadlocking
also sometimes a flush will trigger sub-flushes for buffer lists, so
add an assert to ensure nobody tries using this with driver_calls_flush_notify=true
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
this enables some more under-the-hood changes without touching the header
that will force all of gallium to be recompiled
also update/clarify rules for using rp tracking; these haven't changed,
but the documentation was less clear before
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
not sure if this actually affects anything, but if a renderpass has
to be split for some reason, ensure subsequent renderpasses don't lose
data
also ensure that rp data isn't lost when triggering primgen clears and
delete a now-invalid assert
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
This fixes the behavior of register coalesce in cases where the source
of a copy is written elsewhere in the program by a force_writemask_all
instruction, which could cause the overwrite to be executed for an
inactive channel under non-uniform control flow, causing
can_coalesce_vars() to give incorrect results. This has been reported
in cases like:
> while (true) {
> x = imageSize(img);
> if (non_uniform_condition()) {
> y = x;
> break;
> }
> }
> use(y);
Currently the register coalesce pass would coalesce x and y in the
example above, which is invalid since in the example above imageSize()
is implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence coalescing y and x into the same register changes the
behavior of the program.
Note that this is a regression introduced by commit a4b36cd3dd. In
order to avoid the problem without reverting that patch, we prevent
register coalesce if there is an overwrite of the source with
force_writemask_all behavior inconsistent with the copy and this
occurs anywhere in the intersection of the live ranges of source and
destination, even if it occurs lexically before the copy, since it
might be physically executed after the copy under divergent loop
control flow.
Fixes: a4b36cd3dd ("intel/fs: Coalesce when the src live range is contained in the dst")
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
This fixes the behavior of copy propagation in cases where either the
source or destination of an ACP is overwritten elsewhere in the
program by a force_writemask_all instruction, which could cause the
overwrite to be executed for an inactive channel under non-uniform
control flow, causing the current per-channel dataflow propagation to
give incorrect results. This has been reported in cases like:
> while (true) {
> x = imageSize(img);
> if (non_uniform_condition()) {
> y = x;
> break;
> }
> }
> use(y);
Currently the copy propagation pass would propagate copy 'y = x' into
'use(y)', which is invalid since in the example above imageSize() is
implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence replacing 'y' with 'x' at 'use(y)' changes the behavior
of the program.
This patch extends the global dataflow analysis algorithm to determine
whether there is any control flow path from a given copy to an
overwrite of its source or destination which has force_writemask_all
behavior inconsistent with the copy, and in such case prevents copy
propagation for that ACP entry at any point of the program which can
be reached from the overwrite, even if the copy is statically
re-executed along all such control flow paths (as in the example
above), since the execution of the overwrite for a given channel i may
corrupt other channels j!=i inactive for the subsequently re-executed
copy.
Note that a simpler solution has been attempted which fully shuts down
copy propagation if such a force_writemask_all ACP overwrite is
present /anywhere/ in the program regardless of its location in the
control flow graph, however that led to large shader-db regressions in
some programs from shader-db (like a CS from Car Chase which would
emit 53% more instructions). With this solution the only handful of
shaders that suffer instruction count regressions seem to be getting
misoptimized right now (e.g. some compute shaders from Deus Ex
Mankind). This solution doesn't seem to affect the run-time of
shader-db significantly, it's less than 1% higher with the fix
applied.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
force_writemask_all determines whether all channels of the copy are
actually valid, and may be required to be set for it to be propagated
safely in cases where the destination of the copy is used by another
force_writemask_all instruction, or when the copy occurs in a
divergent control flow block different from its use.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
generate_tex() asserts that sampler_index.type == UD, but commit
83fd7a5ed1 removed the uint temporary, which caused us to see D at
some points. Really, either should be fine, but let's just put the
UD retype back. This fixes a ton of things in crocus.
Fixes: 83fd7a5ed1 ("intel: Use nir_lower_tex_options::lower_index_to_offset")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
tc fence lifetimes can exceed the lifetimes of their parent contexts,
which means they can be destroyed after mfence->fence has been destroyed
to avoid invalid memory access on a destroyed fence, store all the assigned
tc fences into an array on the real fence and then use that to unset fence
pointers on any outstanding tc fences
fixes flakiness in dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.images.texsubimage2d.12
in caselist:
dEQP-EGL.functional.query_context.get_current_surface.rgba4444_pbuffer
dEQP-EGL.functional.create_surface.platform_window.rgba5551_depth_no_stencil
dEQP-EGL.functional.query_surface.simple.pbuffer.rgb888_depth_no_stencil
dEQP-EGL.functional.color_clears.multi_context.gles2.rgb888_pixmap
dEQP-EGL.functional.color_clears.multi_context.gles1_gles2.rgba8888_window
dEQP-EGL.functional.color_clears.multi_context.gles1_gles2_gles3.rgb888_window
dEQP-EGL.functional.render.multi_thread.gles2_gles3.rgba5551_pbuffer
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.buffers.buffersubdata.3
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.programs.link.6
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.images.texsubimage2d.12
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21843>
The cuttlefish-runner.sh script was failing before reaching the test
suite execution (which was not executing the complete test suite due to
the previous non-catched failures, and was erroneous passing) and we
were not catching that.
Add set -e so we can catch those.
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21941>
When a BO is exported, implicit sync convention requires that writers
signal a fence on the object when complete. We already do this for BOs
that are *already* exported, but it is possible for a BO to be written
to, then exported for the first time.
Add a field to agx_bo to keep track of the current writer syncobj
handle. On first export, we use this to import it into the DMA-BUF.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21620>
We don't want a writer hash table with persistent pointers to resources, because
the resources could be freed without the hash table being updated (even though
the underlying BO will not be freed until it's ready). To avoid the reference
count hell, do away with the pointer hash table and instead use a flat dynarray
for mapping BO (handles) to writer (batch indices).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21620>
This introduces tracking of the required semaphore values in pipelines,
which is then propagated to cmd_buffers on bind. Each queue also keeps
track the maximum count it has waited for, so that we can avoid the waiting
overhead once all the shaders are loaded and referenced.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
Following PAL's implementation, this patch avoids allocating shader code
buffers in BAR and use SDMA to upload them to invisible VRAM
directly.
For some games like HZD, shaders can take as much as 400MB, which exceeds
the non-resizable BAR size (256MB) and cause inconsistent spilling
behavior. The kernel will normally move these to invisible VRAM on its own,
but there are a few cases that it does not reliably happen. This patch does
the moving explicitly in the driver to ensure predictable results.
In this patch, we upload the shaders synchronously; so the shader will be
ready as soon as vkCreate*Pipeline returns. A following patch will make
this asynchronous and don't block until we see a use of the pipeline.
As a side effect, when SQTT is used we now store the shaders on a cacheable
buffer which would speed up writing the trace to the disk.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
For consistency with the sdma_copy_buffer helper that will be added next.
As a general justification, SDMA commands require little state tracking and
using radeon_cmdbuf makes it more suitable for driver internal use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
Avoid all the issues of having to keep them in sync, and few-enough
people (read: probably no-one ever) will be working on the asahi driver
from a Windows machine, so symlinks can be relied upon, especially for
something optional like automatic code formatting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21951>
We are about to enable pre-merge testing for radv-zink on vangogh,
which would mean the steam decks would be used for the following jobs:
* Mesa pre-merge CI:
* zink: 3 (~12 minutes)
* Mesa Post-merge CI:
* vkcts: 4 (~30 minutes)
* vkd3d: 1 (~5 minutes)
* DXVK CI: 1 (takes ~4 hours)
This means we could have 9 jobs running at the same time on steam
decks, despite only having 6 available. By reducing the number of decks
allocated for VKCTS runs from 4 to 2, we get closer to the actual
availability, and since vkd3d is so short + DXVK CI runs so
infrequently, we should never have to wait for a deck for too long!
Unfortunately, with the change of parallelism, a known flake started
failing more consistently, so I added it to the flakes list.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21873>
This requires that we limit the number of max combinded SSBOs to 31,
otherwisewe shaders that use SSBO binding points with higher values
will break on the host.
Fixes CTS:
KHR-GL43.shader_storage_buffer_object.basic-atomic-case1
KHR-GL43.shader_storage_buffer_object.basic-atomic-case2
KHR-GL43.shader_storage_buffer_object.advanced-indirectAddressing-case2
KHR-GL43.shader_storage_buffer_object.advanced-usage-case1
KHR-GL43.shader_storage_buffer_object.advanced-usage-sync
KHR-GL43.shader_storage_buffer_object.advanced-matrix
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21586>
When a shader uses SSBOs in various shader stages, then we have to track
the binding locations in order to be able to properly bind these SSBOs.
Therefore add a flag that enables adding the start index of the bindings to
the SSBO index.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21586>
Blob resources are mapped directly, no need to copy data around, and
in any case, neither the resource nor the transfer info will have an
IOV attached to it, so the transfer would result error out on the host
anyway.
In addition, blob resources should not use re-allocation.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21586>
Radeonsi is going to use nir tess factor write, so need to
remap tess factor location.
RADV set tess factor driver location to be 0 and 1 in
get_linked_variable_location(). While radeonsi also set them
to be 0 and 1 in st->map_io aka. si_shader_io_get_unique_index_patch().
We could just set them to be 0 and 1 at the beginning of
ac_nir_lower_hs_outputs_to_mem(), but in order to keep the
location map at the same place, we still do this in
lower_hs_output_store().
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21437>
Rounding up the polygon list BO can waste large amounts of memory. In a common
case I observed, it rounded up 11MB to 16MB, wasting 5MB. That adds up quickly
across processes, especially on the 2GB machines.
This only applies to Midgard. On Bifrost and newer, the driver does not
explicitly allocate this data structure. Cc stable because this rounding is
incorrect and the increase in RAM usage can cause real problems (especially
given how slow the shrinker is).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21831>
Genshin Impact does a depth+stencil invalidate (or discard, not sure
which entrypoint they are using) and then proceeds to do draws with
depth test enabled. For IMRs (or freedreno in sysmem mode) this is no
problem. But for tilers that use this as a hint that they can skip the
z/s tile load, it is.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21916>
HSD 1306463417 is a hardware defect. The originating software
workaround for the issue is Wa_1409433168. Convert all references to
the software workaround number, and use generated helpers instead of
GFX comparisons.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21914>
Xe KMD does a special caching handling for buffers that will be
scanout to display, so that is why it needs a flag set during
allocation.
Checking if VK_STRUCTURE_TYPE_WSI_MEMORY_ALLOCATE_INFO_MESA
is available in AllocateMemory() and marking the buffer as scanout.
All WSI code paths but one sets
VK_STRUCTURE_TYPE_WSI_MEMORY_ALLOCATE_INFO_MESA.
The only one that doesn't requires that WSI is initialize with
wsi_device_options.sw_device = true to be executed, what is not the
case for ANV.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21885>
The logic that updates `ctx->gfx_pipeline_state.final_hash` assumed that
the program is replaced. It is supposed to xor `final_hash` with the
hash first and then with the new hash however when the program is
updated it end up xor-ing the new hash twice so it does nothing.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: 15450d2c2e ("zink: incrementally hash all pipeline component hashes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21925>
Having driver name in the renderer will be useful to differentiate
between open source and proprietary drivers as they can have different
feature sets/quirks.
Vulkan API version is also added to the name to match up with ANGLE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21922>
We haven't measured any noteworthy perf improvement
from these, and they are difficult to port to NIR,
so remove them before the NIR based VS input lowering
in order to make it easier to bisect and analyze stats.
Fossil DB stats on Rembrandt (GFX10.3):
Totals from 21750 (16.12% of 134913) affected shaders:
VGPRs: 868512 -> 868664 (+0.02%); split: -0.00%, +0.02%
CodeSize: 64406804 -> 64397572 (-0.01%); split: -0.08%, +0.07%
MaxWaves: 567904 -> 567888 (-0.00%); split: +0.00%, -0.00%
Instrs: 12327212 -> 12324851 (-0.02%); split: -0.10%, +0.08%
Latency: 61367324 -> 61371204 (+0.01%); split: -0.04%, +0.05%
InvThroughput: 9687734 -> 9686000 (-0.02%); split: -0.03%, +0.01%
VClause: 248207 -> 303449 (+22.26%); split: -0.02%, +22.28%
SClause: 314942 -> 315564 (+0.20%); split: -0.09%, +0.29%
Copies: 921581 -> 921820 (+0.03%); split: -0.16%, +0.19%
Branches: 341964 -> 341967 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16805>
In vkCreateImage() if block comrpessed format and VK_IMAGE_TILING_LINEAR is
used, then the app crashes in vega gpu.
This is because addrlib does not support BC + linear as from function
ValidateSwModeParams(). From Marek Olšák the addrlib behaviour is correct.
In pal driver, flags.texture is not set in DetermineSurfaceFlags() function
if BC + linear. pal driver does it because it is expected that the
BC + linear image is only used as transfer resource.
This patch sets RADEON_SURF_NO_TEXTURE flag if usage is not
VK_IMAGE_USAGE_SAMPLED_BIT and and VK_IMAGE_USAGE_STORAGE_BIT.
flags.texture flag is not set if RADEON_SURF_NO_TEXTURE and this fixes
the crash.
v1: set NO_TEXTURE if not SAMPLED or STORAGE (Marek Olšák)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21422>
Block compressed + linear format is not supported in addrlib. But these
surface can be used as transfer resource. RADEON_SURF_NO_TEXTURE flag
indicates not to set flags.texture flag in gfx9_compute_surface().
This will help to fix the vkCreateImage() crash where block
compressed + linear format image is requested.
v2: combine RADEON_SURF_NO_TEXTURE to below line (Marek Olšák)
v1: add RADEON_SURF_NO_TEXTURE flag (Marek Olšák)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21422>
With the CTS coverage and tu featureset extending, these jobs have been
reliably timing out for a while. I've updated the xfails based on a
single run, we'll see how that goes in the upcoming nightlies.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21879>
Indeed, ureg_get_tokens() returns an allocated string that should be
freed using ureg_free_tokens().
For instance, with "piglit/bin/arb_shader_image_load_store-invalid -auto -fbo"
Direct leak of 768 byte(s) in 2 object(s) allocated from:
#0 0x7fa819a78b48 in __interceptor_realloc (/usr/lib64/libasan.so.6+0xb1b48)
#1 0x7fa80e189e04 in tokens_expand ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:239
#2 0x7fa80e189e04 in get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:262
#3 0x7fa80e191f6e in copy_instructions ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2079
#4 0x7fa80e191f6e in ureg_finalize ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2129
#5 0x7fa80e19447b in ureg_get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2206
#6 0x7fa80ec68b91 in si_create_fmask_expand_cs ../src/gallium/drivers/radeonsi/si_shaderlib_tgsi.c:564
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21871>
Now that we don't do GLSL IR constant propagation or constant folding, we
can leave constant variable detection and handling to NIR. This also
avoids some OOB array access in GLSL IR in a piglit test!
Freedreno stats again look like noise:
total instructions in shared programs: 2718412 -> 2718746 (0.01%)
instructions in affected programs: 80497 -> 80831 (0.41%)
total last-baryf in shared programs: 110015 -> 110510 (0.45%)
last-baryf in affected programs: 35263 -> 35758 (1.40%)
total full in shared programs: 189486 -> 189480 (<.01%)
full in affected programs: 52 -> 46 (-11.54%)
total constlen in shared programs: 494540 -> 494496 (<.01%)
constlen in affected programs: 452 -> 408 (-9.73%)
total sstall in shared programs: 198297 -> 197928 (-0.19%)
sstall in affected programs: 3691 -> 3322 (-10.00%)
total systall in shared programs: 432150 -> 431799 (-0.08%)
systall in affected programs: 6070 -> 5719 (-5.78%)
total waves in shared programs: 435098 -> 435110 (<.01%)
waves in affected programs: 92 -> 104 (13.04%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21751>
NIR is happy to take care of constant folding for us, and it's easier to
do in SSA.
This required adjusting of lower_precision unit tests to have un-folded
constants.
freedreno results look like noise. Some excerpts:
total instructions in shared programs: 2718343 -> 2718412 (<.01%)
instructions in affected programs: 6847 -> 6916 (1.01%)
total last-baryf in shared programs: 109992 -> 110015 (0.02%)
last-baryf in affected programs: 117 -> 140 (19.66%)
total sstall in shared programs: 198312 -> 198297 (<.01%)
sstall in affected programs: 148 -> 133 (-10.14%)
total systall in shared programs: 432163 -> 432150 (<.01%)
systall in affected programs: 1016 -> 1003 (-1.28%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21751>
The whole usage of this flag is to call iris_use_pinned_bo() with
writable argument, for that we don't need any i915_drm.h specific type.
IRIS_BLORP_RELOC_FLAGS_EXEC_OBJECT_WRITE could have any other value but
keeping the same as i915_drm.h.
With this we can drop 2 i915_drm.h imports from generic Iris code.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21887>
This is really dumb.
But this fixes arb_shader_language_420pack-active-sampler-conflict on v7 which
otherwise dereferences a null pointer trying to access the nonexistant texture
arrays, or DATA_INVALID_FAULTs if you give it a texture array filled with
zeroes. But it seems happy if you bind in null textures. This is dumb but less
faults in Piglit is good for reducing flakes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
mesa/st doesn't like to use 24-bit textures, preferring RGBX over true RGB even
for texture views where this isn't valid. Given how silly true RGB is in
practice, I'd rather drop support and fix texture views than go against the
grain and risk more issues down the line since nobody else in tree is testing
these paths and apps really shouldn't be caring.
Fixes page faults in arb_texture_view-rendering-formats_gles3 which tries to
sample an R8G8B8_UINT texture with a R8G8B8X8_UNORM view in one subcase. That
test is now passing reliably.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
This is signed, not unsigned. We were already passing negatives and silently
relying on 2's complement and C to do the right thing. But that's silly. We
should just, actually do the right thing.
Found while struggling to debug primitive-restart-draw-mode.
v2: Update the other architectures too, including a decode_csf.c change for the
v10 incarnation of this v4-era field.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
We just want a bit-exact transfer for integers. Using .auto32 accomplishes this
without any clamping shenanigans. Fixes gl-3.0-vertexattribipointer.
Note we can't use .auto32 unconditionally, since reading a uint vertex as float
is supposed to convert (or something like that, gl-2.0-vertexattribpointer tests
the bad case at any rate).
Fixes: 482cc273af ("pan/bi: Implement load attribute with the builder")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
draw->index_bounds_valid tells drivers that the values of min_index/max_index
are set correctly and can be used e.g. to allocate memory for varyings. If set
incorrectly, the GL promises badness.
But, with primconvert, we go mucking with index buffers and then never update
the bounds. So it doesn't matter if the original index bounds were valid, we
can't promise the original bounds are *still* valid. If we were trying to
optimize CPU overhead, we could try to preserve the new min/max index but seeing
as only older Mali cares about this flag, and if you're using primconvert you're
already screwed, I'm not too inclined to go rework primconvert.
Fixes* page faults in primitive-restart-draw-mode on Mali-G52 for GL_QUAD_STRIPS
and GL_POLYGON, which hit the primconvert path. The full dmesg splat looks like:
[ 5438.811727] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x000000100A16BAC0
Reason: TODO
raw fault status: 0x25002C1
decoded fault status: SLAVE FAULT
exception type 0xC1: TRANSLATION_FAULT_1
access type 0x2: READ
source id 0x250
Notice that a high bit is randomly set in the address, this is trying to read
a varying from the actual varying buffer in the vicinity of 0xa16bac0. What's
actually happening is that we're trying to read index #0 despite promising the
driver a minimum index of 2, causing an integer underflow as we try to read
index -2, or as the hardware sees, 4294967294.
As long as we stop lying to panfrost about the bounds being correct, panfrost is
able to calculate the real (post-primconverted) bounds on its own, fixing the
test.
* Alternatively, maybe Panfrost should just ignore this bit, in which I don't
know why we have it in Gallium, since it's probably not conformant to fault on
out-of-range glDrawRangeElements.
Fixes: 72ff53098c ("gallium: add pipe_draw_info::index_bounds_valid")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
depth_stencil_attachment has been changed from a pointer to the attachment idx
to just the attachment idx, as this avoids the driver having to check for NULL
when comparing attachments indexes with depth_stencil_attachment.
Anyplace that relies on depth_stencil_attachment being a valid index must
already check that depth_stencil_attachment is not VK_ATTACHMENT_UNUSED, so
this change avoids having to check both the pointer and the index for the same
information.
Noticed when running dEQP-VK.api.smoke.triangle
Signed-off-by: Jarred Davies <jarred.davies@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21690>
vn_instance_roundtrip does 2 things:
1. vn_instance_submit_roundtrip
- before: encode a cmd to write vq seqno to ring extra field
- after: encode a cmd to update vq seqno against a ring
- submit the encoded cmd via vq
2. vn_instance_wait_roundtrip
- before: wait until ring extra field has the vq seqno
- after: let renderer ring thread wait for the vq seqno
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21716>
All the farms have been updated, and the `out of files` error has been
fixed, and I also believe that the vast majority of the
`file could not be opened successfully` should also be fixed with this
update.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21872>
First make sure shadow mask change sets dirty state.
Second move shadow mask bit removal to unbind_samplerview which
is cleaner and correctly clears the shadow bit when binding buffer texture.
Fixes: 5193f4f712 ("zink: add a fs shader key member to indicate depth texturing mode")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21571>
usually zink_get_cmdbuf() is enough for reordering operations, but
with new technology, it becomes possible to promote even the most stubborn
buffers to the unordered cmdbuf
first, check the src buffer to ensure that there's no pending writes in
the main cmdbuf that would prohibit reordering
second, apply a TRANSFER_DST to the dst buffer using the util function
to determine whether it can be reordered
if both the src and dst can be reordered for their respective regions
and read/write usage, then the entire op can be promoted regardless of
the unordered_read/unordered_write flags
this optimizes out patterns like
upload index buffer (offset=0)
draw
upload index buffer (offset=128)
draw
upload index buffer (offset=256)
draw
...
so that the uploads and draws can be separated and batched
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21802>
I wanted to find slow pieces of code in our Anv driver using our
drm-shim stub.
The last bit of code still talking to the compositor was the WSI
swapchain code and failing because none of the submissions are taking
place (because of the stub).
This change introduces a new variable MESA_VK_WSI_HEADLESS_SWAPCHAIN
which when set turns every swapchain creation into a headless
swapchain. This swapchain does not present anything, allowing the
application to spin as many frames as possible. Thus helping to
identify slow spots in command buffer building path.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6156>
nir->info.subgroup_size can be set to an enum :
SUBGROUP_SIZE_VARYING = 0
SUBGROUP_SIZE_UNIFORM = 1
SUBGROUP_SIZE_API_CONSTANT = 2
SUBGROUP_SIZE_FULL_SUBGROUPS = 3
So compute the API subgroup size value and compare it to the dispatch
size to determine whether we need some bound checking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ac192d79d ("intel/fs: bound subgroup invocation read to dispatch size")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>
I'm not planning to stand mesa-swrast back up until we get Kata set up, so
turn the testing back on at a reduced fraction on so that
venus/llvmpipe/etc. dev can still get some coverage.
I haven't turned lavapipe back on, because it is now unstable in memory
model / atomics tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21880>
For instance, to load uniform data with the LSC we usually rely on
tranpose messages which have to execute in SIMD1. Those end up being
considered as partial writes so within loops their life span spread to
the whole loop, increasing register pressure.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21867>
on pipeline bind with dynamic state, depth_clip_near needs to either be set by
* applying the dynamic state
* using the pipeline state
the previous code always used the pipeline state
fixes:
dEQP-VK.pipeline.*.extended_dynamic_state.between_pipelines.depth_clamp_enable
Fixes: 650880105e ("vulkan,lavapipe: Use a tri-state enum for depth clip enable")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21814>
-iN, --idle-time=N
Set the idle timeout to N seconds. The default timeout is
300 seconds. When there has not been any user input for the idle
timeout, Weston enters an inactive mode. The screen fades to black,
monitors may switch off, and the shell may lock the session.
A value of 0 effectively disables the timeout.
We don't want the session to get locked and monitors to switch off while tests
are running, as many of them depend on swapping buffers.
Cc: mesa-stable
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21877>
if it's known that a renderpass is active and the driver wants to do
renderpass optimizing, help out by not forcing a sync and instead doing
what the driver would do: create a staging buffer and copy it to the
image
this requires that the driver already handles buffer -> image copies
with resource_copy_region
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21801>
It is legal to pass in nullptr as an instance into
vkGetInstanceProcAddr when resolving any global addresses, this
wasn't handled correctly and an illegal access to a member of
a null struct was made.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21827>
This commit rewrites the KGSL backend to utilize vk common wherever
possible to bring the codebase in line with DRM while implicitly
fixing minor API bugs that may have occurred as a result of manually
implementing VK functions.
As a part of moving to vk common, KGSL sync is now implemented
atop vk common sync and vastly expanded in terms of functionality
such as:
* Import/Export of sync FDs - A required capability for properly
supporting the Android WSI and as these functions were stubbed
when a presentation operation used semaphores, it would cause a
leak of FDs that were imported due to the expectation that the
driver would close them. As well as causing UB around due to
ignoring the imported FD or not exporting a valid FD.
* Supporting pre-signalled fences - Vulkan allows fences to be
created in a signalled state which was stubbed prior and can
lead to UB.
* Timeline semaphore support - As a result of utilizing vk common
as the backbone for synchronization, its timeline semaphore
emulation has been utilized to provide support for them without
needing kernel support. (Note: On newer versions of KGSL,
timeline semaphores can be implemented natively rather than
using emulation as they support wait-before-signal)
Fixes freezes due to semaphore usage with presentation on:
* Genshin Impact
* Skyline Emulator
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21651>
Originally if we had an anonymous field (ie. field declared as part of
the register definition itself) the name in the generated field struct
would include the gen prefix (ie. .a6xx_rb_stencil_buffer_pitch), but
this doesn't work for variants because the variant regs would have
different gen prefixes. Fix this by using reg name instead of the
full_name.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
For regs with multiple variants, generate a template'ized function to
pack the reg value. If the template param is known at compile time
(which is the expected usage) this will optimize to the same thing as
the "traditional" reg packing.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
They have more similarities than differences, so merge them and use
"variant" attribute as needed to manage differences.
Note initially using "variant" conservatively when it comes to regs
known on a7xx but not a6xx. It could be that they exist also on later
versions of a6xx as well, for example. For ex, LPAC related regs/bits
likely existed on later a6xx (eg. a660 family) but BV stuff is not.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
To merge a7xx and a6xx regs, using variant property to manage the
differences, we'll want regs/etc to be named according to the first
generation it is use rather than the domain name. Add a new prefix
type to accomplish this. By default, if no variant property, things
will still be named based on domain (ie. REG_A6XX_...), and things
that have variant="A6XX" will also end up as they currently are
(since the chip enum matches domain name), but things that have
variant="A7XX" will end up as REG_A7XX_...
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
Clang seems more relaxed about this, allowing C99 style initializers
without requiring ordering. But unfortunately g++ is more picky :-/
TODO this doesn't completely fix everything with g++, namely sparse
array initialization.. for ir3 driver-params, I think we can convert
these to structs. But there are still one or two others to deal with.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
C++ is more picky about a goto jumping over variable initialization,
even if unused after the goto label (presumably because of destructors
that can be called after a variable goes out of scope). Since there is
only a single fallback path, get rid of the goto.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
If both, any-hit and intersection shader, use scratch vars,
it could happen that they end up in the same location and
overwrite each other.
Found by inspection.
Fixes: c3d82a9622 ('radv: Add pass to lower anyhit shader into an intersection shader.')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21863>
We have 3 new/changed functions with this commit:
1. _mesa_alloc_dispatch_tables creates all dispatch tables that are not
created on demand and sets them to nop. This operates on gl_dispatch,
so it's reusable (e.g. glthread will want to use it)
2. _mesa_free_dispatch_tables frees everything
3. _mesa_initialize_dispatch_tables initializes gl_dispatch for GL
(not glthread)
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21777>
Upstream moved Host.h from Support to TargetParser in LLVM 17.
This shouldn't lead to a FTBFS, since there is a forwarding include left
behind. Sadly the added deprecation warning #pragma is invalid and thus
causes a build failure right away. But since we would have to follow the
move anyway in the future, just do it right away.
Reference: d768bf994f
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Closes: #8275
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21263>
Currently, the generated tests consist of some boilerplate, generated
test cases, and at the very end the actual test. This is bad for readability,
because the actual code is all the way at the bottom. It's also bad for
clang-format linting: even though the test cases are /* clang-format off */,
they still take an exceptionally long time to parse when linting. I suspect this
is a clang-format bug, but it's easy enough to workaround.
To solve these issues, restructure so that the test cases are in separate files
(containing the actual data), but the manually written test functions are
consolidated into a new family of generated layout tests. This is probably
cleaner.
Parallel clang-format linting is now 10x faster on the M1, which means it's
now practical to lint in my "publish branch" hook.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21854>
Rather than ingesting separate control and memory barriers, ingest only the
combined and optimized scoped_barrier intrinsic. For barriers originating from
GLSL, this makes it easier to ensure correctness. For barriers originating from
SPIR-V, this is required for translation at all, as spirv_to_nir knows only
scoped barriers. So this gets us closer to Vulkan and OpenCL.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21752>
Instead of lowering to bitwise ops. Yet another way of subdividing in NIR.
Probably insignificant but makes it easy to check that the pass ordering from the
previous pass is right. It does let us get much better codegen for
unpacksnorm2x16, whatever that's worth.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
As intended. We can't CSE with partial null destinations in the way, so we
shouldn't eliminate dead destinations until after CSE has run. But we should
still eliminate dead instructions to ensure CSE doesn't move things around
needlessly, hurting register pressure.
Noticed while debugging live range splitting.
No GLES3.0 shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Our dead code elimination pass does two things:
1. delete instructions that are entirely unnecessary
2. delete unnecessary destinations of necessary instructions
To deal with pass ordering issues, we sometimes want to do #1 without #2.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
We should handle nir_op_unpack_32_2x16_split_* natively, since we can generate
better code with agx_subdivide (coalescing the ops away) than the bitshift
lowering.
That said, we do need some extra instructions for the floating point
conversions.
No shader-db changes (which makes sense because we're targetting the GLES3.0
shader-db, which doesn't have the packing GLSL functions).
The real motivation of this change isn't optimizing some GLSL pack functions,
though, it's avoiding a code regression from using NIR's memory bit size
lowering in a future MR. That lowering will turn things like "load i16vec4" into
"load i32vec2 + unpack_32_2x16", so we need to be able to coalesce that unpack.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
At some point in a refactoring long ago, our 'Piglit' runs on arm64
started actually being dEQP-GLES2 runs. Oh dear.
Surprisingly, there are a number of expectation changes; added every
fail I saw from a long overnight stress test.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21851>
Minimum/maximum LOD and LOD bias are unsigned and signed fixed point formats
respectively. They are not unsigned integers. Introduce fixed-point types into
our GenXML and use them in the XML, rather than packing in sidebands. This makes
the XML more correct and fixes pretty-printing of texture and sampler
descriptors.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
Connor Abbott wrote a nice explanation of how instance divisors work on Mali.
Let's add it to the driver docs instead of letting it languish in a forgotten
header file.
This is mostly pasted from the existing header in tree, with a few local changes
applied.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
Nowadays, formats are defined with GenXML, not the old panfrost-job.h, so most
of the format #defines in panfrost-job.h are unused. That said, a few are still
in use as a backdoor for compressed format queries to avoid a GenXML dependency.
That's not great but cleaning that up isn't the subject of this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
XE architecture enables many more metrics, perhaps too many for
the average user. Reduce reported metrics to smaller subset,
known as non-extended metrics, by default. Can re-enable extended
metrics with env var INTEL_EXTENDED_METRICS=1
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21841>
Implement buffer textures in full generality. There are a few issues here:
* OpenGL requires buffer textures support a minimum size of 65536 elements,
however 1D textures in AGX are (at most) 8192 elements.
* OpenGL 4.0 (and OpenGL ES) require buffer textures to support the "RGB32"
texture formats. These are 3 packed channels of 32-bits each. In general,
non-power-of-two texel sizes are problematic. AGX does not support any such
formats and we rely on the GL frontend to lower to a padded format (RGBX) if
necessary. Such a lowering cannot work for buffer textures, however, so we
need to find a way to implement RGB32 buffer textures.
We solve these issues in the follow way:
* Use 2D texture descriptors for buffer textures, with a large fixed
power-of-two size along one axis. Then large texel indices may be accessed at
a small vec2 texel coordinate, and since the fixed dimension is a
power-of-two, that vector may be recovered by simply shifting and masking.
This effectively avoids size restriction. We do need to clamp texel indices to
the buffer size to avoid faulting on OOB reads, since we may read past the end
of the buffer (if the app binds a non-page-aligned offset into the buffer).
* Use a general purpose memory load for RGB32 buffer textures. Lower the texture
load instruction to a memory load from the buffer and some address arithmetic.
There's no format conversion needed for RGB32, other than maybe filling in a
format-appropriate alpha, so this is straightforward. Again, we need to clamp
the texel index for robustness with OOB reads.
Each of these solutions brings its own problem.
* Using 2D textures instead of 1D requires physically rounding up the buffer
size when packing the descriptor, so we can no longer implement textureSize()
by reading off the texture descriptor like normal.
* We don't know at compile-time whether a given texture load will read from an
RGB32 buffer texture or not, so we need to emit code for both. In Vulkan, we
can't key the shader to this property, either, since it's descriptor set state
and not pipeline state.
And each of these problems in turn brings its own solution:
* The texture descriptor is linear, so the "compression buffer address" field is
ignored by the hardware. We stash the real buffer size there so that
textureSize becomes a load from the texture descriptor like usual, without
requiring a sideband (which would complicate bindless textures).
* If we determine a texture descriptor contains RGB32 data, then it will never
be interpreted by the hardware and hence does not need to be a valid texture
descriptor. So, we extend the hardware's format enum to contain a
software-defined RGB32 format enum. Then, when lowering texture buffer loads,
we either read it as a typed RGB32 memory load or as a texture load depending
on the value of the format field in the texture descriptor.
All of this is accomplished with a big NIR pass generating a pile of strange
looking code. But it should be good enough in practice for this silly feature.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
In NIR, texelFetch (txf) does not use a sampler, but in AGX, it does -- even
though the contents of the sampler are semantically irrelevant. Rather than
requiring the state tracker to bind a sampler anyway (indicated for texture
buffers with PIPE_CAP_TEXTURE_BUFFER_SAMPLER), just add a dummy sampler
ourselves if txf is used and there are otherwise no samplers. This is helpful
because PIPE_CAP_TEXTURE_BUFFER_SAMPLER isn't honoured by Rusticl or seemingly
mesa/st's PBO code, and after implementing this dummy sampler workaround in
Panfrost for Rusticl, I realized this CAP is silly and shouldn't exist in the
first place. (And I regret pushing for its reinclusion.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
Setting `set -x`can be useful to known via trace which URL baremetal
used to download artifacts.
Today its only printed the command with the environment variables.
Also, this commit fixes multiple `section_end` for the related Gitlab
sections.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21804>
This commit ensures that we are using mesa release builds in performance
jobs.
To achieve that, some modifications were made on top of
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21492.
- Append the `BUILDTYPE` variable into the S3 artifact name
(MINIO_ARTIFACT_NAME environment variable) to allow for better
artifact management.
- The ./artifacts directory has been added to the list of artifact
directories for build-common. This ensures that the debian-release and
debian-arm64-release jobs are the only ones necessary for running
performance jobs. These jobs only produce artifacts via
prepare-artifacts.sh when we are under performance workflow.
- Make lava-submit.sh behave similar to baremetal jobs regarding
MINIO_ARTIFACT_NAME variable. For example, users can now easily
differentiate between mesa-arm64.tar.zstd and
mesa-arm64-release.tar.zstd by looking inside the `Downloading
artifacts from s3` Gitlab section.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21804>
The check in alloc_bo_from_cache() was skiping any try to get a bo
from cache but after use a protected bo was still being put in some
cache bucket and could be used for cases that don't require a
protected bo.
Using a protected bo in cases that don't require it can have
performance implications.
So here returning NULL when trying to get a cache bucket for a
protected bo, this will cause bo->real.reusable to be set to false
avoiding the bo to be reused.
Fixes: 9402ac8023 ("iris: handle protected BO creation")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21824>
We don't have a way to tell the ZLS hardware to use linear buffers, so if a
buffer could be used for depth/stencil, we have to twiddle. This isn't a problem
in practice, since depth/stencil buffers can't be shared across processes or
mapped directly as linear.
Fixes faults in depthstencil-render-miplevels, which was picking linear for one
buffer because of a STAGING bind flag. But that won't work :-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21753>
This job sometimes - very, very, rarely - fails to start Cuttlefish,
the Android VM environment. Given that we don't have any structural
monitoring and restarting (unlike LAVA/BM/B2C) for this, just stick a
more aggressive timeout on the job, so it'll be retried if it fails to
start.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21837>
On Intel platforms, the uclz lowering if ufind_msb is either one
instruction better (Gfx7 and newer) or two instructions better (all
older platforms) than the ifind_msb implementations.
On platforms that use lower_find_msb_to_reverse, there should be no
difference.
All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19938662 -> 19938634 (<.01%)
instructions in affected programs: 850 -> 822 (-3.29%)
helped: 2 / HURT: 0
total cycles in shared programs: 858467067 -> 858465538 (<.01%)
cycles in affected programs: 10080 -> 8551 (-15.17%)
helped: 2 / HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
4d802df3aa loosened the type restrictions
on these opcodes to enable support for 64-bit ballot operations. In
doing so, it enabled 8-bit and 16-bit sizes as well.
It's impossible to get these sizes through GLSL or SPIR-V. None of the
lowering in nir_opt_algebraic can handle non-32-bit sizes. Almost no
drivers can handle non-32-bit sizes.
It doesn't seem possible to enforce anything other than "one bit size"
or "all bit sizes" in nir_opcodes.py. The only way it seems possible to
enforce this is in nir_validate. This is not ideal, but it be what it
be.
v2: Remove restriction on find_lsb. It is acutally possible to get this
via GLSL by doing findLSB() on a lowp value. findMSB declares its
parameter as highp, so that path is still impossible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
The 31-ufind_msb_rev(x) lowering only produces the correct result for
32-bit sources. ufind_msb_rev can also have 64-bit sources, and most
platforms are expected to lower this to 32-bit instructions with extra
logic operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Stipple lines now appear correctly when they are oblique.
Previously the number of steps of the stipple counter between two vertices
was calculated as the euclidian distance between them in screen space, however
the length occupied by pixel along a line is only `1` for lines that are either
vertical or horizontal and will be anywhere between `1` and `sqrt(2)`
for other cases.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21290>
Get the texture/sampler index from the texture/sampler_offset source (which
is an offset from 0 thanks to the lower_index_to_offset lowering) and feed it in
as corresponding 16-bit texture instruction sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
Some of the symbols listed in PLATFORM_SYMBOLS are not only specific
to Linux, but rather specific to the GNU toolchain. Hence, use them
when inspecting ELF binaries produced by a GNU toolchain: this means
on Hurd ('GNU'), and on e.g. kFreeBSD ('GNU/kFreeBSD').
Signed-off-by: Pino Toscano <toscano.pino@tiscali.it>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21825>
This is an inline function with a compile-constant switch, so I expect
the compiler wouldn't produce any better code like this, but for humans
it's easier to read when function calls are not embedded into other
function calls.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21835>
When enabled, on gfx12 plus, we will add the sync nop instruction after
each instruction to make sure that current instruction depends on the
previous instruction explicitly.
This option will help us to get a hint if something is missing or broken
in software scoreboard pass.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>
This mainly lets the software scoreboarding pass correctly mark the
instructions, without needing to resort to fragile manual handling in
the generator.
We can also make small improvements. On Gfx 8LP-12.0, we no longer have
the restrictions about DWord alignment, so we can simply write each half
into its intended location, rather than writing it to the low DWord and
then shifting it in place.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
I originally thought that we were intentionally emitting the legacy
opcodes here to make them opaque to the optimizer, so that it wouldn't
eliminate the explicit type conversions, as they're actually required
to do the quantization. But...we don't actually optimize those away
currently anyway. So...go ahead and use the helpers for consistency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
We used to set RADEON_FLAG_GTT_WC when wsi_info is set. This changes it
to set the flag for any external mem on vram, extending the logic for
apps using external memory directly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21803>
CTS runs on stoney are currently taking ~20min to complete, which seems
to have begun with the upgrade to CTS 1.3.5.0. This is a bit too long in
and of itself, but it means that - assuming zero contention - a job that
has to be retried because the machine hung can take 40 minutes.
Aim to drop this to 15min turnaround by lowering the overall fraction
from 1/8th of the CTS to 1/11th.
As the jobs we run have been reshuffled, this adds a lot more expected
fails. As most of them categorise easily into patterns, group the
failures together in the file. Non-strict wide lines has passed since we
last ran it; the other failures all group into existing classes seen
for a long time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21791>
I used an old version of the script to generate the notes, which didn't
generate this. It is being kept separate instead of being squashed so
that the commits on the 23.0 branch and those on main match
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21500>
The HW can technically execute 16-bit operations, but the restrictions on
16-bit ALU ops are so great that it ends up not being a win for
GLES-on-Vulkan to lower mediump to 16-bit operations, at least with the
current state of the Intel compiler. This brings zink-on-anv in line with
iris and angle-on-anv for mediump behavior (ANGLE uses RelaxedPrecision,
which we ignore).
Perf on some angle traces on my brya (ADL) and i9-9900K (CFL):
ADL zink pubg_mobile_battle_royale: +13.4574% +/- 5.2046% (n=5)
CFL zink pubg_mobile_battle_royale: +29.5332% +/- 0.646585% (n=6)
ADL zink aztec_ruins_high: +5.78027% +/- 4.80645% (n=4)
CFL zink aztec_ruins_high: -1.10641% +/- 0.140562% (n=12)
ADL zink trex_200: +5.86956% +/- 2.09633% (n=10)
CFL zink trex_200: +9.72136% +/- 0.749261% (n=10)
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21775>
Our TGL machines are currently slightly oversubscribed (max. 17 jobs in
a pipeline on 15 DUTs). They're also currently suffering from
thermally-induced GPU throttling (being investigated), and a
thundering-herd network load effect: as all 15 jobs start at once, we
end up saturating one of our network links.
The combination of all three of these things means that TGL is often our
long pole in CI runs. Until we can ameliorate the two issues
constraining throughput (and a third where an unreliable hardware UART
sometimes kills jobs when it shouldn't), halve the workload so we at
least have some breathing room to absorb them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21790>
Seems to fix a hang in the following titles :
- Age of Empire 4
- Monster Hunter Rise
where the HW is hung on a PIPE_CONTROL after a GPGPU_WALKER but no
MEDIA_INTERFACE_DESCRIPTOR_LOAD was emitted since the switch from 3D
to GPGPU.
This would happen in the following case :
vkCmdBindPipeline(COMPUTE, cs_pipeline);
vkCmdDispatch(...);
vkCmdBindPipeline(GRAPHICS, gfx_pipeline);
vkCmdDraw(...);
vkCmdDispatch(...);
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17247>
The previous solution breaks potential loop header phis.
Move the dummy-break to the bottom of the loop.
Fixes: dEQP-VK.reconvergence.subgroup_uniform_control_flow_ballot.*
Fixes: a9c4a31d8d ('aco: handle NIR loops without breaks')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21736>
the recording rp_info may be a pointer to a member of the array being
reallocated, so test for this and re-set it to avoid invalid memory
access
found with this caselist:
KHR-GL46.texture_gather.offset-gather-unorm-2darray
KHR-GL46.texture_view.view_sampling
cc: mesa-stable
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21729>
PA_SC_VRS_OVERRIDE_CNTL is emitted when a framebuffer is bound because
it controls the VRS surface enable bit. Though, if a pipeline is bound
after the framebuffer is emitted, it can override the state. Remove it
completely since VRS for flat shading and RADV_FORCE_VRS are disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20333>
NGG streamout performance is limited by the workgroup size, so make it as
large as possible.
Since this uses si_get_max_workgroup_size() to set the NGG workgroup size,
the side effect is that all GS is also getting an increase to 256, which
is OK.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
The hardware uses the register to premultiply GS vertex indices
in input VGPRs.
This changes the behavior as follows:
- VGT_ESGS_RING_ITEMSIZE is always 1 on gfx9-11, set in the preamble.
- The value is passed to the shader via current_gs_state (vs_state_bits).
- The shader does the multiplication.
The reason is that VGT_ESGS_RING_ITEMSIZE will be removed in the future.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
The gallium disk cache is about to depend on these, and I don't want to
create a dependency on agx_opcodes.h.py for that. So, make a new header
for them that doesn't have build dependencies.
Rename them to agx_compiler_* too, to avoid collisions with the other
driver debug flags.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21776>
The function _mesa_symbol_table_add_symbol() copies the string with strdup(),
the original string should be freed.
For instance, with "piglit/fp-fragment-position -auto -fbo":
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0xffff99c59050 in __interceptor_strdup (/usr/lib64/libasan.so.6+0x59050)
#1 0xffff8f53d24c in handle_ident ../src/mesa/program/program_lexer.l:129
#2 0xffff8f53d24c in _mesa_program_lexer_lex ../src/mesa/program/program_lexer.l:312
#3 0xffff8f529d10 in yylex ../src/mesa/program/program_parse.y:289
#4 0xffff8f529d10 in yyparse src/mesa/program/program_parse.tab.c:2140
#5 0xffff8f5341a4 in _mesa_parse_arb_program ../src/mesa/program/program_parse.y:2589
#6 0xffff8f51e96c in _mesa_parse_arb_fragment_program ../src/mesa/program/arbprogparse.c:82
#7 0xffff8f4d867c in set_program_string ../src/mesa/main/arbprogram.c:402
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21728>
For instance, with "piglit-2000/bin/asmparsertest ARBfp1.0 tests/asmparsertest/shaders/ARBfp1.0/shadow-02.txt":
Direct leak of 192 byte(s) in 2 object(s) allocated from:
#0 0x7f6e8378f987 in calloc (/usr/lib64/libasan.so.6+0xb1987)
#1 0x7f6e7769d620 in asm_instruction_copy_ctor ../src/mesa/program/program_parse.y:2146
#2 0x7f6e7769d620 in yyparse ../src/mesa/program/program_parse.y:439
#3 0x7f6e776a6725 in _mesa_parse_arb_program ../src/mesa/program/program_parse.y:2590
#4 0x7f6e77687f69 in _mesa_parse_arb_fragment_program ../src/mesa/program/arbprogparse.c:82
#5 0x7f6e77630765 in set_program_string ../src/mesa/main/arbprogram.c:402
#6 0x7f6e76ec3e8a in _mesa_unmarshal_ProgramStringARB src/mapi/glapi/gen/marshal_generated2.c:4152
#7 0x7f6e76a0e585 in glthread_unmarshal_batch ../src/mesa/main/glthread.c:122
#8 0x7f6e76a1031d in _mesa_glthread_finish ../src/mesa/main/glthread.c:383
#9 0x7f6e76a1031d in _mesa_glthread_finish ../src/mesa/main/glthread.c:348
#10 0x7f6e76e6a062 in _mesa_marshal_GetError src/mapi/glapi/gen/marshal_generated1.c:1809
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21728>
For instance, with "piglit/bin/asmparsertest ARBfp1.0 tests/asmparsertest/shaders/ARBfp1.0/swz-04.txt":
Direct leak of 18 byte(s) in 2 object(s) allocated from:
#0 0x7f97e99050 in __interceptor_strdup (/usr/lib64/libasan.so.6+0x59050)
#1 0x7f8d4160ac in handle_ident ../src/mesa/program/program_lexer.l:129
#2 0x7f8d4160ac in _mesa_program_lexer_lex ../src/mesa/program/program_lexer.l:312
#3 0x7f8d402b50 in yylex ../src/mesa/program/program_parse.y:289
#4 0x7f8d402b50 in yyparse src/mesa/program/program_parse.tab.c:2140
#5 0x7f8d40d01c in _mesa_parse_arb_program ../src/mesa/program/program_parse.y:2590
#6 0x7f8d3f77ac in _mesa_parse_arb_fragment_program ../src/mesa/program/arbprogparse.c:82
#7 0x7f8d3ad468 in set_program_string ../src/mesa/main/arbprogram.c:402
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21728>
Wa_14017989577 is a clone of Wa_14015360517, which applies to several
platforms beyond INTEL_PLATFORM_DG2_G10.
Update references to Wa_14017989577, and use the generated workaround
helper to ensure application to the proper platforms.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21744>
Modifying pitch for all LINEAR surface isn't correct;
the original change that modified surf_pitch was only
intended for YUV textures.
This fixes vkGetImageSubresourceLayout rowPitch return value
for VK_FORMAT_BC3_UNORM_BLOCK + VK_IMAGE_TILING_LINEAR.
Fixes: fcc499d5 (ac/surface: adjust gfx9.pitch[*] based on surf->blk_w)
v2: add check for UYVY format (Pierre-Eric)
v3: move blk_w division to above if check (Pierre-Eric)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21595>
It's unnecessary and already checked elsewhere like in
vkCmdBindDescriptorSets(). This improves performance of vkoverhead
test #76 (descriptor_1image) by +18%. It's the same performance as
PRO on my Threadripper 1950X now. This should also slightly improve
texel and buffer descriptors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20699>
If we manage to fold in a left shift that's bigger than the hardware can do, we
should at least avoid generating a useless right shift to feed the hardware
rather bailing completely.
For motivation, this form of address arithmetic is encountered when indexing
into arrays with large power-of-two element sizes (array-of-structs).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
Optimize address arithmetic of the form
base + u2u64((index << shift) + const)
into hardware operands
base, index << (shift - format_shift) + const'
which (if format_shift = shift) can be simply
base, index + const'
rather than the current naive translation
base, ((index << shift) + const) >> format_shift
This saves at least one pointless shift. We can't do this optimization with
nir_opt_algebraic, because explicitly optimizing "(a << #b) >> #b" to "a" isn't
sound due to overflow. But there's no overflow issue here, which is what this
whole pass is designed around.
For motivation, this address arithmetic implements "dynamically indexing into an
array inside of a C structure", where the const is the offset of the array
relative to the structure.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
this otherwise may have been a surface that was never drawn to or
already had its layout corrected, in which case a deferred barrier
is not only unnecessary, it might be broken
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21739>
For GLSL, we want to optimize code like
memoryBarrierBuffer();
controlBarrier();
into a single scoped_barrier intrinsic for the backend to consume. Now that
backends can get scoped_barriers everywhere, what's left is enabling backends to
combine these barriers together. We already have an Intel-specific pass for
combining memory barriers; it just needs a teensy bit of generalization to allow
combining all sorts of barriers together.
This avoids code quality regression on Asahi when switching to purely scoped
barriers. It's probably useful for other backends too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21661>
Some backends can handle a constant texture index or a dynamic texture index but
not a constant texture index plus a dynamic texture offset. Add a nir_lower_tex
option to lower to one of these options.
v2: Use more straightforward code proposed by Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>
Though not many people run full test runs, it occupies 2/7 a630 slots
for nearly 2 hours. If more than one person does this at a time, it can
be an effective DoS and make merges time out.
Limit full runs to a subset of the runners, such that at least some of
them will always be available for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21737>
this ensures that batch refs are added for fb surfaces on unbind, which
prevents stale batch tracking from persisting on resources
after the context is destroyed
fixes:
*EGL.functional.render.multi_context*
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21727>
This commit updates LogFollower class to handle carriage return
characters in LAVA logs. LAVA treats carriage return characters as a
line break, so each carriage return in an output console is mapped to a
console line in LAVA.
The updated LogFollower class now merges lines that end with a carriage
return character into a single line, making the Gitlab sections work
correctly. In addition, the `remove_trailing_whitespace` method has been
updated to remove trailing `\r\n` characters from log lines.
The `test_lava_log_merge_carriage_return_lines` test function has also
been updated to test for carriage returns at the end of the previous
line.
Closes: #8242
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21614>
reason:
some h264 streams have some strange pictures, from
vaapi input these pictures don't have a reference frame,
however, they are not intra only pictures, in MB layer
these pictures are looking for some references, if they
cannot find it. It could cause PF.
when reference pictures exist, it will need to set used_for
reference_flags, therefore if that is set, however the
number of reference frames is zero, which is odd, it
should be avoided.
solution:
In the above case, to scan the ref list so that it will
make at least one reference available to avoid crash, since
this is not accurate enough, it could cause some artifacts.
And in that case, it will need to be checked individually
for another solution.
closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1462
closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8401
Cc: mesa-stable
Tested-by: llyyr <llyyr.public@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21732>
these don't actually work; it creates the marker for the op, but
then the "end" of the marker is effectively determined to be the end
of the cmdbuf
instead, detect whether a draw is from u_blitter and add a marker there
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21583>
This is preliminary work for separate shader functions.
The ray_tracing_module is eventually intended as self-contained
pipeline struct per RT group.
For now, these modules only contain the group handles.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21667>
In mesa/mesa !20272, some bash functions introduce a standard wait to setup
gitlab ci sections, but the file collecting them needed to be included in the
artifacts exported by mesa. Other projects that use tests like deqp-runner
need to load these bash functions.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21680>
* if the proposed box is smaller than an existing box then don't add it,
* if the proposed box is adjacent to an existing box, expand
* if the proposed box is larger than an existing box, replace
this reduces the chances of having a ton of copy boxes to iterate over
also add a perf warning in case a ton of copy boxes exist
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21665>
if a single query is being started and stopped frequently, update
the internal qbo with a single copy call instead of one copy per result
not actually that useful in practice because of how query pools are shared,
but could help somewhere in theory
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21581>
for drivers lacking one of:
* EXT_color_write_enable
* primitivesGeneratedQueryWithRasterizerDiscard
terrible things must happen. specifically, dummy surfaces have to
be used in a framebuffer with rast-discard enabled for the duration
of the query
now that queries are only started/stopped in renderpasses, however,
there are new hurdles. with tc renderpass optimizing, queries can be
started outside the renderpass, which would trigger recursion when
trying to start a primgen query outside the renderpass if any clears
are enabled, as those must be flushed onto the real surfaces
to solve all of this:
* block tc renderpass optimizing if at least one of the above features is missing
* detect a pending primgen query start during renderpass start
* activate rast-discard and set dummy surfaces before beginning renderpass
* this recurses and automatically flushes clears
* finally, start the real renderpass
BUT WAIT THERE'S MORE!
because there's also drivers that support EXT_color_write_enable and don't support
primitivesGeneratedQueryWithRasterizerDiscard, which means they do need rast-discard,
but they don't need dummy surfaces, and so the clears still have to be flushed,
so they need an explicit (recursive) renderpass start/stop in advance to
ensure the clears are applied as expected
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21628>
Now that we have multiple sysval tables, implementing compute kernels --
including with indirect dispatch and load_num_workgroups -- is straightforward.
This patch adds the straightforward launch_grid implementation.
As usual needs UAPI support patches to actually do anything, but the relevant
compute tests are passing downstream.
It's not possible to properly test compute shaders support right now (pending
support for images), so we don't update the CAPs or features.txt here. This is
more about flushing out the piles of downstream patches we have (and getting
reviewed!) in preparation for cutting a downstream release soon.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21703>
The previous lowering was insufficient in two areas:
* No support for indirection. This is required for dynamically indexing into
UBOs, SSBOS, etc in OpenGL ES 3.2
* Only a single table supported. Multiple tables are required to implement
indirect dispatch/draws efficiently, in order to bind the indirect buffer as
uniforms.
The first problem is addressed here by reworking the lowering of
system values to happen in NIR, decoupled from the uniform register assignment
details, such that we can handle 1:n lowerings in a straightforward way.
Namely, indirect sysvals are lowered to indirect memory loads relative to the
base address of the sysval table, where the table address is itself pushed as a
(direct) sysval.
The second problem is addressed in this patch by generalizing to multiple
uniform tables.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21703>
Fragment shaders with side effects need to be lowered to ensure they execute for
all shaded pixels but no helper threads. Add a lowering pass to handle this.
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21712>
Most integer immediates are only 8-bit, but load/store instructions allow their
immediate offsets to be 16-bit instead. Take advantage of this in the optimizer.
This eliminates 36% of the instructions in
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36, a fitting
percentage.
Insignificant effect on dEQP-GLES31.functional.ssbo.* performance... Only a
small % of our compile-time pie is actually spent in the backend anyway (as
opposed to NIR passes or GLSL IR).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
This avoids creating silly preambles that don't actually do anything except push
a constant that we could've inlined for cheaper anyway, since nir_opt_preamble's
cost model is sensitive to e.g. constant folding.
This avoids a pointless preamble in split-hell.
As a nice bonus, this also improves compile-time on address-heavy shaders. With
a release build, CPU time in dEQP-GLES31.functional.ssbo.* reduces from 12.87s
to 10.77... a 16% improvement is nothing to sneeze at.
shader-db results are mostly noise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
Kinda silly but fixes
dEQP-GLES31.functional.state_query.integer.max_framebuffer_samples_* which
queries the number of samples of a NONE format, required for
ARB_framebuffer_no_attachments to make sense.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21710>
Sample index and layer index are both 16-bits, even though they are zero
extended for compiler simplicity in some cases. In particular this means that 2D
MSAA arrays consume 6 half-regs for their coordinates, not 8. This is what the
IR translation (actually agx_nir_lower_texture) produces, we just need to fix
the calculation in agx_read_registers to agree.
Fixes validation failure in tests like
dEQP-GLES31.functional.texture.multisample.samples_4.use_texture_color_2d_array
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21708>
vulkancts test dEQP-VK.wsi.direct_drm.surface.create_simulate_oom is failing
because in wsi_display_alloc_connector() function memory allocation for
connector is not checked for return NULL. create_simulate_oom test simulates
out of memory, hence memory allocation fails for connector and later when
tried to dereference connector program will segfault.
This patch fixes the dEQP-VK.wsi.direct_drm.surface.create_simulate_oom test
segfault issue by checking if connector is NULL afer memory allocation.
Cc: mesa-stable
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21701>
This NIR pass lowers stores in fragment shaders to:
if (!gl_HelperInvocaton) {
store();
}
This implements the API requirement that helper invocations do not have visible
side effects, and the lowering is required on any hardware that cannot directly
mask helper invocation's side effects. The pass was originally written for
Midgard (which has this issue) but is also needed for Asahi. Let's share the
code, and fix it while we're at it.
Changes from the Midgard pass:
1. Add an option to only lower atomics.
AGX hardware can mask helper invocations for "plain" stores but not for
atomics. Accordingly, the AGX compiler wants this lowering for atomics but
not store_global. By contrast, Midgard cannot mask any stores and needs the
lowering for all store intrinsics. Add an option to the common pass to
accommodate both cases.
This is an optimization for AGX. It is not required for correctness, this
lowering is always legal.
2. Fix dominance issues.
It's invalid to have NIR like
if ... {
ssa_1 = ...
}
foo ssa_1
Instead we need to rewrite as
if ... {
ssa_1 = ...
} else {
ssa_2 = undef
}
ssa_3 = phi ssa_1, ssa_2
foo ssa_3
By default, neither nir_validate nor the backends check this, so this doesn't
currently fix a (known) real bug. But it's still invalid and fails validation
with NIR_DEBUG=validate_ssa_dominance.
Fix this in lower_helper_writes for intrinsics that return data (atomics).
3. Assert that the pass is run only for fragment shaders. This encourages
backends to be judicious about which passes they call instead of just
throwing everything in a giant lower everything spaghetti.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21413>
These just don't seem to work. macOS falls back to eMRT here...
dEQP-GLES3.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.13
from Fail -> Crash. Proper solution will come when we implement eMRT later on.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21705>
venus utilizes the host side shader cache.
This is a WA to generate shader cache files containing headers with
a unique cache id that will change based on host driver identifiers.
This allows fossilize replay to detect if the host side shader cache
is no longer up to date.
The shader cache is destroyed after creating the necessary files and
not utilized by venus.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21664>
Strictly, this extension was already advertised, but it didn't seem nice to mark
it DONE when there was no sync support in the driver whatsoever. Mark it done
now that we do have proper explicit sync (well, in conjunction with the "add
Linux UAPI" patches that are blocked on getting the kernel driver upstream, but
that's needed for *any* of these extensions to be there!)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21676>
We need to align the extent to 4 for the GPU, but we can't copy more
than the original size out of the user buffer. Always alloc the exact
size we need.
This does mean the GPU gets an IB extent that could include some other
stuff later in the pool, if not aligned. This is probably safe? Given
the base alignment, it should never cross a page boundary and fault or
anything like that.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21686>
The issue here is that for draw indirect count variants, we want to
jump after the last generated draw call to the next location where
commands are. But if we have more than 67M draws (8k * 8k chunks), we
only know the location once we've generated each of the 8k * 8k
chunks.
This change adds a CPU side pointer in the push constant struct so
that we can create a single linked list of chunks to edit and go
through to write the correct jump address after all the generated
space has been allocated.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c950fe97a0 ("anv: implement generated (indexed) indirect draws")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20497>
As we start to refactor the iris code base to support Xe KMD here I'm
dropping DRM_IOCTL_I915_GEM_CREATE usage as much as possible and
unifying all graphics memory allocation calls to
DRM_IOCTL_I915_GEM_CREATE_EXT.
The kernel version that implemented DRM_I915_QUERY_MEMORY_REGIONS uAPI
also implemented DRM_IOCTL_I915_GEM_CREATE_EXT so we can use that
to safely call DRM_IOCTL_I915_GEM_CREATE_EXT.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21369>
The plan is to compile all the Xe files but in run time it will fail
to detect the KMD loaded and it will fall back to software
rendering(if build).
Compiling Xe files makes sure newer commits don't break Xe even if
developers don't have Xe enabled in their build folder.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21368>
This probably doesn't affect Vulkan or GL because they can't have
anything bigger than a vec4 anyway unless it's a u64vec4 and those have
to be at least 8B aligned. This may affect CL apps if they use
__attribute__((packed)) on something with big vectors, depending on how
LLVM decides to translate that.
Fixes: f8aa83f0c8 ("intel/nir: Use nir_lower_mem_access_bit_sizes()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>
Indeed, "resolve" is not freed at the gl_framebuffer destroy
stage.
For instance, this issue is triggered and detected with
"piglit/bin/fbo-depthstencil clear default_fb -samples=2 -auto"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: f5bde99cbd ("gallium: plumb resolve attachments through from frontends -> pipe_framebuffer_state")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21663>
Instead of having ac_set_reg_cu_en that sets the register, replace it with
ac_apply_cu_en that only returns the modified register value,
which allows a large simplification in both drivers because a lot of code
becomes duplicated after it's switched to ac_apply_cu_en.
RADV also didn't apply it to a few registers. Fixed.
This removes 82 lines of code in total.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
We expect to forward GPU fault information to userspace. Since Mesa can
get that information, we can look up the fault address to log what was
the containing or nearest BO. Add a helper for that, so it can be called
from the driver.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
With macOS support out of the way, we can start implementing a lot of
the Linux driver interface and bookkeeping without actually adding the
UAPI proper. Let's do that to reduce the size of the UAPI patchset.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
Destroy the surface dmabuf feedback proxy before destroying the event
queue that the proxy is attached to.
This silences a warning that libwayland 1.22 emits for programs that use
Vulkan/Wayland:
warning: queue 0x557a4efbcf70 destroyed while proxies still attached:
zwp_linux_dmabuf_feedback_v1@18 still attached
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21647>
Destroy the display wrapper proxy before destroying the event queue that
the proxy is attached to.
This silences a warning that libwayland 1.22 emits for programs that use
EGL/Wayland:
warning: queue 0x562a5ed2cd20 destroyed while proxies still attached:
wl_display@1 still attached
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21646>
We need to have the scratch buffer added to the pipeline BO tracking
list, so it's added to the batch buffer and finally to the execbuffer
list. Otherwise we pagefault (or read the default scratch page on
i915).
Fixes
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_u16vec4
on CI (and probably other tests).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2028f1caa3 ("anv: emit 3DSTATE_HS in cmd_buffer_flush_gfx_state")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21653>
Using texelFetch to read samples from an 8xMSAA fast cleared image on
Haswell can read transparent black pixels around triangles from where
there should be none. This issue isn't present when using sample
shading, resolving the image using vkCmdResolveImage or in a copy the
image. The easiest way to fix this is by just disabling non-zero fast
clears for 8xMSAA images.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7587
Cc: mesa-stable
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21444>
We are seeing endless DRM_IOCTL_SYNCOBJ_WAIT ioctl when system memory is
under pressured.
Commit f9d8d9acbb ("iris: Avoid abort() if kernel can't allocate
memory") avoids the abort() on ENOMEM by resetting the batch. However,
when there's an ongoing OpenGL query, resetting the batch will make the
snapshots_landed never be flipped, so iris_get_query_result() gets stuck
in the while loop forever.
Since there's no guarantee that the next batch after resetting won't hit
ENOMEM, so instead of resetting the batch, be patient and wait until kernel has
enough memory. Once the batch is submiited and snapshots_landed gets
flipped, iris_get_query_result() can proceed normally.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6851
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20449>
If we're just loading memory, we can take the scalar offset_is_uniform
paths even the first active invocation is nonzero, saving a bunch of
looping and bounds checking for per-element loads. And, if we don't have
an active invocation, doing the load for element 0 (which is
bounds-checked to return 0 if element 0 had a bad value in it) before
throwing away the result is still better than doing bounds-checked loads
for each element before throwing away the result.
dEQP-VK.ubo.random.16bit.scalar.92 goes from 16.5 to 14.0 seconds.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21142>
gallivm doesn't actuially jump across branches where no invocations are
active, so my previous assertion about the exec mask being nonzero was
incorrect. This means that we'll always use a defined invocation for the
various LLVMBuildExtractElements using the result value, which is an
improvement over my even the code before my cttz change that would use
undefined values for the element to be extracted.
Fixes: 8c2493d041 ("gallivm: Use cttz instead of a loop for first_active_invocation().")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21142>
Workarounds for defects in Intel silicon have been manually
implemented:
- consult defect database for the current platform
- add workaround code behind platform ifdef or devinfo->ver checks
Some bugs have occurred due to the manual process. Typical failure
modes:
- defect database is updated after a platform is enabled
- version checks are overly broad (eg gfx11+) for defects that were
fixed (eg in gfx12)
- version checks are too narrow for defects that were extended to
subsequent platforms.
- missed workarounds
This commit automates workaround handling:
- Internal automation queries the defect database to collate and
summarize defect documentation in json.
- mesa_defs.json describes all public defects and impacted platforms.
Defects which are extended to subsequent platforms are listed under
the original defect.
- gen_wa_helpers.py generates workaround helpers to be called
in place of version checks:
- NEEDS_WORKAROUND_{ID} provides a compile time check suitable for
use in genX routines.
- intel_device_info_needs_wa() provides a more precise runtime
check, differentiating platforms within a generation and
platform steppings.
Internal automation will generate new mesa_defs.json as needed.
Workarounds enabled with these helpers will apply correctly based on
updated information in Intel's defect database.
Reviewed-by: Dylan Baker <dylan@pnwbakers>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20825>
Performance regressed by 31% in one VP2020/Creo subtest because
the draw_always_async flag wasn't implemented correctly. Remove it
instead of fixing it.
While removing it, I noticed that our DrawIndirect async conditions
were incorrect. I fixed them.
Fixes: 3b897719e6 - glthread: add ctx->GLThread.draw_always_async to simplify draw checking
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21566>
Coverity complains that we could end up rolling over on a 32bit
platform, which isn't really true because of the assertion, but there's
also no harm in ensuring that we have exactly the same behavior for both
32 bit and 64 bit platforms.
CID: 1515989
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21572>
When running in the CI environment, instead of crashing the test
binary, it is preferable to just fail gracefully (in this case return
a NULL shader) like is done in release mode, so other tests continue
to be executed.
For convenience add a variable break_on_failure to the test so the
breaking behavior can be re-enable in individual tests when debugging.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21512>
Currently the iris driver is adding the buffer objects to zombie list
without checking if it is busy or not. It checks for it after 1 second
which adds delay to buffer release.
This fix checks if the bo is busy or not before adding it to zombie list.
Without this fix, the applications expecting immediate buffer release would
fail.
The fix is identified while debugging below android cts tests:
android.graphics.cts.BitmapTest#testDrawingHardwareBitmapNotLeaking
android.graphics.cts.BitmapTest#testHardwareBitmapNotLeaking
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21460>
Now that the upload memory is tied to the shader itself, the trap handler
shader no longer needs an additional wrapper.
This is a cleanup to ease introduction of a new shader uploading code path.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21541>
It seems that when batches are submitted back to back, the USC can
retain cache contents between them. This causes a problem when the CPU
updates a VBO between batches, since some of those updates might not be
visible to the USC.
Looks like the VDM barrier command with one magic bit set fixes this, so
let's try that.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21538>
With the workflow keyword, we can have more control over how pipelines
are created.
One of the features is to set a variable for the entire pipeline
depending on the rule. These variables would be available for all jobs
manifest and can be used inside job rules, for example.
We can use that to set a variable to enable performance jobs in the
pipeline, both at the YAML and script levels.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21492>
It's only used by fragment shaders, so halving it matches the size
used in the most optimal primitive pipeline (VS + FS).
This change frees some URB space for mesh and task shaders and as
a result improves vk_meshlet_cadscene performance by up to 2%,
depending on the model.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21559>
Coverity points out that we can pass a negative value to `close()`,
which results in an unchecked error. While this is technically true, it
really isn't a problem as `close()` is speced to return -1 in that case
(which we ignore). However, what is true is that if we fail to dup the
fd (the only case where we could end up with a negative value), then
we're in an unrecoverable error state anyway, and should go to the error
cleanup code.
CID: 1521539
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21568>
Enable support for the following extensions, which are already supported
by the driver and shared wsi code, and were just missing enables inside
v3dv_device:
VK_EXT_direct_mode_display, VK_EXT_acquire_drm_display,
VK_EXT_acquire_xlib_display.
Successfully tested on RPi 400, RaspberryPi OS 11, with X11 RandR output
leasing to lease a RandR output and use it for direct display mode.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21529>
Doing that won't avoid linking wsi headers, and in fact we have already
included both android and common wsi headers. For swapchain info, it's
currently disabled by the swapchain spec version advertised on Android.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21379>
Mappable memory support is a must for Venus core, but the support of
such can be transparent to the driver. Thus the renderer external memory
type won't expose opaque fd type.
External memory over vtest can be exposed and the wsi support on top can
be made explicit as long as masking out the importable bit.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21379>
Register writes from the pre-header might not be correct for any but
the first loop iteration because they can be clobbered inside the loop.
Foz-DB Navi21:
Totals from 18 (0.01% of 134913) affected shaders:
CodeSize: 251384 -> 251508 (+0.05%)
Instrs: 47644 -> 47664 (+0.04%)
Latency: 801801 -> 801852 (+0.01%)
InvThroughput: 177579 -> 177593 (+0.01%)
Copies: 4752 -> 4771 (+0.40%)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8376
Fixes: d3b0f78110 ("aco/optimizer_postRA: Initialize loop header with preheader information")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21540>
this avoids potentially splitting renderpasses by ensuring that
all (non-cs) query operations always occur inside renderpasses
zink_query_update_gs_states() now has to be called inside renderpass
to catch the active queries
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21534>
now when a query pool is full, a new query pool can be created and the
previous one can be dropped from reuse to be freed at a later time
this has the added benefit of avoiding yet another place where a renderpass
might get split
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21534>
if the starts array has been reset, then the counter will be inaccurate,
and some of the members will leak, so this needs to iterate over the capacity
of the array instead of the contents
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21534>
For instance, with "piglit/arb_shading_language_include-api -auto -fbo" or
"piglit/shader_runner tests/spec/arb_shading_language_include/execution/replacement.shader_test -auto -fbo":
Direct leak of 66 byte(s) in 6 object(s) allocated from:
#0 0x7fa4b59050 in __interceptor_strdup (/usr/lib64/libasan.so.6+0x59050)
#1 0x7f9a098fe0 in validate_and_tokenise_sh_incl ../src/mesa/main/shaderapi.c:3383
#2 0x7f9a0a43e8 in _mesa_NamedStringARB ../src/mesa/main/shaderapi.c:3547
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21553>
This changes instances of d3d12_varying_info to d3d12_varying_info*,
significantly reducing the size of the d3d12_shader_key,
d3d12_gs_variant_key, and d3d12_tcs_variant_key.
Associated changes to key fill, compare, hashing, and gs and tcs variant
maps significantly reduce the amount of time spent clearing and
comparing memory.
The biggest win here is not having to re-zero _or_ re-fill varyings in
d3d12_fill_shader_key, validate_geometry_shader_variant, and
validate_tess_ctrl_shader_variant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21527>
When the option is enabled, lower memory barriers to the
unified nir_intrinsic_scoped_barrier.
The translation of the following is based on
https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_gl_spirv.txt
- memoryBarrier()
- memoryBarrierBuffer()
- memoryBarrierImage()
- memoryBarrierShared()
- groupMemoryBarrier()
Also use scoped barrier for the memory counterparts of the GLSL
(control) barrier() when the option is enabled. The execution
part of a (control) barrier() remains using the old intrinsic.
For memoryBarrierAtomicCounter() there's no corresponding
nir_var_atomic_counter mode. Since atomic counters are lowered
to SSBOs, use the nir_var_mem_ssbo mode in the scoped barrier
instead.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3339>
autotune already divides draw-cost by # of draws, but only increments
the draw-cost once per multi-draw. We could either _also_ account for
draw-cost by multiply by # of draws for treat multi-draw as a single
draw. The latter saves an integer multiply per draw.
Fixes a performance regression triggered by transition from GMEM to
sysmem rendering.
This reverts commit 6bfee9e669.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21543>
It's started rendering something different again, with a similar sort of
bad rendering to what's linked in the bug report (though this time it's a
'P' that became a white square). Commit range 65b62db0..964323fe has
nothing particularly likely in it, so I expect this is some sort of cache
flushing fail or something.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21565>
If a dual source blend colour is never written, src1 will be null and it will be
invalid to dereference it. src1 is dereferenced both for the f2fN instruction
but also if a dual blend factor is used... even if the latter isn't strictly
valid, segfaulting in the NIR pass seems a lot meaner than blending with zero.
The referenced commit hosed Asahi, causing anything that used blending to crash.
Panfrost is unaffected since it always supplies a dual colour due to our crude
construction of blend shaders.
Fixes: 8313016543 ("nir/lower_blend: Consume dual stores")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21544>
RGP traces need a dump of shader code in order to display ISA and
instruction trace. Previously, this was read back from GPU at trace
creation time. However, for future changes that implements upload shader
to invisible VRAM, the upload destination will be a temporary staging
buffer and will be only accessible during shader creation.
To allow dumping in such cases, copy the shader code to a separate buffer
at creation time, if thread tracing is enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21513>
some drivers may provide this in heaps that get used by non-staging resources,
so avoid extra copies in that case
unlike the previous attempt at this optimization, this utilizes the screen-based
context for thread-safe transfers, which should avoid races/crashes
fix#8171
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21452>
G13 does not support sampler descriptor LOD biasing, so this needs to be lowered
to shader code for APIs that require this functionality. Add an option to do
this lowering while doing our other backend texture lowerings. This generates
lod_bias_agx texture instructions which the driver is expected to lower
according to its binding model.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Add a new texture opcode that returns the LOD bias of the sampler. This will be
used on AGX to lower sampler LOD bias to txb and friends. This needs to be a
texture op (and not a new intrinsic) to handle both bindless and bindful
samplers across GL and Vulkan in a uniform way.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Now that we're working on lowered I/O, passing in the dual source blend colour
via a sideband doesn't make any sense. The primary source blend colours are
implicitly passed in as the sources of store_output intrinsics; likewise, we
should get dual source blend colours from their respective stores. And since
dual colours are only needed by blending, we can delete the stores as we go.
That means nir_lower_blend now provides an all-in-one software lowering of dual
source blending with no driver support needed! It even works for 8 dual-src
render targets, but I don't have a use case for that.
The only tricky bit here is making sure we are robust against different orders
of store_output within the exit block. In particular, if we naively lower
x = ...
primary color = x
y = ...
dual color = y
we end up emitting uses of y before it has been defined, something like
x = ...
primary color = blend(x, y)
y = ...
Instead, we remove dual stores and sink blend stores to the bottom of the block,
so we end up with the correct
x = ...
y = ...
primary color = blend(x, y)
lower_io_to_temporaries ensures that the stores will be in the same (exit)
block, so we don't need to sink further than that ourselves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21426>
Power-of-two SW stack sizes are prone to causing collisions in the
hashing function used by the L3 to map memory addresses to banks,
which can cause stack accesses from most DSSes to bottleneck on a
single L3 bank. Fix it by padding the SW stack stride by a single
cacheline if it was a power of two. This has been reported by Felix
DeGrood to improve Quake2 RTX performance by ~30% on DG2-512 in
combination with other RT patches Lionel Landwerlin has been working
on.
Many thanks to Felix DeGrood for doing much of the legwork and
providing several iterations of Q2RTX performance counter dumps which
eventually prompted me to consider the hash collision theory and
motivated this patch, and for providing additional performance counter
dumps confirming that there is no longer an appreciable imbalance in
traffic across L3 banks after this change.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21461>
Do this by removing the compatibility table and only using hard coded shaders
when present. The hard coded shaders, along with the hard coding framework
itself, can be dropped once the compiler is capable of compiling the hard coded
shaders. In the meantime we don't want to risk regressing things that we know
work because we temporarily can't test them.
This restriction is being dropped now as the new compiler framework has been
merged and we want to make use of it so it can be developed further.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21495>
After 16 entries, we fall back to the previous logic that used a hash
map to link the resource's state per context.
Preventing hash map churn by cheaply tracking up to 16 context's worth
of states per resource significantly reduces CPU cost in
find_or_create_state_entry
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21528>
Now that turnip can support multiple kernel-mode drivers in a single
build, re-work the meson option to have a single list of KMDs, rather
than special options to enable kgsl for turnip or virtio for gallium.
It is temporarily a bit awkward as gallium does not yet support kgsl
and turnip does not yet support virtio. But both of those are planned
or in-progress, so long term a single list is the most sensible option.
TODO freedreno/drm support to build with only virtio support.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21394>
1) Allow the two different entrypoints for drm vs non-drm (kgsl) to
coexist.
2) Split the generic drm related device initialization from the msm
specifics. This will simplify adding support for additional drm
based kernel mode drivers (ie. virtgpu)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21394>
Because kgsl sync primitives are not drm_syncobj, the kgsl kernel
support needs the ability to patch in it's own entrypoints related
to fences, etc. The current entrypoint table magic using weak syms
won't work if we are building both kgsl and drm support into one
binary, so switch to runtime patching in the kgsl specific entry-
points.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21394>
Seems this document has moved since last we updated this link. But
instead of chasing the exact CDN link, let's link to the document on
Intel's website. There's both a download-link there, as well as the
ability to read the document online.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21448>
Since cull_mask is only one byte, we can trivially store it in the same
register as the flags. This leaves us with a 2% performance gain in
Quake II RTX:
Totals from 7 (14.00% of 50) affected shaders:
VGPRs: 720 -> 688 (-4.44%)
CodeSize: 213052 -> 212980 (-0.03%); split: -0.05%, +0.02%
MaxWaves: 67 -> 70 (+4.48%)
Instrs: 39429 -> 39394 (-0.09%); split: -0.15%, +0.06%
Latency: 1096258 -> 1096943 (+0.06%); split: -0.05%, +0.11%
InvThroughput: 230661 -> 222963 (-3.34%); split: -3.42%, +0.08%
VClause: 1208 -> 1206 (-0.17%); split: -0.25%, +0.08%
Copies: 5321 -> 5269 (-0.98%); split: -1.22%, +0.24%
Branches: 1903 -> 1902 (-0.05%)
PreVGPRs: 650 -> 645 (-0.77%)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21470>
Fix a build error with -Dvulkan-beta=true:
../src/gallium/drivers/zink/zink_screen.c: In function ‘zink_internal_create_screen’:
../src/gallium/drivers/zink/zink_screen.c:2764:20: error: ‘struct zink_device_info’ has no member named ‘have_KHR_portability_subset’
2764 | if (screen->info.have_KHR_portability_subset) {
| ^
../src/gallium/drivers/zink/zink_screen.c:2765:60: error: ‘struct zink_device_info’ has no member named ‘portability_subset_feats’
2765 | screen->have_triangle_fans = (VK_TRUE == screen->info.portability_subset_feats.triangleFans);
| ^
Fixes: e02cdb397e ("zink: prefer vulkan_core.h over vulkan.h")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21491>
now that the mem type is passed directly to pb, there have to be enough
slabs to allocate all the mem types (not heaps), so create memoryTypeCount
slabs to allow this
fixes#8369
Fixes: f6d3a5755f ("zink: zink_heap isn't 1-to-1 with memoryTypeIndex"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21526>
Indeed, buffer_in_ram could be reallocated by fixup_vertex()
which triggers this issue.
For instance, with "piglit/gl-1.0-dlist-materials -auto -fbo":
==28392==ERROR: AddressSanitizer: heap-use-after-free on address 0x607000010024 at pc 0x7f3f416fcf18 bp 0x7f3f33d12800 sp 0x7f3f33d127f8
WRITE of size 4 at 0x607000010024 thread T6
#0 0x7f3f416fcf17 in _save_Materialfv ../src/mesa/vbo/vbo_save_api.c:1405
#1 0x7f3f418199de in _mesa_unmarshal_Materialfv src/mapi/glapi/gen/marshal_generated0.c:5006
#2 0x7f3f413c6863 in glthread_unmarshal_batch ../src/mesa/main/glthread.c:65
#3 0x7f3f4124d368 in util_queue_thread_func ../src/util/u_queue.c:309
#4 0x7f3f41391eba in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#5 0x7f3f4c619c6b in start_thread glibc-2.35/nptl/pthread_create.c:442
#6 0x7f3f4c69e1fb in __clone3 (/lib64/libc.so.6+0x10c1fb)
0x607000010024 is located 20 bytes inside of 80-byte region [0x607000010010,0x607000010060)
freed by thread T6 here:
#0 0x7f3f4f093b48 in __interceptor_realloc (/usr/lib64/libasan.so.6+0xb1b48)
#1 0x7f3f416e5b0c in grow_vertex_storage ../src/mesa/vbo/vbo_save_api.c:417
#2 0x7f3f416e69bc in fixup_vertex ../src/mesa/vbo/vbo_save_api.c:1266
#3 0x7f3f416fb13e in _save_Materialfv ../src/mesa/vbo/vbo_save_api.c:1405
#4 0x7f3f418199de in _mesa_unmarshal_Materialfv src/mapi/glapi/gen/marshal_generated0.c:5006
#5 0x7f3f413c6863 in glthread_unmarshal_batch ../src/mesa/main/glthread.c:65
#6 0x7f3f4124d368 in util_queue_thread_func ../src/util/u_queue.c:309
#7 0x7f3f41391eba in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#8 0x7f3f4c619c6b in start_thread glibc-2.35/nptl/pthread_create.c:442
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21419>
Now that all color blend bits are dynamic, emit_cb_state() is doing
almost nothing and half of that is wrong.
In the case that color write enable is dynamic, at the time the pipeline
state is emitted, it sees all the color attachments as having write
disabled and stores the WriteDisabled bit for each channel.
When all dynamic state is flushed, we have the right values already but
the values recorded into the command buffer get ORed with the ones
stored in the pipeline, and so WriteDisabled tag along when they
shouldn't.
Since all disabled color attachments are handled already when dynamic
state is flushed, there's no point in doing so at pipeline creation
time too. And since the only other thing done by emit_cb_state() is
writing three hardcoded values, they might as well be taken care of in
the same place as everything else.
Fixes CTS from the future:
dEQP-VK.pipeline.*.extended_dynamic_state.*.color_blend_equation_*dynamic*
dEQP-VK.pipeline.*.extended_dynamic_state.*.color_blend_all_*
Fixes: fc3fd7c69e (anv: dynamic color write mask)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21509>
For testing, the conformant behavior can be enabled by setting
conformant_trunc_coord to true manually and running this to enable
the conformant behavior in hw:
umr -w *.*.regTA_CNTL2 0x40000
The layer index rounding and TRUNC_COORD resetting workarounds can disabled
in the shader compiler.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
In config.c there are multiple copies of the code checking for VA_FORMAT_RT_*, this can lead
to confusion and is hard to maintain without knowing to change the code in all the places.
This commit extracts out the duplicated code into a function that checks format support
for a given profile and entrypoint, then this function is called from several places that
had the copies of this code in vlVaCreateConfig/vlVaGetConfigAttributes.
Please also note that after this change, all entrypoints/profiles will be checked for all
formats in the pipe_screen: YUV420/YUV420_10/YUV422/RGB32/YUV400/YUV444
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21466>
Before this change some formats like YUV420, YUV422 and RGB32 are hardcoded as supported
in VaCreateConfig/VaGetConfigAttributes. This is not always the case, different gallium
drivers and hardware will support different formats. The frontend should delegate the support
check call by using the is_video_format_supported(...) function from pipe_screen.
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21466>
The scaling needs to be ubo * MAX_INLINABLE_UNIFORMS, not
ubo * PIPE_MAX_CONSTANT_BUFFERS, otherwise accesses beyond buffer size
will result for ubo >= 4 (and we'd also access the wrong values later
for other non-zero ubo indices).
Fixes: a7696a4d98 ("lavapipe: Fix bad array index scale factor in lvp_inline_uniforms pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21506>
Make the `error` variable static to prevent a clash with
the glibc error() function when LTO is used.
Fixes the LTO build.
Otherwise, it'll fail in the linking phase with a conflict:
```
[411/2321] Linking target src/util/process_test
FAILED: src/util/process_test
c++ -o src/util/process_test src/util/process_test.p/tests_process_test.c.o -flto -Wl,--as-needed -Wl,--no-undefined -Wl,--fatal-warnings -Wl,--start-group src/util/libmesa_util.a src/util/format/libmesa_format.a src/util/libmesa_util_sse41.a src/c11/impl/libmesa_util_c11.a subprojects/perfetto/libperfetto.a /usr/lib/x86_64-linux-gnu/libz.so -pthread -lm -ldl /usr/lib/x86_64-linux-gnu/libunwind.so -Wl,--end-group
mold: error: symbol type mismatch: error
>>> defined in /tmp/process_test.SLc9I6.ltrans0.ltrans.o as STT_OBJECT
>>> defined in /lib/x86_64-linux-gnu/libc.so.6 as STT_FUNC
```
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21511>
We had some special cases before when we could actually get some IFs on
R300 with VDPAU. Now that VDPAU is gone and everything goes through
ntt, we don't have to worry anymore. Remove the complicated logic and
just always transform KILL into KIL none.-1
No shader-db change on RV530 or RV370.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21503>
There is no UVD and the mpeg2 shader-based decoding is broken and doesn't
lead to CPU savings anyway. VDPAU output works, but there is no real
benefit so just disable VDPAU altogether so we can clean the backend a
bit and also open a way to potentially drop the mpeg2 deconding altogether
from the fronted.
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20524>
The workaround address is used as a source for push constants when
there's no resource available, that address must be 32B aligned.
This fixes invalid address being used for buffers in
3DSTATE_CONSTANT_* packets.
Now that intel_debug_write_identifiers() already add the padding,
there's no need to include extra "+ 8" to the offset.
Thanks to Xiaoming Wang that contributed to find and fix this issue.
Fixes: 2a4c361b06 ("iris: add identifier BO")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21479>
I have multiple examples where this register is too large by one
when comparing to the ROQ read/write pointers in CP_ROQ_*_STAT and the
ROQ data itself, as if it includes the dword most recently read too. I
have an example where it's off by 2 compared to the read pointer, but
the read pointer is also off by 1 itself judging by the SQE program
counter, so that may just be them not getting synchronized. This
off-by-one was getting in the way of figuring out exactly IB2 was being
processed in the next commit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19551>
These don't correspond to the a3xx *_STAT registers, which we're about
to add so we need to rename them. The closest analogue is CP_CSQ_AVAIL,
although the sense is inverted (and we're not sure what the low 16 bits
are about). Also, the a3xx distinction between CSQ and STQ doesn't exist
anymore so don't use these outdated terms.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19551>
This is based on the new approach of having a descriptor set
addresses table in memory. To handle dynamic offsets provided on
vkCmdBindDescriptorSets() we duplicate the set with dynamic
descriptors, apply the offsets, and write the new bo's address
into the table. There are better ways of handling dynamic
descriptors but this implementation won't require many/if any
changes in the compiler code.
The descriptor set itself doesn't allocate and reserve space for
the dynamic descriptors since they would all be collected together
when creating the pipeline layout. While copying the descriptor
set we allocate extra space at the end for the dynamic primaries
and secondaries to account for that.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21391>
From the Haswell PRM Vol. 2b, 3DSTATE_WM::Pixel Shader Kill Pixel:
"This bit is required to be ENABLED in the following situations:
- The API pixel shader program contains "killpix" or "discard"
instructions, or other code in the pixel shader kernel that can
cause the final pixel mask to differ from the pixel mask received
on dispatch.
- A sampler with chroma key enabled with kill pixel mode is used by
the pixel shader.
- Any render target has Alpha Test Enable or AlphaToCoverage Enable
enabled.
- The pixel shader kernel generates and outputs oMask."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19876>
Tigerlake PRM: Volume 2c: Command Reference: Registers Part 2 - Registers M through Z
RCU_MODE :: Compute Engine Enable
This bit indicates if Compute Engine (a.k.a Dual Context or Multi
Context) is enabled or not. This bit must be treated as global
control for enabling and disabling of compute engine. Hardware
allocates required resources for the compute engine based on this
bit.
....
HW reserves 4KB of URB space...
Right now no gen12 platform has Dual Context enabled in kernel side,
exposing a compute engine but that can change, so here adding
has_compute_engine to intel_device_info and only reserving URB space
if compute engine is available.
While at it also fixing the error path when pb_slabs_init() fails.
Bspec: 46034
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21031>
The script is broken, and nobody noticed so it wasn't used much.
Meson has had support for printing the options by pointing to the source
dir for a while (not sure the exact version though) so I think we can
just recommend users do that.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21469>
Fixes: 5b205ef413
r600: Store nir shaders serialized to save memory
Direct leak of 4096 byte(s) in 1 object(s) allocated from:
#0 0x7faf89c3bb48 in __interceptor_realloc (/usr/lib64/libasan.so.6+0xb1b48)
#1 0x7faf7be5981d in grow_to_fit ../src/util/blob.c:67
#2 0x7faf7be5a538 in grow_to_fit ../src/util/blob.c:49
#3 0x7faf7be5a538 in blob_reserve_bytes ../src/util/blob.c:177
#4 0x7faf7be5a538 in blob_reserve_uint32 ../src/util/blob.c:190
#5 0x7faf7d248a8c in nir_serialize ../src/compiler/nir/nir_serialize.c:2109
#6 0x7faf7df4fdbb in r600_pipe_shader_create ../src/gallium/drivers/r600/r600_shader.c:401
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21443>
It's not connected up to anything at the moment, and even if I do enable
it for crocus HSW it only shaves 3 instructions off of one particular VS
in an old synthetic benchmark, not affecting anything else in shader-db.
I don't think anyone will care to ever fix or port this to NIR, let's just
retire it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21353>
If the view's seqno increments, it needs to happen *before* the tex cache
key is constructed. Normally this happens when the sampler views are
bound. But if the texture backing a current sampler view is rebound we
need to handle this before the cache lookup.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21408>
Since 13cb41f666 PIPE_BIND_SHARED was used to allocate driver internal
video buffers. These buffers are never shared, but the intent was to
get non-suballocated buffers and SHARED was used as an indirect flag.
This commit switches to PIPE_BIND_CUSTOM which isn't used anywhere else,
and is now translated as "no suballocation".
The main benefit here is that this allows these buffers to set
use_reusable_pool to true reducing the CPU overhead a lot.
For instance, running the following command on my system:
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi \
-i tears_of_steel_1080p.mov -an -c:v h264_vaapi output.mp4
takes 35 sec with this commit vs 45 sec without.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21416>
As we support VK_EXT_image_drm_format_modifier, we could receive
VK_IMAGE_ASPECT_MEMORY_PLANE_0/1/2_BIT_EXT flags.
Fixes several tests like this:
dEQP-VK.drm_format_modifiers.create_explicit_modifier.*
when using CTS 1.3.5.0
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21463>
We have specialized lowering passes dealing with most of that already:
1. gl_nir_lower_samplers_as_deref
2. nir_lower_samplers
3. nir_lower_cl_images
If we need more than that, those passes can deal with following deref
chains as well.
We _might_ need to improve nir_lower_cl_images a bit for more complex
kernels, but CL also doesn't allow indirect images, so we are always able
to optimize the entire deref chain away.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20161>
Some of the OpenCL tests are flaky, because they just take that long.
Builtins can generated really complex code and if we are unlucky they can
timeout.
Proper support for functions would also solve the issue, probably, but for
now increase the deqp-runner timeout so it's less of an annoyence.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20161>
There's just two places where we need any of the WSI specific vulkan
includes, the rest of Zink should do just fine with vulkan_core.h. So
let's include the win32-specific header explicitly in those two places,
and reduce the need for WSI specifics inside zink itself. Kopper
handles the rest of the WSI integration.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21441>
- tcinv - Likely Texture Cache Invalidate (unverified)
- icinv - Mostly sure that it is Instruction Cache Invalidate
- dccln - Data Cache Clean
- dcinv - Data Cache Invalidate
- dcflu - Data Cache Flush
The emission of these instructions were not observed in the wild.
TODO: find out the difference between .shr and .all modes of
dccln, dcinv, dcflu.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14419>
When establishing a texture transfer map, if there is any pending changes on the
texture, instead of trying direct map with DONTBLOCK first, just
use the upload buffer path.
Fixes piglit tests gen-teximages, arb_copy_images-formats
Cc: mesa-stable
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21393>
When an EGLImage is created from a 2D texture and used for texture sharing,
the texture surface might not have been created with the SHARED bind flag.
To allow these surfaces for sharing, this patch sets the USAGE SHARED bit
for surfaces that can be potentially used for sharing even when the SHARED
bind flag is not originally set. Instead of unconditionally enabling the
SHARED bind flag for all surfaces and unnecessarily bypass the surface cache
optimization, this patch only enables the USAGE SHARED bit for surfaces
that also have the RENDER TARGET bind flag.
When the surface handle is inquired and if the surface is currently
marked as cachable, we will need to unset the cachable bit so
the surface handle will not be recycled again.
This patch fixes an assertion in svga_resource_get_handle() when the
EGL_MESA_image_dma_buf_export extension is used in webgl benchamrk running
from firefox in Fedora 37.
Cc: mesa-stable
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21393>
This hack caused problems with some dx9 tests before (due to mipgen
test using nearest filter sampling with tex coords exactly between two
texels hence being extremely sensitive to arithmetic inaccuracies),
and we can no longer distinguish this by using pixel_offset to not get
it enabled. But to pass other tests we don't really need the hack when
there's texture sampling involved anyway.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21407>
This avoids waiting for instance_data which can improve performance:
vk_ray_tracing_ao_KHR_app: 0.2% (The TLAS has 2 instances)
Quake II RTX: 1%
Control: 1%
We also have to shuffle around some code to avoid increasing VGPR usage.
That leaves us with the following stats:
Quake II RTX:
Totals from 7 (14.29% of 49) affected shaders:
CodeSize: 165612 -> 165716 (+0.06%)
Instrs: 31446 -> 31460 (+0.04%)
Latency: 596709 -> 554292 (-7.11%)
InvThroughput: 121998 -> 113327 (-7.11%)
VClause: 596 -> 587 (-1.51%)
Copies: 4664 -> 4646 (-0.39%)
PreVGPRs: 620 -> 639 (+3.06%)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21421>
The added continue_list corresponds to the SPIR-V
Continue Construct and serves as a converged control-flow
construct and is executed after each continue statement
and before the next iteration of the loop body.
Also adds validation rules for loops with Continue Construct
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
Blend states can require masking colour. Currently, this is handled by
nir_lower_blend, which lowers masks to a read-modify-write operation as required
on Mali hardware. However, our "tilebuffer store" instruction supports a write
mask, allowing us to write only a subset of channels to the tilebuffer. It's
more efficient to use that than to emit pointless tilebuffer loads.
Note that even without tilebuffer loads, non-opaque masks don't work with opaque
pass types. Here, we handle this with a translucent pass type, which gets HSR
to do the right thing and is consistent with the pass type used previously.
However, it's a bit heavy handed -- Apple manages to use an opaque pass type
with masking but with some unknown HSR fields twiddled. IMO reverse-engineering
those details shouldn't block this because this gets us closer to optimal (just
not all the way there) and is strictly better than what we had before.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21431>
Masked stores may result in undefs after optimization. Rather than call
lower_undef_to_zero late (but get no benefit), we may as well handle ourselves
to prepare for proper undef support down the line.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21431>
Rather than the backend. This way we can handle non-constant offsets as well as
constants with a single code path (with the constant offset code subsumed as a
special case via NIR's constant folding). This nets us dynamic offset support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21264>
For pvr_setup_descriptor_mappings_new() there will be quite a few
variables of which the value depend on the stage so rather than
having all that selection in the `switch` at the beginning of the
function the helper macro provides a compact selection in the
desired scope.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21331>
Previously UBOs and various buffers, as well as the native
descriptor sets were DMAed into the shared registers. This added
complexity in allocating the registers and various other places.
We also ended up being in situations were we wouldn't know the size
of a buffer by the time the shaders were being compiled. It would
be possible to determine the size by inspecting the shader but
that would introduce more complexity in the compiler.
To get things working sooner, avoid extra complexity for
now, a different approach was devised.
The driver will write the addresses of the currently bound
descriptor sets into a device buffer. The device buffer is referred
to as the descriptor set addrs table. The dev addr of the table is
written into a shared register. To access the buffers the shader
will first get the address of the descriptor set from the in memory
table. Then get the primary descriptor from the descriptor set. And
finally access the in memory buffer with the address it read from
the descriptor. Essentially there's three level of indirection and
all the buffers are in memory. The shader will know what offset the
primary descriptor is located based on the descriptor set layout.
The descriptor set address could have been written into the shareds
directly but that would require extra handling on the compiler side
so opted to just write the table address instead.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21331>
This commit sets up the infrastructure to introduce the new
descriptor set approach while keeping the old paths so the
hard coded apps are still operational. The old paths will be
removed once the compiler can compiler shaders for those apps
and the driver-compiler interface is fully flushed out.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21331>
This commit changes the pipeline layout, desc. set layout,
and desc. set layout binding to keep track of shader stage usage
with a mask of enum pvr_stage_allocation instead of
VkShaderStageFlags.
This commit also makes renames the relevant fields to
'shader_stage_mask' to make the naming uniform across stucts.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21331>
agx_preprocess_nir runs once per shader, whereas agx_optimize_nir runs once per
variant. That means we want to do as much work as possible in agx_preprocess_nir
to make shader variants as cheap as possible to compiler. So, move our standard
suite of lowering and optimizing to the preprocess loop, leaving just a single
(easy) trip through the optimizer for simple variant processing.
Plus, we can remove variables when preprocessing, since we no longer use
variables anywhere. We remove them to reduce the RAM and disk cache footprint of
shader variants.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21104>
This pass works either early or late, so run it late. It creates some
nir_variables as a side effect, which is weird, but it doesn't matter because
the AGX backend doesn't look at variables and the metadata and lowered I/O
intrinsics are all correct.
This is the last step to moving I/O lowering (and hence shader preprocessing) to
CSO create time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21104>
Rob added these new helpers a while back, which freedreno and radeonsi
both share. We should use them too. The new helpers use variables and
system value intrinsics, so we can drop the explicit binding table
creation and just use the normal paths.
Because we have to rewrite the system value uploading anyway, we drop
the scrambling of the default tessellation levels on upload, and instead
let the compiler go ahead and remap components like any normal shader.
In theory, this results in more shuffling in the shader. In practice,
we already do MOVs for message setup. In the passthrough shaders I
looked at, this resulted in no extra instructions on Icelake (SIMD8
SINGLE_PATCH) and Tigerlake (8_PATCH). On Haswell, one shader grew by
a single instruction for a pittance of cycles in a stage that isn't a
performance bottleneck anyway. Avoiding remapping wasn't so much of an
optimization as just the way that I originally wrote it. Not worth it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20809>
When a shader has a comparison with the subgroup invocation id,
we can use a constant instead, saving a VALU instruction.
When the constant can't be represented as a 64-bit literal,
use the s_bfm_b64 instruction to generate it instead, which
is still a win.
Fossil DB stats on GFX11:
Totals from 300 (0.22% of 134913) affected shaders:
CodeSize: 2223052 -> 2214336 (-0.39%); split: -0.43%, +0.04%
Instrs: 430216 -> 429882 (-0.08%); split: -0.14%, +0.06%
Latency: 5881180 -> 5878181 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 731846 -> 729293 (-0.35%)
Copies: 31662 -> 31847 (+0.58%); split: -0.03%, +0.61%
Branches: 8241 -> 8100 (-1.71%)
PreVGPRs: 15788 -> 15786 (-0.01%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20843>
glthread takes care of all uploads, so it's OK to leave uploaded VBOs
bound. The only thing that will be wrong is the bound vertex buffer
returned by glGet, but the only case when that would be wrong is when
an app that doesn't use VBOs queries the current VBO. That never happens.
However, this adds code to unbind all internal VBOs for the case when
glthread is abruptly disabled (e.g. for GL_DEBUG_OUTPUT_SYNCHRONOUS).
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20624>
This will be needed for lowering DrawIndirect in glthread, which is
needed if non-VBO vertex arrays are present.
This only adds the drawid parameter in glthread's draw_arrays and
draw_elements functions, and implements where needed.
New GL API functions are added because we want to use separate
DISPATCH_CMD_* enums for draws with DrawID, so that we don't increase
the memory footprint of draws in glthread batches if drawid == 0.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20624>
u_vbuf does this too. This is the last big missing piece to stop using
u_vbuf.
If the vertex range to upload is much larger than the draw vertex count and
if all attribs are not in VBOs, convert glDrawElements to glBegin/End.
This is a path that makes the Cogs game go from 1 FPS to ~197 FPS. There is
no change in FPS because u_vbuf does this, but it will be disabled.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20624>
MultiDrawElementsUserBuf is changed to mean the same thing as
glMultiDrawElementsBaseVertex, but "gl_buffer_object *index_buffer" is
passed via a parameter instead of using the bound GL_ELEMENT_ARRAY_BUFFER.
This skips binding and unbinding the index buffer around every draw
where glthread uploads indices.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20624>
DrawElementsUserBuf is changed to mean the same thing as
glDrawElementsInstancedBaseVertexBaseInstance, but "gl_buffer_object *
index_buffer" is passed via a parameter instead of using the bound
GL_ELEMENT_ARRAY_BUFFER.
This skips binding and unbinding the index buffer around every draw
where glthread uploads indices.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20624>
The main goal is to never fail to upload non-VBO vertex arrays.
When glthread synchronized, it didn't upload vertices, expecting st/mesa
to do that. This keeps the required sync, and then upload vertices
in glthread.
Also, reorder the code and remove goto statements. This is pretty much
a rewrite.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20624>
Since the dirty range started out as 0..0, you would have 0..VBend as the
new dirty range on the first draw, and if your VB was >32b then you'd
flush every time you used it. Instead, if there's no existing dirty range
then just set it to our new VB's range.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21370>
Since the dirty range started out as 0..0, you would have 0..VBend as the
new dirty range on the first draw, and if your VB was >32b then you'd
flush every time you used it. Instead, if there's no existing dirty range
then just set it to our new VB's range.
Perf results with zink+anv on my CFL:
sauerbraten: +24.8182% +/- 0.602077% (n=5)
portal-2-v2.trace: +4.64289% +/- 0.285285% (n=5)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21370>
Check if the FS output color comes from an FS input. If so, don't tag
the shader as linear. See code comments for more details.
During testing I added extra counters to check the number of times
linear shaders were used to be sure we're not accidentally disallowing
too many shaders. Things looked good with our in-house mksReplay test
suite.
This fixes some OpenGL CTS test failures with llvmpipe.
Signed-off-by: Brian Paul <brianp@vmware.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7489
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21340>
In OpenGL, FRAG_RESULT_COLOR implicitly broadcasts to every render target. Our
existing lower_blend code (somewhat arbitrarily) aliases to the the first render
target's format and blend settings. That said, I don't think that works if
different render targets have different settings -- or blend with their
different destinations -- though I don't have relevant spec text right now.
The actual reason this works is that all users of this pass either call
nir_lower_fragcolor first (panfrost, asahi) or don't have FRAG_RESULT_COLOR as
part of their API (panvk, soon agxv). Unless/until we actually have a use case
for nir_lower_blend with gl_FragColor, assert that gl_FragColor is lowered first
so we don't need to worry about this imaginary case.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
This turns the early pass into a late pass, which is important because it
depends on the shader key and therefore should be called by the driver instead
of the compiler preprocessing. It's also simpler this way.
The shader key work is waiting for review in another merge request. In the mean
time, this patch will let us run blend lowering early for blend shaders on
Midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
This is a form of lowered I/O, it needs I/O semantics so we can know the
location to store to instead of passing via a sideband.
Over in !20906, we will use the BASE to lower blend shader with multisampling in
NIR instead of passing the number of samples and framebuffer format along a
sideband to the Midgard compiler. That's not needed for this series (this patch
was cherry-picked to avoid regressions in the lower_blend changes) but it's good
to model the full form of the I/O lowered intrinsic here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
These track the same information in a slightly different way. Since
nir_loop_variable::init_src is visible outside this module, it cannot
be eliminated.
As an intentional side effect, induction variables with constant
initializers will now have their nir_loop_induction_variable::init_src
field point to the load_const source. Previously this pointer would be
NULL.
v2: Update unit tests and commit message. Remove the now unused ind_var
variable in find_trip_count.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
These track the same information in a slightly different way. Since
nir_loop_variable::update_src is visible outside this module, it cannot
be eliminated.
This leads to some nice simplification in find_trip_count. Previously
this code only had access to the ALU instruction that performs the
increment. It had to "search" the parameters to determine which (if any)
was the constant. With this change, this code has access to the
nir_alu_src of the ALU instruction that performs the increment. It no
longer needs to search the parameters for the constant. It's either the
supplied nir_alu_src or nothing.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
As an intentional side effect, induction variables with constant
increments will now have their nir_loop_induction_variable::update_src
field point to the load_const source. Previously this pointer would be
NULL.
v2: Update unit tests and commit message.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
A couple basic tests for loops with the exit condition after the
increment. In compiler literature, the optimization that moves the exit
condition from the top to the bottom is called "loop inversion."
v2: Pass parameters to loop_builder_invert using a struct. Add a comment
describing the loop being constructed to loop_builder_invert. Both
suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
Inspired heavily by the work by Yevhenii Kolesnikov in the original
versions of !3445.
v2: Pass parameters to loop_builder using a struct. Add a comment
describing the loop being constructed to loop_builder. Both suggested by
Caio.
v3: mscv C++ designated initializer lolz.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
This is to make dEQP-VK.api.buffer.basic.size_max_uint64 pass on
android.
The test creates a buffer of size UINT64_MAX and makes sure the memory
requirement for the buffer is sane. It fails because our memory
requirement is "align64(UINT64_MAX, 16)" which is 0 after overflow.
The test checks maintenance4's maxBufferSize and is skipped normally.
But the extension can be disabled on an android build.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21346>
while separate shaders requires i/o blocks to match between stages,
there are two tricky cases:
* sparse location specification
* variables are required to match in type by location
the first item means user locations must increment if a slot is not used
the second item means that e.g., a mat3x2 can match three vec2 variables
in matching slots
fix both of these cases now
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21383>
Known unsound code.
So far I'm not convinced transaction elimination is doing us much good. Even in
synthetic glmark style benchmarks this seems to be a few % hit at most. Given
that transaction elimination is unsound by design, and that panfrost's
implementation is buggy in several places and getting it right (up to the
unsoundness of the hardware feature itself) would take actual engineering
effort, and the priority is making glamor work... disabling is the obvious
choice here.
For now, we leave the code but gate it behind a env var
flag (PAN_MESA_DEBUG=crc) rather than defaulting to enabled unless
PAN_MESA_DEBUG=nocrc is set. This way, we can still experiment with it if we
need that data ("what performance could we gain if we had this feature,
unsoundness be damned?"). That said, I'm not really ok with having unsoundness
on my devices, y'know? Back of the napkin math suggests that it's not unlikely
that somebody has hit a transaction elimination collision in the wild with the
DDK.
Boils down to values.
Closes: #8113
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21258>
The blend state is emitted from the command buffer when the FS uses
an epilog (either compiled from a lib with GPL or compiled on-demand).
This shouldn't change anything but it will allow to disable using a
PS epilog when the fragment shader doesn't write any color outputs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21341>
This change is wrong for two reasons:
- it hangs most of the time maybe, because changing PSTATE when the
application is running is broken somehow
- it increases the time between triggering and generating the capture
considerably, because there is a delay for changing PSTATE
This restores previous logic where PSTATE is set to profile_peak at
logical device creation. Though, it also re-introduces an issue with
multiple logical devices (kernel returns -EBUSY) but this will be
fixed in the next commit.
This fixes GPU hangs when trying to record RGP captures on my NAVI21.
Note that profile_peak is only required for some RDNA2 chips (including
VanGogh).
Cc: mesa-stable
This reverts commit 923a864d94.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21222>
Propagating the indirect load to more instructions would result
in more address load instructions. This would (a) remove the advantage
of eliminating one move, and (b) introduce more latency, because between
address load and use two cycles must pass.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21357>
This shader originates from Granite 3D engine and has been adapted
to be used with Open GL and some GLSL ES specifics.
GLSL ES adaptation:
- remove Vulkan specifics: EXT_samplerless_texture_functions usage,
specialization constants, push constant usage
- inline bitextract.h
- always DECODE_8BIT and hardcode error color (for now)
- port to GLSL ES, required some type changes, explicit type
conversions and setting up precisions for types
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19886>
Using 8191.875 seems to big for the hardware to correctly render wide
rectangular lines. This can also be reproduced with AMDVLK by forcing
rectangularLines = True, and fixed by reducing the maximum size as well.
Other drivers seem to expose that value.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21287>
In the upcoming header update, vk_layer.h starts including vulkan_core.h
instead of vulkan.h. This will break this layer as it needs a couple of
window-system extension #defines.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
VK_LAYER_EXPORT is going away in the next Vulkan header update. We
already have a PUBLIC macro in util/macros.h which does the same thing.
Unlike VK_LAYER_EXPORT, it should work in Windows too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
This switches us to using get_all_required() for figuring out which
enum types we care about and then carefully filtering every value as
needed. We also add a number field to Extension so we keep all the
extension XML parsing in one place.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
We now use get_all_required() to get all required commands and use that
to filter instead of doing it manually. Also, we can pull entrypoint
extension etc. information from the requirements struct. Finally, we
also have to filter the actual commands themselves as well as arguments
per-API because there may be multiple versions or variants depending on
the API being used.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
This adds an Extension.from_xml() helper for doing the parsing so we can
re-use it in other code. We also improve filtering of extensions. The
Vulkan XML schema is changing to make the supported attribute a comma-
separated list. This is to allow for vulkansc to also exist in the XML
schema.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
This generally gives us better results and doing it here in nir will
also allow us to remove more glsl optimisation calls that do a similiar
thing for us.
(Updated shader-db results by idr.)
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20246333 -> 20240715 (-0.03%)
instructions in affected programs: 235253 -> 229635 (-2.39%)
helped: 425 / HURT: 114
total cycles in shared programs: 891730115 -> 891631113 (-0.01%)
cycles in affected programs: 37347925 -> 37248923 (-0.27%)
helped: 952 / HURT: 692
total spills in shared programs: 7072 -> 6716 (-5.03%)
spills in affected programs: 505 -> 149 (-70.50%)
helped: 7 / HURT: 0
total fills in shared programs: 9897 -> 8511 (-14.00%)
fills in affected programs: 1674 -> 288 (-82.80%)
helped: 7 / HURT: 0
total sends in shared programs: 1053685 -> 1053411 (-0.03%)
sends in affected programs: 2821 -> 2547 (-9.71%)
helped: 30
HURT: 2
LOST: 13
GAINED: 13
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18149157 -> 18147271 (-0.01%)
instructions in affected programs: 204630 -> 202744 (-0.92%)
helped: 294 / HURT: 121
total cycles in shared programs: 939488196 -> 939508444 (<.01%)
cycles in affected programs: 36394777 -> 36415025 (0.06%)
helped: 718 / HURT: 620
total sends in shared programs: 1005426 -> 1005152 (-0.03%)
sends in affected programs: 2821 -> 2547 (-9.71%)
helped: 30 / HURT: 2
LOST: 2
GAINED: 2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19715>
This is based on brw_nir_lower_mem_access_bit_sizes() but ended up being
substantially different. While the core concepts are all the same, the
brw_* version made a lot of Intel-specific assumptions. The new version
takes a callback which takes a number of bytes of data and an alignment
pair and returns a bit size and number of components to load/store.
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21232>
This makes the pass significantly faster cutting execution time
by around 30% in the cts test
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
This 30% improvement is in addition to all the improvements from
the proceeding patches.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>
The fix in 947f7b452a introduced an extra loop over the copies
array to find the correct entry in the case it had been moved.
The problem is these loops can be iterated over millions of times
so lets simply update the entry pointer in the case we change its
location in the array.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>
We loop, effectively, over two stacks: ready and to_do and finish only
when both are empty. In the case where ready is empty, we pull one off
of to_do, add a copy to a temporary, and push it onto the ready stack.
Previously, we assumed that we would never get to the temporary copy
case if to_do has exactly one entry because that would imply that there
was only one copy left which means there can't possibly be a cycle to
break. This was true until c7fc44f9eb ("nir/from_ssa: Respect and
populate divergence information") which changed things such that
temporary copies sometimes get added in the case where a convergent
value is copied both to convergent and divergent destinations.
This patch adjusts our loop iteration to always attempt to clear the
ready stack before checking if there's anything left on the to_do stack.
I also added an assert to make the exit condition more clear.
Fixes: c7fc44f9eb ("nir/from_ssa: Respect and populate divergence information")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8037
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21315>
There is an optimization in the parallel copy algorithm where, after a
copy has been performed, we can treat the destination as the new source
for future copies of the same source. In particular, consider the
following parallel copy: A -> B, C -> A, A -> C. In this case, after we
have done the A -> B copy, we can make note that the value in A is now
in B and emit the sequence: A -> B, C -> A, B -> C. This allows us to
resolve the swap cycle between A anc C without allocating a temporary
register because we know B is also a copy of A.
When one of the registers involved is convergent and the other is
divergent, this optimization is problematic because, while convergent to
divergent copies are fine, we can't re-use the divergent copy in later
copies if any of those copies are to a convergent variable. We could,
but it would require a read_first_invocation which would get messy. In
In c7fc44f9eb ("nir/from_ssa: Respect and populate divergence
information"), we attempted to deal with this by limiting the rename
optimization to the case where the divergence matched.
The problem is that we did the re-name part whenever the divergence
matched but only marked it as ready if the thing being copied was a
destination. (We actually left two instances of loc[a] = b, one which
always happened and one which only happened if we also wanted to flag
the source as being ready to use as a destination.) While this
technically doesn't cause any problems, it may result in more inter-mov
dependencies which hurts instruction scheduling. For example, if we had
the parallel copy A -> B, A -> C, A -> D, we now end up emitting the
sequence A -> B, B -> C, C -> D which has many more data hazards between
instructions caused by the constant shuffling.
This commit restores the original logic in which we only perform the
rename optimization if the rename would free up a register we will later
use as a destination. This isn't entirely optimal as it still doesn't
prove that there is a cycle involved first, but it should lead to a
reduction in unnecessary dependencies.
No shader-db changes on SKL or DG2
Fixes: c7fc44f9eb ("nir/from_ssa: Respect and populate divergence information")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21315>
It is a pretty common pattern to allocate a non-zero sequence # for
lightweight checking if an object is the same, changed, for use in cache
keys, etc. (And also pretty common to forget to handle the rollover
zero case.) Add a helper for this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
PROG state mostly just disables various LRZ related flags, which can
be handled as a simple mask. The exception is ztest mode, which is
either overriden by PROG state, or we use the all 1's value (which
isn't valid from hw standpoint) to signal that it needs to be computed
at draw time, which fortunately fits in with the bitmask approach.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
This was not doing any actual resource tracking, just updating
gmem_reason. And furthermore, a6xx+ doesn't care about the bits
it was setting. So move this to per-gen backend for the gens that
need it, and avoid setting FD_DIRTY_RESOURCE when FD_DIRTY_BLEND
is set.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
If a resource invalidate is triggered by a different ctx (potentially on
a different thread) simply flag that the tex state needs invalidation,
but defer handling it to the ctx that owns the tex state.
This will let us remove atomic refcnt'ing on the tex state, and more
importantly atomic refcnt'ing on the fd_ringbuffer (as this was the one
special case where rb's could be accessed from multiple threads).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
MESA_VK_ABORT_ON_DEVICE_LOSS=1 \
TU_DEBUG_STALE_REGS_RANGE=0x00000c00,0x0000be01 \
TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass \
./app
To pinpoint the reg causing a failure reducing regs range could be
used for bisection. Some failures may be caused by multi-reg combination,
in such case set 'inverse' flag which would change the meaning of reg
range to "do not stomp these regs".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21226>
Since the Collabora LAVA update related to the downtime from
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21119, the
LAVA logs from Collabora continued to use the hack for older versions
which digested some control characters, such as carriage returns acting
as newlines, which made it necessary to recover from split lines to make
Gitlab sections work in job logs as expected.
Collabora's LAVA instance now gives a more raw log output. It is
necessary to pay attention to newlines at the end of each log message,
which may cause double newlines when printed with Python built-in
`print` function. I decided to remove the repeating `\n` from the
received log messages to make them transparent to LogFollower users.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8242
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21325>
LavaFarm is a class created to handle the different types of LAVA farms
and their tags in Mesa CI. Since specific jobs may require different
types of LAVA farms to run on, it is essential to determine which farm
the runner is running on to configure the job correctly.
LavaFarm provides an easy-to-use interface for checking the runner tag
and returning the corresponding LAVA farm, making it simple for Mesa CI
to configure jobs appropriately. By adding tests for LavaFarm, the team
can ensure that this class is functioning as expected, allowing for the
smooth execution of Mesa CI jobs on the correct LAVA farm.
The tests ensure that get_lava_farm returns the correct LavaFarm value
when given invalid or valid tags and that it returns LavaFarm.UNKNOWN
when no tag is provided. The tests use Hypothesis strategies to generate
various labels and farms for testing.
Example of use:
```
from lava.utils.lava_farm import LavaFarm, get_lava_farm
lava_farm = get_lava_farm()
if lava_farm == LavaFarm.DUMMY:
# Configure the job for the DUMMY farm
...
elif lava_farm == LavaFarm.COLLABORA:
# Configure the job for the COLLABORA farm
...
elif lava_farm == LavaFarm.KERNELCI:
# Configure the job for the KERNELCI farm
...
else:
# Handle the case where the LAVA farm is unknown
...
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21325>
Use requirements.txt and requirements-test.txt to organize better Python
dependencies related to LAVA.
Now LAVA tooling can use recent and fixed library versions.
And test-related libs will not trigger container rebuilding anymore.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21325>
base_mip_width is used in si_compute_copy_image when the
SI_IMAGE_ACCESS_BLOCK_FORMAT_AS_UINT flag is used.
width = tex->surface.u.gfx9.base_mip_width;
This will be incorrect if we don't adjust it. For instance,
with a 260x256 image, surf_pitch and base_mip_width are
320 before surf_pitch is updated to be 192.
Both need to match, or computing the width from base_mip_width
leads to incorrect result.
Cc: mesa-stable
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21253>
We don't need it for event blits. It also does not support fast clears
which makes it slower.
For event blits, blob has
VK_FORMAT_D16_UNORM -> FMT6_16_UNORM
VK_FORMAT_X8_D24_UNORM_PACK32 -> FMT6_Z24_UNORM_S8_UINT
VK_FORMAT_D32_SFLOAT -> FMT6_32_FLOAT
VK_FORMAT_S8_UINT -> FMT6_8_UINT
VK_FORMAT_D24_UNORM_S8_UINT -> FMT6_Z24_UNORM_S8_UINT
VK_FORMAT_D32_SFLOAT_S8_UINT -> FMT6_32_FLOAT + FMT6_8_UINT
and always sets RB_BLIT_INFO:DEPTH. It is unclear what
RB_BLIT_INFO:DEPTH is for but we set it anyway.
Improves "glmark2 -b refract" on angle by 15-20% on a618/a635.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8218
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21116>
ralloc is not thread-safe, so we can't use dev->memctx for allocating
context-specific things without locking. On top of that, we always
need to explicitly clean up pools anyway since we need to unref the BOs,
so there is no point to using a memctx.
And since pools need to be explicitly cleaned up, the meta cache code
needs explicit cleanup, so add that and drop memctx from there too.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21348>
VK_KHR_present_id and VK_KHR_present_wait depend on VK_KHR_swapchain
being present, which is not present at least on Android/KGSL.
Fixes:
src/vulkan/util/vk_extensions.h:450: void assert_device_extensions_requirements(
const struct vk_device_extension_table *, const struct vk_instance_extension_table *):
assertion "!device_ext->KHR_present_id || device_ext->KHR_swapchain" failed
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21345>
We can just add the flags to the kopper interface, since it's private to
Mesa. This gets us depth/stencil invalidation on swapbuffers, which is
critical for tiler performance.
glmark2-es2 -b texture (windowed) goes from 1650 to 1930 fps on
zink+turnip with ZINK_DEBUG=rp.
Part of #7321 (we're still a little behind freedreno's 2180 fps)
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21317>
I tried to drop the swapBuffers path, but it turns out it's being taken by
softpipe/llvmpipe, and the tests are passing. The piglit egl-copy-buffers
test even passes on zink, but you end up with a bad display because of an
un-preserved back buffer.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21317>
This shouldn't be doing a swapBuffers, that's not what this function is
supposed to do. But also, we shouldn't be doing this from zink, which the
swap was introduced for, because we don't implement the extension. Cleans
up some strangeness from 3c4be122cc ("egl: implement more hooks for
swrast")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21317>
If we include vulkan.h, we risk including the WSI bits as well, which we
don't need here. Only trouble can follow from including these where
they're not needed.
So let's include vulkan_core.h in these places instead.
Fixes: b39958a3a1 ("anv,nir: Move the ANV YCbCr lowering pass to common code")
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21185>
VAEncCodedBufferType is used for reading back encoded data.
Mapping it for read instead of write speeds up reading
the data on CPU.
On radeonsi this will result in VRAM copy to staging buffer
in cached GTT, making the CPU read much faster.
Signed-off-by: David Rosca <nowrep@gmail.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20376>
llvm::None was deprecated and builds started failing with
error: ‘None’ is not a member of ‘llvm’
Instead of using the temporarily available include in ADT which would
add a deprecation warning to the build, directly replace llvm::None with
the recommended std::nullopt
This change takes only effect with LLVM 17 or newer.
Reference: d4f38ef288/llvm/include/llvm/ADT/None.h
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21125>
The dst type could be either 16b or 32b.
Fixes validation failure in dEQP-VK.subgroups.* tests which deal with
16b types.
validation fail: (type_size(instr->cat6.type) <= 16) == !!((instr->dsts[0])->flags & IR3_REG_HALF)
-> for instruction: MESA: info: 0023:0000:000: ldc.offset0.base0 hssa_23 (wrmask=0x3), ssa_1, ssa_22
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21316>
Before dropping this function, handle the two callers of this function:
* The call in iris_blorp.c is redundant. The required cache flushes are
already handled by the callers of blorp functions. Delete this.
* The call in iris_resolve.c is still providing a benefit because it
calls iris_emit_buffer_barrier_for internally. Inline the needed
barrier.
Cc: 23.0 <mesa-stable>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21303>
Memory accesses can get corrupted when there's a disagreement between:
* the aux-mode of existing cache lines for a surface and
* the aux-usage in that surface's RENDER_SURFACE_STATE object
We have already prevented hardware from seeing this conflict for
rendering operations, but due to how the L3 is shared among multiple
clients in gfx12 (e.g., sampler engine, render engine, etc.), we need to
expand the scope of the existing solution. Now, before any access of a
compressible resource, we make sure to flush the prior aux-mode from the
caches.
The majority of changes here refactor things for use in a new function,
flush_previous_aux_mode. The remaining change calls that function from
within iris_resource_prepare_access.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6558
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7625
Cc: 23.0 <mesa-stable>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21303>
This patch causes us to print "con" and "div" for registers as well as
SSA defs. We print it on both register declarations, and destinations.
The latter isn't strictly necessary, but it is handy to be able to see
e.g. a convergent value being assigned to a divergent register without
having to constantly refer back to definitions that might be much
earlier in the program. I originally printed it for sources as well,
but that got to be a bit wordy, so I dropped that.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21299>
When mesh shader is spawned on a different slice than the originating
task shader, then input task urb handle can come from a different
slice, so masking this information off will load data from the current
slice, instead of the one where real data are.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21007>
This reverts commit 2dfebf3487.
It causes GPU hangs in piglit tests like
spec@glsl-1.20@execution@clipping@vs-clip-vertex-enables, for reasons I'm
totally unclear on. The commit was not necessary, because the frontend
lowering already handles disabled clip planes by storing 0.0 to the
corresponding clipdist array element in that shader variant. Add a note
to that effect.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21298>
While I'm having a hard time stabilizing most of the test list on this HW
due to the clip-enable GPU hangs leaking into random other tests, these
have been consistent in the last 4 runs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21298>
The shader key structure is quite large and memsetting it to zero to be
able to create or often simply find an existing shader is responsible
for a large portion of CPU usage during benchmarks.
This change is more surgical about what, when, and how things get
cleared.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21247>
Keeping around copies of the BVHs in CPU memory can cause issues with
Applications creating a large amount of acceleration structures (Control).
This commit adds back the old path of copying acceleration structures
while still keeping the deferred, possibly more accurate path around.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20700>
Add support to pandecode for Mali architecture v10, featuring the new command
stream frontend (CSF). This replaces the "job chain" with a new Command
Execution Unit (CEU) that runs a domain-specific assembly language. That
requires us to refactor pandecode substantially, splitting out JM-only code from
shared JM/CSF common code, and adding new CSF-only decode routines to
disassemble and interpret CSF command streams and pretty-printing the
data structures hit.
This is of course impossible to do properly, since the CEU is pretty easily
Turing-complete and hence subject to the halting problem. But we implement some
simple heuristics to follow jumps that are just good enough for the simple
command streams emitting by both the DDK and Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20837>
We're not supposed to call the GENX(pandecode_jc) routines (e.g.
pandecode_jc_v7), since it's an internal interface that expects the caller to
take a lock first. Instead we're supposed to call the non-GenXML pandecode_jc
entrypoint which does the locking properly. Fixes assertion failures when
tracing with recent pandecode:
deqp-vk: ../src/util/simple_mtx.h:142: simple_mtx_assert_locked: Assertion `mtx->val' failed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21257>
Passes the ext_external_objects-memory-object-api-errors piglit:
./bin/ext_external_objects-memory-object-api-errors
Mesa: User error: GL_INVALID_VALUE in glTexStorageMem1DEXT(memory=0)
PIGLIT: {"subtest": {"1D texture" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTexStorageMem2DEXT(memory=0)
PIGLIT: {"subtest": {"2D texture" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTexStorageMem3DEXT(memory=0)
PIGLIT: {"subtest": {"3D texture" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTextureStorageMem1DEXT(memory=0)
PIGLIT: {"subtest": {"1D texture direct state access" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTexureStorageMem2DEXT(memory=0)
PIGLIT: {"subtest": {"2D texture direct state access" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTextureStorageMem3DEXT(memory=0)
PIGLIT: {"subtest": {"3D texture direct state access" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTexStorageMem2DMultisampleEXT(memory=0)
PIGLIT: {"subtest": {"2D texture ms" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTexStorageMem3DMultisampleEXT(memory=0)
PIGLIT: {"subtest": {"3D texture ms" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTextureStorageMem2DMultisampleEXT(memory=0)
PIGLIT: {"subtest": {"2D texture ms direct state access" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glTextureStorageMem3DMultisampleEXT(memory=0)
PIGLIT: {"subtest": {"3D texture ms direct state access" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glBufferStorageMemEXT(memory == 0)
PIGLIT: {"subtest": {"buffer storage" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glNamedBufferStorageMemEXT(memory == 0)
PIGLIT: {"subtest": {"buffer storage direct state access" : "pass"}}
Mesa: User error: GL_INVALID_ENUM in glGetUnsignedBytevEXT(pname=0xffffffff)
PIGLIT: {"subtest": {"unsigned-byte-v-bad-enum" : "pass"}}
Mesa: User error: GL_INVALID_ENUM in glGetUnsignedBytei_vEXT(pname=0xffffffff)
PIGLIT: {"subtest": {"unsigned-byte-i-v-bad-enum" : "pass"}}
Mesa: User error: GL_INVALID_VALUE in glGetUnsignedBytei_vEXT(pname=GL_DEVICE_UUID_EXT)
PIGLIT: {"subtest": {"unsigned-byte-i-v-bad-value" : "pass"}}
Signed-off-by: Yusuf Khan <yusisamerican@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19405>
previously descriptor buffers were sized to allow for 25,000 descriptors
this is a great number.
but in some scenarios it's overkill, and it's theoretically possible that
it might be underkill in others (citation needed), so add some handling
for both cases to save small amounts of vram on average and not crash
in the distant future when hypercomputers try running drawoverhead
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21246>
usually this is handled by zink_compiler_assign_io() for full pipelines,
where locations are compacted and variables are eliminated, but separate
shaders still need to have "correct" locations set, which can be achieved
by relying on 'location' instead of the (failed) attempt by the frontend
to set 'driver_location' with nir_assign_io_var_locations()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21246>
the descriptor count (buffer size) calculated for buffers was based
on drawoverhead throughput, which is the fastest descriptors can be changed
at the cpu level. these cases demonstrate the maximum speed that ANY
descriptor can be changed, which means that changing multiple types in
a given cmdbuf will, at best, be the same throughput
thus, instead of allocating a separate buffer for each type, only a single
buffer needs to be allocated, and all descriptors can be bound to this buffer
this should reduce descriptor vram usage by ~80%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21246>
this makes the code more methodical, readable, and correct, fixing a
number of issues along the way:
* inaccurate write_bind_count incrementing
* inaccurate barrier_access write unsetting
* inefficient partial rebinds
* leaking texel buffers
also add some comments to make this clearer
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21229>
In every current driver, the fd we get back from the screen is the fd we
gave to screen_create() three lines above (or a dup() thereof, which we
consider to be the same since we look inside it for the file description
instead).
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20180>
Pointed out by GCC:
In function ‘load_text_file’,
inlined from ‘standalone_compile_shader’ at ../src/compiler/glsl/standalone.cpp:491:38,
inlined from ‘main’ at ../src/compiler/glsl/main.cpp:98:45:
../src/compiler/glsl/standalone.cpp:358:17: error: ‘free’ called on pointer ‘block_195’ with nonzero offset 48 [-Werror=free-nonheap-object]
358 | free(text);
| ^
In function ‘ralloc_size’,
inlined from ‘load_text_file’ at ../src/compiler/glsl/standalone.cpp:352:31,
inlined from ‘standalone_compile_shader’ at ../src/compiler/glsl/standalone.cpp:491:38,
inlined from ‘main’ at ../src/compiler/glsl/main.cpp:98:45:
../src/util/ralloc.c:117:18: note: returned from ‘malloc’
117 | void *block = malloc(align64(size + sizeof(ralloc_header),
| ^
Fixes: a9696e79fb ("main: Close memory leak of shader string from load_text_file.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21215>
Drop the unused ctx parameter, to match the main Mesa code.
Fixes ODR violation flagged by -Wodr with LTO enabled:
../src/mesa/main/shaderobj.h:74:1: error: ‘_mesa_reference_shader_program_data’ violates the C++ One Definition Rule [-Werror=odr]
74 | _mesa_reference_shader_program_data(struct gl_shader_program_data **ptr,
| ^
../src/compiler/glsl/standalone_scaffolding.cpp:76:1: note: type mismatch in parameter 1
76 | _mesa_reference_shader_program_data(struct gl_context *ctx,
| ^
../src/compiler/glsl/standalone_scaffolding.cpp:76:1: note: ‘_mesa_reference_shader_program_data’ was previously declared here
../src/compiler/glsl/standalone_scaffolding.cpp:76:1: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
Fixes: 717a720e9c ("mesa: drop unused context parameter to shader program data reference.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21215>
In case there is no dirty state that requires resource tracking we
can skip taking the screen lock. Indirect draw and index buffer are
a special case, but we can inexpensively check if they are already
referenced by the batch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21202>
A stackless (or at least using allocated memory for stack) version
might be nice but for now this works around some games compiling
large shaders and hitting stack overflows.
CC: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21231>
This also removes the loop so opt_remove_cast_cast() will only optimize
cast(cast(x)) and not cast(cast(cast(x))). However, since nir_opt_deref
walks instructions top-down, there will almost never be a tripple cast
because the parent cast will have opt_remove_cast_cast() run on it.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21252>
Draw states were not disabled after a dynamic renderpass which
spans several command buffers, the next renderpass if started in
the same command buffer wouldn't emit the full draw state,
since TU_CMD_DIRTY_DRAW_STATE was not set by previous renderpass.
The issue could be observed when corrupting all regs at cmdbuf start in:
dEQP-VK.dynamic_rendering.primary_cmd_buff.random.seed7_geometry
Fixes: cb0f414b2a
("tu: Add support for suspending and resuming renderpasses")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21148>
From the Haswell PRM Vol. 7, "IEEE Floating Point Mode":
"Single precision (F, Float) denorms are flushed to sign-preserved
zero on input and output of any floating-point mathematical
operation."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20232>
RGP requires shaders to be uploaded consecutively inside the same
buffer object. Otherwise, either it makes the driver generating
huge traces (ie. in GiB) or it fails to load traces at all. Hopefully,
this will be improved soon when AMDGPU drivers will have GPL support.
To workaround this, the driver relocates graphics shaders in the same
buffer object when a pipeline is created. Then at draw time, it
overwrites SPI_SHADER_PGM_xxx registers to make sure SQTT can match
between emitted and exported shaders. It's a bit suboptimal because
graphics shaders are uploaded twice but it's the best solution I found.
This will allow to implement GPL caching without breaking capturing
shaders with RGP.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21078>
The shaders were uploaded consecutively to fit a RGP constraint but
this was more like a workaround. This upload path doesn't work well for
graphics pipeline library and it was the main blocker for GPL caching.
This commit breaks capturing shaders with RGP if the offset between
shaders is too big. Next commit should fix it by using shaders reloc.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21078>
The Vulkan Specification states about possible return values from
vkAcquireNextImageKHR:
* VK_NOT_READY is returned if timeout is zero and no image was
available.
* VK_TIMEOUT is returned if timeout is greater than zero and less than
UINT64_MAX, and no image beae available within the time allowed.
That is, if info->timeout is larger than zero, the function must return
VK_TIMEOUT instead of VK_NOT_READY if no image became available before
the timeout elapsed.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21190>
The Vulkan CTS version in Mesa CI is so old that a bunch of tests
are broken, but it's expected.
This runs +283939 tests and the overall VKCTS execution time increased
from ~23 minutes to ~26 minutes (+~13%) on my Threadripper 1950X.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21214>
While this change helps with few shaders, the main benefit is
that it allows to unroll loops comming from nine+ttn on vec4
backends. D3D9 REP ... ENDREP type loops are unrolled now already,
LOOP ... ENDLOOP need some nine changes that will come later.
r300 RV530 shader-db:
total instructions in shared programs: 132481 -> 132344 (-0.10%)
instructions in affected programs: 3532 -> 3395 (-3.88%)
helped: 13
HURT: 0
total temps in shared programs: 16961 -> 16957 (-0.02%)
temps in affected programs: 88 -> 84 (-4.55%)
helped: 4
HURT: 0
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Partial fix for: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8102
Partial fix for: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7222
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21038>
The graphics pipeline library implementation in RADV has been
improved considerably lately.
There is still a bit of work for caching individual libraries
and optimized (LTO) pipelines but I think overall it seems good
enough to stop reporting it as experimental and suboptimal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21213>
CP_REG_TEST (or any command that reads registers) is slow on a618
(gen1). Since SQE can early return, we don't necessarily need
emit_conditional_ib in fd6_emit_tile.
We still CP_REG_TEST twice for load and store when there is no clear.
Not sure if we can simply drop emit_conditional_ib instead?
glmark2 score goes from 943 to 1067.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21208>
This invalid memory access is a consequence of wrong assumptions,
for instance:
"prog->sh.data is NULL if it's ARB_fragment_program"
This issue is triggered with piglit/fp-formats -auto -fbo:
==9747==ERROR: AddressSanitizer: heap-use-after-free on address 0x007f7c812d90 at pc 0x007f833c09f8 bp 0x007fd7eca750 sp 0x007fd7eca768
READ of size 4 at 0x007f7c812d90 thread T0
#0 0x7f833c09f4 in st_get_sampler_views ../src/mesa/state_tracker/st_atom_texture.c:109
#1 0x7f833c0b48 in update_textures ../src/mesa/state_tracker/st_atom_texture.c:266
#2 0x7f82b2d120 in st_validate_state ../src/mesa/state_tracker/st_util.h:128
#3 0x7f82b2d120 in prepare_draw ../src/mesa/state_tracker/st_draw.c:88
#4 0x7f82b2de64 in st_draw_gallium ../src/mesa/state_tracker/st_draw.c:141
#5 0x7f83105940 in _mesa_draw_arrays ../src/mesa/main/draw.c:1202
#6 0x7f8d5fa5cc in piglit_draw_rect_from_arrays piglit/tests/util/piglit-util-gl.c:711
#7 0x7f8d5fac34 in piglit_draw_rect_custom piglit/tests/util/piglit-util-gl.c:833
#8 0x4019e0 in piglit_display piglit/tests/shaders/fp-formats.c:67
#9 0x7f8d643fc4 in run_test piglit/tests/util/piglit-framework-gl/piglit_fbo_framework.c:52
#10 0x401624 in main piglit/tests/shaders/fp-formats.c:39
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21175>
On gen3+, there are 32 predicate bits instead of 1.
I set out to see why CP_REG_TEST (and others commands that read
registers) is slower on gen1 but could not find anything. Since the
blob seems to use multiple predicate bits, let's keep them documented.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21206>
some games/apps (e.g., DOOM2016) compile+link shaders in one context
and then use them in another, expecting that the compiled shaders
will be reused. vulkan has pipeline (library) objects, which are not
specific to shaders but are in theory representing the shaders being used
thus, pipeline (library) objects need to be reusable for any case where
a shader can be reused
to handle this:
* extract pipeline library cache to a refcounted object
* store these objects on the screen
* make them owned by shaders
separable programs are slightly different since they'll use their own
fastpath, thus making their library caches owned by the programs to avoid
polluting the optimized caches
fixes#8264
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21223>
While making the function public, rename it to
nir_collect_src_uniforms. The old name makes it sound like it's just a
query that doesn't have side effects. That is, however, not the case.
This is step 4 in an attempt to unify a bunch of nir_inline_uniforms.c
and lvp_inline_uniforms.c code.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21179>
We lost power in a storm, and these ones didn't come back afterwards. I
suspect I need a new PSU. And maybe some surge protection for the future.
:(
I've left the CI code in place for some day when I hopefully swap out the
power supplies.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21205>
Add a function to create and cache the compute programs that will be
used to transcode ASTC to DXT5.
Note that the error paths in st_create_context_priv may actually lead to
segfaults if hit. I've been able to work around them by 1) moving them
further down and 2) returning early from st_glFlush if st->pipe is NULL.
I don't know if that's the right solution however.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19827>
1. Drop the commented out includes. Shader caching is disabled if those
are found.
2. Replace the active includes with "%s". Later on, we'll construct the
final strings with vasprintf. One downside to doing this is that the
glsl file extensions are no longer true. These files are now
templates.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19827>
These compute shaders are from the MIT-licensed GPU compressor, Betsy.
I have included copyright headers, inlined the __sharedOnlyBarrier macro
definition from the "UavCrossPlatform_piece_all.glsl" header when
applicable, and made the following changes to support GLES:
* Conditionally disable the const keyword in the BC3 shaders
* Make the params uniform in the BC4 shader uint2
* Avoid implicit data type conversions in the BC3 shaders
* Use constructors for array initialization in the BC1 shader
* Add precision qualifiers to the BC3 shaders
* Output to an rgba16ui image for the BC1 and BC4 shaders
* Set the version of the BC3 shaders to 310 es
Ref: https://github.com/darksylinc/betsy/tree/cc723dcae9
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19827>
apps/games using separate shader objects end up passing the separable
shaders to the link_shader hook individually, which is still not ideal for
zink's usage since the more optimal path is to have all the shaders and create
a RAST+FS GPL stage that can run all the inter-stage io handlers
it IS technically possible to handle this for simple VS+FS pipelines using
GPL, however, but it's kinda gross. such shaders now use descriptor buffer
to create their own pipelines/layouts/descriptors async, and then a "separable"
variant of the gfx program can be created by fast-linking these together
the "separable" gfx program can't handle shader variants, but it can do basic
pipeline caching for PSO state changes, which makes it flexible enough to sorta
kinda maybe handle the most basic cases of separate shader objects
descriptor buffer is used because having to create and manage a separate architecture
for sets/pools/templates is too nightmarish even for me
this is, at best, a partial solution, but it's the best the vulkan api can
currently do
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21197>
I found this useful in lining up some perfetto traces between zink+anv and
iris, and understanding what was going on in them. Also it's a demo of
being able to insert annotations for work in the command stream, which I
suspect we'll want more of.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20657>
For zink, we want to know if we should pass command stream markers down to
the underlying driver, but we don't have our own trace context we're
recording trace events with. We definitely want those markers if the
underlying driver is going to be doing perfetto tracing, or is requesting
marker tracing. So, create an interface for querying those flags before
they get copied down to an actual u_trace_context.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20657>
Functions that are in hot paths will have a different treatment to
support i915 and Xe KMD.
Each KMD will have an anv_kmd_backend that will have the hot path
functions set, this way we can avoid branch prediction misses.
Other functions will gradually be moved to anv_kmd_backend.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20948>
As we continue to refactor the code base to support Xe KMD here I'm
dropping anv_gem_create() and unifying all graphics memory allocation
calls to anv_gem_create_regions().
anv_gem_create_regions() will call DRM_IOCTL_I915_GEM_CREATE_EXT
for integrated platforms too only leaving DRM_IOCTL_I915_GEM_CREATE
calls to kernel versions that do not support
DRM_IOCTL_I915_GEM_CREATE_EXT.
This can be detected by devinfo->mem.use_class_instance as
DRM_I915_QUERY_MEMORY_REGIONS uAPI landed in the same kernel version
as DRM_IOCTL_I915_GEM_CREATE_EXT.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20948>
Small depth/stencil textures were using linear tiling, but depth/stencil
attachments cannot use linear tiling for sysmem rendering.
Fixes:
KHR-GL45.geometry_shader.layered_framebuffer.stencil_support
KHR-GL45.geometry_shader.layered_framebuffer.depth_support
Signed-off-by: Amber Amber <amber@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21046>
non-strictLine Vulkan drivers use either parallelogram or bresenham
rasterization for default line modes.
This method of rasterisation produces close enough results that it
in practice is GL/GLES spec compliant (at least cts wise).
Don't emit a feature missing warning for this case.
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20985>
Now that we can do a blocking wait on an fd_fence (which the suballoc
heap already depended on) we can just move the fence wait into core
leaving the backend cpu_prep() implementation only needing to care
about implicit sync on shared buffers.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21143>
It is pretty easy to just cache the stateobj with the hwcso (since
unlike 3d, there is only a single shader state) and re-emit it by
pointer when it changes, now that the CS state doesn't depend on the
grid info.
This also moves immed consts into the PROG state, so they are only
updated when the PROG state is dirty. And splits user consts and
driver param consts, so they are only re-emit when needed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21127>
Only GL_UNSIGNED_INT_2_10_10_10_REV was handled, add
GL_UNSIGNED_INT_10_10_10_2 & GL_UNSIGNED_INT_10_10_10_2_OES.
This makes sure that if the Gallium driver doesn't support the exact
corresponding format, another 10 bpc format is tried before an 8 bpc one
as a fallback.
Fixes the mutter test cogl-test-offscreen-texture-formats with iris.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21021>
The condition was inverted, causing compilation to be actually skipped when
a noop FS is used and straight emitting the pipeline from the default
initialized struct.
Fixes: 3eb97b9d33 ("radv: skip compilation when possible with GPL fast-linking")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21166>
Extend RADV lowering of the load_cull_any_enabled_amd intrinsic to
take into account the number of primitives in the current workgroup.
Workgroups that have less than 16 triangles are considered "small"
and will disable shader culling. Note that LLPC does the same,
but it checks the number of vertices not primitives.
The primary intention of this change is to eliminate the need to
check the draw size in radv_cmd_buffer, but this is actually
beneficial to larger draw calls too, specifically this may improve
the performance of the last workgroup of larger draws too.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20980>
Since largeRing has been enforced, there's no need to do renderer
submission to fill the exp features. So we move it back after ring has
been initialized. Meanwhile, vn_renderer_submit_simple_sync is
intentionally left there to be re-used soon for server ping purpose.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21153>
This change requires the %destructor directive which is supported
by bison or yacc, yacc needs to be compiled with the back-tracking
functionality.
This issue could be checked with the following piglit programs:
glsl-invalid-asm-01, glsl-invalid-asm-02 or vp-bad-program
Direct leak of 5 byte(s) in 1 object(s) allocated from:
#0 0x7f8dc89050 in __interceptor_strdup (/usr/lib64/libasan.so.6+0x59050)
#1 0x7f83791cbc in handle_ident ../src/mesa/program/program_lexer.l:129
#2 0x7f83791cbc in _mesa_program_lexer_lex ../src/mesa/program/program_lexer.l:312
#3 0x7f8377e8d8 in yylex ../src/mesa/program/program_parse.y:289
#4 0x7f8377e8d8 in yyparse src/mesa/program/program_parse.tab.c:2124
#5 0x7f83788c14 in _mesa_parse_arb_program ../src/mesa/program/program_parse.y:2584
#6 0x7f8377371c in _mesa_parse_arb_fragment_program ../src/mesa/program/arbprogparse.c:82
#7 0x7f8372d42c in set_program_string ../src/mesa/main/arbprogram.c:402
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21150>
The use of FMT6_8_8_8_8_UNORM for z24s8/z24x8 is for blit src. Make
that clear by moving the logic from fd6_texture_format to the newly
added blit_format_texture. Add a comment on why this is simpler than in
fdl6_view_init.
This should have no functional change in practice.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21117>
this was including the generated tcs bits, which was likely to be wrong
and thus break optimal key hashing, requiring more pipelines
it also wasn't setting the optimal key value correctly during precompile,
which meant the wrong hash value was used and the precompiled libs were never
actually accessible
cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21169>
failing to set this yields patterns like
* bind fs
* bind samplerviews
* draw
* bind fs2
* ~~unbind samplerviews~~ (eliminated)
* draw
the eliminated unbinding of samplerviews between draws also eliminates a descriptor update,
triggering various artifacts in certain corner cases (like DOOM2016 shadows)
it's possible to manage the updating during shader binding, but the detection is a bit more
complex, and the cpu overhead from maintaining the current codepath with an
extra pipe_context::set_sampler_views (et al) isn't high enough to warrant further investigation
at this time
fixes#8252
Fixes: 153af03b94 ("gallium: Add cap to request state validation for all dirty state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21176>
At vkQueueSubmit time, for each batch with timeline semaphores to
signal, append cmd_buffers with feedback cmds to update the counter
value in its respective feedback slot.
Since multiple signals on the same semaphore could be pending at the
same time across batches/vkQueueSubmits, src slots and commands are
allocated on demand. These src slots can be reused after they've been
signaled (if the current semaphore counter is greater/equal than the
src value) and are cleaned up on vkDestroySemaphore.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20500>
Unlike fence feedback, commands to update timeline semaphore feedback
slots can't be fully pre-recorded because of the counter value input
for signaling timeline semaphores. To avoid fully recording commands
during vkQueueSubmit, pre-record commands that write a counter value
from a feedback "src" slot to the feedback "dst" slot. Then at
vkQueueSubmit, parse the signal semaphores and write the signal counter
value in the feedback src slot and append the command that writes from
that feedback src slot offset to the command buffer associated with the
signal semaphore.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20500>
Refactor into the following stages:
- prepare: Does an initial pass setting vn_queue_submission fields
and fixing up semaphores.
- alloc_storage: based on fields (including counts) from prepare,
calculate and allocate the amount of temporary storage needed.
- setup_batches: perform any modifications on the submission
batches using the allocated temporary storage.
- cleanup: free any temporary storage used.
Currently, only fence feedback needs alloc_storage and setup_batches
to append fence feedback to the submission but this slow will also
be utilized by upcoming timeline semaphore feedback.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20500>
This hardware bug is the result of a control flow optimization present
in Gfx8-9 meant to prevent the ELSE instruction from disabling all
channels and update the control flow stack only to have them
re-enabled at the ENDIF instruction executed immediately after it.
Instead, on Gfx8-9 an ELSE instruction that would normally have ended
up with all channels disabled would pop off the last element of the
stack and jump directly to JIP+1 instead of to the ENDIF at JIP,
skipping over the ENDIF instruction. In simple cases this would work
okay (though it's actual performance benefit is questionable), but in
cases where a branch instruction within the IF block (e.g. BREAK or
CONTINUE) caused all active channels to jump outside the IF
conditional, the optimization would break the JIP chain of "join"
instructions by skipping the ENDIF, causing the block of instructions
immediately after the ENDIF to execute with all channels disabled
until execution reaches the reconvergence point.
This issue was observed on SKL in the
dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.0.38
test in combination with some Vulkan binding model changes Lionel is
working on. In such cases the execution with all channels disabled
was leading to corruption of an indirect message descriptor, causing a
hang.
Unfortunately the hardware bug doesn't provide a recommended
workaround. In order to fix the problem we point the JIP of an ELSE
instruction to the instruction immediately before the ENDIF -- However
that's not expected to work due to the restriction that JIP and UIP
must be equal if and only if BranchCtrl is disabled -- So this patch
also enables BranchCtrl, which is intended to support join
instructions within the "ELSE" block, which in turn disables the
optimization described above, which in turn causes us to execute the
instruction immediately *before* the ENDIF with all channels disabled
-- So in order to avoid further fallout from executing code with all
channels disabled we need to insert a NOP before ENDIF instructions
that have a matching ELSE instruction.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20921>
A few lines earlier uni_offsets is accessed with ubo scaled by
PIPE_MAX_CONSTANT_BUFFERS:
if (uni_offsets[ubo * PIPE_MAX_CONSTANT_BUFFERS + i] == offset)
Found by inspection.
Looking at the before and after NIR code for
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays,
using the correct indexing appears to enable the pass to inline an
additional uniform. My guess is that when a uniform is used more than
once, the first loop wouldn't find the offset recored in the table
because it was recorded at the wrong location.
Fixes: d23a9380dd ("lavapipe: implement extreme uniform inlining")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21144>
Point sprite coordinates in general need to be inverted,
not just the texcoords converted to point sprite.
Move point coord y inversion out to its own pass.
Fixes GTF-GL46.gtf21.GL2FixedTests.point_sprites.point_sprites
with FBO dEQP surface.
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21050>
Currently, we postpone binning syncs until we record draw calls
and can validate if any of them require accessing protected
resources in the binning stage, however, if the draw calls are
recorded in a secondary command buffer and the barriers have
been recorded in the primary command buffer, we won't apply the
binning sync in the secondary when we record the draw calls
and so we must apply it when we execute the secondary in the
primary.
Fixes flakyness in:
dEQP-VK.api.command_buffers.record_many_draws_secondary_2
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21162>
The CLE parser in the sim will read this many bytes for each instruction
in a CL, so we should ensure we have at least that many bytes available
in the BO when reading the last instruction, otherwise we can trigger
a GMP violation. It is not clear whether this behavior applies to real
hardware too.
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21162>
Fixes: 46b099e3
("meson: Ignore unused variables in release builds")
46b099e3 has some issues:
- it doesn't enable unused variables warning on release builds
with assertions enabled;
- it doesn't disable unused variables warning on debug builds
with assertions disabled;
- it doesn't disable unused variables warning when building
with MSVC and assertions are disabled regardless of buildtype,
see #8147. 3/4 regressions reported there have this limitation
alone as root cause.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21154>
This adds support for H264 decode on VCN hardware.
It uses the full DPB method, and relies on the application
to allocate an arrayed texture for the DPB to be stored into.
RADV_PERFTEST=video_decode is required to enable this.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20388>
The decoder context needs to know what engine it's associated with.
Nowadays, we have render, compute, blitter, even video engines being
used from the same driver. Rather than trying to have a single decoder
and thwacking the engine field back and forth between calls, we make
one per queue family, and stash a pointer in anv_queue for easy access.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21149>
In general we should only call it once, and then we should avoid to
call any lowering that introduce back copies. So far we were tracking
that manually out of the nir shader on several places.
Ideally we would like to add a nir_validate rule, but right now there
are some exceptions to this rule. For example right now the Intel
compiler calls nir_lower_io_to_temporaries as part of linking
tess_ctrl/mesh/task sahders.
One option would be to allow drivers to reset the value, but for now
let's not add that validation rule.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19338>
Some jobs failed during the re-enablement of Collabora's LAVA farm.
The trace job radeonsi-stoney-traces:amd64 produced some traces with
almost unnoticeable lighting spread difference, so I updated all the
traces.
Now the test spec@ext_texture_lod_bias@lodbias is failing after running
a couple of times.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20903>
according to spec, the maximum number of acquired images can be calculated with
swapchain_size - VkSurfaceCapabilitiesKHR::minImageCount + 1
the previous calculation was both wrong and occurring in the wrong place,
so this corrects both issues
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21095>
GL Spec says that imageLoad from incomplete images must return 0.
This is not really spec compliant as for proper behavior nullDescriptor
and robustImageAccess2 is needed.
A workaround for lack of either of these requires a shader variant.
Clearing the null surface and hoping the app doesn't write to the image
is closer to spec, while avoiding a shader recompile.
KHR-GL46.shader_image_load_store.incomplete_textures tests this.
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21135>
simple way to reproduce this is to run these 4 together:
KHR-GL46.gpu_shader5.images_array_indexing
KHR-GL46.shader_image_load_store.advanced-allMips
KHR-GL46.shader_image_load_store.advanced-sso-simple
KHR-GL46.shader_image_load_store.incomplete_textures
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21134>
We need to pad VkBuffers to ensure we don't read beyond a page boundary.
An alternative to this approach would be to allocate an additional virtual
page when binding memory to the buffer, and to map this to the first
physical address, so both the first and last virtual page point to the same
physical location. This would be less expensive in terms of memory usage,
but more complex and invasive, hence the simpler approach has been taken
for now.
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21101>
This allows us to communicate to the back-end that we don't actually
know if the framebuffer is multisampled or not. No drivers set anything
but ALWAYS/NEVER and we still have a few ALWAYS/NEVER assumptions but
those should be asserted.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
This allows for the possibility that we may not know at compile time if
sample shading is enabled through the API. While we're here, also
document exactly what this bit means so we don't confuse ourselves.
v2: Fixup coarse pixel values (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
Whenever one of them is BRW_SOMETIMES, we depend on dynamic flag pushed
in as a push constant. In this case, we have to often have to do the
calculation both ways and SEL the result. It's a bit more code but
decouples MSAA from the shader key.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
If we have multiple clip planes, rather than emit multiple discards we can just
OR together the discard criteria. Then a nir_opt_algebraic rule kicks in to
optimize out the flt/.../flt/ior/.../ior into fmin/.../fmin/flt, generating
much less code at the end.
Written while debugging an unrelated issue with the clip lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21103>
Scalarizing preambles in NIR isn't really necessary, we can do it more
efficiently in the backend. This makes the final NIR a lot less annoying to
read; the backend IR was already nice to read thanks to all the scalarized moves
being copypropped. Plus, this is a lot simpler.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21122>
Move the decision of "can I copyprop this uniform?" from copyprop to a
standalone lowering pass. This is more straightforward and will enable the next
patch. This has the side effect of sinking load_preamble instructions, for a
nice reduction in register pressure. Instruction count increase is from
rematerializing some moves, which should be more than balanced out by the
reduced register pressure.
total instructions in shared programs: 1523285 -> 1523317 (<.01%)
instructions in affected programs: 1148 -> 1180 (2.79%)
helped: 0
HURT: 13
HURT stats (abs) min: 1.0 max: 4.0 x̄: 2.46 x̃: 2
HURT stats (rel) min: 0.69% max: 7.69% x̄: 3.65% x̃: 2.61%
95% mean confidence interval for instructions value: 1.78 3.14
95% mean confidence interval for instructions %-change: 2.16% 5.15%
Instructions are HURT.
total bytes in shared programs: 10444532 -> 10444724 (<.01%)
bytes in affected programs: 7386 -> 7578 (2.60%)
helped: 0
HURT: 13
HURT stats (abs) min: 6.0 max: 24.0 x̄: 14.77 x̃: 12
HURT stats (rel) min: 0.63% max: 7.14% x̄: 3.40% x̃: 2.48%
95% mean confidence interval for bytes value: 10.68 18.85
95% mean confidence interval for bytes %-change: 2.02% 4.78%
Bytes are HURT.
total halfregs in shared programs: 419444 -> 416434 (-0.72%)
halfregs in affected programs: 27080 -> 24070 (-11.12%)
helped: 634
HURT: 0
helped stats (abs) min: 1.0 max: 30.0 x̄: 4.75 x̃: 2
helped stats (rel) min: 2.90% max: 54.55% x̄: 13.13% x̃: 8.51%
95% mean confidence interval for halfregs value: -5.08 -4.41
95% mean confidence interval for halfregs %-change: -14.03% -12.23%
Halfregs are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21122>
Instead of printing the raw location number, which is pretty hard to interpret,
let's print the name of the location. Example output:
vec4 16 ssa_2 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=0,
component=0, dest_type=float16 /*144*/, io location=VARYING_SLOT_VAR0 slots=1
mediump /*8388768*/)
One of the "regressions" from moving to purely lowered I/O with all variables
removed is a lack of debuggability, since otherwise these location strings don't
show up anywhere in the printed shader! By contrast this should make the lowered
I/O nice to read like the early I/O.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21091>
create_shader_state passes ownership of the NIR to the driver, so we need to
free it when we destroy the shader CSO later. Use ralloc to manage this in a
uniform way between graphics and compute. Strategy from Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21062>
This adds the basic scaffolding for compute kernels. There's a bit of churn to
make sure we don't need to hang onto the kernel NIR, since it's never used for
anything else except looking up the shader stage.
The compute kernels aren't actually wired up here, but they do get compiled.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21062>
Implement custom border colours, as required by OpenGL's CLAMP_TO_BORDER and
Vulkan with customBorderColor. This uses an extended sampler descriptor, which
has space for the custom border values. The trouble is that the border must be
packed into an internal interchange format that depends on the original format
in a complex way. That said, we're not solving NP-complete problems here, and it
passes the tests (dEQP-GLES31.functional.texture.border_clamp.* and piglit
texwrap).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20570>
Vulkan spec 18.8. Primitives Generated Queries:
When a generated primitive query for a vertex stream is active,
the primitives-generated count is incremented every time a
primitive emitted to that stream reaches the transform feedback
stage, whether or not transform feedback is active.
We can see the order of stages in chapter 27 Fixed-Function
Vertex Post-Processing, which shows that the transform feedback
stage is before rasterization (and therefore culling).
Conclusion is that culled primitives should be included
in the primitives generated query.
This commit makes sure to emit the primitives generated query
code before culling and uses the input primitive count passed
to the current wave instead of the exec mask after culling.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21037>
Match iadd(x, #y). The format shift will get constant-folded away and, if y
is sufficiently small, the constant will be inlined by the AGX backend
optimizer. This gets rid of piles of 64-bit arithmetic from lowering UBOs. It
probably doesn't matter for perf since that's happening in preamble shaders but
it *is* noisy.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21108>
The offset is in vec4s, not words (unlike the component). This doesn't matter
right now since we get everything lowered (offset -> 0) but it will come up if
we implement clip distances natively (instead of lowering in FS).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21097>
These implement gl_ClipDistance in hardware, avoiding the fragment shader
lowering. Unfortunately, they can't be disabled on a per-plane basis and they
can't be interpolated, so using them for OpenGL would still require a bunch of
extra lowering steps. Still, we should document the hardware and the caveats.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21097>
This works around bugs in a LOT of applications, since fp16 texture coordinates
are almost never appropriate even though it's a valid implementation of the GLES
spec. It also doesn't seem to matter for perf.
Code from the Bifrost compiler which implements the same workaround for slightly
different reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21082>
This speeds up glReadPixels. Instead of reading from the write-combined
framebuffer and converting colours on the CPU, this blits on the GPU to a
writeback staging resource with the colour conversion for free, and memcpies
from the writeback staging resource on the CPU.
In general, due to textures being write combined and tiled/compressed by default
by staging resources being linear writeback, blit-based texture transfer should
win out (you were going to blit anyway), particularly when format conversion is
involved
33% reduction in wall clock time for grim at 4K. No change in deqp-gles2
runtime.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21063>
We only need 4 byte alignment, not 8 bytes. This isn't a big difference in
practice, but it probably reduces padding in some cases. More importantly, it
corrects our XML to match what the hardware actually does, which is great.
(There is exactly enough room for a 40-bit address with 4 byte alignment.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
Check the size explicitly, instead of just implicitly in the GenXML pack: it is
the responsibility of the caller to split up larger uploads. While this is
nominally more complicated, agx_usc_uniform is called in the draw hot path
whereas the actual splitting decision can usually be done at compile-time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
There's a corner case where 3D textures have extra padding compared to 2D
arrays. We need to communicate that to ail.
Fixes
dEQP-GLES3.functional.texture.specification.texstorage3d.size.3d_32x16x64_4_levels.
That test now uses the same layout as Metal.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
it's annoying to validate this at runtime since it has to happen during draw,
but storing the "usable" ds3 mode separately from the pipeline state should
be a reasonable enough compromise for perf here...hopefully
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21100>
apparently an actual null descriptor is illegal here, and it's wasted cpu
anyway, so just cache the dummy surface on init and use that data when
fbfetch isn't active but the layout requires it
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21100>
When skiavk is the default system ui renderer, venus icd gets preloaded
into Zygote. However, Zygote access to render node is normally denied by
selinux except for legacy bootanimation purpose. This change fixes venus
icd loading to avoid invoking cros gralloc driver loading by moving the
perform op outside, so that we still get the memory footprint win.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21107>
Instead of generating num_components*simdwidth scattered stores, if
there's no indirect then we can just look up the pointer to the
base_offset and do a simd store there.
dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i64vec4 goes
from 30s to ~2s.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21084>
We first do an incomplete check for whether the linker supports
--gc-sections, then potentially add C and C++ arguments assuming that it
works, then later do a complete check to see if it actually works and
use --gc-sections. This means we can end up putting functions and data
in separate sections when we can't gc them.
Combine the checks, do less work, and be more accurate.
fixes: f51ce21e4e
("meson: Drop adding -Wl,--gc-sections to project c/cpp arguments.")
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21083>
This should be way faster to compile by not spamming so many loops at
LLVM, and faster to execute if LLVM didn't figure out what that loop
meant.
It looks vector reduce ops aren't really a thing, just a convenience in
the IR. We should be able to do better by counting zeroes in the
exec_mask != 0 result.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21001>
We don't need to deref invoc inside -- invoc is uniform in active
channels, so we can find our first active invocation in the loop, and then
dereference invocation once outside.
50 -> 46 seconds on
dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i16vec4
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21001>
This pass needs to run early (because it depends on early I/O), but it doesn't
actually need the shader key. Why not? If we overestimate the number of render
targets, extra store_output intrinsics will be generated, but they will be
deleted by AGX tilebuffer lowering later.
Note we'll probably want something smarter than this for fragment epilogues in
the future to avoid piles of unnecessary moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21065>
Add a second NIR pass for lowering point/texture coordinate replacement (i.e.
point sprites). Why a second one? The current pass works on derefs/variables,
which is good for drivers that don't lower I/O at all (like Zink, where the pass
originates). However, it is problematic for hardware drivers: the inputs to this
pass depend on the shader key, so we want to run the pass as late as possible to
minimize the cost of building/compiling the associated shader variants. In
particular, we need to be able to lower point sprites after lowering I/O if we
would like to lower I/O when preprocessing NIR.
The logic for early lowering and late lowering is considerably different (the
late lowering is a lot simpler), so I've split this out into a second pass
rather than trying to weld them together into one.
This pass will be used on Asahi, which currently uses the early pass. It may be
useful for other drivers as well. (Actually, it's been shipping on Asahi for a
little while now, just hasn't been sent upstream yet.)
Tested with Neverball.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Asahi Lina <lina@asahilina.net>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21065>
The face culling algorithm should have been disabled for
conservative overestimation because it already
(mistakenly) removed some close-to-zero area triangles.
Now that the driver disables it in that case,
let's always remove zero-area triangles.
This only costs +2 SALU instructions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20987>
We were aiming for very square tiles, but it's actually better for us to
reduce the number of different bins so you take fewer trips through the
geometry and keep the caches hotter. Example changes to aztec ruins on
angle:
3x3 tiles of 352x352 to 4x2 tiles of 256x512
4x5 tiles of 256x224 to 5x4 tiles of 224x256
17x11 tiles of 160x128 to 14x11 tiles of 192x128
12x7 tiles of 224x224 to 7x11 tiles of 384x128
12x8 tiles of 224x192 to 7x11 tiles of 384x128
11x6 tiles of 256x256 to 12x5 tiles of 224x288
11x7 tiles of 256x224 to 7x9 tiles of 384x160
8x4 tiles of 352x352 to 6x5 tiles of 448x288
and minecraft:
3x3 tiles of 352x352 to 4x2 tiles of 256x512
12x6 tiles of 256x256 to 3x23 tiles of 1024x64
12x7 tiles of 256x224 to 8x9 tiles of 384x160
FPS changes:
VK aztec ruins normal: 1.12478% +/- 0.213393% (n=67)
ANGLE manhattan_31: +1.42813% +/- 0.893332% (n=7).
ANGLE minecraft: no change (n=21)
ANGLE google_maps: +6.80618% +/- 2.40857% (n=4)
ANGLE trex_200: no change (n=11)
ANGLE pubg: no change (n=21)
Fixes: #8160
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21004>
Submitting a batch with the first command buffer with the simultaneous
bit set followed by a command buffer without the bit set gets past the
check and triggers this assert attempting to chain them:
../src/intel/vulkan/anv_batch_chain.c:1147: anv_cmd_buffer_chain_command_buffers: Assertion `num_cmd_buffers == 1' failed.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21056>
etna_resource_from_handle() is called for each plane of a multiplanar
resource, so there is no point in looping over all planes to do the
renderonly scanout import. In fact that will cause us to lose track
of the scanout imports from later planes when the earlier planes are
redoing the import, overwriting the pointer to the allocated
renderonly_scanout struct.
Drop the loop and just do the import for the current plane.
Fixes: 826f95778a ("etnaviv: always try to create KMS side handles for imported resources")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20993>
Otherwise, a noop FS will be always compiled during linking if not
provided by the application and that is too slow for fast-linking.
This should be improved to use a global noop FS but it's really tricky
because NIR linking doesn't do anything when the next stage is unknown,
and hence doesn't remove unused varyings.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21042>
compute programs can be reused across contexts, which means storing any
pointers directly like this is going to lead to desync and crash
instead, make this like regular descriptor templates and calculate the offset
from the current context to ensure that everything works as it should
fixes#8201
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21020>
So we're not tied to the macOS or Linux UAPIs and are not translating awkwardly
from one to the other when creating BOs. They're not quite equivalent -- macOS
doesn't include writeback information in this flag field, and Linux doesn't have
a executable flag. (Maybe we should add one, though? Then we can enforce W^X.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
On sway+xwayland, both explicit and implicit modifiers are advertised.
While dri3proto says nothing about it, zwp_linux_dmabuf_v1 says
A compositor that sends valid modifiers and DRM_FORMAT_MOD_INVALID for
a given format supports both explicit modifiers and implicit
modifiers.
"glmark2 -b build:model=bunny --fullscreen" goes from 468 to 598fps on
a618 @ 2160x1440.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20892>
We lower NIR's load_constant to load_global_constant, which uses A64
bindless messages. As such, we do the following math to produce the
address for each load:
base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW
base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH
base@64 <- pack_64_2x32_split(base_lo, base_hi)
addr@64 <- iadd(base@64, u2u64(offset@32))
On platforms that emulate 64-bit math, we have to emit additional code
for the 64-bit iadd to handle the possibility of a carry happening and
affecting the top bits.
However, NIR constant data is always uploaded adjacent to the shader
assembly, in the same buffer. These buffers are required to live in a
4GB region of memory starting at Instruction State Base Address. We
always place the base address at a 4GB address. So the constant data
always lives in a buffer entirely contained within a 4GB region, which
means any offsets from the start of the buffer cannot possibly affect
the high bits.
So instead, we can simply do a 32-bit addition between the low bits of
the base and the offset, then pack that with the unchanged high bits.
On anv, INSTRUCTION_STATE_POOL_MIN_ADDRESS is 8GB, so the high bits are
always 0x2. We don't even need to patch that portion of the address and
can just use an immediate value. We do still need to pack, however.
fossil-db on Icelake indicates the following for affected shaders:
Instrs: 10830023 -> 10750080 (-0.74%)
Cycles: 1048521282 -> 1046770379 (-0.17%); split: -0.33%, +0.16%
Subgroup size: 103104 -> 103112 (+0.01%)
Send messages: 570886 -> 570760 (-0.02%)
Loop count: 14428 -> 14429 (+0.01%)
Spill count: 14246 -> 14244 (-0.01%); split: -0.06%, +0.04%
Fill count: 22802 -> 22794 (-0.04%); split: -0.04%, +0.01%
Scratch Memory Size: 654336 -> 662528 (+1.25%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
We lower NIR's load_constant to load_global_constant, which uses A64
bindless messages. As such, we do the following math to produce the
address for each load:
base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW
base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH
base@64 <- pack_64_2x32_split(base_lo, base_hi)
addr@64 <- iadd(base@64, u2u64(offset@32))
On platforms that emulate 64-bit math, we have to emit additional code
for the 64-bit iadd to handle the possibility of a carry happening and
affecting the top bits.
However, NIR constant data is always uploaded adjacent to the shader
assembly, in the same buffer. These buffers are required to live in a
4GB region of memory starting at Instruction State Base Address. We
always place the base address at a 4GB address. So the constant data
always lives in a buffer entirely contained within a 4GB region, which
means any offsets from the start of the buffer cannot possibly affect
the high bits.
So instead, we can simply do a 32-bit addition between the low bits of
the base and the offset, then pack that with the unchanged high bits.
On iris, IRIS_MEMZONE_SHADER is at [0, 4GB) so the high bits are always
zero. We don't even need to patch that portion of the address and can
simply use u2u64 to promote the 32-bit add result to a 64-bit value
where the top bits are 0.
shader-db on Icelake indicates that this:
- Helps instructions: -1.13% in 135 affected programs
- Helps spills/fills: -4.08% / -4.18% in 4 affected programs
- Gains us 1 SIMD16 compute shader instead of SIMD8
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
They got accidentally disabled entirely, so they didn't block merge, but
once they re-enable then they'll block us again. The problem was that I
moved allow_failure to a .performance-rules section, but we only ever
inherit the rules from that location, not the rest of yml.
This is basically a revert of 67547a04b6 ("ci: Move the performance
jobs' allow_failure:true to the gl rules."), though I still keep the
allow_failure in a more common location with comments, since perf jobs are
a huge trap.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21002>
Complicated CFG and lots of SALU can cause this to take an extremely long
time to finish.
Fixes
dEQP-VK.graphicsfuzz.cov-value-tracking-selection-dag-negation-clamp-loop
and Monster Hunter Rise demo compile times.
fossil-db (gfx1100):
Totals from 57 (0.04% of 134574) affected shaders:
Instrs: 170919 -> 171165 (+0.14%)
CodeSize: 860144 -> 861128 (+0.11%)
Latency: 961466 -> 961505 (+0.00%)
InvThroughput: 127598 -> 127608 (+0.01%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8153
Fixes: 5806f0246f ("aco/gfx11: workaround VALUPartialForwardingHazard")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20941>
u_vector_add() don't keep the returned pointers valid.
After the initial size allocated in u_vector_init() is reached it will
allocate a bigger buffer and copy data from older buffer to the new
one and free the old buffer, making all the previous pointers returned
by u_vector_add() invalid and crashing the application when trying to
access it.
This is reproduced when running
dEQP-VK.synchronization.signal_order.timeline_semaphore.* in DG2 SKUs
that has 4 CCS engines, INTEL_COMPUTE_CLASS=1 is set and of course
perfetto build is enabled.
To fix this issue here I'm moving the storage/allocation of
struct intel_ds_queue to struct anv_queue/iris_batch and using
struct list_head to maintain a chain of intel_ds_queue of the
intel_ds_device.
This allows us to append or remove queues dynamically in future if
necessary.
Fixes: e760c5b37b ("anv: add perfetto source")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20977>
The offsets table stores offsets of a buffer (such as cmdstream) that
we've already dumped. The suballoc pool results in more suballocated
cmdstream allocated from a single backing buffer, meaning that we need
to increase the size of this table.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20975>
To fix a hypothetical issue:
v0 = start_linear_vgpr
if (...) {
} else {
use_linear_vgpr(v0)
}
v0 = phi
We need a p_end_linear_vgpr to ensure that the phi does not use the same
VGPR as the linear VGPR.
This is also much simpler.
fossil-db (gfx1100):
Totals from 1195 (0.89% of 134574) affected shaders:
Instrs: 4123856 -> 4123826 (-0.00%); split: -0.00%, +0.00%
CodeSize: 21461256 -> 21461100 (-0.00%); split: -0.00%, +0.00%
Latency: 62816001 -> 62812999 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 9339049 -> 9338564 (-0.01%); split: -0.01%, +0.00%
Copies: 304028 -> 304005 (-0.01%); split: -0.02%, +0.01%
PreVGPRs: 115761 -> 115762 (+0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20621>
this extension was added for the purpose of emulating the GL ext,
and using it is reasonably straightforward
the only (somewhat) invasive part is modifying the renderpass/dynamic hashes
to have samplecounts in the key, but this is also not too much work
now only fbfetch requires real renderpasses, and everything else is dynamic
fixes#7559
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20883>
Whenever a single file mesa-db cache hits max size limit, a half of cache
is evicted and the cache file is defragmented. The downside of this eviction
strategy is that it causes high disk IO usage during eviction if mesa-db
cache file size is large.
In order to mitigate this downside, we will split mesa-db into multiple
part such that only one part will be evicted at a time. Each part will be
an individual single file mesa-db cache, like a DB shard. The new multipart
mesa-db cache will merge the parts into a single virtual cache.
This patch introduces two new environment variables:
1. MESA_DISK_CACHE_DATABASE_NUM_PARTS:
Controls number of mesa-db cache file parts. By default 50 parts will be
created. The old pre-multipart mesa-db cache files will be auto-removed
if they exist, i.e. Mesa will switch to the new DB version automatically.
2. MESA_DISK_CACHE_DATABASE_EVICTION_SCORE_2X_PERIOD:
Controls the eviction score doubling time period. The evicted DB part
selection is based on cache entries size weighted by 'last_access_time' of
the entries. By default the cache eviction score is doubled for each month
of cache entry age, i.e. for two equally sized entries where one entry is
older by one month than the other, the older entry will have x2 eviction
score than the other entry. Database part with a highest total eviction
score is selected for eviction.
This patch brings x40 performance improvement of cache eviction time using
multipart cache vs a single file cache due to a smaller eviction portions
and more optimized eviction algorithm.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20256>
In order to ease writing mesa-db eviction unit tests, stop accounting
mesa-db cache file header size during checking whether cache file reached
the size limit. This change ensures that older unit tests will keep working
whenever cache header version/size will change.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20256>
If you can longer find chadversary or chadv on the interwebs, then
search for linyaa or versalinyaa.
Egg-crAcked-By: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Egg-Cracked-By: Faith Ekstrand <faith@gfxstrand.net>
Egg-Cracked-By: Lyude Paul <lyude@redhat.com>
Egg-Cracked-By: Wann
Egg-Cracked-By: Zach Lesher
Egg-Cracked-By: 初音ミク
Acked-by: Daniel Stone <daniels@collabora.com>
Allow the loading process to affect driconf option matching without
changing the behavior throughout mesa common code or leaking the name of
the loading process to logs, artifact storage, or in sub-thread naming,
as can be the case with the broader MESA_PROCESS_NAME override.
This new MESA_DRICONF_EXECUTABLE_OVERRIDE takes higher precedence over
MESA_PROCESS_NAME in the case where both are set.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20779>
Also deprecate GALLIUM_PROCESSS_NAME in favor of MESA_PROCESS_NAME,
while maintaining existing functionality for use cases relying on
GALLIUM_PROCESSS_NAME.
GALLIUM_PROCESSS_NAME takes higher precedence over MESA_PROCESS_NAME in
the case where both are set.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20779>
meson tests sharing a binary (and deviating in their env/args) will
produce temporary logs to the same directory, which is assumed to exist
only for the duration of a single test. This is problematic when running
tests in parallel, as one test may remove the directory before the
other(s) finish, causing a test flake.
This appends the each test's pid to the output directory to enforce
uniqueness and avoid the race.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20779>
Rely on the kernel for video encoder/decode capabilities where possible,
since there might be special cases for some devices. Otherwise, fallback
to the older logic for older kernels.
v2: Made the macro lines shorter and added a comment to explain (David)
v3: Undo deleting some logic (Ruijing)
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20969>
Allowing longer writes reduces the number of send messages needed
to support unaligned 4-component writes.
Note: nothing currently generates 8-component writes, so this change
makes "second_mask" code path in emit_urb_direct_writes and
emit_urb_indirect_writes_mod dead.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20858>
nir_opt_preamble is now aware of the internal uniforms we insert, so it can use
the whole uniform file available to it. This lets us push more (all?) uniform
loads in Dolphin ubershaders to the preamble.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>
To comply with The Ekstrand Rule.
AGX has a large number of "uniform registers" available. These may be loaded
with arbitrary ranges of GPU memory by the driver, or they can be written by the
preamble shader. Currently, the compiler runs nir_opt_preamble on the first half
of the uniform file, and then translates NIR sysvals to moves from the second
half of the uniform file, passing back a uniform->sysval map for the GL driver
to respect. This has (at least) two issues:
* Since nir_opt_preamble runs before gathering sysvals, it has to assume the
maximum number of sysvals are pushed, which can prevent it from moving some
computation to the preamble due to running out of partitioned uniform registers.
This is a problem for Dolphin's ubershaders, though it's unclear how much it
matters for Dolphin perf.
* This violates The Ekstrand Rule and apparently will be a problem for our
Vulkan driver. I'm just a compiler+GL girl, so I wouldn't know.
To fix this, we invert the order of operations. At the end of this series, we
instead lower NIR system values to NIR load_preamble instructions in the GL
driver. The compiler just translates directly to uniform registers reads. The
Vulkan driver will need its own version of this code, but maybe it can do
something clever and descriptor set aware.
This means that there will already be some load_preamble instructions when
nir_opt_preamble runs, so I've made minor changes to nir_opt_preamble to handle
that gracefully. This is a bit lazy... The alternative is to introduce a
`load_uniform_agx` intrinsic which `load_preamble` gets lowered to trivially.
But that's another pass over the IR (and due to AGX's shader variant hell I'm
sensitive to backend compile time) and it would be more complicated than what's
implemented here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Ella Stanforth <ella@iglunix.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>
Some backends may wish to reserve early uniforms for internal system values, and
use the remaining space for preamble storage. In this case, it's convenient to
teach nir_opt_preamble about a reserved offset. It's logical to treat the output
*size instead of an in/out variable that nir_opt_preamble adds to. This requires
a slight change to the consumers to zero the input.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by-(with-sparkles): Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>
Zink creates all libaries with CREATE_RETAIN_LINK_TIME_OPTIMIZATION,
then it first creates unoptimized pipelines and it enqueues optimized
pipelines in the background with CREATE_LINK_TIME_OPTIMIZATION.
If a pipeline is linked without CREATE_LINK_TIME_OPTIMIZATION, the
driver should import binaries instead of retained NIR shaders. This
was broken because RADV wasn't compiling binaries at all in presence
of CREATE_RETAIN_LINK_TIME_OPTIMIZATIONS. Now, it always compiles
binaries in libraries but can also retain NIR if requested.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8150
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21008>
Our hardware requires that we write to URB using full vec4s at aligned
addresses. It gives us an ability to mask-off dwords within vec4 we don't
want to write, but we have to know their positions at compile time.
Let's assume that:
- V represents one dword we want to write
- ? is an unitinitialized value
- "|" is a vec4 boundary.
When we want to write 2-dword value at offset 0 we generate 1 write message:
| V1 V2 ? ? |
with mask:
| 1 1 0 0 |
When we want to write 4-dword value at offset 2 we generate 2 write messages:
| ? ? V1 V2 | V3 V4 ? ? |
with mask:
| 0 0 1 1 | 1 1 0 0 |
However if we don't know the offset within vec4 at *compile time* we
currently generate 4 write messages:
| V1 V1 V1 V1 |
| 0 0 1 0 |
| V2 V2 V2 V2 |
| 0 0 0 1 |
| V3 V3 V3 V3 |
| 1 0 0 0 |
| V4 V4 V4 V4 |
| 0 1 0 0 |
where masks are determined at *run time*.
This is quite wasteful and slow.
However, if we could determine the offset modulo 4 statically at compile time,
we could generate only 1 or 2 write messages (1 if modulo is 0) instead of 4.
This is what this patch does: it analyzes the addressing expression for
modulo 4 value and if it can determine it at compile time, we generate
1 or 2 writes, and if it can't we fallback to the old 4 writes method.
In mesh shader, the value of offset modulo 4 should be known for all outputs,
with an exception of primitive indices.
The modulo value should be known because of MUE layout restrictions, which
require that user per-primitive and per-vertex data start at address aligned
to 8 dwords and we should statically always know the offset from this base.
There can be some cases where the offset from the base is more dynamic
(e.g. indirect array access inside a per-vertex value), so we always do
the analysis.
Primitive indices are an exception, because they form vec3s (for triangles),
which means that the offset will not be easy to analyse.
When U888X index format lands, primitive indices will use only one dword
per triangle, which means that we'll always write them using one message.
Task shaders don't have any predetermined structure of output memory, so
always do the analysis.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20050>
Coded buffer will only be read on CPU, setting
PIPE_USAGE_STAGING instead of PIPE_USAGE_STREAM
makes the CPU reads much faster.
On 6700XT this reduces the CPU copy by around
3ms to 0.3 ms on average while under high GPU
load - real-time game streaming.
Signed-off-by: David Rosca <nowrep@gmail.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20989>
We have a common pain point with fractional CTS coverage, where the test
list changes on a CTS uprev or board load rebalancing, so you get a
different subset of tests run. The dev updates the list of xfails (a
pain), but also we end up with xfails left behind that aren't tested any
more and don't reflect reality.
For some drivers (tu, freedreno, zink-anv) we have manual jobs available
for curious devs to look at the current state of the CTS, but without
anyone having to keep the full xfails updated during uprevs, you don't
necessarily know what to do with the results you get on your MR.
So, let's introduce nightly testing for the tests that aren't guaranteed
green by Marge. With that, Someone (possibly me? sigh) can review the
nightly results and push up updates for full-run xfails so everyone can be
on the same page other than a day or so of delay. We also have some hope
for automated tooling to do this thanks to what Collabora has been working
on for automated CI uprev MR generation.
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20950>
ac/surface puts the raw pip_bank_xor there, which needs the extra
shift for the actual tile_swizzle.
(I think long term we should refactor this in ac/surface but for
now lets fix like radeonsi to avoid race conditions.)
CC: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20979>
If we hit the race condition of looking up an already imported BO that
is in the process of being destroyed, the handle will be GEM_CLOSE'd,
meaning that the handle that we just got from the kernel is probably not
valid. So in this case we should retry.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20961>
The RESOURCE_CREATE_BLOB ioctl can carry a ccmd payload, similarly to
EXECBUF. But we need to preserve the order of buffered execbuf cmds
which haven't been flushed to the guest kernel yet, rather than let the
CREATE_BLOB payload jump to the head of the queue. Otherwise, for ex,
the host could see the guest requesting an iova that has not yet been
(from it's perspective) released.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20961>
This introduces radv_compute_pipeline_compile() which is used for
compute and ray tracing pipelines. I think it's better than having a
single function for compiling everything, and that will allow us to do
more cleanups.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20943>
The Vulkan spec says:
"VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT specifies
that pipeline creation will fail if a compile is required for
creation of a valid VkPipeline object; VK_PIPELINE_COMPILE_REQUIRED
will be returned by pipeline creation, and the VkPipeline will be
set to VK_NULL_HANDLE."
Given the implementation is expected to set the pipeline to
VK_NULL_HANDLE, it's unecessary to handle pipeline feedback.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20943>
"If this texture instruction has a nir_tex_src_texture_offset source,
then the texture index is given by texture_index + texture_offset."
This fixes the failures for:
spec@arb_arrays_of_arrays@execution@sampler@fs-nested-struct-arrays-nonconst-nested-array
spec@arb_gl_spirv@execution@uniform@sampler2d-nonconst-nested-array
Signed-off-by: Amber Amber <amber@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20954>
list_is_linked() isn't the right function to use in order to check if
the BO is on a cache bucket or the zombie list, as this checks if the
next pointer of the list isn't NULL. This is always the case with the
BO list item as it's always initialized, so the next pointer points to
the list head itself when the BO isn't on any list.
Use list_is_empty() to check if the BO is actually linked into one
of the deferred destroy lists.
Fixes: 1b1f8592c0 ("etnaviv: drm: properly handle reviving BOs via a lookup")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20940>
Fixes 3DMark Wild Lands. Otherwise, we'd end up loading a DXIL shader
that had invalid linkage with another shader in the pipeline. We can
only load a DXIL shader if it's being linked against the same before
and after as a previous compilation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20962>
The anv_execbuf_add_bo() function already sets the offsets for the
exec_objects. Since we're always using softpin and never using
relocations all these objects should have non-changing offsets, all
set during anv_bo creation and never changed. Not only we should not
change these offsets, we definitely don't change them between
anv_execbuf_add_bo() and this loop we're removing.
Previously, we'd have the offset set as -1 for BOs that had never been
submitted when we were not using softpin.
Notice that with games we can have several hundreds of BOs in this
array.
This loop was added by:
c5f7e1f5b4 ("anv: Delete relocation support from batch submission")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20885>
This is a simple helper for minimizing the storage requirements of
control streams. It discards all information required only while
building the control stream and returns just the status and the list of
BOs backing the control stream. The first BO in the list is the start
of the control stream.
Especially for small, deterministically sized control streams, there's
no sense in lugging around an entire builder structure once it's built.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20932>
This game seems to incorrectly set the render area and since we switched
to full dynamic rendering, the framebuffer dimensions is no longer used.
Forcing the render area to be the framebuffer dimensions restore the
previous logic and it fixes rendering issues.
Fixes: c7d0d328d5 ("radv: Set the window scissor to the render area, not framebuffer")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20900>
Check the need for emitting prefetch before calling si_emit_cache_flush
to mask a possible cache miss delay and always inline radv_emit_prefetch_L2.
Either change alone is not significant but together they increase
drawcall throughput by 8% on i5-2500.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20877>
The JIT generated FS shader has logic to obey sample mask when:
multisample is enabled, or multisample is disabled but FS writes sample
mask and pipe_rasterizer_state::no_ms_sample_mask_out.
However it did not handle the case where multisample was disabled, FS
did not write sample_mask, and sample mask was zero. Instead it relied
upon the setup to discard the primitives, but that went away with commit
da5840f3.
We could restore the discard on zero mask behavior, but we would again
blurring the semantics of rasterization discard. Instead this change
adds logic to primitive setup to cull the primitives when sample mask is
zero.
Fixes: da5840f3 ("llvmpipe: Faithfully honour pipe_rasterizer_state::rasterizer_discard flag")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20934>
The Meson docs points out that it's better to use the files() function
when referring to files in the source tree than manually constructing
paths like this. Let's follow that advice, and get some neat cleanups.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20907>
The meson.build_root() method has been deprecated, so let's switch to
meson.project_build_root(), which usually means the same thing. The case
where it doesn't do the same thing is if Mesa is a subproject to some
other project, but in that case I believe we want the build root of Mesa,
not of the parent project anyway.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20907>
RRGetScreenInfo re-probes connector status, which may result in an EDID
transfer for every output, which according to Adam Jackson can be on the
order of 100ms for a single EDID block. So our previous implementation
of this eglGetMscRateANGLE was blocking for excessive periods of time
instead of being a quick query of the refresh rate like users expect.
This changes our eglGetMscRateANGLE implementation from using
RRGetScreenInfo to RRGetScreenResourcesCurrent and RRGetCrtcInfo.
This obtains the same monitor info without re-probing connectors.
Fixes a severe performance regression in Chromium WebGL performance.
While we're re-implementing the extension, we also implement proper
multi-monitor support: if there are multiple active CRTCs, we determine
which contains the largest portion of the surface, as specified in the
EGL_ANGLE_sync_control_rate extension.
We also now report fractional refresh rates correctly rather than
rounding to the nearest Hz.
Fixes: 4752655649 ("egl/x11: implement ANGLE_sync_control_rate")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6996
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7038
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20665>
Commit 582bf4d9 turned on write-combining for most (all?) memory
allocations. This caused a fairly large performance drop in some of
our VMware tests (application traces, such as Windows Metro Paint).
This patch adds a third memory type configuration: DEVICE_LOCAL,
HOST_VISIBLE, HOST_COHERENT. This is uncached. Then, in
anv_AllocateMemory() we only use write-combining for this uncached
type. This memory type is found in the Intel Windows Vulkan driver.
And according to
https://asawicki.info/news_1740_vulkan_memory_types_on_pc_and_how_to_use_them
uncached memory correlates to write-combined memory.
This fixes our performance regression (and actually produced the
fastest ever results for our test suite).
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20770>
Handle lookup (for example PRIME_FD_TO_HANDLE) must be synchronized with
GEM_CLOSE, otherwise re-import can race with bo_del path, resulting in
the handle of the newly (re)imported BO getting closed. Now that the
finalize step has been decoupled, fixing this is mostly just deleting
code.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
The complexity around batching up handle closing is simply to allow the
virtgpu to back up ccmd's to the host (because virtio/virtgpu is pretty
inefficient when it comes to lots of small msgs to the host, and it is
common that when we are deleting BOs, we delete a lot of them at the
same time. But that will make the locking fix in the next commit
impossible (without nested locks). So let's flip this around and do the
step that virtgpu wants to batch up first, before we get into closing
GEM handles, etc.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
When importing from a GEM name or dmabuf fd, we can race with the final
unref of the same BO, in which case we can get a hit in the handle
table for an fd_bo that another thread is about to free(). Detect and
handle this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
This was a merge conflict from the Win32 WSI DXGI swapchain changes.
I missed moving a new line of code that was added when rearranging
things for using the common helpers.
Fixes: cfa260cd ("dzn: Use common physical device list/enumeration helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20944>
VKD3D-Proton DXBC f32 to f16 conversion implements a float conversion using PackHalf2x16.
Because the spec does not specify a rounding mode, it emits a sequence to ensure
D3D-like behaviour for infinity.
When we know the current backend has pack_half_2x16_rtz_split,
we can eliminate the extra sequence.
Fossil DB stats on GFX11:
Totals from 835 (0.62% of 134913) affected shaders:
VGPRs: 49368 -> 49224 (-0.29%)
CodeSize: 5341956 -> 5124564 (-4.07%)
Instrs: 1024062 -> 987041 (-3.62%)
Latency: 6530956 -> 6465120 (-1.01%); split: -1.01%, +0.00%
InvThroughput: 908189 -> 870253 (-4.18%)
VClause: 18704 -> 18702 (-0.01%); split: -0.02%, +0.01%
SClause: 33406 -> 33284 (-0.37%); split: -0.38%, +0.01%
Copies: 67440 -> 65992 (-2.15%); split: -2.15%, +0.00%
Branches: 18498 -> 18465 (-0.18%)
PreSGPRs: 38409 -> 38331 (-0.20%)
PreVGPRs: 44089 -> 43834 (-0.58%)
Note, some fossils are from before this pattern was added to VKD3D-Proton,
so the above may not reflect real-world impact.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.
The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.
No shader-db changes on TGL & DG2.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
The code emitted by lp_build_fpstate_set to reset the FP state could be
jumped over when the write mask was zero, leading to denormals not being
flushed to zero.
Spotted by Roland Scheidegger.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20901>
It's legal to create a library with FRAGMENT_OUTPUT_INTERFACE and with
all CB states as dynamic, in this case the PS epilog should be dynamic.
This fixes a bunch of regressions while running Zink/RADV CTS with
RADV_PERFTEST=gpl.
Zink is the final boss.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20882>
For the common case where we're emitting packet we don't need to
update the cl_out pointer and then store the result in cl->next,
we can directly update cl->next.
This shows a small improvement in vkoverhead's scores for basic
draw tests.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20897>
Fix: Acquire/release should have one valid access/sync and one set
to none.
Workaround: D3D doesn't like simultaneous access resources leaving
COMMON layout, nor does it like setting UAV/RTV access bits for the
COMMON layout.
Use UNDEFINED -> UNDEFINED layout transitions, where the access bits
just aren't validated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20919>
Float controls are emitted as function attributes on the entrypoint.
These function attributes are not the standard build-in LLVM kind, but
are strings, which the DXIL backend didn't know how to emit. So, this
change adds string attribute support and uses it for fp32 ftz/preserve.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20919>
Allow the compiler to assume that the shader always has full subgroups,
meaning that the initial EXEC mask is -1 in all waves (all lanes enabled).
This assumption is incorrect for ray tracing and internal (meta) shaders
because they can use unaligned dispatch.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>
The shifting was off-by-one compared to how it is done in the kernel. Also, excess_length needs to be casted to uint64_t to prevent zeroing everything except the 5 LSBs.
Fixes: abf3bcd6 ("radv: Add RMV resource tracking")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20820>
If you're only affecting one or a couple of drivers, it would be nice if
your pipeline buttons on the web UI weren't full of manual run buttons for
all the other drivers.
This is a bunch of duplicated lines, but less than it could have been now
that we have !references.
In some of these cases (i915g, nouveau, etnaviv), we have no non-manual
jobs for those drivers, so I could have just rewritten the original
"driver-rules" to "driver-manual-rules". I decided to keep things
consistent between drivers, though, because this is all esoteric enough to
readers already without making different drivers' rules look different.
Fixes: #4891
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17445>
The texture format code relies on a python-generated atlas of structs
that describe a lookup table for texture swizzling. Many of these
texture formats contain the index "6" used for this lookup. The 6th
index just so happens to represent a "don't care" value, however the
out of bounds read is still best to be avoided. The address sanitizer
finds this issue pretty immediately but it only shows up on big endian
because the textures don't need this on little.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20846>
The ACP entries created by copy propagation to track the implied
copies of LOAD_PAYLOAD instructions don't model the behavior of
LOAD_PAYLOAD correctly, since (as of 41868bb682) header
moves are implicitly retyped to UD and the destination of non-header
copies implicitly uses the same type as the corresponding source, even
though the ACP entries created for such copies could incorrectly
represent a type conversion, which can lead to mis-optimization of the
program.
According to Marcin, this fixes the func.mesh.ext.workgroup_id.task.q0
crucible test.
Fixes: 41868bb682 ("i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction")
Reported-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18980>
ChromeOS still uses Python 3.6, but the glsl2spirv script uses module
'__future__.annotations', introduced in Python 3.7. Fix the build by
removing module, but otherwise preserve the type annotations.
Fixes: 949c3b55db ("util/glsl2spirv: add type annotations")
Acked-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20237>
this is incredibly stupid, but KHR-GL46.texture_view.coherency does all
kinds of rasterization discard draws with fb attachments bound as images,
and there's no other sane way to catch it dynamically
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20891>
Here adding kmd_type parameter to
intel_gem_read_render_timestamp(), intel_gem_can_render_on_fd() and
intel_gem_supports_protected_context().
Those 3 functions will have Xe implementations, the other functions
in intel_gem.h will not be called by Xe code paths so not adding
kernel_driver_type to it.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20773>
This appears to have been an oversight. AV1 Main profile requires
support for both 8bit and 10bit, and so we should always declare
support for the YUV420_10 RT Format. This support then cascades
into supporting the appropriate surface formats and meets expectations
of vaapi clients (especially ffmpeg based) on how to detect support
for these formats.
Note that the commit [0b02db3007] was also made with the intention of
fixing this problem, but it does so in a non-idiomatic way. With that
change, there is still no declared YUV420_10 RT Format, and instead
the P010 surface format is reported under the YUV420 RT Format. This
is not going to work with all vaapi clients. I recommend that this
commit be reverted.
Signed-off-by: Philip Langdale <philipl@overt.org>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20870>
For the same reason why we used to have USE_LIBBACKTRACE with the old
Android makefiles, allow to build Mesa without linking to it.
In recent VNDK versions, libbacktrace isn't available.
When building without linking libbacktrace, for some reason some symbols
related to C++ exception handling are exposed. Allow them in the symbols
check script.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: Sergi Blanch Torné <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20019>
`count` should not be incremented before the check, because it causes
the modifiers array to be filled starting from position 1 instead of 0.
This bug causes one less format modifier to be available than would
otherwise be expected, which could then lead to a dmabuf query failing
in situations where a supported modifier wouldn't be advertised.
It also causes garbage data to be advertised as a modifier in position 0
of the array, although this is not very likely to cause issues.
Fixes: 2a1217513 ("panfrost: Implement panfrost_query_dmabuf_modifiers")
Cc: mesa-stable
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20879>
gfx11 renamed PRIM_GRP_SIZE to VERTS_PER_SUBGRP but another change was
was missed.
Update our code based on PAL's UniversalCmdBuffer::CalcGeCntl function
(especially useVgtOnchipCntlForTess being false for gfx11).
Fixes: 25a66477d0 ("radeonsi/gfx11: register changes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20728>
Give a name to dma-buf. This name appears in /sys/kernel/debug/dma_buf/bufinfo
and could be useful to debug dma-buf:
Dma-buf Objects:
size flags mode count exp_name ino name
00606208 00000002 00080007 00000003 drm 00192014 2321705-glxgears
The name is only added to non-shared buffer, to avoid overwriting
an existing name when exporting an imported buffer (otherwise all
dma-buf will pretend to be created by XWayland).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20728>
This is needed when switching away from GPGPU mode. See the previous
commit for anv. This is not likely to make a practical difference for
iris because it never switches back and forth between modes like anv.
Fixes: 172e0b0ebf ("iris: Update PIPELINE_CONTROL flush when switching pipeline mode in TGL+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20774>
See the comments in emit_apply_pipe_flushes(). Flushing HDC is not
sufficient in GPGPU mode, and we need to set the untyped data port flush
bit as well.
Fixes many dEQP-VK failures with INTEL_COMPUTE_CLASS=1 on Alchemist.
Fixes: 1067ec90a5 ("anv: Update PIPELINE_CONTROL flush when switching pipeline mode in TGL+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20774>
This patch removes below shadow reg optimization. This is done for
Vega64 battlefield 1 crash when shadow regs enabled.
+ reset only dirty states with buffers in si_pm4_reset_emitted()
+ various draw states in si_begin_new_gfx_cs()
v2: remove first_cs parameter from si_pm4_reset_emitted() (Marek Olšák)
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18301>
In case of mcbp, shadowed_regs is initialized early in radv_queue_init()
function by submitting the command buffer. The command buffer is submitted in
radv_init_shadowed_regs_buffer_state() function. When RADV_DEBUG=noibs is used
radv_amdgpu_winsys_cs_submit_sysmem() function is used to submit command buffer.
radv_amdgpu_winsys_cs_submit_sysmem() crashes here because initial_preamble_cs
is NULL. This patch fixes the radv_amdgpu_winsys_cs_submit_sysmem() function
to support NULL initial_preamble_cs.
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18301>
Let's make sure we aren't introducing new validation failures as
development proceeds. Basically, we record the current set of known
validation failures from the CTS, and for any validation failure we have
the layer log it and abort.
I had started encoding xfails from piglit, but it turns out that piglit
and the validation layer fight about the teardown process, producing
use-after-frees.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20756>
Otherwise, the access chain you emitted last time may not dominate the
current use.
Fixes the following validation failure in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.bits_unique_per_sample.multisample_texture_2:
UNASSIGNED-CoreValidation-Shader-InconsistentSpirv(ERROR / SPEC):
msgNum: 7060244 - Validation Error: [
UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle =
0x55cf3cea2c60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 |
SPIR-V module not valid: ID '67[%67]' defined in block '23[%23]' does
not dominate its use in block '31[%31]'
Fixes: 8899f6a198 ("zink: fix gl_SampleMaskIn spirv generation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20756>
Fixes validation failures:
Test case 'dEQP-GLES31.functional.android_extension_pack.shaders.es32.extension_directive.oes_sample_variables'..
MESA: error: Validation Error: [
UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle =
0x563a1838b790, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 |
SPIR-V module not valid: [VUID-StandaloneSpirv-Flat-04744] Fragment
OpEntryPoint operand 31 with Input interfaces with integer or float type
must have a Flat decoration for Entry Point id 4.
%gl_SampleId = OpVariable %_ptr_Input_uint Input
Test case 'KHR-GL46.shader_ballot_tests.ShaderBallotAvailability'..
MESA: error: Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x5558e12f17e0, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: [VUID-StandaloneSpirv-Flat-04744] Fragment OpEntryPoint operand 28 with Input interfaces with integer or float type must have a Flat decoration for Entry Point id 4.
%gl_SubgroupLocalInvocationId = OpVariable %_ptr_Input_uint Input
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20756>
DXIL has no concept of subgroup mask ops, relative
shuffle ops, and everything is scalar.
Most wave broadcast ops support i1 overloads, except
for quad swap operations. Go figure. Use lower_bit_size
to promote those to i32 instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20801>
the code here was all expecting the VkPipelineStageFlags bitfield expansions,
but u_foreach_bit() gives the actual bit, so implicit feedback loops were never
actually being detected
instead, convert back to the bitfield at the top of the loop so the value works
as expected
Fixes: 9ba0657903 ("zink: make implicit feedback loop application stricter")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20863>
Sometimes you want to diff 2 runs with INTEL_DEBUG=bat, but a tiny
allocation change can mess quite badly with offsets printed in the
decoding, making it hard to look at the diff with meld.
Fortunately our decoder can avoid printing offsets. We just need a
variable to specify that.
We still use the defaults specified by the driver but you can turn
things on/off with :
INTEL_DECODE=+color,-offsets,-floats INTEL_DEBUG=bat ./my_app
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20874>
I stumbled on this when I inserted some suboptimal lowering code after all
optimizations. Adding certain subset of optimizations after my lowering code
actually avoided this bug, so I think it's not possible to hit this on upstream.
Let's fix this for the next person generating suboptimal code...
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20857>
Allocates VRAM in display GPU in case of prime. Then the dma_buf is imported
into prime GPU.
v4: add image tag to __DRIimage (Marek Olšák)
v3: move display fd opening to separate commit (Pierre-Eric)
image_format_to_fourcc() non-static to seperate commit (Pierre-Eric)
v2: close query fds after linear_copy buffer import (Marek Olšák)
use image_format_to_fourcc() from loader_dri3_helper.c (Marek Olšák)
Signed-off-by: Yogesh Mohanmarimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13422>
the image_format_to_fourcc() function will be used from
egl/wayland hence make it non-static. Also move the function
into loader_dri_helper.c from loader_dri3_helper.c since
loader_dri3_helper library depends on xcb which will make
egl wayland depend on xcb indirectly.
v2: add loader tag to extern image_format_to_fourcc() (Marek Olšák)
V3: move image_format_to_fourcc to loader_dri_helper.c
Signed-off-by: Yogesh Mohanmarimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13422>
Initialize dri_screendisplay_gpu variable in struct laoder_dri3_drawable.
Also make dri_screen_display_gpu variable as input parameter to function
loader_dri3_drawable_init() since dri_screen variable is initialized this way.
This also helps to avoid duplicate initializing dri_screen_display_gpu
in glx and egl code.
Signed-off-by: Yogesh Mohanmarimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13422>
Wa_14015360517 mentions situations where HW produces invalid
occlusion query results when "Pixel Shader Does not write to RT"
bit is set.
"When Pixel Shader Kills Pixel is set, SW must perform a dummy render
target write from the shader and not set this bit, so that Occlusion
Query is correct."
Another situation is when writing to UAV or to NULL render target.
Patch sets field as 'must be zero' to discourage possible use of it.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20849>
These enums have been removed/deprecated long time ago from desktop
GL. Here we remove them from modern GL while still allow for compat
contexts (~old apps). This change matches proprietary drivers and makes
our behaviour same within CTS with some tests.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20719>
While we are already ensuring we allocate at least 8192 bytes should
this not be the first allocation and our allocations are typically just
a few bytes, multilayered framebuffers with large numbers of layers may
require more space than that in a single allocation.
Fixes: 3325950648 ('v3dv: increase BO allocation size when growing CLs')
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20871>
according to GL spec, incomplete shadow samplers should return 0
this is technically possible for drivers to do using a RGBA texture in
the sense that somehow it's been working, but it's broken at the gallium-level
for what drivers should be expecting to see in such circumstances given
that such scenarios have been binding a RGBA texture to use with shadow samplers
instead, we can give drivers a fallback Z32 texture to avoid format/sampler
mismatches and complying with expected behavior
see also KHR-GL46.incomplete_texture_access.sampler for driver-specific testing
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20817>
when doing legacy depth texture mode sampling, it's necessary to keep
another view that has the right (R in component 0) swizzle so that depth
values can actually be returned in cases where it would otherwise be
a constant value due to swizzling
this also allows zink_sampler_view::shadow_needs_shader_swizzle to be removed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20598>
this is clunky because of how big the swizzle data block is,
but the gist of it is the data block is stored onto the shader module key
after all the other data, and then it gets manually hashed/compared in
relevant cases
it's gross, but so is this functionality
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20598>
this enables passing a zink_fs_shadow_key to the compiler to manually
apply a swizzle other than R/R/R/R to depth texture results
currently no data is passed, so the previous splatting behavior is preserved
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20598>
I've been running into failures with tests like :
dEQP-VK.robustness.robustness2.bind.notemplate.rgba32i.unroll.nonvolatile.uniform_buffer_dynamic.no_fmt_qual.len_4.samples_1.1d.frag
With the load_global_const_block_intel NIR intrinsic, you can load a
vec8/vec16 with a predicate. The predicate is correctly uniformized to
feed into the SEND instruction's flag register.
The problem is that a series of optimization first remove the
find_live_channel and then changes the broadcast into a simple MOV
instruction, on the assumption that the first channel is always active
if there is not control flow. This is correct.
But after that the cmod optimzation will remove this instruction :
mov.nz.f0.0(16) null:D, vgrf16+0.0<0>:D NoMask
because it seems to be equivalent to :
cmp.g.f0.0(16) vgrf16:D, vgrf12:D, 63d
In this case vgrf16 is the predicate to the load block SEND
instruction. Since the execution mask is different between both, some
of the channels of the SEND instruction end up not being loaded or
loaded with the wrong predication and we end up with incorrect UBO
data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20852>
GL robust buffer access applies to texelFetch with out of bounds LODs.
imageRobustAccess2 guarantees this, but imageRobustAccess does not.
Therefore, the txf robustness lowering pass from earlier is used
to provide this guarantee and support ARB/KHR robust_buffer_access_behavior.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20808>
unlocks were incorrectly added to paths using dri2_egl_display() as
well as those using dri2_egl_display_lock()
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour. On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.
Fixes: f1efe037df ("egl/dri2: Add display lock")
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20712>
Strangely, this was somehow compiling with GCC but my futile efforts to
build mesa with msan caused me to find clang refusing to compile because
of this. Unknown how many bugs this could fix or how GCC did manage to
find "config" in scope but it's fairly obvious that this is the correct
parameter that should be used.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20830>
It was wrong. For example, if the draw vertex count was 10 and the upload
vertex count was 150, u_vbuf wouldn't unroll the draw and would instead
memcpy 150 vertices. This fixes that case.
Fixes: 068a3bf0d7 - util: move and adjust the vertex upload heuristic equation from u_vbuf
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20824>
DXIL doesn't have a "subgroup ID" or "num subgroups" construct,
so add lowering to construct them. Subgroup ID is done using
once-per-subgroup atomics on a workgroup-shared variable, and
then broadcasting that (using read_first_invocation) to the other
threads. Num subgroups is just a division with the workgroup size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20777>
this is an artifact of very old code before the dynamic state was set
for all graphics pipelines
now the checks only cause blend constants to not be updated, which triggers
bugs and validation failures
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20799>
The problematic sequence of draws is pretty rare. But there are a small
handful of games which do not exhibit the problematic sequence and for
which invalidating LRZ on draws with blend plus depthwrite enabled hurts
performance slightly. This driconf option enables opting in to the
previous behavior.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20708>
Writing depth with blend enabled means we need to invalidate LRZ,
because the written depth value could mean that a later draw with
depth enabled (where we would otherwise write LRZ) could have
fragments which don't pass the depth test due to this draw. For
example, consider this sequence of draws, with depth mode GREATER:
draw A:
z=0.1, fragments pass
draw B:
z=0.4, fragments pass
blend enabled (LRZ write disabled)
depth write enabled
draw C:
z=0.2, fragments don't pass
blend disabled
depth write enabled
Normally looking at the state in draw C, we'd assume we could
enable LRZ write. But this would cause early-z/lrz to discard
fragments from draw A which should be visible due to draw B.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8065
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20708>
offset field in kgsl_command_object is NOT used by KGSL, so
we should offset directly to iova.
Fixes weird hangs on KGSL. E.g. fixes the hang in:
dEQP-VK.memory.pipeline_barrier.transfer_dst_storage_texel_buffer.1024
cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20795>
There was a VUID-VkImageViewCreateInfo-image-04739 in the Vulkan 1.3
spec that said:
If image was created with the
VK_IMAGE_CREATE_BLOCK_TEXEL_VIEW_COMPATIBLE_BIT flag and format is a
non-compressed format, viewType must not be VK_IMAGE_VIEW_TYPE_3D
That VUID has since been removed, and when a view of a 3D image is
created, with put the depth into the array_len, so it won't be always 1.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20803>
For a depth-only-with-discard pipeline, spi_shader_col_format needs to be
fixed up to a single channel export, or otherwise discard will not work.
Since col_format can change depending on the dynamic state, precompute the
need for this workaround on pipeline creation and apply it when emitting
prolog states.
Fixes: eb07a11b8f ("radv: add support for compiling PS epilogs on-demand")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20704>
Some D3D12 drivers, like my PC's AMD driver, don't like using a
dynamic index to load from a constant buffer that's bound via
root constants. Instead, just go ahead and load the full set of
vertex data and just bcsel which one to use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20778>
Better error handling is more reliable.
Options:
-L, follow location
--retry, number of retries
--retry-all-errors, does not fail on ALL errors, that's why there is -f
-f, fail fast with no output at all on server errors
--retry-delay, make curl sleep this amount of time before each retry
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20788>
This is a weird way to do queries, but in multiview, each query
takes up N slots, where N is the number of views. D3D doesn't do
it that way, and only has one result, which fortunately is a valid
way to do Vulkan queries. We just need to take care to zero out
the other view results, and make sure they get "signaled" when
the cmdbuf is submitted.
Note that it is invalid in D3D to use ResolveQueryData on query
slots that have never actually been begun/ended, so we zero out
the data by copying zeroes into the buffer. This probably could
be optimized but oh well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
For draws, when we're emulating multiview, we need to loop them
and set up the right sysval. For clears, we always need to loop.
When not emulating, we also need to set up the right view instance
mask.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
D3D's view instancing is an optional feature, and even when it's
supported, it only goes up to 4 views, where Vulkan requires a
minimum of 6 supported views. So, we need to have a path for handling
the cases where we can't use the native feature.
In this mode, pass the view ID as a runtime var. The caller is then
responsible for looping the draw calls and filling out the constant
buffer value correctly for each draw. When we get to the last pre-rast
stage, we'll additionally want to write out gl_Layer to select the
right RTV array slice. Lastly, for the fragment shader, if there's
any input attachments, those get loaded using the RTV slice instead
of the view ID. RTV slice input into the PS is done with a signature
entry (which must be output from the previous stage) rather than a
system value.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
This adds support for D3D12-native view instancing to the compiler.
Essentially, it's just the ability to load SV_ViewID (dx.op.viewID),
set the right capability, and fill out some more PSV data. Note that
the PSV data is currently garbage. Ideally, we'd fill out a proper
input -> output and viewID -> output dependency table, but AFAIK
this is only used to enforce D3D API validation, and drivers ignore
it, so it's less critical to get it right.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT = 0x01 has been incidently the
correct memory type index, but isn't guaranteed to be, which is why it
hasn't caused issues yet
Fixes: f9515d93 ("zink: allocate/place memory using memoryTypeIndex directly")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20264>
internal_format is the "real" format of a resource, and the "real" format
of imported resources is the external-facing format, not the pipe format
this ensures the correct format is available for internal ops, such as nplanes queries
Fixes: 2e2775c11b ("zink: fix PIPE_RESOURCE_PARAM_NPLANES with format modifier")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20753>
When the base array layer of the image view is > 0, addrlib computes
the offset (in HwlComputeSubResourceOffsetForSwizzlePattern) which is
then added to the base VA in RADV. But if the driver doesn't reset
the base array layer, the hw will compute incorrect addressing
(ie. base array will be added twice). This also matches AMDVLK.
This fixes a VM fault followed by a GPU hang on RDNA2 when trying
to join a multiplayer game with medium settings in Halo Infinite.
Fixes: 98ba1e0d81 ("radv: Fix mipmap views on GFX10+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20761>
The ideal case for performance is to have a single buffer for
all display list. The caveat is that large buffers are less
likely to be freed because they're refcounted: it only takes
1 user (diplay list) to keep it in VRAM.
This lowers VRAM usage when replaying the trace attached
of the trace attached to !6140 from 5.5 GB to about 1.8 GB.
Viewperf snx performance isn't affected.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6140
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20748>
grow_vertex_storage may call wrap_filled_vertex, which will
trigger the assert incorrectly because the new size will be
smaller than 'new_size' but it's correct because
'vertex_store->used' has been reset to 0.
Fixes: a08baaff97 ("vbo/dlist: fix indentation in vbo_save_api.c")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20748>
These are handled identically in almost all cases. There is one place
in the legacy surface lowering that was obtaining the bitsize from the
opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8)
for that and works just fine. If we just do that in the legacy lowering
too, then we don't need this plethora of opcodes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
These are basically identical save for:
- shared has surface hardcoded to SLM rather than an SSBO index
- shared has to handle adding the 'base' const_index (SSBO have none)
- the NIR source index for data is shifted by one
It's not worth copy and pasting the entire function for this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
These are now basically identical to their non-float counterparts. The
only thing that differed was the opcode checking to determine which
operands existed. Now that we have a unified opcode enum and a helper
for the number of data operands, we can just use that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation. Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor. No need for a separate opcode.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines. We have to translate one way or
another, and using the modern set makes sense going forward.
One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.
The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:
x <- load.ugm // Atomic read sequenced-before the barrier
y <- fence.ugm
z <- fence.tgm
wait(y, z)
w <- load.tgm // Read sequenced-after the barrier
In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.
v2: Apply the workaround regardless of whether the NIR barrier
intrinsic specifies multiple storage classes or a single one,
since an acquire barrier is required to order subsequent requests
relative to previous atomic requests of unknown storage class not
necessarily specified by the memory scope information of the
intrinsic.
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
In this usual case, do a quick check to avoid generating 5 useless instructions
(mov/vec4 instructions). They'll get copypropped but that creates more work for
the optimizer and nir/lower_blend runs in a hot variant path on both Asahi and
Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
nir/lower_blend asserts:
assert(nir_intrinsic_write_mask(store) ==
nir_component_mask(store->num_components));
For the special blend shaders used in Panfrost, this holds. But for arbitrary
shaders coming out of GLSL-to-NIR (as used with Asahi), this does not hold. In
particular, after nir_opt_undef runs, undefined components can be trimmed.
Concretely, if we have the shader:
gl_FragColor.xyz = foo;
Then this becomes in NIR
gl_FragColor = vec4(foo.xyz, undef);
and then opt_undef will give the store_deref a wrmask of xyz but 4 components.
Then lower_blend asserts out.
Found in a gfxbench shader on asahi.
Closes: #6982
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
Per the spec.
Fixes arb_color_buffer_float-render on both Panfrost and Asahi (before/after
reproduced on Mali-T860 and AGX G13 respectively). Without that patch, that test
fails the assertion:
arb_color_buffer_float-render: ../src/compiler/nir/nir_lower_blend.c:259: nir_blend_logicop: Assertion `util_format_is_pure_integer(format)' failed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
The upper bits start correctly, there's no need to clear them as long as we keep
them zero'ed by using ixor with a valid bit mask instead of inot.
Makes the code generated for logic op slightly less ridiculous. I'm joking. It's
still ridiculous but I'm not in the mood to fix up the Midgard compiler and it's
just a little ALU for a feature almost nothing uses.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
Particularly constant colours, but also (more obscurely) SNORM.
Fixes arb_color_buffer_float-render with SNORM framebuffers. Issue affects both
Asahi and Panfrost (the latter after we start advertising EXT_render_snorm).
v2: Check the blend factor to avoid unnecessary clamps. This avoids regressing
blend shader code quality on Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com> [v1]
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
In this case we have 4 components but the value of the fourth component
is undefined. Apply the fixup we already have.
Fixes
dEQP-GLES3.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.0
on Asahi. That test blend with DST_ALPHA with its RGB565 attachment,
which is fine if RGB565 is preserved, but Asahi is demoting that
format to RGBX8 which means -- after lowering the tilebuffer access --
we blend with an ssa_undef.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
This patch is comprised of three main changes:
- Add a "shim" for GDI, since Xbox doesn't expose this library
- New framebuffer file, to support the Xbox "windowing" system
- Implement a custom WndProc hook for Xbox, since SetWindowsHookEx isn't supported either
Other than that, it's similar to the previous Xbox commits which mostly disable Win32-specific logic.
Co-authored-by: Ethan Lee <flibitijibibo@gmail.com>
Co-authored-by: David Jacewicz <david.jacewicz@protonmail.com>
Co-authored-by: tieuchanlong <tieuchanlong@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19022>
Three reasons for that:
0. The operation we're doing here is actually a reallocation.
1. The newer code is, IMHO, easier to read.
2. Realloc has this property where sometimes, when possible, it will
expand your array without moving it somewhere else, so it doesn't
need to copy the memory contents, returning the original pointer
back to you. I did some analysis and while that case is not common,
it does happen sometimes in real world applications (I could see it
happening in Shootergame and Aztec Ruins, but not Dota 2), so we're
able to save a few CPU cycles.
v2: Rebase.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>
This is the only code path where we don't run anv_execbuf_finish() in
case anv_execbuf_add_bo() fails. While there is not a bug in the
current tree, I recently made an (uncommitted) modification that
started leaking memory and made me realize the lack of cleanup here.
If we had anv_execbuf_finish() being called upon error like we're
going to have after this patch my modification wouldn't have caused
the memory leak.
I think it's much safer and future-proof if we're able to operate
under the assumption that whatever is allocated and set to anv_execbuf
will be dealt with upon failure of anything else related to it, so
functions that fail should only be required to free pointers not yet
assigned to anv_execbuf.
The dEQP-VK 'alloc_callback_fail' tests should exercise this code
path. The one I was specifically using here is:
dEQP-VK.api.object_management.alloc_callback_fail.device_group
v2: Rebase.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>
Because anv_execbuf_add_bo_bitset() calls anv_execbuf_add_bo(), which
can fail if its memory allocations fail.
I have seen dEQP tests exercising memory allocation failures during
anv_execbuf_add_bo(), but I don't think the path coming from
add_bo_biset() was specifically exercised. Anyway, add the error check
just in case.
v2: Rebase.
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>
In anv_execbuf_add_syncobj(), we try to not create or use
exec->syncobj_values if we don't need to. But when we figure we're
going to need it (i.e., when timeline_value is not zero), then we
create exec->syncobj_values with vk_zalloc, which means every previous
value is set to zero, as it should be. This is all correct.
The problem starts when we add a 16th element. In this case we double
exec->syncobj_array_length and realloc the buffer by using vk_alloc
and copying the old array to the new one. After that, we write the
timeline_value to the array only if it's not zero, and that's the
problem: since we just used vkalloc and memcpy, we don't have any
guarantees that the new array will be zero after the 16th element, and
if timeline_value is zero we write nothing to that position.
Once we start using exec->syncobj_values we have to commit to using
it, so the "if (timeline_value)" check near the end of the function
has to be changed to "if (exec->syncobj_values)", so we actually set
elements after the 16th to zero when they need to be zero. Another
approach to fix this would be to memset the new elements once we
double syncobj_array_length.
In practice, I couldn't find any application or deqp test that used
more than 3 elements in exec->syncobj_array_length, and we need more
than 16 elements in order to be able to reproduce the bug, so I'm not
aware of any real-world bug that goes away with this patch. This issue
was found while reading code.
If we craft a little Vulkan program that submits a ton of timeline and
binary semaphores on vkQueueSubmit, then waits for them, we get the
following error without this patch:
MESA: error: ../../src/intel/vulkan/anv_batch_chain.c:1910: execbuf2 failed: Invalid argument (VK_ERROR_DEVICE_LOST)
v2: Rebase.
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>
Commit e07c5467 ("v3dv/format: use XYZ1 swizzle for three-component formats")
removes the only code that handled the clamp_to_transparent_black_border
variable. Therefore, the variable can be deleted, as it is not currently
being used.
Fixes: e07c5467 ("v3dv/format: use XYZ1 swizzle for three-component formats")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20746>
ARB programs are logically separate, and Mesa will happily mix and match them.
We need to alert backends of this fact, by setting nir->info.separate_shader.
Otherwise, backends may link shaders invalidly.
Fixes fp-abs-01 on Bifrost. (We don't use separate_shader for anything on
Valhall, so the issue doesn't appear there.)
Compare 151aa19c21 ("ttn: Set nir->info.separate_shader"), which fixed a
similar issue with TGSI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20688>
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x7fb6224340 in calloc (/usr/lib64/libasan.so.6.0.0+0xa4340)
#1 0x7facfdd5a0 in u_transfer_helper_create ../src/gallium/auxiliary/util/u_transfer_helper.c:580
#2 0x7facf2e09c in lima_resource_screen_init ../src/gallium/drivers/lima/lima_resource.c:935
#3 0x7facf23af4 in lima_screen_create ../src/gallium/drivers/lima/lima_screen.c:746
#4 0x7fac83ed30 in kmsro_drm_screen_create ../src/gallium/winsys/kmsro/drm/kmsro_drm_winsys.c:124
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20764>
Note that as we are enabling the feature, we need to set the
VK_FORMAT_FEATURE_2_STORAGE_READ_WITHOUT_FORMAT_BIT_KHR for any format
that supports STORAGE_IMAGE_BIT, from spec:
"An implementation that supports
VK_FORMAT_FEATURE_STORAGE_IMAGE_BIT for any format from the given
list of formats and supports shaderStorageImageReadWithoutFormat
must support VK_FORMAT_FEATURE_2_STORAGE_READ_WITHOUT_FORMAT_BIT
for that same format if Vulkan 1.3 or the
VK_KHR_format_feature_flags2 extension is supported."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20744>
Needed to support Vulkan feature shaderStorageImageReadWithoutFormat.
With that enabled we could receive a NONE format on a load image. For
those we treat them as 32-bit formats, that would mean that the
lowering would not need to do any format-specific unpacking.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20744>
We don't use a base workgroup ID for BLOCS. It needs to be lowered, or
else we'll assert fail when compiling the compute shader.
(Note for stable: this patch doesn't fix a bug in 4abdecce22
specifically, but rather is a missing patch that needed to go along with
the rest of MR 20068, on whichever branches it exists on.)
Fixes: 4abdecce22 ("iris: Lower load_base_workgroup_id to zero")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20750>
The ANV bug this was meant to represent has been long fixed, and the
workaround has just been a proxy for EXT_depth_clip_control for a while
now.
Let's simplify things a bit, by removing this flag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20740>
if a resource is bound as both a framebuffer attachment and a sampler but
isn't actually used as a sampler, it's just a framebuffer attachment, and it
should retain its layout as a framebuffer attachment without any barriers
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20674>
in many cases, a texture may be bound as both a framebuffer attachment
and a samplerview without the latter actually being used by a shader
this avoids unnecessary feedback loop tagging, which should improve
perf and avoid spurious warning messages on drivers that don't support
the feedback loop ext
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20674>
We cannot change this, as it has already been communicated to app
partners. Also this breaks chrome's GPU quirk matching (which in some
cases is non-gpu-related, but when all you have is a hammer, everything
looks like a nail).
Fixes: 9c1fbc076a ("Return 'Mesa' for GL_VENDOR for community drivers")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20757>
The video session and video session parameters objects can have a common
base class the drivers can inherit from if needed.
This creates code to parse the h264/h265 parameter sets into common
structs.
v2: add h265 VPS, add a macro for FIND/ADD generations, changes the API
to make generation easier.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20389>
This allows support for SSBOs/images in all shader stages. And also,
unlike the bindful IBO state, does not introduce a dependency on the
program state. With bindless descriptors, SSBO and image fetch lowered
to isam can re-use the same descriptor. This will let us remove the
TEX state dependency on PROG state (in a following cleanup commit).
Note, this does not yet switch the pipe caps to reflect that we can
support SSBOs/images in other shader stages.. because ir3 still tells us
nibo>0 even though we are using bindless and that triggers an assert in
the build_ibo() path. Probably we want ir3 to be more clever.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20687>
This will be used when we switch over to lowering image/SSBO to
bindless.
Note that it also starts using CP_SET_DRAW_STATE in the compute path.
Subsequent cleanup will switch texture and eventually other state over
as well (which will make more sense when we get more clever than
emitting all state for every compute grid, but for now simplifies
re-using the same code between 3d and compute).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20687>
Pre-bake IBO descriptor sets at the time that images/SSBOs are bound,
and re-use the pre-baked descriptors at draw time when we emit state.
This starts putting in place the state tracking we'll use when switching
over to bindless IBO state, without yet changing the shaders (lowering
to bindless) or changing the actual state emitted (other than switching
to use the storage descriptor for image reads via isam, like tu does).
Note that this even pre-bakes the iova into the descriptor, rather than
relying on OUT_RELOC() to do the bo tracking, so we need to manually
attach the bo to the ring. But we already require FD_BO_NO_HARDPIN for
a6xx. This makes the state emit a straight memcpy, and will simplify
things when it comes to generating the bindless descriptor set (which
due to the desc_size field in the low bits of the BINDLESS_BASE regs
would be awkward to construct as a ring rather than a bo).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20687>
Add support to lower IBO (image/SSBO) and fb-read to use bindless
descriptors. This will be used by a6xx to avoid having to merge image
and SSBO state into a single compact IBO descriptor, and also simplify
enabling image and SSBO support for additional shader stages (since each
stage can use it's own descriptor set).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20687>
../src/freedreno/ds/fd_pps_driver.cc:656:44: error: comparison of integer expressions of different signedness: '__gnu_cxx::__alloc_traits<std::allocator<int>, int>::value_type' {aka 'int'} and 'const unsigned int' [-Werror=sign-compare]
656 | assert(d->assigned_counters[i] < g->num_counters);
cc1plus: all warnings being treated as errors
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20660>
Popular cases in this group recently:
1 dEQP-GLES3.functional.fbo.blit.conversion.r16ui_to_r16ui
1 dEQP-GLES3.functional.fbo.blit.conversion.r16ui_to_rgb10_a2ui
1 dEQP-GLES3.functional.fbo.blit.conversion.rgb5_a1_to_rgb5_a1
3 dEQP-GLES3.functional.fbo.blit.conversion.rgba4_to_r32f
4 dEQP-GLES3.functional.fbo.blit.conversion.rgb565_to_rgba8
5 dEQP-GLES3.functional.fbo.blit.conversion.rgba4_to_rg16f
There's pretty clearly something common with blitting from 16-bit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20759>
pruning old swapchains is challenging because there's no way to definitively
know when to destroy them without VK_EXT_swapchain_maintenance1 which isn't
supported yet
initially, I handled it by only pruning on shutdown and whenever a new swapchain
was created since those are both safe points, but this leads to scenarios where
a dead swapchain can exist for the entire lifetime of an application
if the swapinterval is changed
to avoid such ballooning, check whether the current swapchain has ever presented
on each present queue and then prune based on this
fixes#7529
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20588>
In the original change I noticed that missing robustness on swkms seemed
to be an oversight, since it was enabled on sw-non-kms, so I exposed the
ext based on the underlying pipe query. However it turns out that there
is a dri_screen flag for allowing robust contexts that exists to do error
checking for GLX, which was under an !swkms check. So we would expose the
ext, but then throw an error if you tried to create one.
Fixes: e6285ea55f ("egl: Replace the robustness DRI2 ext check with a pipe cap query.")
Closes: #8066
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20679>
From the Vulkan spec, the WAIT flag on vkCmdCopyQueryPoolResults only
serves to increase the first synchronization scope to include query end
commands, but either way, the synchronization scope only includes
commands that occur earlier in submission order. In other words, we
don't need to enforce queue ordering, a pipeline barrier is all that's
needed.
Fixes deadlocks in the timestamp.misc_tests.two_cmd_buffers_primary test.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20617>
D3D considers the rasterizer enabled if there's a pixel shader *or* if
depth is enabled, since you can do depth-only rendering. After parsing
shaders, if we find that there was supposed to be a pixel shader, but
we removed it because there was no output position, disable depth too.
Also, store this info in the cache, since we might not even load the
nir shaders if we'd seen this pipeline before.
Fixes dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20617>
The compute path does this save/restore dance with the current batch, so
various things called to emit state can assume ctx->batch is the current
thing. But during resource tracking, which could have flushed what was
previously the current batch. Fixes a problem that surfaces in the next
patch when we stop just flushing batches for all the barriers.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20575>
The host can't assign more than 32 locations explicitly, and we
exhaust this already when we handle patches and generics. So
drop the separable flag in cases when we have other IO that
uses generated names that will have to be matched by name.
v2: skip tests for VS input and FS outputs
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20738>
With that we get the correct base offset when accessing image arrays.
This is required if there a various images with different access
specifiers, because only with the correct base offset the host driver is
able to pick the right array.
Fixes GL-CTS: KHR-GL43.shading_language_420pack.binding_image_array
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19980>
Some drivers may encode constant offsets in the instruction, so
make it possible for the drivers to request lowering the atomic
uniform offset into the range_base variable of the intrinsic.
v2: drop patch to use build-in array offset evaluation, it makes
problems with zink, and update the code accordingly
v3: always initialize range base
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19980>
Add the intrinsic range_base value to the image intrinsics and add
the option to store the image array offset into range_base instead
of adding it to the image array index if the driver requests it.
v2: Always initialize range_base
v3: fix for bindless intrinsics
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19980>
When querying capabilities or creating views using a scoped aspect
mask, we want to return the format for the correct single-channel
format, but when actually creating the resource (aspect mask 0),
we want to use the typeless format, since the single-channel formats
don't report multisampling support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20614>
Only add SV_SampleIndex if there exists a sample-rate var that has either flat
interpolation or centroid (and therefore can't force sample rate implicitly),
unless there is also a sample-rate var that doesn't have those properties.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20614>
There's VK tests that have mismatching interpolation specifiers between FS
and the previous stage. For structs, that resulted in different types, which
breaks DXIL validation.
We could link the shaders and have that overwrite the interpolation field from
the previous shader, but we could also just not care and always use float.
I don't see any regressions from that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20614>
NIR has a fsin instruction that takes an argument in radians. Midgard instead
has an fsinpi argument that takes an argument in multiples of pi. So, we had a
NIR pass that would change fsin(x) to fsin(x / pi) and then map fsin to fsinpi
in the backend.
But that's invalid! In NIR, the opcode fsin is well-defined. fsin(x) means
something very different than fsin(x / pi). They won't usually be equal. The
transform fsin(x) -> fsin(x / pi) is fundamentally unsound.
It did work before, by accident. Most NIR passes don't care about the semantics
of ALU instructions. fsin(x) and fsin(x / pi) are both well-defined but
fundamentally different NIR shaders. So while rewriting is wrong -- the NIR we
get out is not equivalent to the NIR we put in, and the Midgard ops we generate
are not equivalent to the NIR -- but if we don't run any passes that care about
the definition of fsin the two wrongs will cancel out to make a right.
However, some NIR passes do care about the definitions of ALU instructions,
instead of treating them as named black boxes. In particular, constant folding
(nir_opt_constant_fold) evaluates ALU instructions when their inputs are
constants, according to the definition in nir_opcodes.py. So our little charade
will only work if we don't call nir_opt_constant_fold, or if all the fsin
instructions have non-constant inputs. At the beginning of this series, that is
the case. With the later scalarization change, that's no longer the case, and
the unsoundness translates to real failing tests rather than a quibble of NIR's
semantics.
To mitigate, we define a new NIR opcode with the semantics we want and translate
fsin(x) = fsin_mdg(x / pi), where that equivalence does hold mathematically. So
the new translation is sound and doesn't rely on lucky pass ordering.
This matches the approach already used for AMD and AGX, which have fsin_amd and
fsin_agx opcodes respectively.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>
We already do this in NIR since a04aa4bc08
and 3f97306b95 so there is no effect
for the mesa state tracker now that it can not emit TGSI any more.
This leaves only nine when RADEON_DEBUG=use_tgsi is set. D3D9 however
requires that sin and cos inputs already have the proper range.
This is super important when the nine shader uses relative adressing
and therefore needs all 256 constants we have. If we add our extra
constants for the fixup, we get over the limit and fail compilation.
v2: vertex shaders only
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18933>
Depths 24 and 30 happen to have uniform bpc but 16 does not. Pull the
real channel width out of the format description instead. This is still
a bit ignorant of channel order though.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20673>
Currently etnaviv disables TS on all MC1.0 GPUs, since the TS unit
doesn't properly take into account the linear window offset with
MC1.0, creating address aliases on MMUv1 that aren't properly dealt
with.
MMUv2 however doesn't have a linear window, so we can safely enable
TS on those GPUs.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20552>
Add a new environment varible
MESA_DISK_CACHE_READ_ONLY_FOZ_DBS_DYNAMICE_LIST that specifies a text
file containing a list of RO fossilize caches to load. The list file
is modifiable at runtime to allow for loading RO caches after
initialization unlike MESA_DISK_CACHE_READ_ONLY_FOZ_DBS.
The implementation spawns an updater thread that uses inotify to monitor
the list file for modifications, attempting to load new foz dbs added to
the list. Removing files from the list will not evict a loaded cache.
MESA_DISK_CACHE_READ_ONLY_FOZ_DBS_DYNAMIC_LIST takes an absolute path.
The file must exist at initialization for updating to occur.
File names of foz dbs in the list file are new-line separated and take
relative paths to the default cache directory like
MESA_DISK_CACHE_READ_ONLY_FOZ_DBS.
The maximum number of RO foz dbs is kept to 8 and is shared between
MESA_DISK_CACHE_READ_ONLY_FOZ_DBS_DYNAMIC_LIST and
MESA_DISK_CACHE_READ_ONLY_FOZ_DBS.
The intended use case for this feature is to allow prebuilt caches
to be downloaded and loaded asynchronously during app runtime.
Prebuilt caches be large (several GB) and depending on network
conditions would otherwise present extended wait time for caches
to be availible before app launch.
This will be used in Chrome OS.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19328>
When loading multiple RO foz dbs, if a db fails to load, continue trying
to load other RO foz dbs instead of destroying the foz cache.
Preserve destroying the foz cache and not preceding to load RO caches
if the RW cache fails to load.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19328>
opaque should not be set when logicops are enabled, that needs blending
even on Bifrost. Fixes is for when I believe the bug became possible to hit.
The logical error is older.
Fixes Piglit logicop tests again.
Fixes: d849d9779a ("panfrost: Avoid blend shader when not blending")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20685>
Unlike literally every other mesa/st emulation, for some inexplicable reason we
need to pretend to support the CAP and then set a different EMULATE cap instead
of the emulation keying off the lack of support for the CAP. Set the CAPs
accordingly so we get NV_primitive_restart (with emulation of non-fixed
indices).
This gets Mesa to advertise GL 3.1 on Mali-G57 as intended.
Fixes: 30c14f54cf ("panfrost: Disable PIPE_CAP_PRIMITIVE_RESTART on v9")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20702>
Future changes to nir_lower_blend cause fsat(reg.yx) instructions to be
generated, which correspond to "FCLAMP.v2f16 x.h10" pseudoinstructions. These
get their swizzles lowered, but we forgot to clear the swizzle out, so we end up
with extra swap (cancelling out the intended swizzle).
Fix the lowering logic.
Fixes: ac636f5adb ("pan/bi: Use FCLAMP pseudo op for clamp prop")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20683>
Original patches wrote by Ella Stanforth.
Alejandro Piñeiro main changes (skipping the small fixes/typos):
* Reduced the list of supported formats to
VK_FORMAT_G8_B8_R8_3PLANE_420_UNORM and
VK_FORMAT_G8_B8R8_2PLANE_420_UNORM, that are the two only
mandatory by the spec.
* Fix format features exposed with YCbCr:
* Disallow some features not supported with YCbCr (like blitting)
* Disallow storage image support. Not clear if really useful. Even
if there are CTS tests, there is an ongoing discussion about the
possibility to remove them.
* Expose VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT, that is
mandatory for the formats supported.
* Not expose VK_FORMAT_FEATURE_2_MIDPOINT_CHROMA_SAMPLES_BIT. Some
CTS tests are failing right now, and it is not mandatory. Likely
to be revisit later.
* We are keeping VK_FORMAT_FEATURE_2_DISJOINT_BIT and
VK_FORMAT_FEATURE_2_MIDPOINT_CHROMA_SAMPLES_BIT. Even if they
are optional, it is working with the two formats that we are
exposing. Likely that will need to be refined if we start to
expose more formats.
* create_image_view: don't use hardcoded 0x70, but instead doing an
explicit bit or of VK_IMAGE_ASPECT_PLANE_0/1/2_BIT
* image_format_plane_features: keep how supported aspects and
separate stencil check is done. Even if the change introduced was
correct (not sure about that though), that change is unrelated to
this work
* write_image_descriptor: add additional checks for descriptor type,
to compute properly the offset.
* Cosmetic changes (don't use // for comments, capital letters, etc)
* Main changes coming from the review:
* Not use image aliases. All the info is already on the image
planes, and some points of the code were confusing as it was
using always a hardcoded plane 0.
* Squashed the two original main patches. YCbCr conversion was
leaking on the multi-planar support, as some support needed
info coming from the ycbcr structs.
* Not expose the extension on Android, and explicitly assert that
we expect plane_count to be 1 always.
* For a full list of review changes see MR#19950
Signed-off-by: Ella Stanforth <estanforth@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19950>
Although for any buffer created by the user, or any API that can be
called by the user (like GetDeviceBufferMemoryRequirements) the
alignment is V3D_NON_COHERENT_ATOM_SIZE, there are internal uses of a
buffer that could require a fine-grained alignment (like when used as
a alias for a image, that has different alignment requirements).
Note that an alternative would have created a
v3dv_buffer_init_with_alignment (or similar name), but this option
seemed easier.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19950>
We set the different data being freed to NULL after freeing it, and
checks for NULL before freeing it.
This fixes several double free crash with v3dv, when running OOM wsi
tests, like for example:
dEQP-VK.wsi.xlib.swapchain.simulate_oom.composite_alpha
Although note that only one person got those on a new fresh install of
the Raspbian OS, so this problem was rare.
Fixes: 5b13d74583 ("vulkan/wsi/drm: Break create_native_image in pieces")
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20695>
Constructs like
out vec4 fs_out;
....
fs_out = vec4(...);
if (fs_out.w < alpha_test_value)
discard;
lead to initial nir that reads from fs_out, even though we don't actually
do a framebuffer fetch, and later nir passes will eliminate that direct
read from the output variable. As given in the commit message of 1124bee4
we are actually only interested in the framebuffer fetch, so set the
property only when an output is used for fbfetch reads.
v2: Iterate over all variables (Jason)
Fixes: commit 1124bee4ba
glsl/nir: Set sample_shading if a FS output ever shows up as an rvalue
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20694>
De-duplicate GPU_VERSION/VK_DRIVER and add different jobs that can be
extended for limozeen vs kingoftown runners in order to de-duplicate the
DEVICE_TYPE/DTB/RUNNER_TAG variables. This should simplify moving jobs
between runners to load-balance.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20715>
This isn't allowed for the same reason that AFBC of regular luminance-alpha
isn't allowed (and will raise DATA_INVALID_FAULTs). Reorder the checks to
ensure these formats are checked.
Fixes Piglit texwrap GL_EXT_texture_sRGB-s3tc.
Fixes: 476be5cb27 ("panfrost: Don't use texture format swizzles on v7")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20686>
renderdoc won't work with zink in frontends that aren't dri,
so ZINK_RENDERDOC should be used to specify start:end frames
to ensure that the vulkan command stream is captured
this is not a renderdoc issue: there are no frame boundaries in rusticl
or gallium-nine, so there is no possible way that renderdoc could
determine when/how to split frames
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20651>
Per Yiwei:
"For vn_QueueSubmit and other exposed Vulkan entry points, we keep the
original Vulkan variable namings. If within the same function you need
to use struct vn_queue *queue, then we prefix a _ to the args in the
exposed entry points, so it becomes VkQueue _queue.
For all other places:
VkObject obj_handle
struct vn_object *obj
The obj in this file can be queue, fence, sem, event, cmd, dev, etc."
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20088>
Refactor the QueueSubmit functions to share a common function differing
in the vkQueueSubmit/vkQueueSubmit2 call with differences with
VkSubmitInfo/VkSubmitInfo2 handled in the
vn_queue_submission_prepare_submit().
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20088>
Instead of calling an additional QueueSubmit for fence feedback, append
a SubmitInfo batch for fence feedback. This does require copying the
submitted batches to a larger buffer with an additional slot for the
fence feedback batch.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20088>
In preparation for adding additional dependency requirements for
external sync fd support.
Move vn_physical_device_init_external_* so external sync fd support can
be retrieved earlier. Then move sync2 disabling to
vn_physical_device_get_passthrough_extensions and 1.3 downgrading
to vn_physical_device_init_properties.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20088>
Commit introduces a fix that allows for gbm to be built with an empty
backend. There are situation especially in a Yocto/OE cross compilation
environment where you want to build with an empty backend. The particular
situation is as such:
The mesa-gl recipe is the preferred provider for virtual/libgbm, virtual/libgl,
virtual/mesa, etc... But the x11 DISTRO_FEATURE in't included this leads to build
errors such as:
| /../../../ld: src/gbm/libgbm.so.1.0.0.p/main_backend.c.o: in function `find_backend':
| backend.c:(.text.find_backend+0xa4): undefined reference to `gbm_dri_backend'
| /../../../ld: src/gbm/libgbm.so.1.0.0.p/main_backend.c.o:(.data.rel.ro.builtin_backends+0x4):
undefined reference to `gbm_dri_backend'
| collect2: error: ld returned 1 exit status
Issue should be replicable by setting -Ddri3=disabled and -Dgbm=enabled
Add fix to bypasses compilation issue by excluding gbm dri backend. If
HAVE_DRI || HAVE_DRIX not specified.
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Signed-off-by: Vincent Davis Jr <vince@underview.tech>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20447>
... if you have EXT_framebuffer_sRGB. The extension just relaxes an
error check that we're already not performing, and sRGB rendering
implies sRGB texture support, and mipmap generation would need it to be
a valid render format. So advertise it if EXT_framebuffer_sRGB works.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19476>
This was just all kinds of wrong. Note that the comment implying "this
is a workaround for old XQuartz" is on the "not apple" side of the
ifdef. Anyway. xserver didn't start sending GLX_DRAWABLE_TYPE in the
fbconfig until:
commit 8cde0af3c57f0375ba8ba77af9fdf74b79d9496d
Author: Kristian Høgsberg <krh@redhat.com>
Date: Wed Apr 2 19:06:40 2008 -0400
Send the GLX_EXT_texture_from_pixmap attributes to the client.
So we can remove this default from the fbconfig path. But we preserve it
for the GLXGetVisualConfigs path, because that is specified not to send
GLX_DRAWABLE_TYPE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20549>
First, move a few early-out checks up since they don't need to take the
GLX lock. Second, move garbage collecting deleted contexts up to
immediately after they are unbound. This fixes a memory leak, albeit a
difficult one to hit, in the case where you switch away from a deleted
context but switching to the new one errors out. In that case we would
leak the deleted context, since it's been unbound from all threads and
there's no longer an XID for it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20549>
If we pass a physical 2D texture descriptor we should also pass 2
coords. Otherwise it just uses the random content in the second
register which ends up funny sometimes.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20696>
If the previously emitted graphics pipeline uses the value A for
col_format_non_compacted and the new bound graphics pipeline uses B.
At bind time, radv_cmd_state::col_format_non_compacted will be set to
B and the rbplus flag will be dirtied. But if there is no draws and a
new graphics pipeline is bound with the same value as A, the next
draw will emit the rbplus state with B instead of A.
This can be basically triggered with meta operations after drawing
because the driver saves/restores the bound pipeline.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8073
Fixes: 11469f7553 ("radv: copy the non-compacted color format at pipeline bind time")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20692>
The results.csv file of a full piglit run is about 6 MB.
Given how seldomly this file is being used, and the fact that it cannot
be viewed directly in gitlab's artifact page anyway.
Let's compress the file using zstd, and enjoy a ~90% reduction in size
at the cost of probably less than 500ms of compression time on a slow
device, and 55ms on the CI machines in the valve farm.
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20669>
The results.csv file can get ridiculously big for a vkcts run (up to
135MB). Given how seldomly this file is being used, and the fact that
it cannot be viewed directly in gitlab's artifact page anyway.
Let's compress the file using zstd, and enjoy a ~95% reduction in size
at the cost of probably less than 1 second of compression time on even
the slowest of the devices in CI (which would use sharing), and about
150ms on the CI machines in the Valve farm.
Suggested-by: Daniel Stone <daniels@collabora.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20669>
load_preamble is intended to be almost free (costing at most a move), and it
does not have special bounds checking requirement, so it's ok to select with it.
With this, drivers that use nir_opt_preamble together with a late call to
peephole_select can optimize sequences like:
if (x) {
<uniform-on-uniform calculation>
} else {
<different uniform-on-uniform calculation>
}
to simply
bcsel(x, <uniform register 0>, <uniform register 1>)
rather than emitting needless control flow / branching over some moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20597>
multiplying by the array size is always wrong for this case, and not
doing so allows for some simplification and better inlining, though
the output results are identical
the one corner case is clip/cull distance, which need special handling
since they're arrays with vec semantics
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20678>
This is heavily copy-pasted from a patch of Ian Romanick, including the
commit message.
Previously, this pass always generated fcsel for bcsel. This was the
only place that generate fcsel, so various drivers assumed (and needed!)
that src0 was a Boolean with 0.0 or 1.0 as the only values.
Specifically, many DX9 / GL_ARB_vertex_program platforms lack a CMP
instruction in vertex shaders. In those cases, they would use LRP to
implement fcsel. The bummer is that many plaforms have a real fcsel
instruction, and those platforms would benefit from other places
generating that opcode.
Instead of leaving assumptions in drivers about the sources of an opcode
that they can't really support, allow them to control the way the
lowering pass translates bcsel. Two flags are used to control this:
- If the driver sets has_fused_comp_and_csel in nir_options, fcsel_gt
will be used. Since the Boolean value is 0.0 or 1.0, this is
equivalent to fcsel.
- If the parameter has_fcsel_ne is set, fcsel will be used. This is the
old path.
- Otherwise, the lowering pass assumes we're on a crufty, old DX9 vertex
program, and it emits flrp.
With this, the assumptions about src0 of fcsel in NTT can be removed.
If a platform can't handle fcsel, it should ensure that the lowering
pass won't generate it.
No change in shader-db.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
DX9 PS 1.x / GL_ARB_fragment_program shaders that have been converted to
GLSL are littered with patterns like
ps_r1.x = ((ps_t0.y >= 0.0) ? ps_r1.x : ps_c1.y);
This is because CMP is a fundamental opcode in those earlier shading
languages. i915 supports this opcode natively, but there's no way to
get it directly into the backend. Instead, NIR and NTT generate some
combination of fcsel and sge and hope for the best.
i915
total instructions in shared programs: 49032 -> 48897 (-0.28%)
instructions in affected programs: 4173 -> 4038 (-3.24%)
helped: 39
HURT: 0
total temps in shared programs: 2795 -> 2790 (-0.18%)
temps in affected programs: 22 -> 17 (-22.73%)
helped: 5
HURT: 0
total const in shared programs: 4976 -> 4967 (-0.18%)
const in affected programs: 203 -> 194 (-4.43%)
helped: 9
HURT: 0
GAINED: shaders/trine/fp-13.shader_test FS
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.