Previously, calling vkCmdCopyQueryPoolResults with the
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT flag set the query result
field in the buffer to 0 if unavailable and the query result if
available. This was a misunderstanding of the Vulkan spec, and this
commit corrects the behavior to emitting a separate available
result in addition to the query result.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
Previously, calling vkGetQueryPoolResults with the
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT flag set the query result
field in *pData to 0 if unavailable and the query result if
available. This was a misunderstanding of the Vulkan spec, and this
commit corrects the behavior to eriting a separate available result
in addition to the query result.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
When setting the vertex buffers, lima calls
util_set_vertex_buffers_mask() to reference and copy buffers. That
function
function adds dst with start_slot internally, so lima should not offset
the destination address again.
This is discovered when comparing with other drivers, and fixed by
removing the extra offset in lima_set_vertex_buffers().
This fixes draws that get translated in u_vbuf, because u_vbuf adds
extra vertex buffers when translating.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3620>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3620>
The gmem state is split out now, so it does not require synchronization.
But gmem rendering still accesses vsc state from the context.
TODO maybe there is a better way? For gen's that don't do vsc resizing,
this is probably easier.. but for a6xx there isn't really a great
position for more fine grained locking. Maybe it doesn't matter since
in practice the lock shouldn't be contended.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3503>
For the moment, everything is I915_EXEC_RENDER, so this isn't necessary.
But even should that change, I don't think we want to handle multiple
engines in this manner.
Nowadays, we have batch->name (IRIS_BATCH_RENDER, IRIS_BATCH_COMPUTE,
possibly an IRIS_BATCH_BLIT for blorp batches someday), which describes
the functional usage of the batch. We can simply check that and select
an engine for that class of work (assuming there ever is more than one).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3613>
For all VMEM instructions, the resource constant is now
in operands[0]. For MIMG instructions, the sampler shares
operands[1] with write data in case this instruction writes memory.
Moving the VADDR to be the last operand for MIMG is the first step to
support Navi NSA encoding.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3602>
In a NIR generated using SPIR-V initializers to variables, copy
propagation can end up transforming
vec1 32 ssa_33 = deref_var &@1 (shared mat2x4)
vec1 32 ssa_35 = mov ssa_33
vec1 32 ssa_7 = deref_cast (mat2x4 *)ssa_35 (shared mat2x4) /* ptr_stride=0 */
into
vec1 32 ssa_33 = deref_var &@1 (shared mat2x4)
vec1 32 ssa_7 = deref_cast (mat2x4 *)ssa_33 (shared mat2x4) /* ptr_stride=0 */
Before the optimization, the "head" of a path of deref that uses ssa_7
will be the cast. After, it will be the variable in ssa_33. Since
the types are the same, this is a trivial cast that would be picked up
by nir_opt_deref.
If we need to compare such deref-chain after optimization with another
deref-chain for the same variable, the compare function will get
confused by the cast in the middle.
One alternative would be to add nir_opt_deref to places that compare
derefs, but that might not scale well, so skip the trivial casts when
generating the paths instead.
Motivated by the discussion in
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047#note_383660.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3420>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3420>
process_block() will use this to determine the register demand of the
before the current instruction. Previously, it was filled with zeroes
which could result in process_block() only using the register demand
of after the current instruction.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
Otherwise, code like this will be broken:
loop {
if (...) {
break;
} else {
break;
}
}
The continue_or_break block doesn't have any logical predecessors but it's
a logical predecessor of the header block. This liveness error breaks the
spiller in init_live_in_vars() (under "keep variables spilled on all
incoming paths") and eventually creates garbage reloads.
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
The operand isn't fixed to exec, which can mess up the spiller. This also
adds a new situation where a phi is needed.
Fixes dEQP-VK.ssbo.layout.random.descriptor_indexing.2 and an assertion
when compiling a Detroit: Become Human shader.
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
A shader might require vgpr spilling but not require sgpr spilling. In
that case, the spiller lowers the sgpr target by 5 which could mean sgpr
spilling is then required. Then the vgpr target has to be lowered to make
space for the linear vgprs. Previously, space wasn't make for the linear
vgprs.
Found while testing the spiller on the pipeline-db with a lowered limit
Fixes: a7ff1bb5b9
('aco: simplify calculation of target register pressure when spilling')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
Because softpin block pools are made up of a set of BOs with different
maps, it was possible for a single state to end up straddling blocks.
To fix this, we pass a contiguous size to anv_block_pool_grow and it
ensures that the next allocation in the pool will have at least that
size.
We also add an assert in anv_block_pool_map to ensure we always get
contiguous maps. Prior to the changes to anv_block_pool_grow, the unit
tests failed with this assert. With this patch, the tests pass.
This was causing problems on Gen12 where we allocate the pages for the
AUX table from the dynamic state pool. The first chunk, which gets
allocated very early in the pool's history, is 1MB which was enough that
it was getting multiple BOs. This caused the gen_aux_map code to write
outside of the map and overwrite the instruction state pool buffer which
lead to GPU hangs.
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We intentionally throw away all but one BT block but then we set
cmd_buffer->bt_block to ANV_STATE_NULL instead of the one we hung on to.
This causes the command buffer to immediately re-emit STATE_BASE_ADDRESS
the first time a BT is needed for no good reason.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When an error condition occurs during tu_create_cmd_buffer(), the
cmd buffer has already been added to a pool, so the cleanup code should
remove it.
Fixes a crash (assert in tu_device::tu_bo_finish()) in dEQP tests:
dEQP-VK.api.object_management.max_concurrent.command_buffer_primary
dEQP-VK.api.object_management.max_concurrent.command_buffer_secondary
due to pool attempting to destroy an invalid command buffer.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3572>
I find warnings to be very disruptive to my workflow (using emacs's "go to
next error" feature), and I periodically have to go clean up other
people's drivers to get back to finding my own warnings in the noise. I
know I'm not the only one doing something like this.
We don't want to enable -Werror by default in builds, since it means that
end users will have builds spuriously fail based on what compiler version
and opt flags they have compared to what the devs are using. However, it
is quite easy to have CI ensure that we at least don't introduce warnings
on the compiler version that it uses.
For now I've just enabled it on meson-i386 to cover a bunch of Mesa core
and get us started on ratcheting up warnings-cleanliness in the tree,
without me having to fix up all the drivers at once.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3539>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3539>
In f132e0fddf, I attempted to allow BLORP to do CCS_E copies by using
the UNORM formats instead. However, the old BLORP bit-cast code could
only handle RGBA formats and asserted on anything other than UINT
formats. The reason we didn't catch this is because it only comes up on
Gen12 platforms which aren't in our normal CI yet.
Fixes: f132e0fddf "intel/blorp: Add support for CCS_E copies with..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3593>
Currently, fetching the partial results (VK_QUERY_RESULT_PARTIAL_BIT)
of an unavailable occlusion query via vkGetQueryPoolResults can
return invalid values. anv returns slot.end - slot.begin, but in the
case of unavailable queries, slot.end is still at the initial value
of 0. If slot.begin is non-zero, the occlusion count underflows to
a value that is likely outside the acceptable range of the partial
result.
This commit fixes vkGetQueryPoolResults by always returning 0 if the
query is unavailable and the VK_QUERY_RESULT_PARTIAL_BIT is set.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
A cmdstream of size zero is invalid. But this can appear in various
places where we emit a pointer to state. This doesn't show up with
newer kernels (newer than v5.0) which use "softpin", but on earlier
kernels can result in:
[drm:msm_ioctl_gem_submit [msm]] *ERROR* invalid cmdstream size: 0
Since the pointer value doesn't matter in these cases, the easy solution
is just to not emit a cmds table entry in this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2805>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2805>
This reverts commit b60f5cbc15.
This fixes dmesg errors and X freezes:
[ 29.543096] amdgpu 0000:0c:00.0: No GEM object associated to handle 0x00000009, can't create framebuffer
[ 29.543103] amdgpu 0000:0c:00.0: No GEM object associated to handle 0x00000009, can't create framebuffer
v2: (Francisco Jerez)
- Drop vec4 changes.
- Handle explicit acc0 operand and implicit one.
- Make sure instruction is SIMD16, prediction is off and default mask
control set to true.
v3: (Francisco Jerez)
- Clear accumulator only when it's written.
- Use BRW_MASK_DISABLE instead of true.
- Use correct width for brw_acc_reg().
- Fix last_inst_offset.
v4: (Francisco Jerez)
- Don't check for last instruction for accummulator write.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
Until now, embedded constants were printed as all 32 bits integer or
floats, but the compiler can pack constant from different types if
severa instructions with different reg_mode and native type refer to
the constant register. Let's implement something smarter so users don't
have to do a manual conversion when looking at a trace.
Note that 8-bit constants are not decoded yet, as we're not sure how
the writemask is encoded in that case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3536>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3536>
We were applying row pitch constraint of CCS surfaces to linear
surfaces. But CCS is only supported in linear tiling under some
condition (more on that in the following commit). So let's drop that
requirement for now.
Fixes a bunch of crucible assert where the byte size of a linear image
is expected to be similar to the byte size of buffer for the same
extent in the following category :
func.miptree.r8g8b8a8-unorm.aspect-color.view-2d.*download-copy-with-draw.*
v2: Move restriction to isl_calc_tiled_min_row_pitch()
v3: Move restrinction to isl_calc_row_pitch_alignment() (Jason)
v4: Update message (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 07e16221d9 ("isl: Round up some pitches to 512B for Gen12's CCS")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3551>
Gen12 does not support RENDER_SURFACE_STATE::SurfaceArray = true &&
RENDER_SURFACE_STATE::Depth = 0. SurfaceArray can only be set to true
if Depth >= 1.
We workaround this limitation by adding the max(value, 1) snippet in
the shaders on the 3 components for texture array sizes.
Tested on Gen9 with the following Vulkan CTS tests :
dEQP-VK.image.image_size.2d_array.*
v2: Drop debug print (Tapani)
Switch to GEN:BUG instead of Wa_
v3: Fix dEQP-VK.image.image_size.1d_array.* cases (Lionel)
v4: Fix dEQP-VK.glsl.texture_functions.query.texturesize.* cases
(Missing tex_op handling) (Lionel)
v5: Missing break statement (Lionel)
v6: Fixup comment (Tapani)
v7: Fixup comment again (Tapani)
v8: Don't use sample_dim as index (Jason)
Rename pass
Simplify control flow
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v7)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Some of the smaller bit-size formats which support CCS_E don't have a
UINT representative in their compression class. However, we should be
able to use UNORM just fine and still get bit-exact copies. We just
have to do a conversion to/from UNORM when we bitcast.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3554>
Those were renamed/merged some time ago but it turns out that
ppir_op_undef can't be shared.
It was being used for undefined ssa operations and for read-before-write
operations that may happen to e.g. uninitialized registers (non-ssa)
inside a loop.
We really don't want to reserve a register for the undef ssa case, but
we must reserve and allocate register for the unitialized register case
because when it happens inside a loop it may need to hold its value
across iterations.
This dummy node might be eliminated with a code refactor in ppir in case
we are able to emit the write and allocate the ppir_reg before we emit
the read. But a major refactor we need this to keep this code to avoid
apparent regressions with the new liveness analysis implementation.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3502>
nir can output writes to dead registers when expanding vec4 operations
to non-ssa registers. In that case, some components of the vec4 may be
assigned but never read. These are also not currently removed by a nir
dead code elimination pass as they are not ssa.
In order to prevent regalloc from allocating a live register for this
operation, an interference must be assigned to it during liveness
analysis.
This workaround may be removed in the future if the assignments to dead
components can be removed earlier in ppir or nir.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3502>
The previous way we were attempting to handle AUX tables on TGL-LP was
very GL-like. We used the same aux table management code that's shared
with iris and we updated the table on image create/destroy. The problem
with this is that Vulkan allows multiple VkImage objects to be bound to
the same memory location simultaneously and the app can ping-pong back
and forth between them in the same command buffer. Because the AUX
table contains format-specific data, we cannot support this ping-pong
behavior with only CPU updates of the AUX table.
The new mechanism switches things around a bit and instead makes the aux
data part of the BO. At BO creation time, a bit of space is appended to
the end of the BO for AUX data and the AUX table is updated in bulk for
the entire BO. The problem here, of course, is that we can't insert the
format-specific data into the AUX table at BO create time.
Fortunately, Vulkan has a requirement that every TILING_OPTIMAL image
must be initialized prior to use by transitioning the image from
VK_IMAGE_LAYOUT_UNDEFINED to something else. When doing the above
described ping-pong behavior, the app has to do such an initialization
transition every time it corrupts the underlying memory of the VkImage
by using it as something else. We can hook into this initialization and
use it to update the AUX-TT entries from the command streamer. This way
the AUX table gets its format information, apps get aliasing support,
and everyone is happy.
One side-effect of this is that we disallow CCS on shared buffers.
We'll need to fix this for modifiers on the scanout path but that's a
task for another patch. We should be able to do it with dedicated
allocations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
All they do now is take a size, align, and flags and figure out which
heap to allocate in. All of the actual code to deal with the BO is in
anv_allocator.c. We want to leave anv_vma_alloc/free in anv_device.c
because it deals with API-exposed heaps so it still makes sense to have
it there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
This commit moves it in with all the other cache invalidation operations
as if it were done by PIPE_CONTROL even though it's a pair of register
writes. This means we only have to write the GFX_AUX_TABLE_BASE_ADDR
register once at device initialization instead of every invalidate.
Invalidates are now a single LRI instead of two.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
This breaks add_mapping() into three pieces:
1. get_aux_entry() adds AUX-TT pages as needed and returns the
L1 entry index, L1 entry address, and L1 entry map.
2. gen_aux_map_format_bits_for_isl_surf() computes the format-
specific information that goes in the AUX-TT entry.
3. add_mapping() is a lot dumber function that now just adds the
requested mapping with the requested format bits.
This lets us break out some additional helpers in the API which we want
to use for more direct AUX-TT management in ANV.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Pass down stencil data from the subpass attachment like we do
elsewhere. Only stencil attachments will make use of it.
Fixes warnings like
../src/intel/vulkan/genX_cmd_buffer.c: In function ‘cmd_buffer_begin_subpass’:
../src/intel/vulkan/genX_cmd_buffer.c:4656:41: warning: ‘target_stencil_layout’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4656 | att_state->current_stencil_layout = target_stencil_layout;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>
Previously, we set aux_usage=ISL_AUX_USAGE_NONE when we really meant
CCS_D. This sort-of made sense before we had anv_layout_to_aux_usage
but now that we have that helper. However, in our more modern aux
tracking model, all aux usage goes through anv_layout_to_* and we're
better off making the meaning of anv_image::planes[]::aux_usage be
AUX_USAGE_NONE if and only if there is no compression.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3556>
The instruction encoding for SENDS changed on Gen12 and it now supports
embedding the entire extended message descriptor in the instruction if
it's an immediate. Stop falling back to doing an indirect SEND just
because we had something in [15:12] of ex_desc.ud.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
This commit makes two changes:
1. We set pending_pipe_bits instead of emitting PIPE_CONTROL directly
for the flush at the end of cmd_buffer_begin_subpass.
2. Because BLORP ops such as vkCmdClearAttachments may come in the
middle of a render pass, we have to also flag the need for a cache
flush after the blorp op.
Fixes: 185630c6bc "anv/blorp: Do the gen11 BTI flush"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
../src/panfrost/pandecode/decode.c: In function ‘pandecode_compute_fbd’:
../src/panfrost/pandecode/decode.c:789:35: warning: taking address of packed member of ‘struct mali_compute_fbd’ may result in an unaligned pointer value [-Waddress-of-packed-member]
789 | pandecode_u32_slide(num, s->unknown ## num, ARRAY_SIZE(s->unknown ## num))
| ~^~~~~~~~~
../src/panfrost/pandecode/decode.c:800:9: note: in expansion of macro ‘SHORT_SLIDE’
800 | SHORT_SLIDE(1);
| ^~~~~~~~~~~
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3543>
It's required to insert 1 wait state if the dst VGPR of any v_interp_*
is followed by a read with v_readfirstlane or v_readlane to fix GPU
hangs on GFX6. Note that v_writelane_* is apparently not affected.
This hazard isn't documented anywhere but AMD confirmed it.
This fixes a GPU hang with the texturemipmapgen Sascha demo on GFX6.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3533>
Unlike on an immidiate-mode renderer, Turnip only renders tiles on
vkCmdEndRenderPass. As such, we need to track all queries that were
active in a given render pass and defer setting the available bit
on those queries until after all tiles have rendered.
This commit adds a draw_epilogue_cs to tu_cmd_buffer that is
executed as an IB at the end of tu_CmdEndRenderPass. We then emit
packets to this command stream that update the availability bit of a
given query in tu_CmdEndQuery.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
Mostly a translation of freedreno's implementation of glEndQuery for
GL_SAMPLES_PASSED query objects with a slight modification to set the
availability bit of the query bo (slot->available) if the query was
not ended inside a render pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
General structure is inspired by anv's implementation in genX_query.c.
We define a packed struct that tracks sample count at the beginning of
the query and at the end; the result of the occlusion query is then
slot->end - slot->begin.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
Most places we actually know the usage and can provide it. There are
two exceptions to this:
1. We pass 0 into get_blorp_surf_for_anv_image when we use
ANV_IMAGE_LAYOUT_EXPLICIT_AUX because anv_layout_to_aux_usage is
never actually called so it doesn't matter.
2. We pass 0 into anv_layout_to_aux_usage in transition_color_buffer.
However, the coming commits which will begin using the usage
parameter only care about depth.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
Rather than looking at the aux usage, we look at the isl_aux_state which
provides us with more detailed information. This commit adds a couple
helpers to isl which let us quickly determine if we have valid depth/hiz
on the initial layout and if we need valid depth/hiz for the final
layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
When copying to an RGB surface, we treat it as an R only one of three
times the width, which may end up being larger than the maximum size
supported by the hardware and so it hits the shrink path. This forced
both source and destination surfaces to be shrunk, even though it's not
necessary for the former, and may even hit some assertions in some
cases, such as the surface being compressed.
Fixes several tests under dEQP-VK.api.copy_and_blit.core.image_to_image.dimensions.*
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3422>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3422>
This allows removing superfluous s_cselect instructions
that come from turning booleans into 64-bit vector condition.
v2 by Daniel Schürmann:
- Make the code massively simpler
v3 by Timur Kristóf:
- Fix regressions, make it work in wave32 mode
- Eliminate extra moves by not always using the SCC definition
- Use s_absdiff_i32 for uniform XOR
- Skip the transformation for uncommon or invalid instructions
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3450>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3450>
For GS copy shaders, whether we want to do exports is conditional. By
explicitly marking the end blocks, we can mark an IF's then branch as an
export block and ensure that's where the assembler inserts null exports.
v6: only fixup exports in the end block, like before
v8: simplify some code
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
v2: implement GFX10
v3: rebase
v7: rebase after shader args MR
v8: fix gs_vtx_offset usage on GFX9/GFX10
v8: use unreachable() instead of printing intrinsic
v8: rename output_state to ge_output_state
v8: fix formatting around nir_foreach_variable()
v8: rename some helpers in the scheduler
v8: rename p_memory_barrier_all to p_memory_barrier_common
v8: fix assertion comparing ctx.stage against vertex_geometry_gs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
This is not always ->rgbBits, because there are cases where that could
be 32 but we're (legally) bound to a depth-24 pixmap. The important
thing to have match here is the actual server-side notion of depth. You
can look this up (at modest expense) from the xlib visual info if the
fbconfig has a visual. But it might not, so if not, fetch it (at
slightly greater expense) from XGetGeometry. Do this at GLX drawable
creation so you don't have to do it on the SwapBuffers path.
Apparently this fixes glx/glx-swap-singlebuffer, which is unintentional
but quite pleasant.
Fixes: mesa/mesa#2291
Fixes: 90d58286 ("drisw: Fix and simplify drawable setup")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
This gets a lot of the hard code converted over to the new macros,
resulting in (I feel) much more readable code with
LESS_SHOUTING_ABOUT_THE_REG(). I decided to consistently put the reg on
its own line, so that all the register names line up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
This introduces some minor unpacking of the temporary fd_reg_pair structs
to code that previously was packing a whole register field.
In the pack wrapper in tu_cs.h, I added some explanatory docs, dropped the
relocs handling since we don't need it, and removed the extra regs[] in
the __ONE_REG() macro (which was causing gcc's optimizer to fall on its
face in my release build).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
Sometimes you want to zero out an address by supplying a NULL BO, but
without this we would end up only emitting one dword. Increases size of
fd6_gmem.o by .8%, though it's not clear to me why (no obvious terrible
codegen happening)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
legalize is computing a lot of state that goes in the variant, let's just
store it directly instead of passing pointers around. This leaves
max_bary in place, which is doing some surprising work (overwriting the
original total_in in some cases).
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3494>
A few hash_table users roll their own integer hash functions which
call _mesa_hash_data to perform the hashing which ultimately calls
into XXH32 with a dynamic key length. When using small keys with a
constant size the hash rate can be greatly improved by inlining
XXH32 and providing it a constant key length, see:
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
Additionally, this patch removes calls to _mesa_key_hash_string and
makes them instead call _mesa_has_string directly, matching the new
integer hash functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
For most key sizes, xxhash outperforms fnv1a's hash rate substantially (bug
2153). In particular, the V3D driver hashes multiple ~200 byte keys as part
of the shader cache lookup which can easily eat up 10-20% of the runtime on
the Raspberry Pi. Swapping over to xxhash drops this to ~1% of the runtime.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
When the amdgpu_screen_winsys uses the same FD as the amdgpu_winsys
(which is always the case for the first amdgpu_screen_winsys), we can
just use bo->u.real.kms_handle.
v2:
* Also only create the kms_handles hash table if the
amdgpu_screen_winsys fd is different from the amdgpu_winsys one.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3202>
The assumption being that KMS handles are only retrieved for relatively
few BOs, so hash tables should be efficient both in terms of performance
and memory consumption.
We use the address of struct amdgpu_winsys_bo as the key and its
kms_handle field (the KMS handle valid for the DRM file descriptor
passed to amdgpu_device_initialize) as the hash value.
v2:
* Add comment above amdgpu_screen_winsys::kms_handles (Pierre-Eric
Pelloux-Prayer)
v3:
* Protect kms_handles hash table with amdgpu_winsys::sws_list_lock
mutex.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3202>
This introduces:
- nir_texop_fragment_mask_fetch (fetch a fragment mask from a
compressed multisampled color surface)
- nir_texop_fragment_fetch (fetch a color fragment for a
particular sample at corresponding fragment mask index).
These two texture operations are necessary for implementing
SPV_AMD_shader_fragment_mask.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3304>
Pretty straightforward: Port texture descriptor code from freedreno, fill
in alignment limits from closed vk, and tu_cmd_buffer.c was already
uploading the texture descriptor.
This doesn't implement storage texel buffers (required in the compute
pipeline) yet, since those will need an IBO descriptor for the store path.
Still, making the load path be connected to the texture descriptor won't
hurt.
Part of #2237
Fixes dEQP-VK.binding_model.shader_access.primary_cmd_buf.uniform_texel_buffer.*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3522>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3522>
I noticed that we can do better for these kinds of comparisons while
working on the lowering for iadd_sat@64 and isub_sat@64. This
eliminated 11 instruction from the fs-addSaturate-int64.shader_test.
My hope is that this will improve the run-time of int64 tests on Ice
Lake. I have no data to support or refute this.
Unsurprisingly, no changes on shader-db.
v2: Condition the min and max patterns with nir_lower_minmax64.
Suggested by Caio. Very long discussion in the MR. :)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Driver supports integer multiplication between a 32-bit integer and a
16-bit integer. If the second operand is 32-bits, the upper 16-bits are
ignored, and the low 16-bits are possibly sign extended as necessary.
Iris will eventually enable this. Not sure about other drivers.
v2: Add default value to u_screen.c. Suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add missing SpvOpUCountTrailingZerosINTEL case to switch in
vtn_handle_body_instruction. Remove stray semicolon in
vtn_nir_alu_op_for_spirv_opcode. Use umin instead of umax for
SpvOpUCountTrailingZerosINTEL "lowering" in vtn_handle_alu.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Remove smashing type to D for nir_op_irhadd. Caio noticed it was
odd, and removing it fixes an assertion failure in the crucible
func.shader.averageRounded.int64_t test (because the source should be
W).
v3: Emit BRW_OPCODE_MUL directly for nir_op_umul_32x16 and
nir_op_imul_32x16. Suggested by Curro.
v4: Smash types of MUL instruction generated for nir_op_umul_32x16 and
nir_op_imul_32x16. With this change, I get the same assembly now as I
did with v2.
v5: Remove support for pre-Gen7. The integer multiply path was
incorrect, and, since the extension isn't enabled pre-Gen7, there's no
way to test it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Add a big comment explaining the [IU]SUB_SAT lowering. Suggested by
Caio.
v3: Use get_fpu_lowered_simd_width in get_lowered_simd_width. Suggested
by Ken on IRC.
v4: Fix a typo in a comment. Noticed by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Move the check to fs_visitor::lower_integer_multiplication.
Previously the cases where lowering was skipped, the original
instruction was removed by fs_visitor::lower_integer_multiplication.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_usub_sat64 flag that only applies to the 64-bit
version of the nir_op_usub_sat instruction.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_hadd64 flag that only applies to the 64-bit versions
of the instructions.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
uctz isn't added because it will implemented in the GLSL path and the
SPIR-V path using other pre-existing instructions.
v2: Avoid signed integer overflow for uabs_isub(0, INT_MIN). Noticed by
Caio.
v3: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN).
I tried the previous methon in a small test program with -ftrapv, and it
failed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Re-write iadd64_saturate and isub64_saturate to avoid undefined
overflow behavior. Also fix copy-and-paste bug in isub64_saturate.
Suggested by Caio.
v3: Avoid signed integer overflow for abs_sub(0, INT_MIN). Noticed by
Caio.
v4: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN).
I tried the previous methon in a small test program with -ftrapv, and it
failed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Gen4/5's rounding instructions operate differently than later Gens'.
They all return the floor of the input and the "Round-increment"
conditional modifier answers whether the result should be incremented by
1.0 to get the appropriate result for the operation (and thus its
behavior is determined by the round opcode; e.g., RNDZ vs RNDE).
Since this requires a second instruciton (a predicated ADD) that
consumes the result of the round instruction, the round instruction
cannot write its result directly to the (write-only) message registers.
By emitting the ADD in the generator, the backend thinks it's safe to
store the round's result directly to the message register file.
To avoid this, we move the emission of the ADD instruction to the NIR
translator so that the backend has the information it needs.
I suspect this also fixes code generated for RNDZ.SAT but since
Gen4/5 don't support GLSL 1.30 which adds the trunc() function, I
couldn't write a piglit test to confirm. My thinking is that if x=-0.5:
sat(trunc(-0.5)) = 0.0
But on Gen4/5 where sat(trunc(x)) is implemented as
rndz.r.f0 result, x // result = floor(x)
// set f0 if increment needed
(+f0) add result, result, 1.0 // fixup so result = trunc(x)
then putting saturate on both instructions will give the wrong result.
floor(-0.5) = -1.0
sat(floor(-0.5)) = 0.0
// +1 increment would be needed since floor(-0.5) != trunc(-0.5)
sat(sat(floor(-0.5)) + 1.0) = 1.0
Fixes: 6f394343b1 ("nir/algebraic: i2f(f2i()) -> trunc()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2355
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Each instruction bundle can contain up to 16 constant bytes. The meaning
of those byte is instruction dependent: it depends on the instruction
native type (int, uint or float) and the instruction reg_mode (8, 16, 32
or 64 bit). Those different layouts can be exposed as a union to
facilitate constants manipulation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
On platforms without mincore(), _eglPointerIsDereferencable()
currently just checks whether p != NULL. This is not sufficient:
In the Wayland platform code (i.e., in get_wl_surface_proxy()),
_eglPointerIsDereferencable() is called on the version field
of `struct wl_egl_window` which is 3 on current versions of
Wayland. This causes a segfault when trying to dereference p.
Fix this behavior by assuming that the first page of the
process is never dereferencable.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3103>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3103>
Specifically, execution size, register file, and register type. I did
not add validation for vertical stride and width because I don't believe
it's possible to have an otherwise valid instruction with an invalid
vertical stride or width, due to all of the other regioning
restrictions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
16-bit immediates need to be replicated through the 32-bit immediate
field, so we should never see one that isn't.
This does happen however in the fuzzer unit test, so returning false
allows the fuzzer to reject this case.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Previously we were sharing tables between generations that were nearly
identical (i.e., Gen8 3-src adds HF support) and used a small bit of
code to handle the differences. This is kind of a mess if you want to
reject 64-bit types on platforms that don't support 64-bit types, so
split the tables, allowing each generation's table to list exactly what
it supports.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Two of the tests emit instructions with MRF destinations, and MRFs
aren't present on Gen7+. I think we were just lucky that this didn't
cause a problem earlier since we were running the tests on Gen7-9.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Since the platforms don't support align1 3-src instructions, the
contents of these operands are not going to be meaningful. Just don't
print them to avoid hitting some assertions in brw_inst functions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to use a barrier to
ensure all the invocations reach the same point in the shader, because
they are already running lock-step.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18361 -> 18339 (-0.12%)
sends in affected programs: 904 -> 882 (-2.43%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 2.44 x̃: 2
helped stats (rel) min: 0.84% max: 21.43% x̄: 7.82% x̃: 2.67%
95% mean confidence interval for sends value: -3.31 -1.58
95% mean confidence interval for sends %-change: -14.67% -0.97%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to synchronize shared
memory access (SLM) since all the requests from a single thread are
already synchronized. In such case, we just add a scheduling fence.
To be able to identify that case for all platforms, move the handling
of platforms prior to Gen11 (which don't have a separate SLM fence)
after the optimization.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18395 -> 18361 (-0.18%)
sends in affected programs: 938 -> 904 (-3.62%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 3.78 x̃: 4
helped stats (rel) min: 1.56% max: 26.32% x̄: 10.33% x̃: 2.60%
95% mean confidence interval for sends value: -4.85 -2.71
95% mean confidence interval for sends %-change: -19.12% -1.54%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any
assembly code.
Will be used when the compiler shouldn't reorder certain instructions
but there's no need to generate code for the HW to do it -- as the
ordering will be guaranteed by other means.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
The closed GL driver doesn't use UBWC on any storage images. It does tile
mostly (skipping tiling on writeonly images, it seems), but for freedreno
we've been enabling tiling in all cases and it's fine. We do need to
disable UBWC, as tests fail otherwise and just plugging in the equivalent
UBWC regs like we were setting up a texture isn't enough.
Fixes dEQP-VK.image.atomic_operations.*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
So far this doesn't handle the texture state-based storage image access
loads, and doesn't support descriptor arrays (same as SSBOs). The texture
side is more tricky, since we have another remapping table to work around.
This is enough to get some of dEQP-VK.image.atomic_operations.* working.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
The exception is the texel/gather offsets and stream output
components, which will not be exposed since we don't expose the
corresponding GLSL version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3493>
Just make it be all SSBOs then all storage images. The remapping table
was there to make it so that the big gap present from gallium's atomic
lowering would get cleaned up, but that's no longer case. The table has
made it very hard to support Vulkan storage images, so it's time for it to
go.
This does mean that an SSBO/IBO that is only loaded (or size-queried) will
now occupy a slot in the table where it wouldn't before. This seems like
a minor cost compared to being able to drop this much logic.
With the remapping table gone, SSBO array handling for turnip just falls
out.
Fixes many array cases of
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_buffer.*
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca> (turnip)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
The arguments passed in were:
- prog->info.num_ssbos
- prog->nir->info.num_ssbos
- arbitrary values for standalone compilers
The num_ssbos should match between the prog's info and prog->nir's info
until this lowering happens.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
We carve out half the SSBO space for atomics, and we were just binding
them way up there. freedreno was then using a remapping table to map the
sparse buffer index back down, since space in the descriptor array is a
shared resource that may limit parallelism. That remapping table
generated inside of the ir3 compiler is getting thoroughly in the way of
implementing vulkan descriptor sets.
We will be able to get rid of the freedreno's remapping table, and
hopefully save shared resources on other hardware, by packing the atomics
tightly above the SSBOs (like i965 does). We already rebind the shader
buffers on program change if either the old or new program has SSBOs or
ABOs, so this doesn't necessarily increase the program state change cost
(the only cost increase I can come up with is if you're using the same
atomic counter without rebinding it across changes of programs with
varying SSBO counts, meaning it would now bounce around index space).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
Gallium arbitrarily (it seems) put atomics below SSBOs, resulting in a
bunch of extra index management, and surprising shader code when you would
see your SSBOs up at index 16. It makes a lot more sense to see atomics
converted to SSBOs appear as magic high numbers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
There's a lot going on here (it's a ton of commits squashed together
since otherwise this would be impossible to review...)
1. We have a fast path for linear->tiled for whole (aligned) tiles, but we
have to use a slow path for unaligned accesses. We can get a pretty
major win for partial updates by using this slow path simply on the
borders of the update region, and then hit the fast path for the
tile-aligned interior. This does require some shuffling.
2. Mark the LUTs constant, which allows the compiler to inline them,
which pairs well with loop unrolling (eliminating the memory accesses
and just becoming some immediates.. which are not as immediate on
aarch64 as I'd like..)
3. Add fast path for bpp1/2/8/16. These use the same algorithm and we
have native types for them, so may as well get the fast path.
4. Drop generic path for bpp != 1/2/8/16, since these formats are
generally awful and there's no way to tile them efficienctly and
honestly there's not a good reason too either. Lima doesn't support any
of these formats; Panfrost can make the opinionated choice to make them
linear.
5. Specialize the unaligned routines. They don't have to be fully
generic, they just can't assume alignment. So now they should be nearly
as fast as the aligned versions (which get some extra tricks to be even
faster but the difference might be neglible on some workloads).
6. Specialize also for the size of the tile, to allow 4x4 tiling as well
as 16x16 tiling. This allows compressed textures to be efficiently tiled
with the same routines (so we add support for tiling ASTC/ETC textures
while we're at it)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Having VERTEX_BUFFER_STATE.BufferSize greater than the size of
a bound vertex buffer allows shader to read uninitialized vertex
attributes from BO, instead of allowing hardware to return zeroes
on out-of-bounds access.
OpenGL spec "6.4 Effects of Accessing Outside Buffer Bounds" says:
"Robust buffer access can be enabled by creating a context with robust access
enabled through the window system binding APIs. When enabled, any command
unable to generate a GL error as described above, such as buffer object accesses
from the active program, will not read or modify memory outside of the data
store of the buffer object and will not result in GL interruption or termination.
Out-of-bounds reads may return values from within the buffer object or zero
values."
Fixes three webgl tests:
conformance/rendering/out-of-bounds-array-buffers.html
conformance2/rendering/out-of-bounds-index-buffers-after-copying.html
conformance2/rendering/element-index-uint.html
See #1996
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Amend fails and skips lists basing on lists from Andreas Baierl,
shard mali400 job across two devices since it takes close to 10min
and rename jobs to lima-mali400-test and lima-mali450-test.
Also don't set MESA_GLES_VERSION_OVERRIDE=3.0 for lima since we don't support
GLES 3.0 and lower DEQP_PARALLEL to 3 for jobs on H3.
Keep mali400 jobs disabled atm since they take too much time to complete
and we also get some unexplicable failures in dEQP-GLES2.functional.default_vertex_attrib.*
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
deqp-runner.sh uses it to determine whether we split job across multiple
devices and if we do what's the node index.
With this change we now can set 'parallel: N' in job description if we want
to split the job.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
When VK_DESCRIPTOR_TYPE_SAMPLER is provided, it doesn't need to be
counted as a buffer count. Otherwise it leads to mismatch of allocated
buffer size, hitting VK_ERROR_OUT_OF_POOL_MEMORY finally.
Fixes: c39afe68f0
Also fixes amber tests:
./tests/cases/address_modes_float.amber
./tests/cases/address_modes_int.amber
./tests/cases/magfilter_linear.amber
./tests/cases/magfilter_nearest.amber
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Having to always pull the physical device from the instance has been
annoying for almost as long as the driver has existed. It also won't
work in a world where we ever have more than one physical device. This
commit adds a new field called "physical" to anv_device and switches
every location where we use device->instance->physicalDevice to use the
new field instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Only non-indexed triangle lists and strips are supported. This increases
performance if there is something to cull.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
- Use conservative late alloc when the number of CUs <= 6.
- Move the late alloc GS register to the GS shader state, so that it can be
tuned for NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
The value is not changed. I just use a different way to compute it.
The value will vary with NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This decreases VGPR usage and will allow us to merge some IF blocks
in shaders.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
We didn't use the correct LDS pointer, though it probably doesn't matter,
because I think that nothing else is using LDS here.
This commit makes it consistent with all other esgs_ring use.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Some formats, in particular YCbCr formats and ASTC have additional
restrictions. We already whack ASTC formats to RGBA32_UINT because the
hardware doesn't allow LINEAR with ASTC. However, we need to fix YCbCr
formats as well because they come with alignment restrictions that we
can't guarantee are satisfied. We're using blorp_copy to do the copies
so we may as well just stomp formats for everything.
Fixes: b24b93d584 "anv: enable VK_KHR_sampler_ycbcr_conversion"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
This fixes a crash in LZDoom where over 16 shader variants are needed
for a few shaders in some maps, and should also save a few kilobytes
of RAM as most of the time only one or two variants of the 8 previously
allocated are actually needed.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Previously, the array bo_access->readers was only cleared when there
were no unsignaled fences, which in some situations never happened.
That resulted in the array having thousands of NULL pointers, but only
a handful of active readers.
With this patch, all the unsignaled readers are moved to the front of
the array, effectively building a new array only containing the active
readers in-place. This results in the readers array usually only having
a couple of elements.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
The dl-tag isn't a neat tool for defining sub-headings, it's a semantic
tool for defining definitions and their meaning. Let's insetad use
normal sub-headings instead.
To make the last few paragraphs stand out from the above, let's add a
sub-heading for those as well.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
This lets us move PC_PRIMITIVE_CNTL into the rasterizr stateobj, rather
than unconditionally emitting it directly in the cmdstream on every
draw.
This also starts adding some tracking about previous draw state, so that
following patches can limit some of the register writes we currently
emit on every draw.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
The overhead does seem to matter when you have a high enough # of draw
calls that effect few bins/pixels, because these writes would happen
unconditionally (ie. not part of a state-group).
Possibly we could keep these if we moved them into a state-group so the
register writes would be no-ops on bins with no geometry. OTOH I
usually end up adding in a WFI when using them scratch reg values to
track down a crash. (So add a WFI to mitigate the annoyance of needing
to use a debug build to get scratch regs to locate the position of a
crash/hang in the cmdstream.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
This involves permuting the registers of barycentric vectors to have
the standard X[0-n] Y[0-n] layout at NIR translation time.
Barycentrics are converted to the format expected by the PLN
instruction in the lower_barycentrics() pass run after the
optimization loop.
Main reason is correctness of SIMD32 fragment shaders. The
shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used
during NIR translation are busted for SIMD32. This leads to serious
corruption at present with INTEL_DEBUG=do32, especially on Gen11+
where these helpers are hit more frequently due to the lack of a
hardware PLN instruction.
Of course one could have chosen to fix those helpers instead, but
there is another far more subtle issue that was reported during review
of the SIMD32 fragment shader codegen changes: The SIMD splitting pass
currently handles SIMD32 barycentric vectors as if they had the
standard X[0-n] Y[0-n] layout, even though they are interleaved for
the PLN instruction, which causes incorrect execution masks to be
applied to the MOVs unzipping barycentric vectors in cases where a
LINTERP instruction occurs under non-uniform control flow.
I'm not aware of any conformance regressions due to the latter issue
at present, but for our peace of mind let's move the conversion to the
PLN layout into the lower_barycentrics() pass run after
lower_simd_width().
This leads to the following shader-db improvements (including SIMD32
shaders) in combination with the previous back-end preparation changes
-- Without them (especially the copy propagation changes) this would
lead to a massive number of regressions. On ICL:
total instructions in shared programs: 20662316 -> 20466903 (-0.95%)
instructions in affected programs: 10538474 -> 10343061 (-1.85%)
helped: 68775
HURT: 6
total spills in shared programs: 8938 -> 8748 (-2.13%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8965 -> 8663 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 146
GAINED: 43
On SKL:
total instructions in shared programs: 18725867 -> 18614912 (-0.59%)
instructions in affected programs: 3876590 -> 3765635 (-2.86%)
helped: 27492
HURT: 2
LOST: 191
GAINED: 417
On SNB:
total instructions in shared programs: 14573613 -> 13980646 (-4.07%)
instructions in affected programs: 5199074 -> 4606107 (-11.41%)
helped: 29998
HURT: 0
LOST: 21
GAINED: 30
Results are somewhat less impressive but still significant without
SIMD32 fragment shaders enabled. On ICL:
total instructions in shared programs: 16148728 -> 16061659 (-0.54%)
instructions in affected programs: 6114788 -> 6027719 (-1.42%)
helped: 42046
HURT: 6
total spills in shared programs: 8218 -> 8028 (-2.31%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8953 -> 8651 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 0
GAINED: 3
On SKL:
total instructions in shared programs: 14927994 -> 14926738 (-0.01%)
instructions in affected programs: 168850 -> 167594 (-0.74%)
helped: 711
HURT: 2
On SNB:
total instructions in shared programs: 10770538 -> 10734403 (-0.34%)
instructions in affected programs: 2702172 -> 2666037 (-1.34%)
helped: 17818
HURT: 0
All of the hurt shaders are either spilling slightly more or emitting
additional NOP instructions due to the SIMD16 POW workaround for
Gen8-9 combined with differences in scheduling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The goal is to represent barycentrics with the standard vector layout
during optimization and particularly SIMD lowering. Instead of
emitting the barycentric layout conversions at NIR translation time,
do it later as a lowering pass. For the moment this is only applied
to PI messages, but we'll give the same treatment to LINTERP
instructions too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to change the layout of barycentric vectors, which will
involve permuting the GRFs of barycentrics fetched from the thread
payload. Make room for this in a function separate from the generic
fetch_payload_reg(), since the permutation will only be applicable to
barycentric vectors. This allows simplifying fetch_payload_reg(),
since there was no need for handling multiple-component payload
registers except for barycentrics.
This causes some minor shader-db noise due to the new helper emitting
a LOAD_PAYLOAD instruction unconditionally, but it will be cleaned up
shortly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This prevents regressions on SNB due to the redundant MOVs lying
around in cases where fetch_payload_reg() returns a VGRF (currently
only in SIMD32 but soon in pretty much all cases). The MOVs can't be
register-coalesced due to their source being a FIXED_GRF, and they
can't be copy-propagated either due to the unlit centroid workaround
partial writes. They can be copy-propagated just fine into a SEL
instruction though.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs: 13996898 -> 14001982 (0.04%)
instructions in affected programs: 197461 -> 202545 (2.57%)
helped: 0
HURT: 1251
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is mainly meant to avoid shader-db regressions on SNB as we start
using VGRFs for barycentrics more frequently. Currently the
aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16
mode barycentric vectors are typically 4 GRFs. This is not a problem
on Gen4-5, because on those platforms all VGRF allocations are
pair-aligned in SIMD16 mode. However on Gen6 we end up using either
the fast or the slow path of LINTERP rather non-deterministically
based on the behavior of the register allocator.
Fix it by repurposing aligned_pairs_class to hold PLN-aligned
registers of whatever the natural size of a barycentric vector is in
the current dispatch width.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs: 13983257 -> 14527274 (3.89%)
instructions in affected programs: 1766255 -> 2310272 (30.80%)
helped: 0
HURT: 11608
LOST: 26
GAINED: 13
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids regressions on SNB due to the bank conflict mitigation
pass moving a VGRF-allocated barycentric vector to a misaligned
location, which would prevent the PLN instruction from being used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we would hardcode fs_visitor::delta_xy barycentrics to be
allocated from aligned_pairs_class on hardware with PLN source
alignment restrictions (pre-Gen7). Instead allocate any registers
consumed by LINTERP from aligned_pairs_class, even if some barycentric
vector had ended up in a temporary.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs: 13983257 -> 14527274 (3.89%)
instructions in affected programs: 1766255 -> 2310272 (30.80%)
helped: 0
HURT: 11608
LOST: 26
GAINED: 13
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is particularly useful in cases where register coalaesce is
unlikely to succeed because the LOAD_PAYLOAD isn't a plain copy --
E.g. when a LOAD_PAYLOAD is shuffling the contents of a barycentric
vector in order to transform it into the PLN layout.
This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series. On SKL:
total instructions in shared programs: 18596672 -> 18976097 (2.04%)
instructions in affected programs: 7937041 -> 8316466 (4.78%)
helped: 39
HURT: 67427
LOST: 466
GAINED: 220
On SNB:
total instructions in shared programs: 13993866 -> 14202963 (1.49%)
instructions in affected programs: 7611309 -> 7820406 (2.75%)
helped: 624
HURT: 52943
LOST: 6
GAINED: 18
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where a LOAD_PAYLOAD instruction copies a single block of
sequential GRF registers into the destination (see
is_identity_payload()), splitting the block copy into a number of ACP
entries (one for each LOAD_PAYLOAD source) is undesirable, because
that prevents copy propagation into any instructions which read
multiple components at once with the same source (the barycentric
source of the LINTERP instruction is going to be the overwhelmingly
most common example).
Technically it would also be possible to do this for VGRF sources, but
there is little benefit from that since register coalesce already
covers many of those cases -- There is no way for a block of
FIXED_GRFs to be coalesced into a VGRF though.
This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series. On SKL:
total instructions in shared programs: 18595160 -> 18828562 (1.26%)
instructions in affected programs: 13374946 -> 13608348 (1.75%)
helped: 7
HURT: 108977
total spills in shared programs: 9116 -> 9106 (-0.11%)
spills in affected programs: 404 -> 394 (-2.48%)
helped: 7
HURT: 9
total fills in shared programs: 8994 -> 9176 (2.02%)
fills in affected programs: 898 -> 1080 (20.27%)
helped: 7
HURT: 9
LOST: 469
GAINED: 220
On SNB:
total instructions in shared programs: 13996898 -> 14096222 (0.71%)
instructions in affected programs: 8088546 -> 8187870 (1.23%)
helped: 2
HURT: 66520
total spills in shared programs: 2985 -> 2961 (-0.80%)
spills in affected programs: 632 -> 608 (-3.80%)
helped: 2
HURT: 0
total fills in shared programs: 3144 -> 3128 (-0.51%)
fills in affected programs: 1515 -> 1499 (-1.06%)
helped: 2
HURT: 0
LOST: 0
GAINED: 4
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be useful for eliminating redundant copies from the FS
thread payload, particularly in SIMD32 programs. For the moment we
only allow FIXED_GRFs with identity strides in order to avoid dealing
with composing the arbitrary bidimensional strides that FIXED_GRF
regions potentially have, which are rarely used at the IR level
anyway.
This enables the following commit allowing block-propagation of
FIXED_GRF LOAD_PAYLOAD copies, and prevents the following shader-db
regressions (including SIMD32 programs) in combination with the
interpolation rework part of this series. On ICL:
total instructions in shared programs: 20484665 -> 20529650 (0.22%)
instructions in affected programs: 6031235 -> 6076220 (0.75%)
helped: 5
HURT: 42073
total spills in shared programs: 8748 -> 8925 (2.02%)
spills in affected programs: 186 -> 363 (95.16%)
helped: 5
HURT: 9
total fills in shared programs: 8663 -> 8960 (3.43%)
fills in affected programs: 647 -> 944 (45.90%)
helped: 5
HURT: 9
On SKL:
total instructions in shared programs: 18937442 -> 19128162 (1.01%)
instructions in affected programs: 8378187 -> 8568907 (2.28%)
helped: 39
HURT: 68176
LOST: 1
GAINED: 4
On SNB:
total instructions in shared programs: 14094685 -> 14243499 (1.06%)
instructions in affected programs: 7751062 -> 7899876 (1.92%)
helped: 623
HURT: 53586
LOST: 7
GAINED: 25
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This involves indexing the ACP tables used internally by
fs_copy_prop_dataflow::setup_initial_values() by reg_space() instead
of register number. Both are nearly equivalent for virtual GRFs
(barring the single bit of entropy lost in the hash), and this makes
handling FIXED_GRFs straightforward.
Because we're only going to support FIXED_GRFs for the source of a
copy, this change is only strictly necessary during the second pass
that checks for source interference, but we also apply the same change
to the first pass for consistency.
Note that this shouldn't change the behavior of the copy propagation
pass until we start inserting FIXED_GRF entries into the ACP. Even
then FIXED_GRF writes are extremely rare so this change will hardly
ever have an effect, but they aren't completely non-existing so we
need to handle them for correctness.
No functional nor shader-db changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reworks the current fs_inst::is_copy_payload() method into a
number of classification helpers with well-defined semantics. This
will be useful later on in order to optimize LOAD_PAYLOAD instructions
more aggressively in cases where we can determine it's safe to do so.
The closest equivalent of the present fs_inst::is_copy_payload()
method is the is_coalescing_payload() helper introduced here.
No functional nor shader-db changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where LOAD_PAYLOAD is provided a pair of contiguous registers
as header sources, try to use a single SIMD16 instruction in order to
initialize them. This is unlikely to affect the overall cycle count
of the shader, since the compressed instruction has twice the issue
time, except due to the reduced pressure on the instruction cache.
Main motivation is avoiding instruction-count regressions in
combination with the following copy propagation improvements, which
will allow the SIMD16 g0-1 header setup emitted for framebuffer writes
to be copy-propagated into its LOAD_PAYLOAD, leading to the emission
of two SIMD8 MOV instructions instead of a single SIMD16 MOV.
Reverting this commit on top of the copy propagation changes would
lead to the following shader-db regressions on SKL and other
platforms:
total instructions in shared programs: 14926738 -> 14935415 (0.06%)
instructions in affected programs: 1892445 -> 1901122 (0.46%)
helped: 0
HURT: 8676
Without the following copy propagation changes this doesn't have any
effect on shader-db on Gen7+, because we would typically set up the FB
write header with a separate SIMD16 MOV that isn't currently
copy-propagated into the LOAD_PAYLOAD, so the individual SIMD8 MOVs
result of LOAD_PAYLOAD lowering would get register-coalesced away
under normal circumstances. However that wasn't the case for MRF
LOAD_PAYLOAD destinations on Gen6 and earlier, because register
coalesce only kicks in for GRFs, leaving a number of redundant SIMD8
MOVs lying around. On SNB this leads to the following shader-db
improvements:
total instructions in shared programs: 10770538 -> 10734681 (-0.33%)
instructions in affected programs: 2700655 -> 2664798 (-1.33%)
helped: 17791
HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Images with modifiers come with restrictions:
1. They have to be simple 2D images right now
2. They need to have a sensible format (not compressed, multi-plane, or
non-power-of-two)
3. If a CCS modifier is being requested, they have to actually support
CCS_E and be CCS-compatible with any other formats the client may
wish to use for image views.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>
Ensure the hardware cursor is disabled when we set the mode for a
VkDisplayKHR object. The extension doesn't expose any mechanisms to
program the hardware cursor, so we need to ensure it is hidden.
Currently, it seems like X is responsible for disabling the cursor
before handing over the lease. But that seems a little frail, and we
should be disabling the cursor ourselves so it works correctly
independently of how the lease was prepared for us.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>
When swr driver is in use it print detected architecture
message to std::err. It can be harmfull when swr is using
in multinodes environments.
It can be enabled setting env var SWR_PRINT_INFO to 1.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
The current logic considers that the nir_intrinsic_component(store_intr)
encodes the source components start, but it actually encodes the
destination one. Source component offset adjustment is taken care of in
install_registers_instr(), when offset_swizzle() is called.
This fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.45
when PAN_MESA_DEBUG=deqp (looks like exposing GLES3 features has an
impact on the varyings layout).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
The pre-tag right before is a block-level tag, which means it implicitly
terminates the paragraph. So there's no paragraph to close after this.
Instead, move the paragraph-closing before the pre-tag, to explicitly
close the paragraph.
Fixes: 41b3eb08d9 "docs: update meson docs for windows"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>
Fixes the following valgrind error:
Invalid read of size 16
at 0x28F458A1: si_set_sampler_view_desc (in radeonsi_drv_video.so)
by 0x28F4657E: si_set_sampler_views (in radeonsi_drv_video.so)
by 0x28D62BF5: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Address 0x18142a10 is 0 bytes inside a block of size 48 free'd
at 0x48369AB: free (vg_replace_malloc.c:540)
by 0x28D62D51: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Block was alloc'd at
at 0x4837B65: calloc (vg_replace_malloc.c:762)
by 0x28EFB2EC: si_create_sampler_state (in radeonsi_drv_video.so)
by 0x28D62C30: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Fixes: 69430d7e59 ("va: use a compute shader for the blit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2321
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Introduce separate helper functions to set the blendfactor bits.
Lima uses bits 0-2 for the type, bit 3 sets the inverted function
and bit 4 is set if alpha is used.
alpha_src_factor and alpha_dst_factor don't need the alpha bit, so
they are masked with 0xf. There is only place for 4 bits anyway.
If alpha_src_factor is PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE, we need
to change it to PIPE_BLENDFACTOR_ONE first.
This is exactly what the blob does and we pass all
dEQP-GLES2.functional.fragment_ops.blend.* tests now.
Better than the blob btw...
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
This patch changes lower_to_cssa to be much more conservative
about assumptions which phi operands might interfere.
Previously, this pass wasn't exhaustive and could miss some corner cases.
v2: remove optimizations to find better insertion points as it's hard
to guarantee that they are always correct and have overall no benefit.
Fixes: 0b8216b2cd ('aco: Lower to CSSA')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>
Some applications explicitly call glTex[ture]Parameteri[v] to set
GL_TEXTURE_MAX_LEVEL and GL_TEXTURE_BASE_LEVEL before uploading any
texture data. Core Mesa initializes MaxLevel to 1000, so if it isn't
that, we know they've set it. (We check for < TEXTURE_MAX_LEVELS to
avoid hardcoding that value, however.)
If MaxLevel - BaseLevel > 0, then the app is trying to tell us that
this texture is going to have multiple miplevels. In that case, go
ahead and allocate the space for it.
Avoids many resource_copy_region calls at texture finalization time
in the Civilization VI benchmark.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>
No stride / no attributes means that nothing is being written to the
buffer. However it might still prevent primitives from being written out
to the other buffers. Disabling it entirely seems to fix it.
Fixes GTF-GL45.gtf30.GL3Tests.transform_feedback.transform_feedback_overflow
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The existing liveness analysis in ppir still ultimately relies on a
single continuous live_in and live_out range per register and was
observed to be the bottleneck for register allocation on complicated
examples with several control flow blocks.
The use of live_in and live_out ranges was fine before ppir got control
flow, but now it ends up creating unnecessary interferences as live_in
and live_out ranges may span across entire blocks after blocks get
placed sequentially.
This new liveness analysis implementation generates a set of live
variables at each program point; before and after each instruction and
beginning and end of each block.
This is a global analysis and propagates the sets of live registers
across blocks independently of their sequence.
The resulting sets optimally represent all variables that cannot share a
register at each program point, so can be directly translated as
interferences to the register allocator.
Special care has to be taken with non-ssa registers. In order to
properly define their live range, their alive components also need to be
tracked. Therefore ppir can't use simple bitsets to keep track of live
registers.
The algorithm uses an auxiliary set data structure to keep track of the
live registers. The initial implementation used only trivial arrays,
however regalloc execution time was then prohibitive (>1minute on
Cortex-A53) on extreme benchmarks with hundreds of instructions,
hundreds of registers and several spilling iterations, mostly due to the
n^2 complexity to generate the interferences from the live sets. Since
the live registers set are only a very sparse subset of all registers at
each instruction, iterating only over this subset allows it to run very
fast again (a couple of seconds for the same benchmark).
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
There are some cases in shades using control flow where the varying load
is cloned to every block, and then the original node is left orphan.
This is not harmful for program execution, but it complicates analysis
for register allocation as there is now a case of writing to a register
that is never read.
While ppir doesn't have a dead code elimination pass for its own
optimizations and it is not hard to detect when we cloned the last load,
let's remove it early.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
After flushing batches, iris_fence_flush() asks the kernel whether
each batch's last_syncpt has already signalled or not. (The idea is
that either the compute or render batch may not have actually had any
work queued up, so last_syncpt there might have been signalled a long
time ago.) If it's already completed, we don't bother to record it.
A strange corner is the case of repeated flushes. For example, we
might flush for some reason, and hit a glFlush(), and hit SwapBuffers.
It's possible for all the batches to have been flushed previously, -and-
for them to have actually completed. In this case, we'll see that there
are no syncobj's to wait on, and record fence->count == 0.
This works fine internally - fence_finish can see count == 0 and realize
that it doesn't need to wait, for example. But when working with native
FDs, we may be asked to export a fence with count == 0. So we need an
actual synchronization primitive we can hand off. Because all of the
relevant batches had been signalled when creating the fence, we want the
new dummy fence to be signalled as well.
So we just make a signalled syncobj and export it.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Currently the schedule_program implementation being used is picked
at compile time, which on the Android platform means that the
bifrost compiler & scheduler is used for all targets, including
midgard based hardware.
This commit disambiguates between the two schedule_program functions.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
get_nir_image_intrinsic_image() was incorrectly mutating the value held
by the register which holds the intrinsic's first source (image index).
If this happened to be the register for an SSA def which is also used
elsewhere in the program, this meant that we would clobber that value
in subsequent uses.
Note that this only affects i965, because neither anv nor iris use the
binding table start sections, so nothing is ever added here.
Fixes KHR-GL46.compute_shader.resources-max on i965 with Eric Anholt's
MR !3240 applied. That MR reorders SSBOs and ABOs, so that test uses
image 0 and SSBO 0, causing this code to brilliantly add binding table
index 45 to both the image (correct) and the SSBO (bzzt, wrong!).
Fixes: 09f1de97a7 ("anv,i965: Lower away image derefs in the driver")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404>
Vulkan 1.2 introduces some new structures to get the properties and
features of a device from extensions that were promoted to core in 1.1
and 1.2. This commit implements the new property queries and makes all
of the corresponding extension queries map to them.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Vulkan 1.2 introduces some new structures to get the properties and
features of a device from extensions that were promoted to core in 1.1
and 1.2. This commit implements the new feature queries and makes all
of the corresponding extension queries map to them.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
This is required for the subgroupBroadcastDynamicId feature that was
added in Vulkan 1.2.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2 (Jason Ekstrand):
- Add duplicate hooks for both the 1.2 and KHR versions of
vkCmdDraw[Indexed]IndirectCount.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It doesn't really support any Vulkan properly yet so why not claim 1.2?
This was an easier way of fixing the build than trying to roll it
forward to a later version of ANV's entrypoint generator scripts.
They were causing trouble with Marge Bot: The project settings require
that the pipeline succeeds before a merge request (MR) can be merged,
otherwise Marge doesn't wait for the pipeline to succeed before merging
an MR assigned to her. But Marge can't start manual jobs, so she would
always time out waiting for pipelines with manual jobs.
To avoid this, use these rules:
* Run the pipeline by default for MRs and main project branches changing
any files affecting it.
* For other MRs, run a single dummy job which always succeeds.
* Don't run any jobs for main project branch changes (e.g. from an MR
having been merged) not affecting the pipeline.
* Allow jobs to be started manually on branches of forked projects, as
before.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3361>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3361>
To implement NIR-to-TGSI, we need to be able to get the size of the
uniform variable for the TGSI declaration, not just the
.driver_location. With its location in mesa/st, drivers couldn't link
to it from nir-to-tgsi.
This feels like a common enough function to want, so let's share it in
the core compiler.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297>
The only bit that gallium varied on was handling of bindless. We can
retain previous behavior for count_attribute_slots() by passing in
"true" (though I suspect this is just giving a silly answer to a silly
question), and delete our recursive function from mesa/st.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297>
PhysicalStorageBuffer is lowered to nir_var_mem_global, and
SPIR-V 1.5rev1 in section "3.25. Memory Semantics <id>" says
UniformMemory
Apply the memory-ordering constraints to StorageBuffer,
PhysicalStorageBuffer, or Uniform Storage Class memory.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3322>
When possible, get rid of an s_not when all it does is invert the SCC,
and its successor s_cbranch / s_cselect can be inverted instead.
Also modify some parts of instruction_selection to take advantage of
this feature.
Example:
s2: %3900, s1: %3899:scc = s_andn2_b64 %0:exec, %406
s2: %3902 = s_cselect_b64 -1, 0, %3900:scc
s2: %407, s1: %3903:scc = s_not_b64 %3902
s2: %3906, s1: %3905:scc = s_and_b64 %407, %0:exec
p_cbranch_z %3905:scc
Can now be optimized to:
s2: %3900, s1: %3899:scc = s_andn2_b64 %0:exec, %406
p_cbranch_nz %3900:scc
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Previously all booleans needed an s_and with exec when they were turned
into a scalar condition. However, this is not needed for uniform booleans.
v2 by Daniel Schürmann:
- Make the code more readable
v3 by Timur Kristóf:
- Fix regressions, make it work in wave32 mode
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
By adding an extra instruction, we can replace the operands of
the s_cselect_b64, which allows it to get picked up by the
optimizer when it looks for uniform booleans.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
As pointed out by Boris, what we were calling PAN_LINEAR depth textures
was in fact u-interleaved tiled (!), but we never noticed since we
flipped the flag used for sampling, leading to all sorts of fun bugs
when attempting to directly acess depth textures from the CPU. Which
begs the question -- if what we called LINEAR was tiled, how do we
actually render linear depth textures? It turns out the flags for AFBC
form a mali_block_format 2-bit code just like their render-target
counterparts, so we can render to any of the above.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393>
Fixes: b390ff3517 ("intel/fs: Add support for SLM fence in Gen11")
Fixes: e142061399 ("intel/fs: Implement scoped_memory_barrier")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The assertion was failing.
Fixes: 363b4027fc - radeonsi: put up to 5 VBO descriptors into user SGPRs
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Using needs_vop3 check was flawed because it would only combine the
literal if the first operand is the literal. If the second or third
operand is the literal, then needs_vop3 will be true and the literal will
not be combined.
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 782051 -> 782051 (0.00 %)
VGPRS: 630048 -> 630048 (0.00 %)
Spilled SGPRs: 195 -> 195 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 54743740 -> 54585548 (-0.29 %) bytes
Max Waves: 67340 -> 67340 (0.00 %)
Instructions: 10182030 -> 10182030 (0.00 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 701990 -> 699590 (-0.34 %)
VGPRS: 566632 -> 566784 (0.03 %)
Spilled SGPRs: 218 -> 218 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 49173564 -> 49007856 (-0.34 %) bytes
Max Waves: 59650 -> 59612 (-0.06 %)
Instructions: 9315135 -> 9293330 (-0.23 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
We could create VALU instructions which read two sgprs, but only if isel
created an instruction which already read one of them.
This change is in a separate patch from the apply_sgprs() rewrite so that
it can be tested if the rewrite affected anything.
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1756 -> 1708 (-2.73 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 312 -> 300 (-3.85 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1784 -> 1736 (-2.69 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 319 -> 307 (-3.76 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
On some platforms (like Win64), unsigned long is 32-bit, so the first
cast doesn't do anything, and the compiler complains about an implicit
cast to a smaller type. So let's cast to an uintptr_t instead first,
as that's large enough on all platforms.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I get warnings on MSVC for these implicit casts. Let's use explicit
casts instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We currently initialize this float-array with double-literals. Some
compilers generate warnings for this, so let's switch these to
float-literals instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Lower 8 bits of unknown_1_3 seems to be min_lod,
rest of 4 bits + miplevels are max_lod and min_mipfilter seems to be
lod bias. All are in fixed format with 4 bit integer and 4 bit fraction,
lod_bias also has sign bit.
Blob also seems to do some magic with lod_bias if min filter is nearest --
it adds 0.5 to lod_bias in this case. Same story when all filters are
nearest and mipmapping is enabled, but in this case it subtracts 1/16
from lod_bias.
Fixes 134 dEQP tests in dEQP-GLES2.functional.texture.*
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3359>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3359>
This stops using driGetRendererString() in favor of a simple snprintf().
This should have the same functionality on 64-bit systems, but drops
a "x86/MMX/SSE2" suffix on 32-bit systems. (People shouldn't be using
the GL_RENDERER string to check for CPU features...)
We also use gen_get_device_name() instead of PCI ID list munging.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3371>
This reverts commit 4cda61f11e for now,
as it appears to break i965 CI (32,000+ failures). Rob and I suspect
we need to do the equivalent of 1c6a2efa06
on i965 - we are doing nir_lower_tex and brw_nir_lower_resources in the
wrong order and that's likely triggering this condition. Once we fix
that, we should put this patch back.
This doesn't change behavior, but makes the code a bit easier to read.
Both values are zero, but I somehow swapped the logical meaning of them
when initializing.
This is needed to implement the EXT_EGL_image_storage spec:
"If <target> is GL_TEXTURE_2D, then the resultant texture must have a
sized internal format which is colorspace and size compatible with the
dma-buf. If the GL is unable to determine such a format, the error
INVALID_OPERATION is generated."
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Check various parts of the EXT_EGL_image_storage spec, and add a
new vfunc for drivers implementing it.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The major differences between EXT_EGL_image_storage and
EGLImageTargetTexture2DOES are:
(1) The texture target is made immutable
(2) EXT_EGL_image_storage supports non-2D targets.
We can reuse EGLImageTargetTexture2D and FreeTextureImageBuffer
for (1) pretty easily. For (2), let's just not support the
complicated targets. Let's reuse aspects of the
EGLImageTargetTexture2DOES implementation.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Now that both GLSL and SPIR-V are adding shared and tcs_patch barriers
(as appropreate) prior to the nir_intrinsic_barrier, we don't need to do
it ourselves in the back-end. This reverts commit
26e950a5de01564e3b5f2148ae994454ae5205fe.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
The GLSL barrier() intrinsic does an implicit shared memory barrier in
compute shaders and an implicit TCS patch output barrier in tessellation
control shaders. We'd like NIR's barrier intrinsic to just be a control
flow barrier and not have memory implications. To satisfy this, we need
to add an extra memory barrier in front of each nir_intrinsic_barrier.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
As per the Vulkan memory model, the proper translation of GLSL barrier()
is an OpControlBarrier with a scope of Workgroup and semantics of
Acquire, Release, and WorkgroupMemory. Older versions of GLSLang gave
an OpControlBarrier with semantics of None so we need to patch it up on
those versions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
This re-enables and fixes support for stencil buffer.
It fixes 365 stencil related deqp tests. All tests that use INCR, INCR_WRAR,
DECR and DECR_WRAP as a stencil op still fail, but they also fail with the
blob, so we may ignore that for now.
We still have dEQP-GLES2.functional.depth_stencil_clear.depth_stencil_masked
failing, which is strange because it's the only one out of the
depth_stencil_clear.* set.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
This field is for the primitive ID export to the fragment shader.
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can't be enabled for geometry shaders, for NGG streamout and
for vertex shaders that export the primitive ID. NGG passthrough
requires that LDS isn't used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Per the semi-recently-released NVIDIA docs, when this bit is not
enabled, then the result for RT[0] will be used. So if e.g. only a
single RT is drawn to and it's not RT[2], the results will not be
visible. Fixes
GTF-GL45.gtf33.GL3Tests.explicit_attrib_location.explicit_attrib_location_pipeline
which was failing due to a frag shader outputting only to location=2.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This corresponds to gl_PrimitiveID and gl_Layer. When both of these are
stored in a single AST.64 or AST.128 operation, then it appears as
though the whole store fails. Fixes the recently extended
glsl-1.50-transform-feedback-builtins piglit, and also
gtf30.GL3Tests.transform_feedback.transform_feedback_builtins.
The issue was reproduced on GM107 and GP108 but not GK208 nor GK104.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Perhaps in a future implementation, such events could be passed back to
the driver, or queried directly. However for now, this is required for
GL 4.3 robustness contexts.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The fix was found by Karol Herbst a long time ago, but it was unclear
why it helped or if it would create additional problems. This change
adds a comment that explains what's going on, and in the process also
normalizes the nv50 implementation to match.
The coordinates which are fed to gl_Position map directly to pixel
coordinates, since the viewport transform is disabled. If the
framebuffer is MSAA, then that doesn't affect the pixel coordinates at
all, it's just that each pixel has multiple samples.
Note that this makes it really clear that this approach is inappropriate
for EXT_framebuffer_multisample_blit_scaled, and also the 3d path will
fail terribly for direct copies. Thankfully the 2d path normally takes
care of this.
Fixes KHR-GL43.packed_depth_stencil.blit.depth32f_stencil8 as well as
scaling issues in a number of EXT_framebuffer_multisample-related piglit
tests (although they continue to fail due to inaccuracies).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
lima doesn't support alpha test, flat shading, two-sided color nor
clip planes. We can enable these caps when corresponding hw features
are implemented in the driver.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Fixes some of dEQP-GLES2.functional.polygon_offset.* tests and shadows in Q3A.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Apparently Mali4x0 doesn't do viewport clipping, so anything rendered beyond viewport
is still rendered. Looks like we need to use scissors to do clipping.
Fixes most of dEQP-GLES2.functional.clipping.*, 6 out of 7 remaining failures
fail on blob as well. Remaining [1] fails on many other gallium drivers.
[1] dEQP-GLES2.functional.clipping.triangle_vertex.clip_three.clip_neg_x_neg_z_and_pos_x_pos_z_and_neg_x_neg_y_pos_z
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Apparently it doesn't depend on primitive type, the value
only depends on whether we specify point size via PLBU command --
bit 12 is set in this case
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
The state value of main_uniform_storage_index will be wrong for
add_parameter() when find_and_update_previous_uniform_storage()
finds a uniform if there is more than 1 uniform used in
multiple shader stages.
The new code is also simpler.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode,
as opposed to s2 (an SGPR-pair) in wave64 mode.
A couple of cases where this should have been fixed were omitted from
the previous patch by mistake.
Fixes: e0bcefc3a0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.
This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The liveness calculation done by the local CSE pass in order to prune
AEB entries whose sources are no longer live is currently inaccurate,
because the live intervals are calculated once at the beginning of the
pass, so they don't take into account any of the copy instructions
inserted by the CSE pass as it makes progress. However the IP counter
used in that calculation is based on the start_ip of the basic block,
which is updated automatically whenever any instructions are inserted
into the CFG. This causes the IP counter and liveness intervals to
get out of sync in programs with multiple basic blocks, causing the
CSE pass to toss AEB entries prematurely, which can lead to missed
optimization opportunities rather non-deterministically.
On BDW this leads to the following shader-db changes:
total instructions in shared programs: 14952488 -> 14951763 (-0.00%)
instructions in affected programs: 45416 -> 44691 (-1.60%)
helped: 40
HURT: 4
total spills in shared programs: 20989 -> 20970 (-0.09%)
spills in affected programs: 103 -> 84 (-18.45%)
helped: 3
HURT: 0
total fills in shared programs: 24981 -> 24926 (-0.22%)
fills in affected programs: 127 -> 72 (-43.31%)
helped: 3
HURT: 0
In addition it avoids a number of regressions in combination with some
of the optimization changes I'm working on for SIMD32, which would
have made CSE more effective... Causing it to be less effective
elsewhere in the program astonishingly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For uniform sample ID, only the first channel of msg_data will be
initialized. We need to pass that component only to the SEND message
for SIMD lowering to unzip the descriptor source correctly.
Fixes several dozens of conformance test failures with SIMD32 fragment
shaders enabled, including:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.dynamic_sample_number.*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message. In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function. This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.
Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.
The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents invalid code from being emitted for ROR/ROL instructions in
SIMD32 shaders.
The problem can be reproduced with the following tests while forcing
SIMD32 to be used for fragment shaders:
piglit.shaders.glsl-rotate-left
piglit.shaders.glsl-rotate-right
However the issue could occur in production already with compute
shaders and a workgroup size large enough to trigger SIMD32 dispatch.
Fixes: 83fdec0f0d "intel/compiler: Enable the emission of ROR/ROL instructions"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The current implementation was broken for any integers between 2^24
and 2^30 (it would return zero for me on ICL). The reason is that for
such integers we wouldn't take the 'if (0 <= shiftCount)' early return
path, however 'shiftCount + 7' would be positive, leading to a
negative 'count' argument passed to __shift64RightJamming(), which
would give undefined results.
This reworks the affected conversion functions to use either
__shortShift64Left() or __shift64RightJamming() based on the sign of
the final shift count, which should avoid the problem. In addition
this should qualify as a clean-up/optimization -- This implementation
of the conversion functions translates to 7 instructions less than the
original on Intel hardware.
This fixes the 'KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot'
conformance tests on soft fp64 hardware with large enough subgroup
size (>16).
Fixes: d5cf6e92b4 "glsl: Add built-in functions to do uint64_to_fp32(uint64_t)"
Fixes: c9d333a6b7 "glsl: Add built-in functions to do int64_to_fp32(int64_t)"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
This patch prevents memory leak in get_version function in st_manager.c
This issue was found by valgrind:
16 bytes in 1 blocks are definitely lost in loss record 6 of 1,418
at 0x483CD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x63D9476: st_init_extensions (st_extensions.c:1679)
by 0x63B803B: get_version (st_manager.c:1271)
by 0x63B8124: st_api_query_versions (st_manager.c:1289)
by 0x63266EF: dri_init_screen_helper (dri_screen.c:583)
by 0x6321B12: dri2_init_screen (dri2.c:2110)
by 0x631AACC: driCreateNewScreen2 (dri_util.c:155)
by 0x5D58192: dri3_create_screen (dri3_glx.c:897)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
Fixes: eca8032f20 ("gallium: Add ARB_gl_spirv support")
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Previously, when cluster_size was set to 0, it always worked as if
the cluster size was 64. This commit fixes it in wave32 mode by
changing to work as if the cluster size was set to 32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We did not take into account if name is NULL, so we could dereference
a NULL pointer in strncmp() call.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We can only sample from 24-bit packed format and can't render into it and
it causes chromium-based browsers to fail when they create FBO with GL_RGB
format. Drop R8G8B8 alltogether so mesa can promote it to RGBX format.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
There's no reason to hand-roll all of the memory re-allocation fall-back
code for compute shaders. It's just duplicated complexity. This also
makes it more clear in flush_compute_state where the
MEDIA_INTERFACE_DESCRIPTOR_LOAD command gets emitted relative to other
packets in the command stream.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Because Gen7 push constants are already relative to dynamic state base
address, they aren't really an address. It's deceptive to return an
address from the helper function. Instead, let's leave it as a
special-case in the gen7-11 helper; we don't need the helper for code
de-duplication for Gen7 anyway.
Fixes: 67d2cb3e93 "anv: Add get_push_range_address() helper"
Closes: #2323
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Add debug flag to disable tiling. Note that it prevents lima from creating
tiled buffers, but it's still able to import them if modifier is specified
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Use linear layout for shared buffers if modifier is not specified
and use linear layout when importing buffers with invalid modifier.
Fixes: 01a451b04d ("lima: handle DRM_FORMAT_MOD_INVALID in resource_from_handle()")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Setting up transitive conflicts between a full register and its two
half registers (eg r0.x and hr0.x and hr0.y) will make the half
registers conflict. They don't actually conflict and this prevents us
from using both at the same time.
Add and use a new ra helper that sets up transitive conflicts between
a register and its subregisters, except it carefully avoids the
subregister conflict.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
The availability is not written at the location changed in
ee6fbb95a74d...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ee6fbb95a7 ("anv: Properly handle host query reset of performance queries")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use image_load_mip and image_store_mip respectively if the lod
parameter isn't zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
SPV_AMD_shader_image_load_store_lod allows to use a lod parameter
with OpImageRead, OpImageWrite and OpImageSparseRead.
According to the specification, this parameter should be a 32-bit
integer. It is initialized to 0 when no lod parameter is found
during SPIR-V->NIR translation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a cleanup but also a fix for commit dd09f1d806. In case of
i965 we did not actually create hash for cached shader programs.
Fixes: dd09f1d806 "mesa/st/i965: add a ProgramResourceHash for quicker resource lookup"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When decoding using VDPAU, the _MaxLevel value becomes -1 due to
NumLevels being equal to 0 at a certain point, and decoding fails
due to an assertion later on.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
PIPE_CAP_MAX_VERTEX_BUFFERS already sets the maximum vertex_buffer_index.
There's no need to error on num_elements == 0 (if that can even happen).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fixes:
dEQP-GLES3.functional.draw.draw_arrays_instanced.*
dEQP-GLES3.functional.draw.draw_elements_instanced.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Sometimes using user buffer (not VBO) e.g. glVertexPointer
one thread could free memory before other thread used it.
Instead of copying this memory to driver simplier thing is
to block until draw finish.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Only the blocks that are reachable are inserted with an end_nop
instruction at the end.
When handling the Phi second pass, if the Phi has a parent block that
does not have an end_nop then it means this block is unreachable, and
thus we can ignore it, as the Phi will never come through it.
Fixes dEQP-VK.graphicsfuzz.uninit-element-cast-in-loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
e5167a9276 disabled SDMA for gfx8.
This caused 3 piglit arb_sparse_buffer tests (basic, buffer-data
and commit) to crash on GFX8.
Reported-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: e5167a9276 ("radeonsi: disable SDMA on gfx8 to fix corruption on RX 580")
From issue 10 of the OES_EGL_image_external_essl3:
A limited set of use-cases is enabled by making glBindImageTexture
accept external textures. Shaders can access such external textures
using the existing <image2D> sampler type.
Fixes: 02a6d901ee ("mesa: add OES_EGL_image_external_essl3 support")
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Our barrier instruction does not implicitly do a memory fence but the
GLSL barrier() intrinsic is supposed to. The easiest back-portable
solution is to just add the NIR barriers. We'll sort this out more
properly in later commits.
Cc: mesa-stable@lists.freedesktop.orgCloses: #2138
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes a hang on Raven with Resident Evil 2.
I did not find anything more restricted to fix it:
- Setting persistent_states_per_bin to 1 fixes it too,
but likely does an internal break on any descriptor set changes
too.
- Only breaking the batch when cb_target_mask changes does not fix
it (and looking at AMDVLK comments, I suspect the code in radeonsi
should really be doing a FLUSH_DFSM).
- Always doing a FLUSH_DFSM on shader switch helps, but that is more
often than this and I don't think we should be doing that when DFSM
is disabled.
- Also emitting the existing break on framebuffer change when DFSM is
disabled does not fix the issue.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2315
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
They're not available for Debian buster yet, so we have to use upstream
snapshot packages again.
In contrast to earlier, we now store the LLVM APT repository key in Git
instead of re-downloading it every time.
We need to mindful that we don't clobber the shadow comparator.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darrayshadow_*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
The hardware separates face selection and array indexing, it looks like,
whereas Gallium smushes them together with some modulus fun. Let's fix
it so mipmapped 2D arrays work without regressing cubemaps.
Fixes dEQP-GLES3.functional.texture.filtering.2d_array.* among others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This is adapted from the GLSL IR code but doesn't need to
iterate over the IR. I believe this also fixes a potential bug in
the GLSL IR code which potentially counts the same output twice.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This will allow us to do some linking in NIR that was previously
done by the GLSL IR linker. To start with this just has calls for
linking atomics.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is pretty much a copy of link_check_atomic_counter_resources()
updated to work with the NIR linker.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
A NIR based glsl linking function will be too different to the
spirv version to bother attempting any sharing. So lets change
the name to be explicit.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The gl_nir_lower_buffers pass relies on recognizing the same literal
constants as the GLSL compiler so that constant buffer array indices
are constant in nir as well. Without this, get_block_array_index()
would see
vec1 32 ssa_723 = deref_var &const_temp@1 (function_temp int)
vec1 32 ssa_724 = load_const (0x00000001 /* 0.000000 */)
...
vec1 32 ssa_5 = deref_var &const_temp@1 (function_temp int)
vec1 32 ssa_6 = intrinsic load_deref (ssa_5) (0) /* access=0 */
vec1 32 ssa_7 = deref_var &blockB (ssbo BlockB[1])
vec1 32 ssa_8 = deref_array &(*ssa_7)[ssa_6] (ssbo BlockB) /* &blockB[ssa_6] */
instead of a literal 1, and ultimately generate the block name
BlockB[0]. That used to work, since we before the previous commits
we'd compact the block binding points and names. Thus, there would
always be a BlockB[0].
Now, if an entry in a block array isn't used, we don't generate that
block name, which means that if entry 0 isn't used BlockB[0] isn't
present and then get_block_array_index() fails to find the block.
In most cases we would have dealt with this in the call to
st_nir_opts() in st_nir_link_shaders(), but in the num_shaders == 1
case (for example, compute) we would call gl_nir_lower_buffers()
before we lowered GLSL constants. Move that corner case up next to
where we call st_nir_link_shaders() so we call st_nir_opts() at the
same point in the flow for all shaders.
Fixes: dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When SSBO array is used with packed layout, both IR tree
and as a result, NIR tree will be incorrect.
In fact, the SSBO dereference indices won't
match the array size in some cases like the following:
"layout(packed, binding=1) buffer SSBO { vec4 a; } ssbo[3];
out vec4 color;
void main() {
color = ssbo[2].a;
}"
After linking the IR and then NIR will have an SSBO array
definition with size 1 but dereference still will have index 2
and linked_shader->Program->sh.ShaderStorageBlocks
will contain just SSBO with name "SSBO[2]"
So this line should be removed at least as a workaround for now
to avoid error like:
Failed to find the block by name "SSBO[0]"
Fixes: 810dde2a "glsl/nir: Add a pass to lower UBO and SSBO access"
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is needed to be in agreement with spec requirements:
https://github.com/KhronosGroup/OpenGL-API/issues/46
Piers Daniell:
"We discussed this in the OpenGL/ES working group meeting
and agreed that eliminating unused elements from the interface
block array is not desirable. There is no statement in the spec
that this takes place and it would be highly implementation
dependent if it happens. If the application has an "interface"
in the shader they need to match up with the API it would be
quite confusing to have the binding point get compacted.
So the answer is no, the binding points aren't affected by
unused elements in the interface block array."
v2: - 'original_dim_size' field moved above to keep
the struct packed better on 64-bit
- added a comment for 'total_num_array_elements' field
- fixed a binding point calculations for SSBOs array of arrays
( Ian Romanick <ian.d.romanick@intel.com> )
- fixed binding point calculations for non-packed SSBOs
v3:
- rename 'total_num_array_elements' to 'aoa_size'
( Jason Ekstrand <jason@jlekstrand.net> )
- rename 'boffset' to 'binding_stride'
( Alejandro Piñeiro <apinheiro@igalia.com> )
Fixes: 8cf1333b "glsl: link uniform block arrays of arrays"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532
Reported-By: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Take one step towards sharing code between the LAVA and non-LAVA jobs,
with the goals of reducing maintenance burden and use of computational
resources.
The env var DEQP_NO_SAVE_RESULTS allows us to skip the procesing of the
XML result files, which can take a long time and is not useful in the
LAVA case as we are not uploading artifacts anywhere at the moment.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Same format code as UINT... might be different in how it's fed into a
shader but we'll deal with that when we get there.
Fixes dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.usigned_int2_10_10_10.components4_vec2_quads1
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Make it a lot more obvious what we're doing and fix more than a few
corner cases in the process.
Fixes
dEQP-GLES3.functional.buffer.map.write.render_as_index_array.pixel*, and
likely others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We use the lowering in nir_format_convert. There are native ops for this
so this is far from optimal and not remotely efficient but as with most
blend shader things right now, it's hard enough to get it working, so
let's focus on that for now. We'll make it fast later (once we have
GLES3 stable, we can start optimizing these things).
Fixes dEQP-GLES3.functional.fragment_ops.blend.fbo_srgb.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Changes the assert to match the comment above.
This assert was failing in some cases while running darkplaces.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
PP stream terminator size seems to be 4 words, it worked with full PP
stream because we align stream beginning to 32 bytes and BO is
initialized with zeroes. But with partial PP stream it sometimes break
if for new PP stream we reuse BO that has non-zero value at this place.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
We don't need to reload and redraw some tiles if framebuffer was not
cleared and scissor test was enabled for some of draws. This simple
optimization fixes cursor lag in X11
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This commit postpones PP stream generation till job is submitted.
Doing that this late allows us to skip reloading and redrawing tiles
that were not updated.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Drop assert as it is not necessary and used wrong anyway.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Updated documentation renames "Anisotropic Algorithm" to "LOD Algorithm"
and adds a note for Gen9+ saying "The EWA Algorithm should only be
enabled for Anisotropic Filtering modes." and indicating that the extra
accuracy shouldn't be necessary for other modes, and comes at a cost.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Ken): Handle platforms without sampler support for HiZ
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2 changes]
This eliminates 50% of pixels (2M) rendered for a blit in GS:GO. This
accounts for 3% of pixels rendered in the game. Total GPU clocks for
the first 900 frames of CSGO improves by 1%.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The shader code required to do this is int(sat(x) * UINT24_MAX) which
isn't really worth all the effort to avoid. Doing the format
conversion, on the other hand, prevents us from sampling with HiZ which
is something that we very much want on gen8-9 where we can.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves the descriptor based texture structs and their helpers
into the only user.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This moves the state based texture structs and their helpers
into the only user.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
There are only 13 bits available to store the line width, hence
it can't be larger than 8191
v2: Add Fixes tag
v3: - Unify value since for all r600 archs (Konstantin Kharlamov)
- Correct the value the line width value is emitted as a 12.4
fixed point value of 1/2 line width on r600-r700 and as
8 * line width on Evergreen and newer.
Fixes: 06bfb2d28f
r600: fork and import gallium/radeon
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3286>
The tiled-case is non-sensical for non-base mips, but Vulkan requires
that this function handles it but at the same time does not require
returning anything useful. So we can basically return anything.
Correct tiled pitch and offset are still required for our own WSI and
in the future getting the layouts of images with DRM format modifiers.
Both don't have to deal with images with more than 1 level though.
Fixes: 824bd0830e "radv: return the correct pitch for linear mipmaps on GFX10"
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2301
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2304
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I copy and pasted some of the boilerplate but never the implementation.
For now, ASTC 5x5 is disabled and faked via uncompressed RGBA; let's
delete these remnants until such a time when we implement it properly.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Intel Gen9 hardware has some nasty restrictions where ASTC 5x5 formats
and color compression can't both live in the sampler cache at the same
time. To properly support it, we have to track which of those exist
in the cache and flush ASTC out or resolve away compression.
As far as I'm aware, very little uses ASTC 5x5 textures, so instead
of replicating all that for iris, we simply turn it off and rely on
the Gallium fallback mechanism to fake it via uncompressed RGBA.
This should avoid GPU hangs any time people use ASTC 5x5 with CCS.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This patch allows us to fake ASTC 5x5 specifically, while leaving the
other ASTC LDR formats with native support. I plan to use this in iris,
at least for the time being.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We already have a helper for this, so let's use that instead of rolling
our own version.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Paul Cercueil <paul@crapouillou.net>
Etnaviv also does the same thing, so let's try to avoid repetition here,
and use the same for it code as well.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Paul Cercueil <paul@crapouillou.net>
Not 100% sure if this matches the semantics, but it seems to pass the
tests, so it seems like an improvement.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
According to the description of VkGraphicsPipelineCreateInfo(),
pViewportState, pMultisampleState, pDepthStencilState and
pColorBlendState must be ignored when rasterization is not enabled.
This avoids potentially invalid pointers being dereferenced when
rasterization is disabled. Tested with `demos_x64 VK_Parameter_Zoo`
from Renderdoc repository.
v2: Don't store the `raster_enabled` as part of anv_pipeline, just
query it from the create info. This avoids storing a state that's
only used during pipeline creation. (Jason)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2258
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch> [v1]
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For indexed draw number of VS invocations is (ctx->max_index - ctx->min_index + 1),
so we have to use this number when calculating space for varyings, gl_Position and
gl_PointSize.
Fixes dEQP-GLES2.functional.buffer.write.use.index_array.array and
dEQP-GLES2.functional.buffer.write.use.index_array.element_array
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
All VkFoo structs are typedef'd to not need the struct keyword. Leaving
it in there is just extra characters and breaks Vulkan's aliasing when
stuff gets promoted to core versions. It's better to just never use
struct for VkFoo.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, P016 was used for the decoding of 10-bit HEVC/H.265 encoded
videos, which worked fine for mpv and ffmpeg. GStreamer specifically looks
for P010, so this patch sets the default buffer type to P010 for HEVC
decoding.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3153>
texelFetch is a requirement for OpenGL 3.0, so this gets us a step
closer to GL 3.0 support.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
With VK_AMD_mixed_attachment_samples, the number of depth/stencil
samples isn't always equal to the number of color samples. Adjust
the number of Z samples when it's different but make sure to have
a consistent sample count if there are no depth/stencil attachments.
Also adjust the number of samples used for fragment shaders which is
the number of color samples if mixed attachment samples are used.
Only enabled on GFX8+ because it's untested on previous chips.
All dEQP-VK.pipeline.multisample.mixed_attachment_samples.* now pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3018>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3018>
Since 18a8c3f7f1 we don't create a driver CSO if there are any
incompatible elements, so only ask backends to delete it if it exists.
Fixes multiple CTS crashes in V3D.
Fixes: 18a8c3f7f1 ("u_vbuf: Only create driver CSO if no incompatible elements")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
i965 and iris use inputs_read/outputs_written for a shader stage to
determine the layout of input and output storage. Adjacent stages must
agree on the layout, so adjacent input/output bitfields must match.
This patch adds a new nir_shader_compiler_options::unify_interfaces
flag which asks the linker to unify the input/output interfaces between
adjacent stages.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>
They need a very particular form; the naive way we did before is not
sufficient in practice, it doesn't look like. So let's follow the rough
structure of the blob's writeout since this is fixed code anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This still may not be perfect (in the sense that legal shaders might
still get cut off) but this fits how writeout is done with both Panfrost
and the blob, so it's good enough for what we need and allows MRT
shaders to be sanely disassembled.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's a long story... but we'd try to insert constants that weren't there
and end up clobbering fields in the bundle following the constant
array...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Blend shader size and location in memory is considerably constrained,
probably to facilitate optimizations (my guess is that blend shaders are
run strictly out of i-cache). We need to pack the blend shaders for each
RT of a single framebuffer together. The easiest way to do this is at
draw time which is not terribly efficient but will hold us over for now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't handle this format yet, but we will soon, and the abort in
pan_pack_color is possible even without exposing the format... Handling
this gracefully might not be required by the spec but let's not crash.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Remove the invert on arguments to branches, and invert the branch
condition instead. This saves one instruction per inverted argument.
Closes#2088
Signed-off-by: Afonso Bordado <afonsobordado@az8.co>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The GCnano has only 4 vertex buffers instead of 16. This information
can be extracted from the GPU status registers and is already stored
in screen->specs.stream_count. Use PIPE_CAP_MAX_VERTEX_BUFFERS to
report this information and permit u_vbuf to reorganize the shaders
to fit.
This fixes the following dEQP on GCnano:
dEQP-GLES2.functional.shaders.conversions.vector_combine.float_float_float_float_to_vec4_vertex
This fixes all the other dEQP-GLES2.functional.shaders.conversions.*
which used to fail on GCnano.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3241>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3241>
Fixes 240 failing test cases in dEQP-VK.spirv_assembly which
were failing due to a bad s_ashr_i32 instruction. This commit
fixes the instruction format along with the definitions of the
instruction.
Fixes: 11f43caaec
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Two competing rules for defining u_format_table.c exists,
which is an error.
Additionally the more general rule lacks the inclusion of
format/u_format.csv.
Fixes: 882ca6dfb0 ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Since we have a separate blend shader for each render target, let's
simplify this structure and reduce the options memory footprint by 88%
or something goofy like that.
Should also enable separate blending per render target.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We need to actually work out the varying format on demand, rather than
assuming rgba32f.
Fixes dEQP-GLES3.functional.fragment_out.basic.int.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tessellation Control and Evaluation shaders are implementing
tessellation and require special handling of their inputs
and outputs.
TCS can write out not only per-vertex, but also per-patch
(per-primitive) attributes and tessellation factor values
that control the tessellator.
TES can read TCS outputs, plus must be feeded with new
system values (tessellation coordinates) that are
outputs of the tessellator fixed function.
TCS can also contain calls to barrier() function (similar
to compute shaders).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
It has been unused for a while; let's just remove the abstraction.
Technically the hardware does support 32-bit job descriptors, but we
don't and we can't keep them from breaking so let's not pretend they
work.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
There's only one way to encode comparison functions in the command
stream, not two. It's just that the semantics for texture comparisons
are flipped from the semantics of stencil comparison. We can factor out
that flip to common Panfrost code, rather than tying it to a second
Gallium routine.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Somehow we have native hardware for all of these. Suspected by staring
at the bit pattern; confirmed by poking in various texture wrap modes
into the textures mesa demo and seeing what happens.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's a relic from before we understood the varying builtins. It should
never actually come up if the builtins are decoded correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These are conventions by the blob (a convention we happent to follow).
They are not at all intrinsic to the hardware, so now that the
convention is implemented within the Midgard stack, these defines are
wholly unused. Remove them.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Updates radv Makefile.sources and fixes the following building error:
external/mesa/src/amd/vulkan/radv_shader.c:1122:
error: undefined reference to 'radv_declare_shader_args'
Fixes: 3b14336 ("ac/nir, radv, radeonsi: Switch to using ac_shader_args")
Fixes: 66c703b ("radv: Move argument declaration out of nir_to_llvm")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Updates amd Makefile.sources and fixes the following building errors:
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:338: error: undefined reference to 'ac_add_arg'
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:340: error: undefined reference to 'ac_add_arg'
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:341: error: undefined reference to 'ac_add_arg'
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:342: error: undefined reference to 'ac_add_arg'
Fixes: 9885af3 ("ac: Add a shared interface between radv, radeonsi, LLVM and ACO")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
RADV Android build rules are now getting the wrong vk_format.h
from src/vulkan/util include, the simplest way to fix is to add
src/amd/vulkan include prior to src/vulkan/util include
Fixes the following building errors:
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_radv_common_intermediates/vk_format_table.c:39:4:
error: use of undeclared identifier 'VK_FORMAT_LAYOUT_PLAIN'
...
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_radv_common_intermediates/vk_format_table.c:131:8:
error: use of undeclared identifier 'VK_FORMAT_TYPE_UNSIGNED'; did you mean 'UTIL_FORMAT_TYPE_UNSIGNED'?
{VK_FORMAT_TYPE_UNSIGNED, true, false, false, 4, 0}, /* x = a */
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Fixes: 3a28281 ("util: Add a mapping from VkFormat to PIPE_FORMAT.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Updates Makefile.sources and fixes the following building error:
In file included from external/mesa/src/vulkan/util/vk_format.c:24:
In file included from external/mesa/src/vulkan/util/vk_format.h:28:
external/mesa/src/util/format/u_format.h:33:10: fatal error: 'pipe/p_format.h' file not found
#include "pipe/p_format.h"
^~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 3a28281 ("util: Add a mapping from VkFormat to PIPE_FORMAT.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It shows up as a special (magic?) attribute. We could try to be clever
and only include the extra record if gl_VertexID is actually read, but
honestly that's just extra complexity for no good reason. Might as well
just always include it; this won't be a real bottleneck, I don't think.
Fixes dEQP-GLES3.functional.shaders.builtin_variable.vertex_id.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We have special records for these, put in a fixed location by convention
per the blob.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Just like varyings have special records for point coordinates (etc),
attributes have special records for vertex/instance ID. We can parse
these fairly easily, although they don't line up exactly with normal
attribute records.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Padded counts are numbers of the form:
n = (2k + 1) * 2^s
for k, s integers. Rather than explicitly store k and s separately and
then compute this formula on demand, it's much cleaner to store the
padded number itself, which is what you manipulate most of the time.
When you do need k,s it is easy to factor by noticing the bitwise
representation:
s = ctz(n)
k = n >> (s + 1)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Slight bug with instancing. No harm done but let's get rid of the
pandecode warning, it's just noise.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The algorithm doesn't need to be tangled up in details about the
attribute records themselves. We'll need to compute magic divisors for
gl_InstanceID in a second.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
They don't need them; this will allow us to move the code into encoder/
which in turn will make the messy Gallium code less scary.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Let's follow the naming convention that panfrost command stream code is
organized by command stream structure.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These show up in some blend shaders. Let's use the shared lowering and
remove our own.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
GL ES 3.0 requires it to be higher, and stuff seems to work just fine.
Fixes: dEQP-GLES3.functional.implementation_limits.max_vertex_output_components
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We need to reshuffle to sync up the shadow coordinate temporary with the
cubemap coordinate temporary. Once that's in place, it's simple enough
(we load the shadow coordinate into .z like 2D).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
My latest divination spell has uncovered a pattern in the aether.
Although the swizzle is unaligned, its format is otherwise standard.
Document this, removing the old incorrect understanding of the swizzle
(which coincided on common special swizzles only).
Fixes dEQP-GLES3.functional.shaders.texture_functions.texelfetchoffset.sampler2d_fixed_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We zero the extra components anyway. Fixes
dEQP-GLES3.functional.shaders.texture_functions.texelfetch.sampler2d_fixed_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We may call it with sentinel values (~0 in particular) corresponding to
unused arguments; ignore these.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These days `ctx->inputs` is the split scalar input components and
`ir->inputs` is the full vecN. This got fixed in the load_input case,
but the load_interpolated_input case was missed.
Fixes: bdf6b7018c ("freedreno/ir3: re-work shader inputs/outputs")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Due to the succeeding break we would fall into some off-by-one errors.
These should be resolved now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
At the moment there's no need to actually count these but we do need a
placeholder for report.py to be happy.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We use these prefixes in panfrost shader-db and they need to match for
shader-db to be happpy.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The blob uses COMPUTE jobs for some internal purposes. These are
essentially free but panfrost doesn't use them, so it messes up the
numbering. Just filter them out.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We support the same set of samples for integer color formats as for
non-integer. We've been advertising it wrong since before the initial
Vulkan 1.0 release. :-(
Fixes: d689745303 "vk/0.210.0: Rework device features and limits"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Some of the latest changes are causing the following build error on Android:
```
external/mesa3d/src/gallium/auxiliary/nir/nir_to_tgsi_info.c:403:6:
error: redefinition of 'nir_tgsi_scan_shader'
void nir_tgsi_scan_shader(const struct nir_shader *nir,
^
external/mesa3d/src/gallium/auxiliary/nir/nir_to_tgsi_info.h:37:20:
note: previous definition is here
static inline void nir_tgsi_scan_shader(const struct nir_shader *nir,
^
```
Include nir_to_tgsi_info.c and nir_to_tgsi_info.h into the build
only if LLVM is enabled.
Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2978>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2978>
We might get asked to pitch the storage on a buffer that already has
no meaningful contents. In this case, the existing buffer is as good
as a new one.
I was passing iris keys to brw_debug_key_recompile, leading to out of
bounds memory reads.
Fixes: 2e654db27a ("iris: Create smaller program keys without legacy features")
Fix build error after llvm-10 commit 5d986953c8b9 ("[IR] Split out
target specific intrinsic enums into separate headers").
../src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp:78:37: error: ‘x86_bmi_bextr_32’ is not a member of ‘llvm::Intrinsic’
{"meta.intrinsic.BEXTR_32", Intrinsic::x86_bmi_bextr_32},
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
The GPU presents the state of the hardware front_face in internal
register 0 (i0), the range of which is 0.0f..1.0f.
This patch assigns the fragment shader input to this internal register.
Moreover, based on the internal front_ccw state, the value of the i0
register is inverted accordingly using SET.EQ/SEQ.NE instruction before
being further processed in the shader. This mimics the operation of the
NIR compiler.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2868>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2868>
More vertex buffers are used than the hardware supports. In
principle, we only need to make sure that less vertex buffers are
used, and mark some of the latter vertex buffers as incompatible.
For now, mark all vertex buffers as incompatible.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2807>
This enables Mesa to work with Ingenic SoCs through the use of the
ingenic-drm modesetting driver along with the render-only drivers,
such as Etnaviv on the JZ4770 SoC.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This introduces new vec8 and vec16 instructions (which are the only
instructions taking more than 4 sources), in order to construct 8 and 16
component vectors.
In order to avoid fixing up the non-autogenerated nir_build_alu() sites
and making them pass 16 src args for the benefit of the two instructions
that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has
nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is
used for the > 4 src args case).
v2 (Karol Herbst):
use nir_build_alu2 for vec8 and vec16
use python's array multiplication syntax
add nir_op_vec helper
simplify nir_vec
nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert
use nir_build_alu for opcodes with <= 4 sources
v3 (Karol Herbst):
fix nir_serialize
v4 (Dave Airlie):
fix serialization of glsl_type
handle vec8/16 in lowering of bools
v5 (Karol Herbst):
fix load store vectorizer
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This assert causes testing tools such as shaderdb to abort on some test
cases. This is an unsupported feature and not a compiler bug. The
compilation error is already propagated correctly, so we can remove the
assert to allow testing tools to run to completion.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3176>
ppir has some code that operates on all ppir_src variables, and for that
uses ppir_node_get_src.
lod bias support introduced a separate ppir_src that is inaccessible by
that function, causing it to be missed by the compiler in some routines.
Ultimately this caused, in some cases, a bug in const lowering:
.../pp/lower.c:42: ppir_lower_const: Assertion `src != NULL' failed.
This fix moves the ppir_srcs in ppir_load_texture_node together so they
don't get missed.
Fixes: 721d82cf06 lima/ppir: add lod-bias support
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3185>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3185>
The test here is testing whether either variable is non-zero.
While currently the test works fine, it's fragile. Replace it
with logical OR to avoid the fragility.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Currently piglit spec@arb_occlusion_query@occlusion_query_conform
spins for ever as the resource status is never reset. See
etna_hw_get_query_result(..) for more details.
Fixes: 1456aa61cc ("etnaviv: Rework resource status tracking")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Marek Vasut <marex@denx.de>
Since we are using st_common_variant while creating variant for vertext
program, we can release tokens created in st_create_vp_variant which
are already stored in respective states.
This fix memory leak found with piglit tests
Fixes bc99b22a30 ('st/mesa: use a separate VS variant for the draw module')
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Currently when lowering mod() we add an extra instruction so if
mod(a,b) == b then 0 is returned instead of b, as mathematically
mod(a,b) is in the interval [0, b).
But Vulkan spec has relaxed this restriction, and allows the result to
be in the interval [0, b].
This commit takes this in account to remove the extra instruction
required to return 0 instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
Using a drm syscall layer faking a kernel driver :
==581460== Conditional jump or move depends on uninitialised value(s)
==581460== by 0x48A4C2B: close (drm-hooks.cpp:185)
==581460== by 0x5A815F1: dri3_alloc_render_buffer (loader_dri3_helper.c:1469)
==581460== by 0x5A82050: dri3_get_buffer (loader_dri3_helper.c:1827)
==581460== by 0x5A82662: loader_dri3_get_buffers (loader_dri3_helper.c:2028)
==581460== by 0x6C78109: intel_update_image_buffers (brw_context.c:1870)
==581460== by 0x6C77805: intel_update_renderbuffers (brw_context.c:1499)
==581460== by 0x6C7789D: intel_prepare_render (brw_context.c:1520)
==581460== by 0x6C773D4: intelMakeCurrent (brw_context.c:1341)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 069fdd5f9f ("egl/x11: Support DRI3 v1.1")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3152>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3152>
And add fields uncovered by looking at the firmware. I think this covers
all the memory, register, and scratch manipulation opcodes that exist on
A6xx, plus one additional nice find for Vulkan and describing a
previously unknown opcode and documenting CP_WAIT_REG_MEM.
Note that the bits for the CP_REG_TO_MEM count, as well as the formula
for computing the actual count for both CP_REG_TO_MEM and CP_MEM_TO_REG,
are changed because the A630 SQE firmware actually does something
different. I haven't investigated older microcodes to see whether this
extends back to A5xx and A4xx, but the only non-A6xx uses of this
field result in the same bit-pattern when using the A6xx bit range and
formula, so it should be safe to change the definition universally.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3116>
For the sake of our testing infrastructure, disable this extension
for TGL until we can sort out a hang in Vulkan CTS.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older
platforms that don't support MUL with 32x32 types and use vec4.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms
that don't support MUL with 32x32 types, including ICL and TGL.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With this patch, GCC generates vectorized code that does the comparisons
without converting the indices to 32-bit first.
This optimization makes the aforementioned function almost twice as fast
for ARM NEON, and should speed up vectorised code on other platforms.
Without vectorisation, the function is still a percent or two faster,
but slightly larger.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>
This fixes a bug with NGG that is probably harmless.
Basically, !is_monolithic makes the VS prolog emit
llvm.amdgcn.init.exec.from.input, which sets the EXEC mask to only enable
ES threads. In the NGG non-GS case, the GS threads <= ES threads, so it was
never an issue.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by
nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
If for some reason the fence associated with an image doesn't signal,
we're likely in a device lost scenario, we should report that error.
We can't really wait for a given amount of time because we could get a
timeout and that is not a valid error to report for vkQueuePresentKHR,
so just wait forever.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/830
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I'm honestly unsure what this is for, but it's needed on MFBD systems
for unknown reasons, at least when MRT is actually in use and then
sometimes without MRT (it fixes a blend shader issue on T760?)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
Epilogues are special fixed-function blocks, so they need special
handling for liveness analysis to work completely. This in turns fixes
RA issues for many shaders using MRT.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
The flow is considerably more complicated. Instead of one writeout loop
like usual, we have a separate write loop for each render target. This
requires some scheduling shenanigans to get right.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
V3D can do indirect inputs so we don't need it. Also, the lowering
produces horrible if-ladder code that is particularly bad for geometry
shaders where inputs are always arrays and shader bodies usually have
a loop indexing into them.
This fixes a couple of geometry shader tests in CTS that would fail to
register allocate otherwise.
There are no changes in shader-db.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
With geometry shaders the number of emitted primitived is decided
at run time, so we cannot precompute it in the CPU and we need to
use the PRIMITIVE_COUNTS_FEEDBACK commands to have the GPU provide
the number like we do for the number of primitives written to
transform feedback. This may have a performance impact though, since
it requires a sync wait for the draw to complete, so we only do
it when geometry shaders are present.
v2: remove '> 0' comparison for ponter type (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When doing layered rendering the binning stage will prepare per-tile
lists for each layer in the framebuffer, so we need to make sure
we allocate enough space for them .
We also need to emit the NUMBER_OF_LAYERS packet. This is required
even when the number of layers is only 1, otherwise the simulator
detects buffer overflows in the tile_state BO during some CTS test
cases involving layered FBOs.
When rendering, we need to emit commands for each layer of the
framebuffer separately and make sure we address the correct layers for
each one.
v2: fixed typo in comment (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
For layered rendering we need to emit per layer rendering commands
lists so we we can end up requiring a fairly large buffer for this
if the number of layers is large enough.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
OES_geometry_shader introduced the concept of layered framebuffers.
Removing this assertion gets a bunch of CTS tests to pass. We will
also need layered images to implement layered rendering with geometry
shaders.
v2: fix typo in commit message (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we program an output size of 0 the simulator asserts. This was
not a problem until now because our VS would always have to
emit fixed function outputs, however, now that it can be paired
with a GS we can end up with a VS shader that no longer emits
any outputs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Geometry shaders can output many vertices and thus have higher VPM memory
pressure as a result. It is possible that too wide geometry shader dispatches
exceed the maximum available VPM output allocated, in which case we need
to reduce the dispatch width until we can fit the VPM memory requirements.
Supported dispatch widths for geometry shaders are 16, 8, 4, 1.
There is a limit in the number of VPM output sectors that can be used by a
geometry shader that we can meet by lowering the dispatch width at compile
time, however, at draw time we need to revisit this number and, together with
other elements that can contribute to total VPM memory requirements, decide
on a configuration that can fit the program into the available VPM memory.
Ideally, we also want to aim for not using more than half of the available
memory so we that we can run a pair of bin and render programs in parallel.
v2: fixed language in comment and typo in commit log. (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
According to the documentation, the 1-way dispatch width is only supported
with geometry shaders.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is good enough to get basic GS workloads working, later patches will
improve this by adding instancing support, proper SIMD configuration, etc.
Notice that most of the TESSELLATION_GEOMETRY_SHADER_PARAMS fields are only
relevant when tessellation shaders are present. We do not support tessellation
yet, but we still need to fill in these tessellation state with default values
since our packing functions require some of these to have non-zero values.
v2:
- Add a comment in the code explaining why we fill in
tessellation fields (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Every code address starts at bit 3 (addresses must be 64-bit aligned),
with the first 3 bits used to specify threading and NaN propagation
parameters for the shader program.
We generally skip "reserved" bits, however, doing this when the
reserved field is the last in a struct and it is large enough can
make us compute incorrect (smaller) struct sizes which can
lead to corrupt CLs. In particular, the "Tess/Geom Common Params"
struct has a reserved field at the end that is 8-bit, so if we
don't include this we compute a packet size that is 1 byte smaller
than it shold, making the next packet we emit start 1 byte
earlier and therefore leading to incorrect CL data from that point
forward.
The name of one of the fields was not correct.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.
The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.
When vertex shaders are paired with geometry shaders we also need
to consider the following:
- Only geometry shaders emit fixed function outputs.
- The coordinate shader used for the vertex stage during binning must
not drop varyings other than those used by transform feedback, since
these may be read by the binning GS.
v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
BEGIN_RING() could decide we can't fit the next packet in the current
cmdstream segment, and grow a new segment. So we need to grab ring->cur
*after* BEGIN_RING(), otherwise we are writing cmdstream past the end of
the previous segment.
Fixes: bdd98b892f ("freedreno: New struct packing macros")
Signed-off-by: Rob Clark <robdclark@chromium.org>
The current strategy using the suballocator with fixed size doesn't
scale and causes some programs with large number of vertices (like some
glmark2 scenes) to crash.
Change it to dynamically allocate a separate bo to accomodate for
arbitrary number of vertices.
This also fixes the buffer read/write flags for gp.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>
I feel really bad about this but this one test is flaking. I don't want
to do a mass revert (and bisection is extremely difficult with
nondeterministic/Heisenbugs), but it's Friday night and master needs to
pass. This commit should be reverted asap (once the flake is solved)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This can be used to start/stop statistics capturing from the command
line.
v3:
- Install script (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
By default, if an output_file is specified, the overlay layer will start
capturing data immediately. After this commit, when a control socket is
used, the capture starts disabled by default, and is only enabled when a
command ":capture=1;" is received.
when the capture is enabled, we might have already accumulated some
stats. To avoid capturing such noise, we discard and reset the fps and
stats, updating the display and capturing only data from that point on.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Add support for socket from which the overlay layer can receive
commands. This control socket can be useful to allow setting options
once the application is already running. For instance, triggering the
capture of fps data at a certain point.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Add some infrastructure to trace scheduler decisions. The next patch
will add some more traces, just splitting this out to reduce clutter.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Sometimes sched changes that are a win in terms of instruction count
and/or register pressure, are worse in real life, due to keeping varying
storage locked for too long. Add a shader-db stat to give this more
visibility.
Signed-off-by: Rob Clark <robdclark@chromium.org>
We'll need this so we can allocate a stack for the batch large enough
for all the jobs within it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The size of the scratchpad (as well as some tiler details) depend on the
contents of the batch, so we need to wait to defer filling out the FBD
until after all draws are queued.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The only format that needs swizzle is R8 emulated with L8, so we can get
rid of the SWIZ(X, Y, Z, W) everywhere.
Note: R8G8 also had a swizzle, but it wasn't necessary.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This supports all sRGB formats, without having them in the format table.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The Vulkan spec doesn't have any words for vertex attributes alignment.
Fixes a test failure on GFX6 and a GPU hang on GFX10 with:
dEQP-VK.spirv_assembly.instruction.spirv1p4.entrypoint.tess_con_pc_entry_point
vkpipeline-db results on GFX10:
Totals from affected shaders:
SGPRS: 463772 -> 472972 (1.98 %)
VGPRS: 343208 -> 343752 (0.16 %)
Spilled SGPRs: 323 -> 336 (4.02 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 13806200 -> 14164472 (2.60 %) bytes
Max Waves: 84021 -> 83755 (-0.32 %)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2161
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Our current implementation of performance queries is fairly harsh
because it completely flushes and invalidates the 3d pipeline caches
at the beginning and end of each query. An argument can be made that
this is how performance should be measured but it probably doesn't
reflect what the application is actually doing and the actual cost of
draw calls.
A more appropriate approach is to just stall the pipeline at
scoreboard, so that we measure the effect of a draw call without
having the pipeline in a completely pristine state for every draw
call.
v2: Use end of pipe PIPE_CONTROL instruction for Iris (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was initially intended to fix issues with the query timings going
occassionally high.
It turns out there was a bug in the attribution of OA reports to our
context when parsing the OA data. This led to reports flagged with
other context IDs to be included in our queries results.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were passing cl->bo, which is NULL, so v3d_job_add_bo was a no-op.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Partial depth/stencil clear and skipping unused attachments.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't have an entry for cpp 128 in the tile_alignment table, but I don't
think the HW supports this at all (blob driver just doesn't have 8x msaa).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use a special format which allows sampling the stencil and set the correct
swizzle.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't have layered rendering and ir3 doesn't support this intrinsic, so
just set it to zero for now.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
It looks like the actual tile alignment requirement is less than 32x32, but
in some cases input attachment texture needs 64 alignment.
Reduced the h alignment to 16 to compensate and it seems to work fine.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use DIV_ROUND_UP and stop trying to increase the tile_count width/height
once tile_align_w/tile_align_h are reached.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
pColorBlendState is allowed to be NULL if subpass has >0 color attachments
but they are all unused.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Here we use the NIR based builder to add everything to the resource
list execpt for SSO packed varyings. Since the details of those
varyings get lost during packing we leave the special handing to
the GLSL IR pass for now. In order to do this we add some bools
to the build resource list functions.
Using the NIR based resource list builder gets us a step closer to
using a native NIR based linker. It should also be faster than the
GLSL IR builder, one because the NIR optimisations should mean we
add less entries due to better optimisations, and two because nir
gives us better lists to work with and we don't need to walk the
entire IR to find the resources.
Ack-by: Alejandro Piñeiro <apinheiro@igalia.com>
In a following commit we will use a NIR based builder to build the
OpenGL resource list, so we want to delay this call a little.
Ack-by: Alejandro Piñeiro <apinheiro@igalia.com>
This adds support for adding names of varying to the resource list
which is required for us to use this function with the glsl linker.
Support for names is optional for spirv which is why it had not been
added yet.
This is mostly a copy of the GLSL IR code adapted to nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
In order to be able to implement a NIR based glsl linker we need to
build the program resource list with NIR. This change delays the
remaping so that a later commit can call the NIR based resource
list builder.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It's already enabled for all gallium drivers that support GLSL 1.40 or
above and we already support everything in our compiler on SNB+
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's conceptually independent from the upper part (which is not yet
understood, but for spilling generally remains equal to 0x1e).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Due to this issue we were using 4x the memory we should have for TLS,
which was messing up the size calculations. Oops!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
I'm not totally sure why this would *break* things, but it's certainly
not necessary and it does break things. Somehow this gives the RA more
freedom, fixing some spill issues.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like no_spill decisions to be class-specific -- spilling from
special register to a work register doesn't preclude also spilling that
work register to stack.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to spill two 128-bit values in the same bundle, since we
have two registers we can spill with. This improves the
register allocation flexibility in programs with heavy spilling, though
unfortunately it isn't sufficient (theoretically, 3.5 128-bit values can
be spilled from 3 vector units and 2 scalar units).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This has been unused since the beginning since it's broken. Let's toss
it so it doesn't get in the way of further fixes. Bigger to fish to fry.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We do need some sort of a cost heuristic, but this one is just causing
spilling to behave worse on shaders I'm looking at, and I don't need
more noise in the spill implementation right now.
Get it working first. We can optimize this later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Let's not worry about spilling twice in a bundle; that's too
restrictive. We'll need to change the schedule itself -- unfortunately,
this can have second-order effects due to pipeline registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Instead of having a giant function for both, split into the two
subtasks so we can handle errors better.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We move it to the register allocator itself. It doesn't belong in
midgard_schedule.c!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Should be harmless, but UBSAN complains about it and fills the logs with
noise.
../src/mesa/state_tracker/st_manager.c:523:27: runtime error: member access within null pointer of type 'struct st_framebuffer'"}
#0 0xaad4e89c in st_framebuffer_reference ../src/mesa/state_tracker/st_manager.c:523"}
#1 0xaad4e89c in st_api_make_current ../src/mesa/state_tracker/st_manager.c:1091"}
#2 0xaab69e0e in dri_make_current ../src/gallium/state_trackers/dri/dri_context.c:301"}
#3 0xaab48fd2 in driBindContext ../src/mesa/drivers/dri/common/dri_util.c:581"}
#4 0xb682a122 in dri2_make_current ../src/egl/drivers/dri2/egl_dri2.c:1625"}
#5 0xb67f95a4 in eglMakeCurrent ../src/egl/main/eglapi.c:884"}
#6 0x4c2b0e in tcu::surfaceless::EglRenderContext::EglRenderContext(glu::RenderConfig const&, tcu::CommandLine const&) (/deqp/modules/gles2/deqp-gles2+0x29b0e)"}
#7 0x4c3302 in tcu::surfaceless::ContextFactory::createContext(glu::RenderConfig const&, tcu::CommandLine const&, glu::RenderContext const*) const (/deqp/modules/gles2/deqp-gles2+0x2a302)"}
#8 0x73a9b0 in glu::createRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::RenderConfig const&, glu::RenderContext const*) (/deqp/modules/gles2/deqp-gles2+0x2a19b0)"}
#9 0x73ad86 in glu::createDefaultRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::ApiType) (/deqp/modules/gles2/deqp-gles2+0x2a1d86)"}
#10 0x4c6a78 in deqp::gles2::Context::Context(tcu::TestContext&) (/deqp/modules/gles2/deqp-gles2+0x2da78)"}
#11 0x4c3ba0 in deqp::gles2::TestPackage::init() (/deqp/modules/gles2/deqp-gles2+0x2aba0)"}
#12 0x852fd8 in tcu::TestHierarchyIterator::next() (/deqp/modules/gles2/deqp-gles2+0x3b9fd8)"}
#13 0x829660 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x390660)"}
#14 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#15 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#16 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
../src/mesa/state_tracker/st_atom.c:115:8: runtime error: member access within null pointer of type 'struct st_program'"}
#0 0xaae11a58 in check_program_state ../src/mesa/state_tracker/st_atom.c:115"}
#1 0xaae128f6 in st_validate_state ../src/mesa/state_tracker/st_atom.c:192"}
#2 0xaadc58c2 in prepare_draw ../src/mesa/state_tracker/st_draw.c:132"}
#3 0xaadc58c2 in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184"}
#4 0xabc4f924 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:816"}
#5 0xabc50240 in _mesa_DrawElements ../src/mesa/main/draw.c:970"}
#6 0x73ebd2 in glu::CallLogWrapper::glDrawElements(unsigned int, int, unsigned int, void const*) (/deqp/modules/gles2/deqp-gles2+0x2d4bd2)"}
#7 0x6d86b2 in deqp::gls::FragOpInteractionCase::iterate() (/deqp/modules/gles2/deqp-gles2+0x26e6b2)"}
#8 0x494d16 in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad16)"}
#9 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#10 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#11 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#12 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#13 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
UBSAN complained that when alpha was 255 and we shifted it 24 positions
to the left, it didn't fit in a signed int. That's because bitwise
operations automatically promote to signed int.
../src/gallium/drivers/panfrost/pan_job.c:1130:64: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'"}
#0 0xacf953d6 in pan_pack_color ../src/gallium/drivers/panfrost/pan_job.c:1130"}
#1 0xacf953d6 in panfrost_batch_clear ../src/gallium/drivers/panfrost/pan_job.c:1204"}
#2 0xaae3226a in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:513"}
#3 0x4c3d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
#4 0x828cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#5 0x8295f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#6 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#7 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#8 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Should be harmless, but UBSAN complains about it and fills the logs with
noise.
../src/gallium/auxiliary/util/u_inlines.h:110:8: runtime error: member access within null pointer of type 'struct pipe_surface'"}
#0 0xaaccf186 in pipe_surface_reference ../src/gallium/auxiliary/util/u_inlines.h:110"}
#1 0xaaccf186 in util_copy_framebuffer_state ../src/gallium/auxiliary/util/u_framebuffer.c:105"}
#2 0xaabfb60e in cso_set_framebuffer ../src/gallium/auxiliary/cso_cache/cso_context.c:723"}
#3 0xaae195ce in st_update_framebuffer_state ../src/mesa/state_tracker/st_atom_framebuffer.c:207"}
#4 0xaae12316 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261"}
#5 0xaae31302 in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:438"}
#6 0x4c3d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
#7 0x828cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#8 0x8295f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#9 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#10 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#11 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's undefined behavior UBSAN complains about, so fixing this will
reduce the noise a bit.
../src/compiler/nir/nir_clone.c:710:4: runtime error: null pointer passed as argument 2, which is declared to never be null"}
#0 0xac781be4 in clone_function ../src/compiler/nir/nir_clone.c:710"}
#1 0xac781be4 in nir_shader_clone ../src/compiler/nir/nir_clone.c:740"}
#2 0xacf99442 in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:54"}
#3 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
#4 0xaae326bc in set_fragment_shader ../src/mesa/state_tracker/st_cb_clear.c:135"}
#5 0xaae326bc in clear_with_quad ../src/mesa/state_tracker/st_cb_clear.c:335"}
#6 0xaae326bc in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:518"}
#7 0x494d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
#8 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#9 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#10 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#11 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#12 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
As found by UBSAN, it should be harmless but it's good to remove any UB
so the tool's output is useful.
../src/panfrost/midgard/midgard_schedule.c:1094:9: runtime error: index -1 out of bounds for type 'midgard_instruction *[6]'"}
#0 0xad047872 in schedule_block ../src/panfrost/midgard/midgard_schedule.c:1094"}
#1 0xad04d41a in schedule_program ../src/panfrost/midgard/midgard_schedule.c:1116"}
#2 0xad031f98 in midgard_compile_shader_nir ../src/panfrost/midgard/midgard_compile.c:2588"}
#3 0xacf9874e in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:68"}
#4 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
#5 0xaae2596e in st_update_fp ../src/mesa/state_tracker/st_atom_shader.c:168"}
#6 0xaae12316 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261"}
#7 0xaadc58c2 in prepare_draw ../src/mesa/state_tracker/st_draw.c:132"}
#8 0xaadc58c2 in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184"}
#9 0xabc4f924 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:816"}
#10 0xabc50240 in _mesa_DrawElements ../src/mesa/main/draw.c:970"}
#11 0x73ebd2 in glu::CallLogWrapper::glDrawElements(unsigned int, int, unsigned int, void const*) (/deqp/modules/gles2/deqp-gles2+0x2d4bd2)"}
#12 0x6d86b2 in deqp::gls::FragOpInteractionCase::iterate() (/deqp/modules/gles2/deqp-gles2+0x26e6b2)"}
#13 0x494d16 in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad16)"}
#14 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#15 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#16 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#17 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#18 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tessellator defines own fmin/fmax functions that conflict
with those defined in cmath header. Need to use legacy math.h
which was originally used in MS code.
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
Global load/store instructions can't know if the offset is
out-of-bound because they don't use descriptors (no range).
Fix this by clamping the offset for arrays that are indexed
with a non-constant offset that's greater or equal to the array
size.
This fixes VM faults and GPU hangs with Dead Rising 4.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148
Fixes: 71a6794200 ("ac/nir: Enable nir_opt_large_constants")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Since f9a3d9738b temporary BO_WSI are definitely a thing so we have
an assert wrong.
Take that opportunity to expand a bit on an existing comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f9a3d9738b ("anv: Use BO fences/semaphores for AcquireNextImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
We appear to have got lucky that the only type of temporary fence
payload we could have was a syncobj and that would only happen when
the type of the permanent payload was also a syncobj.
This code was broken if that assumption changed and it did in commit
f9a3d9738b.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
This adds nir encoding for these, generating them from libclc
was very expensive, and this is a lot simpler.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
There is an alignment issue doing this the other way, the
spec clearly says vload/store don't require alignment.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Neither Mutter nor KWin's wayland compositors appear to use modifiers.
In the non-modifier case, iris was still trying to use Y-tiling for
scan-out surfaces, leading to this error:
(gnome-shell:7247): mutter-WARNING **: 09:23:47.787: meta_drm_buffer_gbm_new failed: drmModeAddFB failed: Invalid argument
We now fall back to the historical X-tiling for scanout buffers, which
ought to work everyone, at lower performance. To regain that, we need
to ensure modifiers are actually supported in environments people use.
Fixes: fbf3124771 ("iris: Rework tiling/modifiers handling")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I forgot why that was required, but it still is the correct thing to do.
Hit it at some point when working on implementing more CL features.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Just like we already do in the llvm backend. The current constant buffer code
seems fundamentally flawed and right now we are thinking on how we want to
reimplement all of that.
But until that happens, just treat is as global memory and go on.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The caller is responsible for setting up the ubo_addr_format value as
contrary to shared and global, it's not controlled by the spirv.
Right now clovers implementation of CL constant memory uses a 24/8 bit format
to encode the buffer index and offset, but that code is dead as all backends
treat constants as global memory to workaround annoying issues within OpenCL.
Maybe that will change, maybe not. But just in case somebody wants to look at
it, add a toggle for this inside vtn.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The current code looks like a typo, and fails if the usage_mask
is for a y/z enabled input.
Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer
with llvmpipe/nir
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The previous fix worked when the second channel wasn't exposed, but
a couple of piglit tests have inputs with just the y/z chans, no x/w.
Partly Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer
with llvmpipe/nir
Fixes: 5363cda52b ("gallivm: add swizzle support where one channel isn't defined.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Including non-functional changes to get the value from the fd_reg_pair
in places.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
All developers now use gitlab, don't confuse newcomers by suggesting
they might use the mailing list. We want everyone to use gitlab so
that patches get run through basic CI before they are merged.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support
to be actually usable. However, it doesn't strictly require that we
support any external handle types. We should be able to advertise 1.1
even on old kernels that don't have syncobj support.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we
don't, we only have BO waits and those aren't quite as nice. This
commit adds a flag to _anv_queue_submit to wait for the queue to drain
before returning. This gives us the behavior we need to implement
DeviceWaitIdle.
Fixes: 246261f0ad "anv: prepare the driver for delayed submissions"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise we may lower some fdot to fdph which is not implemented in pp.
Fixes#2126
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Nir serializes uses nir_ssa_alu_instr_src_components in a few places to
determine how many components a src has, but that's not what this function
returns. It simply returns how many channels are used, which is still fine
for most of the code.
This was breaking code like this:
vec16 32 ssa_1 = intrinsic load_global
vec1 32 ssa_2 = fmax ssa_1.a, ssa_2.b
v2: make the 16bit encoding work for identify swizzles again
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is correct per the Vulkan spec format equivalence table.
Fixes: f36b52740a "radv/android: Add android hardware buffer queries."
Reviewed-by: Eric Anholt <eric@anholt.net>
Sometimes it's useful to get information about GPU faults in the
console, so it's synchronized with other messages.
This commit will cause Mesa to wait for completion and check if there
are any faults raised by the GPU.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A lot of the brw_*_prog_key fields are for emulating features on legacy
hardware that iris doesn't support. In particular, all of the texture
swizzle fields take up a lot of space. These dead fields make hashing
the shader keys more expensive than it ought to be.
We introduce iris-specific keys with only the information we need, and
translate them to brw keys when actually compiling new variants. This
way, key comparisons can use the small keys. The size reductions are:
VS: 328 bytes -> 8 bytes
TCS: 312 bytes -> 24 bytes
TES: 304 bytes -> 24 bytes
GS: 284 bytes -> 8 bytes
FS: 304 bytes -> 16 bytes
CS: 280 bytes -> 4 bytes
Scores for the Piglit drawoverhead microbenchmark case with a shader
program change improve by roughly 30%.
Reviewed-by: Eric Anholt <eric@anholt.net>
macOS does not have pthread_getcpuclockid.
src/util/u_thread.h:156:4: error: implicit declaration of function 'pthread_getcpuclockid' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pthread_getcpuclockid(thread, &cid);
^
Fixes: 4913215d14 ("util/u_thread: don't restrict u_thread_get_time_nano() to __linux__")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2171
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
This gets us shared non-UBWC layout code between gallium and turnip.
Until I fix up the rest of gallium to handle UBWC mipmapping, we do the
single-level UBWC setup in gallium as a fixup after layout.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We pass in all the parameters for setting up the layout, though freedreno
still sets a few of them up early (since it uses layout helpers in making
some decisions about the layout setup parameters that will be cleaned up
once krh's blitter work lands).
This lets us start using some of the fdl_* helpers and have more obviously
matching code between gallium and turnip. We can't yet use the fdl_* UBWC
helpers, since the gallium driver doesn't do UBWC mipmaps (which I'm
working on in another branch).
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
It's the same logic for each of these being emitted, and I was about to
change the rsc->layout.* for UBWC.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We can just bake the UBWC-goes-first delta into the slices at setup time.
I did have to fix up the resource shadowing swap path to swap the slice
fields, as it was missing and regressed the format reinterpets otherwise.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.
iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone, So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
low-level implementation of INTEL-performance-query APIs in
Intel iris driver. Most of functions and procedures defined here
are adopted from i965 driver (brw_performance_query.c)
v2: - replace genX_init_performance_query with
iris_init_perfquery_functions which is gen's version agnositic
- general code clean-up
v3: include gen_perf_gens.h as some of defines were moved to this new
header file
v4: - checking for kernel 4.13+ won't be needed here as Iris won't be
loaded anyway without DRM_SYNCOBJ that is enabled after Kernel
4.13.
- checking whether gen < 8 or is_cherryview won't be required as
well because those cases are screened in iris_screen_create.
v5: remove genX(init_performance_query)
v6: - remove oa_metrics_kernel_support as iris works only with kernel
4.18 and newer.
- use perf functions defined in separate file, iris_perf.h/c
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The configuration of the gen_perf vtable will be the same for
INTEL_performance_query and AMD_performance_monitor.
Initialize the table in a single routine that can be called from both
implementations.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
new state tracker APIs added for INTEL_performance_query
This extension is enabled if all vendor specific functions for it
exist.
v2: add st_cb_perfquery.* to the list of sources in Makefile
v3: minor code clean-up
v4: - add driver hooks for intel-performance-query apis
- add PIPE level performance counter and type enums that
match to OpenGL enums
- do conversion of pipe_perf_counter_type and
pipe_perf_counter_data_type enums to GL defines in state_tracker
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood. I didn't see any improvements
at out-of-the-box power management settings, however.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
TCCNTLREG contains additional cache programming settings. In
particular, there are several write combining controls we'd like to use.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This makes simple_mtx_destroy set the counter to an invalid canary
value and then makes lock/unlock assert that the value is legal.
That way, calling lock/unlock after destroy will assert fail,
rather than deadlocking or potentially even working.
This has caught real deadlocks in dEQP multithreaded tests (in st/mesa
shader variant zombie list handling), which have since been fixed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
In particular, we need to invalidate the LRZ state when we cannot be
confident in what the Z state would be during rendering:
1) depth test modes not supported by LRZ
2) stencil test, which would require full rasterization and stencil
test in the binning pass (whereas LRZ normally just needs to
determine the min and max z value in an 8x8 quad)
Signed-off-by: Rob Clark <robdclark@chromium.org>
This was reverted needlessly because if was part of another series.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.
Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This has no real effect other than the names of the temporary files in
the build folder.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.
Found by inspection, doesn't fix anything known.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The new callback is called right before the flush is done to allow
users of st->flush to do some work after all the previous work has
been flushed.
This will be used by dri_flush in the next commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When using 3 planes, the sequence produces this chain:
plane0 -> plane2
This commit fixes this to produce:
plane0 -> plane1 -> plane2
Fixes: 86e60bc265 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
instead of keeping the IR indefinitely in st_vp_variant.
This trivially fixes Selection/Feedback/RasterPos for NIR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
gallivm receives these opcodes anyway because st_draw_feedback.c uses
shaders that were assembled for drivers, not llvmpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With 781a78 ("mesa: enable ARB_direct_state_access in compat for
GL3.1+), it's possible to have DSA with GL3.1+.
FTL creates a GL3.1 compat context, but fails the
_mesa_has_geometry_shaders(..) check in frame_buffer_texture.
Bump the compat version to pass the check.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We were iterating over the entire 32-entry array each time, when we
can just use a bitset to know that we're only uploading from the first
entry normally.
Knocks ir3_emit_user_consts down from ~.5% of CPU to .1% on WebGL
fishtank.
Reviewed-by: Rob Clark <robdclark@chromium.org>
The default is to not throw GL errors when drawing with mapped
buffers, but we were forcing it on for unclear reasons. Internally we
keep all our buffers mapped anyway, so it should be a no-op other than
reducing CPU overhead (.23% in a perf report for WebGL fishtank)
Reviewed-by: Rob Clark <robdclark@chromium.org>
u_decomposed_prims_for_vertices cannot support POLYGON, but POLYGON is
trivial to support as a special case directly (since we have the number
of vertices directly).
Fixes aborts in Panfrost in apps using GL_POLYGON.
Fixes: e881aa8c12 ("gallium/util: Add u_stream_outputs_for_vertices helper")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Revewied-by: Eric Anholt <eric@anholt.net>
Also removing the FIXME comments after matching the numbers with
updated documentation.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Assume that resource is tiled if we get DRM_FORMAT_MOD_INVALID
in resource_from_handle() and we don't have RO.
Fixes: 8c12f4e5f2 ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This patch also disables AMD_shader_ballot on GFX7 by default if ACO is used.
Note that shader_ballot works correctly, but performance seems inferior.
To enable shader_ballot use RADV_PERFTEST=shader_ballot.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
ACO writes an unused 3rd operand for internal usage
which makes LLVM recoginize it as illegal instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
It's a very odd case to hit in the real world. However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.
Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.
Fixes: 3119b96bdf "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733 "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
With the addition of the planar formats helper, the
planar formats no longer have a valid block.bits field.
Calling util_format_get_blocksize therefore asserts.
Reorder the check to see if the format is supported
before doing the query to get the blocksize.
Fixes: 20f132e5ef ("gallium/util: add planar format layouts and helpers")
Signed-off-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Multi-planar surfaces are allowed to have modifiers. Don't require
DRM_FORMAT_MOD_INVALID in order to create a surface for each plane
defined by the format.
Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This format will be used to properly handle planar images with modifiers
in iris.
Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The commit noted below assumed and enforced that DRM_MOD_INVALID was the
only valid modifier for multi-planar imported images. Due to that, it
required that modifier on multi-planar images to:
1. Allow multiple planes.
2. Perform YUV format lowering and extent adjustments.
3. Use buffer_index to correctly map the given planes.
Fix these issues by removing or updating the code built on that
assumption.
Fixes: 2066966c10 ("gallium/dri2: Support creating multi-planar modifier images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of doing a dummy submit on the command buffer for the fence or a
dummy semaphore and trusting in implicit sync, this commit moves us to
take advantage of implicit sync and just use the WSI image BO as the
fence. Both semaphores and fences require a tiny bit of extra plumbing
to do this but the result is that we can get rid of a bunch of the extra
synchronization we're doing today.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In 83b943cc2f, we started making all VkDeviceMemory BOs resident all
the time. One unfortunate side-effect of this is that every
vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which
means that X server or Wayland compositor, instead of waiting on the
last vkQueueSubmit to actually write the buffer, now waits on the last
vkQueueSubmit to from that driver instance relative to whenever the
compositor's GL driver instance calls execbuf. This potentially leads
to a lot of extra synchronization that we didn't intend to have.
Instead, this commit makes it so that we leave WSI memory objects with
EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and
set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of
vkQueuePresent. This should hopefully result in tighter integration
with the compositor, lower latency, and better performance.
Testing with DOOM 2016, this seems to reduce latency by at least a frame
if not two and makes the game much more responsive. Testing was,
however, subjective, so we don't have any hard data on that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise, we're trusting in the execbuf_add_bo which sets
EXEC_OBJECT_WRITE to to always be the first one that gets called. This
is likely true for fences but it seems somewhat fragile.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us treat the implicit synchronization that we need for X11 and
Wayland like a semaphore. Instead of trusting the driver to somehow
figure out when that memory object needs to be signaled, we provide an
explicit point where the driver can set EXEC_OBJECT_WRITE and signal the
dma_fence on the BO. Without this, we have to somehow track inside the
driver when WSI buffers are actually used to avoid extra synchronization
dependencies.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's not a tiler specific initialization; it's a generic GPU-side write
primitive that may be used for tiler reset on midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Only Polaris10 is tested at the moment, and I disabled a TON of
tests to keep a CTS run within 5 minutes because my local runner
is a bit slow. A full CTS run takes more than 1h, which means it
will hit the timeout.
RADV CI can only be triggered manually on personal branches to
avoid breaking the world because one runner is definitely not
enough. This will allow us to test it until it's stable enough
to be enabled by default.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
This requires to bump LLVM to 8 because it's the minimum supported
version by RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Use new rules: instead of only:
For container stage jobs:
* In the main Mesa project, run them by default.
* In merge requests, run them by default if any files affecting pipeline
results are changed.
* In all other cases (in particular branches in personal projects),
don't run them by default but allow triggering them manually.
build & test stage jobs are left at the default (when: on_success), so
they will run automatically once all their dependencies are satisified.
(Using the same rules as above would require these jobs to be manually
triggered as well, which is only possible once all dependency jobs have
passed) Please be considerate of CI runner resources and cancel unneeded
jobs on personal branches with no corresponding merge requests (this can
be done before the jobs start running).
In summary: No more special branch names. Unnecessary job runs are
avoided by default, but jobs which don't run by default can be triggered
manually.
v2:
* Split out LAVA changes to separate commit
* Clarify commit log a little, in particular WRT build/test stage jobs
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> # v1
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> # v1
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> # v1
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Having different policies could have some weird results, e.g. changes
only touching documentation (where the intention is not to run the
pipeline by default) would still create a pipeline with the LAVA jobs
running by default.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes the deqp fails in:
dEQP-VK.pipeline.sampler.*border*
(minus 1d array/d24 cases which fail for other reasons)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Two things:
* Texture/sampler pointers aligned to the size of texture/sampler state
* Returning errors instead of crashing on OOM
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
To use with texture states that need alignment (texconst, sampler, border)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Iterate the system values list when adding varyings to the program
resource list in the NIR linker. This is needed to avoid CTS
regressions when using the NIR to build the GLSL resource list in
an upcoming series. Presumably it also fixes a bug with the current
ARB_gl_spirv support.
Fixes: ffdb44d3a0 ("nir/linker: Add inputs/outputs to the program resource list")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The main and Gallium implementations were recently merged, and the
align2 parameter in the Gallium one is in bits. execmem.c expected
bytes still. This led to every call here asserting.
Fixes: b6fd679a9e("mesa/main/util: moving gallium u_mm to util, remove main/mm")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
If the runner has a HW device that would be supported, even without
/dev/dri forwarded into the container, it will be enumerated and the tests
on llvmpipe fail with (for example):
libEGL warning: Not allowed to force software rendering when API explicitly selects a hardware device.
libEGL warning: MESA-LOADER: failed to open i965 (search paths /builds/anholt/mesa/install/lib/dri)
Given that we can't necessarily control the DRI devices present on the
runners (particularly for developers bringing their own runners to reduce
the demands on fd.o's shared resources), just skip these tests in CI.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
The primary difference between the KHR and EXT versions of the extension
is that the KHR provides the address at AllocateMemory time for replay
so we can replay it safely without moving to a sparse address model.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This function has a lot of possible extensions and some of them we can
easily handle on-the-fly so it's easier to just have a loop than to find
each structure manually.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When a BO is flagged as having a client visible address, we put it in
its own heap. We also support the client explicitly specifying an
address in said heap. If an address collision happens, we return false
from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This new function lets you request to remove a specific address range
from the allocator. It returns true on success and leaves the allocator
unmodified and returns false on failure. It doesn't need to return an
offset because, if it succeeds, the offset passed in is the allocated
offset.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already have a mechanism for specifying that we want a fixed address
provided by the driver internals. We're about to let the client start
specifying addresses in some very special scenarios as well so we want
to pass this through to the allocation function.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Our VMA allocations are really independent from the memory heaps we
expose via the API. The only thing that really matters is the GTT size
so we can make the high heap the right size.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
util_vma_heap_alloc will already return 0 if it doesn't have enough
space. The only thing the vma_*_available tracking was doing was
preventing us from allocating too much on any given heap. Now that
we're tracking that in the heap itself, we can drop these.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're already tracking the amount of memory used in each heap. This
commit just makes us start rejecting memory allocations if the heap
would grow too large.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I think the reason why we only do this for primaries is that we didn't
expect to have blorp calls in secondaries. However, you are allowed to
have a full render pass in a secondary command buffer so resolves and
clears can end up in there. We should just always invalidate.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In ee77938733, we started using the BO cache for anv_bo_pool and
stopped using the bo_flags parameter. However, we never dropped it from
the struct or the init function.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
strip() removes leading and trailing newlines, but leaves newlines
between multiple lines in the string. This could cause failures when
comparing the output of cross-compiled Windows binaries (producing
Windows-style newlines) to the expected output with Unix-style newlines.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
vl functions moved from radeonsi to gallium/auxiliary/vl have left
android build of radeonsi in broken state.
libmesa_galliumvl static is need to build readeonsi,
gallium_dri building rules are reworked to avoid multiple symbols
and libmesa_galliumvl static dependency is needed in radeonsi.
Here is the changelog:
- android: gallium/auxiliary: add libmesa_galliumvl static
- android: gallium_dri: move libmesa_gallium to static to prevent multiple symbols
- android: radeonsi: fix build after vl refactoring
Fixes the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_uvd.c:47:
error: undefined reference to 'vl_video_buffer_create_as_resource'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 86e60bc ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since compute shares the FS state with graphics, we have to re-upload the
pipeline state when switching between compute dispatch and graphics draws.
We could potentially expose graphics and compute as separate queues and
then we wouldn't need pipeline state management, but the closed driver
exposes a single queue and consistency with them is probably good.
So far I'm emitting texture/ibo state as IBs that we jump to. This is
kind of silly when we could just emit it directly in our CS, but that's a
refactor we can do later.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
I tripped over this during CS enabling when my program BO wasn't set up.
Easier to debug this way than the kernel telling us a 0 handle is invalid.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
The loop over the pipelines to create (and the failure handling) was
noisy, and the stub for compute setup looked nicer to me.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This is enough to pass
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_buffer.fragment.single_descriptor.*
with fragmentStoresAndAtomics set, and thus to be able to start working on
compute. I haven't enabled that flag yet, because it also implies image
load/store support, which I haven't filled in.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
The spec requires unused uniform block to be set as active in the
program resource list. To support this we tell opt dead code not to
remove them. However we can mark them as unused internally and
avoid unnecessarily state changes.
This change is also required for the folowing clean-up patch.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is where all the other uniform values are populated so it
makes much more sense here. Moving it will also allow us to better
share code between the NIR and GLSL IR resource list builders.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Without looking at the assembly or something, I'm not sure what the
compiler does here. The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t. That's the only reason I
could think that comparing with -1 would be always true.
This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.
src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1515 | return *type != -1;
| ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1518 | return *type != -1;
| ^~
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
This is for state commands like CmdSetViewport that can be used outside of
a renderpass. Accumulating those into draw_cs outside of the renderpass
should have the desired effect.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Somehow adjusting maxloc based on existing outputs got lost, resulting
in the clipdist varying clobbering the position varying. Causing a
shader that had no position output in freedreno/ir3, which triggers GPU
hangs in neverball.
Fixes: d0f746b645 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The logic to ensure VS and BS inputs are aligned wasn't accounting for
unused inputs in VS. This *usually* doesn't happen, but it seems it
can in the case of ARB programs?
Fixes assert:
```
fd6_program_create: Assertion `bs->inputs[i].regid == vs->inputs[i].regid' failed.
```
Fixes: 882d53d8e3 ("freedreno/ir3+a6xx: same VBO state for draw/binning")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Fixes crashes that were unnoticed in CI because debug_assert() was not
enabled (but become real crashes after the next patch):
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_highp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_lowp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_mediump_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_highp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_lowp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_mediump_geometry
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The following programming note shows up in all 3DSTATE_CONSTANT_*
packets:
"The sum of all four read length fields must be less than or equal to
the size of 64."
The backend compiler should guarantee this for us, so let's just add a
check here.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).
There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.
v2:
- Rebased on top of the lasted changes from Jason.
- Added review suggestions by Caio.
- Removed struct push_bos and merged some code into
anv_nir_compute_push_layout().
v3:
- Remove code churn due to gen8+ workaround in
anv_nir_compute_push_layout(). This code has been removed in an earlier
commit, and implemented in cmd_buffer_emit_push_constant().
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Add a helper function to get the push range address. Once we have a
separate function for emitting gen12 push constants, we can use this
helper and avoid duplicating code.
v3: Do not add range->start to the address in gen7 (Caio).
v4: Do not drop range->start from gen7 (Caio, Jason).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Store push_ranges in ascending order, and only "shift" them to the end
of the array during state packet emission.
We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet.
So instead of applying the workaround here just for GEN < 12 (which
requires and extra loop through all the ranges to figure out if we
should shift them or not), we simply move the whole logic to the state
emission code. At that point, in a later commit, we are already looping
through all of the ranges anyway to check which packet we will be using,
so we might as well implement the workaround there, where it is going to
be used.
v3: Move gen8+ workaround to the state emission code (Caio).
v4: Add explanation of why we moved the workaroudn (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).
There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.
v2 (Suggestions from Caio):
- use max_length instead of large_buffers.
- remove UNUSED and use #if GEN_GEN >= 12 instead.
- inline "buffers" and drop BITSET_RANGE() usage.
- add assert(n <= max_pointers)
- move emit to outside of the loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Split into a function the logic to gather the push constant buffers,
which now stores them in struct push_bos. Another function is added to
emit the packet, using data from the push_bos struct.
This will be useful when adding a new function for emitting push
constants for newer platforms.
v2 (Suggestions from Caio):
- rename 'n' -> 'buffer_count'
- remove large_buffers (for now)
- initialize push_bos
- remove assert
- change for() condition (i <= 3 -> i < 4)
v3:
- Add comment about size limit.
- Rework "shift" logic and 'for' loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
In blorp, all the push constants are disabled, so we only need to emit a
single 3DSTATE_CONSTANT_ALL with the bitmask for stage update
appropriately set.
v2: Update comment (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
These bits are ignored when clearing so don't bother setting them.
Note: MSAA samples when clearing comes from other registers (tu6_emit_msaa)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Passes these deqp tests: dEQP-VK.api.image_clearing.core.*attach*single*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easier to find the gmem_offset associated with an attachment.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we have tiled format modifier merged into linux we can enable tiling.
That should improve overall performance and also workaround broken mipmapping
for linear textures since now we prefer tiled textures.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Patch adds additional linker check for SSO programs to make sure they
are redeclaring built-in blocks as required by the desktop spec.
This fixes following Piglit tests:
arb_separate_shader_objects/linker/pervertex-*
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Commit also updates the Piglit quick_gl.txt, list modifications happened
due to following Piglit commits: c248bf201,c acff58ca, 5603e2e60.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Several places in ACO we use SOP1 or SOP2 instructions to operate over the
exec mask or VCC, and these need to be adapted to the new size in wave32
mode.
This commit adds a way to deal with this problem in aco_builder: the caller
can specify a wave size specific opcode and the builder will translate that
to the correct opcode based on the current wave size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32
instruction is superfluous.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
And only use --process-isolation false for the quick_gl tests.
This will hopefully avoid variance in the test results that we've been
seeing lately. But even if it doesn't, it should at least help narrow
down the cause of the variance.
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This is a more accurate description of what happens in processing the
OA reports.
Previously we only had a somewhat difficult to parse state machine
tracking the context ID.
What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :
* whether the r0 is tagged with the context ID relevant to us
* if r0 is not tagged with our context ID and r1 is: does r0 have a
invalid context id? If not then we're in a case where i915 has
resubmitted the same context for execution through the execlist
submission port
v2: Update comment (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.
We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
My fix wasn't totally correct as pointed out by Marek.
Ported from RadeonSI.
Fixes: deafe4cc58 ("radv/gfx10: fix primitive indices orientation for NGG GS")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In d1c4e64a69, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push. Iris doesn't want to push anything so it gives a bogus number
of parameters and trusts the back-end compiler to dead-code all of them.
Now that we can tell the back-end compiler to stop re-arranging things,
delete the hack and enable the new simpler code path.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reject the new formats in swr to prevent crashes because it doesn't
know how to handle the new formats.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This wires up the front facing value as a sysval, I'd like to
remove the other facing code but I'd need to confirm VMware
don't use it first.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes dEQP-GLES3.functional.primitive_restart.*. Note the 0x18000 value
is accidentally somehow enabling primitive restart for some reason.
I'm not sure where this value came from but let's not.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
The algorithm is as described. Nothing fancy here, just need to add some
new code paths depending on which model we're running on.
Tomeu:
- Also disable tiling when !hierarchy and !vertex_count
- Avoid creating polygon lists smaller than the minimum when
vertex_count > 0 but tile size smaller than 16 byte
- Take into account tile size when calculating polygon list size for
!hierarchy
- Allow 0-sized tiles in a single dimension
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We've figured out most of the big pieces, and though it looks faintly
like other Midgards, it's much simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Similarly to how it's already done in the compiler, add a way to express
differences between GPU models that need to be taken into account when
assembling the cmdstream.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq
should only return Inf when fsqrt returns 0.
The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.
An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max. That's more explicit, but we may also want to
have specific bit encodings of float values later. I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.
total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.
Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into
another instruction as either source or destination modifiers.
Counting them as instructions means that some if-statements won't get
converted to selects. For example,
vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x
/* succs: block_1 block_2 */
if ssa_25 {
block block_1:
/* preds: block_0 */
vec1 32 ssa_26 = fabs ssa_24
vec1 32 ssa_27 = fneg ssa_26
vec1 32 ssa_28 = fabs ssa_20
vec1 32 ssa_29 = fneg ssa_28
vec1 32 ssa_30 = fmul ssa_27, ssa_29
vec1 32 ssa_31 = fsat ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
block_1 isn't really 6 instructions, but it will be counted that way.
Most callers of the peephole_select pass use either 1 or 8. It's very
easy to blow way past either of these limits with things that are really
only one or two actual instructions.
I also tried some fancier things like making sure the fsat was of
another SSA def from the same block, but the simple test was actually
better.
The i965 back-end SEL peephole pass still helps ~700 shaders in
shader-db with this change.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14743694 -> 14738910 (-0.03%)
instructions in affected programs: 156575 -> 151791 (-3.06%)
helped: 1204
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3
helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55%
95% mean confidence interval for instructions value: -4.12 -3.82
95% mean confidence interval for instructions %-change: -5.35% -4.95%
Instructions are helped.
total cycles in shared programs: 231749141 -> 231602916 (-0.06%)
cycles in affected programs: 2818975 -> 2672750 (-5.19%)
helped: 876
HURT: 322
helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220
helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44%
HURT stats (abs) min: 1 max: 1188 x̄: 38.27 x̃: 20
HURT stats (rel) min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70%
95% mean confidence interval for cycles value: -130.47 -113.64
95% mean confidence interval for cycles %-change: -14.85% -12.72%
Cycles are helped.
total sends in shared programs: 730495 -> 730491 (<.01%)
sends in affected programs: 46 -> 42 (-8.70%)
helped: 2
HURT: 0
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122757 -> 8122617 (<.01%)
instructions in affected programs: 14716 -> 14576 (-0.95%)
helped: 46
HURT: 1
helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3
helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59%
95% mean confidence interval for instructions value: -3.42 -2.54
95% mean confidence interval for instructions %-change: -3.28% -1.62%
Instructions are helped.
total cycles in shared programs: 188510100 -> 188509780 (<.01%)
cycles in affected programs: 58994 -> 58674 (-0.54%)
helped: 32
HURT: 1
helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6
helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68%
95% mean confidence interval for cycles value: -16.34 -3.06
95% mean confidence interval for cycles %-change: -2.46% -0.15%
Cycles are helped.
pthread_getcpuclockid() and clock_gettime() are also available on at least
OpenBSD, FreeBSD, NetBSD, DragonFly, Cygwin.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Enabling this option makes Intel Gen8-11 hardware load the 'iris'
driver by default instead of the older 'i965' driver.
Regardless of how this option is set, users can still override which
driver the loader selects via two methods. The first is to create a
~/.drirc or /etc/drirc file with the following snippet:
<driconf>
<device driver="loader" kernel_driver="i915">
<option name="dri_driver" value="i965" />
</device>
</driconf>
The other option is to set an environment variable:
export MESA_LOADER_DRIVER_OVERRIDE=i965
For now, "prefer_iris" defaults to i965 (the historical choice).
A separate future patch will change the default driver to iris.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1893
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Improves generated code of dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
because a loop exit phi can then be fixed to exec, removing copies and
improving jump threading.
No pipeline-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO considers discards jumps and creates edges in the CFG for them but NIR
does neither of these.
This can be fixed instead by keeping track of whether a side of an IF had
a break/discard, but this doesn't solve the issue with discards affecting
loop exit phis. So this reworks phi handling a bit.
Fixes these tests:
dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
dEQP-VK.graphicsfuzz.loop-call-discard
dEQP-VK.graphicsfuzz.complex-nested-loops-and-call
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Right now there are two copies of mm:
* mesa/main/mm.[ch]
* gallium/auxiliary/util/u_mm.[ch]
At some point they splitted, and from the commit message it was not
clear why it was not possible to have only one copy at a common place.
Taking into account that was several years ago, Im assuming that it
was not possible then.
This change would allow to have one copy of the same code, and also
being able to use that code out of mesa/main or gallium, if needed.
This commit moves u_mm and removes mm, as u_mm has slightly more
changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This exposes GL_TDFX_texture_compression_FXT1 support. It's ancient,
only Intel GPUs appear to support it, and I seriously doubt anybody
uses it. But i965 supports it, and it's trivial to do, so we may as
well support it in the new iris driver as well.
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric recently added PIPE_FORMAT_FXT1_RGB[A] as part of his format
unification work. This was really most of the work of implementing
the extension. We just need to handle it in a couple of places and
expose the extension.
v2: Reject the new formats in llvmpipe_is_format_supported to prevent
crashes because it doesn't know how to handle the new formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
This allows this pass to be run multiple times and the results are
just or'ed together.
It fixes on test on llvmpipe nir, and regresses none.
Suggested by Kenneth
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This shouldn't introduce any functional changes for RadeonSI
when NIR is enabled because these operations are already lowered.
pipeline-db (NAVI10/LLVM):
SGPRS: 9043 -> 9051 (0.09 %)
VGPRS: 7272 -> 7292 (0.28 %)
Code Size: 638892 -> 621628 (-2.70 %) bytes
LDS: 1333 -> 1331 (-0.15 %) blocks
Max Waves: 1614 -> 1608 (-0.37 %)
Found this while glancing at some F12019 shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The reference guide is incorrect and SADDR is actually used with FLAT on
GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
LLVM and the proprietary compiler seem to do this
Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The blob driver does something like this for all vertex formats:
if (normalize) {
if (OPENGL_ES30)
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_SIGN_EXTEND;
else
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_ON;
} else {
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_OFF;
}
As there is no way to get to that information in gallium we always
assume OPENGL_ES30.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
brw_performance_query_metrics.h was removed in
134e750e16 and
brw_performance_query.h was removed in
8ae6667992
remove reference to these files from Makefile.sources
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Fixes: 8ae6667992 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We must reset the damage info of our render targets here even though a
damage reset normally happens when the DRI layer swaps buffers. That's
because there can be implicit flushes the GL app is not aware of, and
those might impact the damage region: if part of the damaged portion
is drawn during those implicit flushes, you have to reload those areas
before next draws are pushed, and since the driver can't easily know
what's been modified by the draws it flushed, the easiest solution is
to reload everything.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 65ae86b854 ("panfrost: Add support for KHR_partial_update()")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Coverity doesn't know that we always have coordinates if we have lod. To
avoid annoying errors, let's just zero-initialize this.
CoverityID: 1455202
Reviewed-by: Dave Airlie <airlied@redhat.com>
Same story as the previous two commits; these functions dereference the
memory they are pointed at. We can't do that.
CoverityID: 1455180
Reviewed-by: Dave Airlie <airlied@redhat.com>
Similar to the previous commit, pipe_resource_reference also dereference
the memory pointed at. Let's avoid it.
CoverityID: 1455198
Reviewed-by: Dave Airlie <airlied@redhat.com>
zink_render_pass_reference will dereference the memory 'dst' points at,
which can't really go well. All we want to do here is to increase the
reference-count, so let's use a different helper for that instead.
CoverityID: 1455200
Reviewed-by: Dave Airlie <airlied@redhat.com>
destroy_fence doesn't handle NULL-pointers gracefully. So let's avoid
hitting that code-path, by simply returning NULL early here instead.
CoverityID: 1455179
Reviewed-by: Dave Airlie <airlied@redhat.com>
It seems I had some fat fingers when writing this function, and I
accidentally ended up allocating a new query and immediately trying to
delete an uninitialized pool instead of just deleting the pool of the
query that was passed.
CoverityID: 1455196
Reviewed-by: Dave Airlie <airlied@redhat.com>
When I changed to heap-allocated sampler-objects, I missed the code-path
that restores sampler-states after the blitter; it needs an array of
pointers, not an array of VkSampler objects to behave.
This fixes spec@arb_texture_cube_map@copyteximage for me.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5ea787950f ("zink: heap-allocate samplers objects")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Vulkan only allows power-of-two sample counts. We already kinda checked
for this, but forgot to validate the result in the end. Let's check the
result and error properly.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This drops all the old documentaion around applying for push access.
Also this removes the documentation stating that you can push
directly to mesa rather than using merge requests.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1969
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Was totally broken ...
Removed two if(point) {} because point is always non-NULL and we
were counting on that already for counting, since we NULL our
references to semaphores without active point earlier.
Fixes: 4aa75bb3bd "radv: Add wait-before-submit support for timelines."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2137
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour. On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.
This occurs in amdgpu_winsys_create() after
cb446dc0fa
winsys/amdgpu: Add amdgpu_screen_winsys
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
They were out of sync. Besides syncing, lets ensure they never diverge
again.
Fixes: 8d2654a419 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Split out the logic for exclusive scans into a separate function
that makes clear what it does instead of having this opaque 60
line if.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
To avoid generating invalid LLVM IR when both operands don't have
the same type. This might happen when performing pointer comparisons
with SPIRV 1.4.
Fixes invalid LLVM IR for:
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.variable_pointers_ssbo_equal
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrnotequal.variable_pointers_ssbo_not_equal
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This adds the hooks between llvmpipe and the gallivm NIR
code, for compute and fragment shaders.
NIR support is hidden behind LP_DEBUG=nir for now until
all the intergration issues are solved
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This transforms the NIR shaders like the TGSI transforms worked.
v2: fix some nir info requirements, use 32-bit bools
Acked-by: Roland Scheidegger <sroland@vmware.com>
This add the initial implementation of the NIR->LLVM conversion
for llvmpipe NIR support.
v2: lower bool to int32 in nir not llvm
Acked-by: Roland Scheidegger <sroland@vmware.com>
This is a port of the old radeonsi code to be used for llvmpipe NIR support.
Once we remove TGSI support from llvmpipe (I can dream? :-), then
we should be able to refine most of this down and remove it.
v2: port to later radeonsi code for vertex inputs and sampler/io parsing.
Acked-by: Roland Scheidegger <sroland@vmware.com>
When drawing the main character in Shadow of Mordor, the game appears
to draw Talion with one vertex shader, and the Wraith with another.
If the compiler optimizes those in different ways which lead to slight
imprecisions, then the resulting positions may not line up, leading to
Z-fighting occurring as the game decides which of the two are in front.
brw_nir_opt_peephole_ffma looks at usages of multiply adds across the
entire shader, and may make different decisions between the two, leading
to such imprecisions and Z-fighting. This started happening recently
after a NIR change to eliminate unnecessary MOVs (7025dbe7), but that
change simply exposed the existing problem.
Improves performance on Skylake GT4e by 1.22945% +/- 0.398672% (n=3),
likely due to the fixed rendering.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1985
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Many applications use multi-pass rendering and require their vertex
shader position to be computed the same way each time. Optimizations
may consider, say, fusing a multiply-add based on global usage of an
expression in a shader. But a second shader with the same expression
may have different code, causing that optimization to make the other
choice the second time around.
The correct solution is for applications to mark their VS outputs
'invariant', indicating they need multiple shaders to compute that
output in the same manner. However, most applications fail to do so.
So, we add a new driconf option - vs_position_always_invariant - which
forces the gl_Position output in vertex shaders to be marked invariant.
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
They're not implemented, and not critical to bring up immediately. Avoids
failures in the CTS when nothing gets written to the query.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will make it easier to look at details of failed / skipped tests.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we're not reporting test results as JUnit anymore, we can use the
default JSON format.
This affects how test results are summarized, update the reference files
accordingly.
Reviewed-by: Eric Anholt <eric@anholt.net>
It was basically useless in this form, and processing the JUnit data in
the GitLab backend was pretty expensive.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We were always ensuring a minimum size of 4 bytes for uniforms
for the case where we don't have any, to account for hardware pre-fetching
of the uniform stream, however, pre-fetching could also lead to to out
of bounds reads when have read the last uniform in the stream, so we
probably want to have the extra 4 bytes to prevent the kernel from
observing invalid memory accesses when the uniform stream sits right at
the end of a page.
This seems to fix MMU exceptions reported with a Linux 5.4 kernel.
Credit goes to Phil Elwell for identifying the problem and narrowing
it down to memory accesses in the uniform stream.
Reported-by: Phil Elwell <phil@raspberrypi.org>
Tested-by: Phil Elwell <phil@raspberrypi.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.
Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add missing required bits. Fixes at least:
dEQP-VK.pipeline.render_to_image.dedicated_allocation.1d.small.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.pipeline.render_to_image.dedicated_allocation.2d.mipmap.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.renderpass.dedicated_allocation.attachment.4.401
dEQP-VK.renderpass2.suballocation.formats.r16_uint.load.draw
dEQP-VK.synchronization.op.single_queue.barrier.write_draw_read_copy_image_to_buffer.image_128x128_r16_uint
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Only clang has this argument (at least as of clang 8 and gcc 9), which
errors when using the gcc empty initializer syntax in C:
```C
struct foo f = {};
```
GCC has a warning for this, but only when using -Wpedantic, which is a
lot of noise to lose useful warnings in.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Most of these will never actually be compiled by windows, but in the
interest of being able to make using struct foo = {}; an error and
avoiding breaking windows removing a handful of safe uses seems like a
good trade off.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The vertex cache uses the full 48-bit address on Gen11+. See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.
Interestingly, the docs don't mention index buffers as needing a
workaround at all. So either we've been overzealous, or the docs
never got updated to record that. Which begs the question of whether
the issue there was fixed, if there was one...
Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
The slices table and most of the other layout fields in the
freedreno_resource moves into fdl_layout.
v2: Changes by anholt to not have duplicate fields, which was introducing
a surprising behavior change in resource layout (using the
level_linear helper before the setup of the shadowed fields)
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Clark <robdclark@chromium.org>
This gets the worst of the sed required for shared resource layout out of
the way. The texture layout comment is dropped now that we're referencing
the shared header, which has a more complete description.
Acked-by: Rob Clark <robdclark@chromium.org>
This will be used for sharing resource layout code between freedreno and
tu. Mostly copied from a commit by Rob, with a new location and the slice
struct renamed for consistency.
Acked-by: Rob Clark <robdclark@chromium.org>
Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out. This will make the shared resource helper transition
cleaner.
Acked-by: Rob Clark <robdclark@chromium.org>
This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.
Acked-by: Rob Clark <robdclark@chromium.org>
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:
result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);
->
temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);
nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.
Back in 2016 in 7be8d07732 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first. This helped his cases, but I
believe introduced this failure mode. Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things). There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.
Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):
Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
-65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
-68.8066% +/- 6.49523% (was 4.8s)
v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
tricky code. Runtime of uniform_api.random.3 down -0.790299% +/-
0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
fixing infinite loops.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.
Reviewed-by: Eric Anholt <eric@anholt.net>
If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.
Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Commit 0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail
on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't
propely handle this. Disabling this format also for buffers fixes the
issue.
In addition, disabling also the related RGB8 integer formats for buffers
fixes some deqp-gles3 tests:
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d
Fixes: 0899bf55
st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM
Closes#2118
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4567 | return ac_build_intrinsic(ctx, intr, type, params, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4568 | AC_FUNC_ATTR_READNONE);
| ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
In the case of promoted extensions we can end up with an entrypoint that
we support being an alias of an entrypoint we do not support. For
instance, if an extension gets promoted from EXT to KHR, the EXT entry-
points may be aliases of the KHR ones. We want to leave everything as
EXT until we get around to advertising the KHR so that we don't break
things when we update the XML and headers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The current code can only handle enum aliases if the original enum is
declared first followed by the alias as we walk the XML in a linear
fashion. This commit allows us to handle aliases where the alias
declaration comes before the thing it's aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We may have replaced the backing storage for a texture buffer while it
was unbound, at which point iris_rebind_buffer would not have caught it
and updated it. We need to ensure that the current resource's address
matches the one our SURFACE_STATE points at. If not, update addresses
and re-upload the SURFACE_STATE.
Shader images and buffers do not suffer from this problem because we
re-stream the surface state on every set call, since there isn't a
created CSO object for those with a saved SURFACE_STATE. Constant
buffers are also currently re-streamed (we pitch the SURFACE_STATE
on every set_constant_buffer call). Surfaces would need this
treatment (as they're created CSOs) except that we never swap out
their backing storage today (we only do it for buffers), so it's OK
for now.
Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene).
Huge thanks to Andrii Simiklit for tracking down the problem - it was
quite difficult to find! Also fixes Andrii's new Piglit test for the
bug, 'arb_texture_buffer_object-re-init'.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
When replacing the backing storage for texture buffers, image buffers,
and so on, we may need to update the "Surface Base Address" field in
any corresponding SURFACE_STATE. This is easier to accomplish if we
have a copy on the CPU - we can just compare the current field, update
it, and re-upload.
This patch adds a CPU-side copy to the new iris_surface_state wrapper
struct, and reworks allocation and upload to fill things out on the
CPU copy first, then upload that to the GPU when finished.
This will be necessary to fix iris_invalidate_resource bugs shortly.
Technically, we never replace the backing storage for pipe_surfaces
(render targets), so we don't need to make this change there. However,
it's nice to have surfaces, sampler views, and image views handled
similarly. Plus, if we ever wanted to swap out backing storage for
busy textures, we'd need this infrastructure.
v2: Properly free memory (caught by Andrii Simiklit)
Today, we only have a state reference to the GPU buffer containing our
uploaded SURFACE_STATEs. However, we're going to want a CPU-side copy
soon. Making a wrapper struct means we can talk about both together,
and also put both in the field called "surface_state".
We can just compare the VERTEX_BUFFER_STATE address field to the
current BO's address. When calling rebind, we've already updated
the resource to the new buffer, but the state will have the old
address.
Mutating fields of global resources is generally not safe, and the only
reason we were doing it was to avoid passing an extra parameter to
the fill_surface_state helper.
This takes a noticable amount of time in piglit and some tests don't
need it.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable. A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.
Results for v3d:
total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
patch, include SSA defs in live values so we can switch to bottom-up
if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
successfully register allocate on GLES3.1 dEQP. Make sure that
discards don't move after store_output. Comment spelling fix.
At the same time, update etna_clear_blit_pack_rgba to work with integer
formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Use the extended format if an such a format was passed.
v1 -> v2:
- set FORMAT_MASK bit when using ext PE format as suggested
by Wladimir J. van der Laan
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
No functional changes, but it will be used to decompress
separate depth/stencil aspects.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
No functional changes because the aspect mask is still not used
during image transitions but it will be needed for the separate
depth/stencil aspects logic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v7: run nir_opt_algebraic
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v10: add tests for 64-bit offsets
v10: add tests for signed offsets
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
This pass combines intersecting, adjacent and identical loads/stores into
potentially larger ones and will be used by ACO to greatly reduce the
number of memory operations.
v2: handle nir_deref_type_ptr_as_array
v3: assume explicitly laid out types for derefs
v4: create less deref casts
v4: fix shared boolean vectorization
v4: fix copy+paste error in resources_different
v4: fix extract_subvector() to pass
nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64
v4: rebase
v5: subtract from deref/offset instead of scheduling offset calculations
v5: various non-functional changes/cleanups
v5: require less metadata and preserve more
v5: rebase
v6: cleanup and improve dependency handling
v6: emit less deref casts
v6: pass undef to components not set in the write_mask for new stores
v7: fix 8-bit extract_vector() with 64-bit input
v7: cleanup creation of store write data
v7: update align correctly for when the bit size of load/store increases
v7: rename extract_vector to extract_component and update comment
v8: prevent combining of row-major matrix column acceses
v9: rework process_block() to be able to vectorize more
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v9: remove entry::store_value, since it will not be updated if it's was
from a vectorized load
v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2
v9: handle nir_intrinsic_scoped_memory_barrier
v10: use nir_ssa_scalar
v10: handle non-32-bit offsets
v10: use signed offsets for comparison
v10: improve create_entry_key_from_offset()
v10: support load_shared/store_shared
v10: remove strip_deref_casts()
v10: don't ever pass NULL to memcmp
v10: remove recursion in gcd()
v10: fix outdated comment
v11: use the new nir_extract_bits()
v12: remove use of nir_src_as_const_value in resources_different
v13: make entry key hash function deterministic
v13: simplify mask_sign_extend()
v14: add comment in hash_entry_key() about hashing pointers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
It shouldn't matter, but the 1 was leftover from when it was handled
together with workgroup_size and num_work_groups.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The former was always true and hence dead code. We will want to
explicitly declare the ring offset register with ACO, but we also want
to declare the scratch offset too, and we can't try to disable it since
ACO also supports spilling and the determination of whether spilling has
to happen occurs well after setting up registers. So replace
supports_spill with something that will actually be used for ACO.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Due to how LLVM works we have to make some of the FS inputs become
vectors, and therefore have to split them early so that they don't take
up extra register pressure due to how RA currently works.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:
- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.
- Stop using radv-specific structures for things like determining the
chip generation in ACO.
In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We'll duplicate this in a header file in the next commit, and then
remove the original enum. Just rename it temporarily so that things
keep building.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
GiMark benchmark from GpuTest has such code in VS:
out vec4 lightDir0;
out vec4 lightDir1;
...
lightDir0.xyz = lp0 - vVertex.xyz;
lightDir1.xyz = lp1 - vVertex.xyz;
In FS:
float distSqr = dot(lightDir0, lightDir0);
So due to the usage of uninitialized .w channel in the dot product,
distSqr may become undefined which results in many black dots
in the test on Iris.
In https://www.geeks3d.com/forums/index.php/topic,6242.0.html
developer stated that this benchmark most likely won't be updated.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1919
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes dEQP-VK.compute.builtin_var.local_invocation_index with
RADV_PERFTEST=cswave32.
My initial fix was to lower it but Rhys suggested the shift-right
and it's much better like this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
They are broken like on GFX6-GFX7. It seems better to disable them
instead of enabling a broken feature.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fails a couple of piglits due to other bugs in llvmpipe,
but it adds support for the feature properly.
v2: don't reset pipestats, just recalc, fix CI expectation
In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.
Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.
Fixes: cff53da3 ("radv: enable secure compile support")
This will be used to create a communication pipe between the user
facing device and a freshly forked (per pipeline compile) slim copy
of that device.
We can't use pipe() here because the fork will not be a direct fork
of the user facing process. Instead we use a previously forked
copy of the process that was forked at device creation in order to
reduce the resources required for the fork and avoid performance
issues.
Fixes: cff53da374 ("radv: enable secure compile support")
In the following commits we want to be able to fork an existing lightweight
fork created at device creation time. In order for the user facing process
to communicate with this new fresh fork we create some members here to hold
FIFO file descriptors and a unique id.
Here we also add a new fork enum that we use to tell the lightweight
process to create a fresh fork.
For more information on why we create a fresh fork see the following
commits.
This fixes a build failure on MSVC.
BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes build with MinGW, with shared LLVM and lto
/tmp/opengl32.dll.BxiIYm.ltrans59.ltrans.o:<artificial>:(.text+0x1674): undefined reference to `LLVMAddInstructionCombiningPass'
See also scons/llvm.py
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Only NPOT vectors greater than vec4 use the extra uint32.
This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.
There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
It can be derived from src and var. This frees 10 bits in the header
that will be used later.
"mode" is moved in the structure, because those bits will be used for
something else later.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The majority of constants can be packed like this.
v2: - use enum for the packing encoding,
- trim packed_value to 20 bits add 1 bit to last_component,
which simplifies a later commit
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
If the repo continues development, we don't want to accidentally pick
up potentially breaking changes on our next container rebuild.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
xfb + lines/points still flakes too frequently (and the problem isn't
even related to xfb), but we can add the rest back into this mix now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Extract .qpa for the individual unexpected results and flakes, and
translate to xml, preserved with the artifacts. This allows easy
browsing of the test logs for fails/flakes, for easier debugging.
The # of logs to preserve is capped at 50 to avoid saving 100s of
megabytes of logs in case someone pushes a change that breaks
everything.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
If there are a small number of fails, re-run to determine if they are
flakes, and optionally (if `$FLAKES_CHANNEL` configured) report the
flakes.
This way flakes don't interfere with developers working on other
drivers, but get logged so that the developers working on the flaking
driver can monitor the situation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Bump cts_runner to pick up the change to preserve .qpa and caselist .txt
files for blocks of tests that contain fails, and preserve the caselist
files. To reproduce fails that depend on order of running tests, these
are useful.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The log only shows the first 50, but preserve the full list for easier
browsing.
(Also move return of exit code to end which makes later patches in the
series easier)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Update the deqp build to preserve testlog-to-xml and stylesheets, so
deqp runner can extract .qpa for failed/flaked tests, and convert to
xml. With this, will be able to browse output from failed tests
directly from the artifacts.
The main motiviation is to give better visibility into what happens with
flaked tests, when it is difficult/impossible to reproduce the flake
locally (ie. when it happens once out of N million tests). But this
should also make it easier to debug regressions that a MR triggers,
especially when it is on hw that you don't have.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Dolphin: 75 fps -> 88 fps - Super Mario Galaxy
Citra: 81 fps -> 91 fps - A Link Between Worlds
Yuzu: 21 fps -> 27 fps - Super Mario Odyssey
Dolphin still has many syncs because of glFenceSync and glClientWaitSync.
Moving them to the dispatcher thread might yield another speedup.
Yuzu uses a compatible profile by default. This benchmark used the variable
MESA_GL_VERSION_OVERRIDE=4.5FC to overwrite this behavior.
This profilation was done on a mobile i7-8550U CPU with i965.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For temporary lookups, just allocate out of the NULL ralloc context,
so we don't have to edit the linked list of ralloc children to add it
and then immediately remove it again.
When uploading a new shader, allocate the keybox off the shader, so
if we delete the shader the keybox also goes away. Less manual cleanup.
All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <eric@anholt.net>
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.
The issue was observed in GiMark subtest from GpuTest.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)
Closes#2133
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
It seems overkill to me to build scons 7x for every pipeline.
Scons is now build with the oldest llvm version in scons-old-llvm
and with the newest llvm version in scons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Generating a source code with a fixed size leads to issues with plattform dependent types.
We either hard code 4 or 8 bytes there, and both are wrong on the other plattform.
So this patch solves this issue by generating eg sizeof(GLsizeiptr), which is valid both
on 32 and on 64 bit plattforms.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This enables UBWC for everything except 3D textures.
It breaks many image_to_image copies but those aren't important and it can
be worked around later (image_to_image copy needs to be done in two steps,
decode from the source format and then encode to the destination format).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix build error after llvm-10.0 commit 1dfede3122ee ("Move
CodeGenFileType enum to Support/CodeGen.h").
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp: In member function ‘void JitManager::DumpAsm(llvm::Function*, const char*)’:
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp:428:45: error: ‘CGFT_AssemblyFile’ is not a member of ‘llvm::TargetMachine’
*pMPasses, filestream, nullptr, TargetMachine::CGFT_AssemblyFile);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Do this by repeating processing of loops until no progress is made.
Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)
v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
of loops instead of at each continue
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
When GPU is idle and suspends, the currently selected countables
will all reset to the first one. So periodically restore the selected
countables.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Port from the envytools tree, but converted to use the .c tables for
describing the perfcounter groups/countables, rather than using rnndec
to get this at runtime from the register xml.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Currently this are getting blocked by the kernel.. these counters don't
seem to be the most useful ones, and to use them we'd have to somehow
probe the kernel by submitting cmdstream to write the selector regs and
see if that triggers a GPU fault. So let's just skip them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This should eventually be useful for VK_KHR_performance_query as well.
And in the more near term, for fdperf.
Attempt to not break android build is best-effort and untested.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
When we had one gen supporting performance counters, it made sense to
have these builder macros in the .c file with the table. But time has
come to de-duplicate.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This passes the piglit CL builtin-ulong-clz-1.0.generated.cl
test.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This adds the option to lower 64-bit ufind_msb opcodes.
v2: use split_x/y removes component loops (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I set the runners to concurrency=1, so they serve only one gitlab-ci job
at at time. Swap over to using the parallel runner now to keep the
runners busy, more efficiently than spawning many docker containers and
downloading artifacts multiple times, and producing easier-to-understand
results for browsing on the web.
This bumps the a306 runners to 4x parallel instead of 2x like before, but
cheza gles3 drops from 6 to 4. Current rough timings of the jobs (if no
container download):
db410c-gles2: 5:00
a630-gles2: 1:30
a630-gles3: 6:00
a630-gles31: 5:30
a630-gles3 is a bit longer than I like, but it should come back down once
I can sort out the NIR algebraic rewinding.
This was apparently missed in 67b32190f3, which added support
for ARB_shading_language_include to #line, including the 'path'
field for the location.
Fixes crashes in CTS with all drivers as they attempt to access
an uninitialized path string during parsing.
Fixes: 67b32190f3 ("glsl: add ARB_shading_language_include support to #line")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2132
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
Use hardcoded /cache/mesa/ccache for the cache, so it will be shared by
all jobs of all Mesa projects running on the same runner host. This
should increase the hit rate and decrease the worst case storage used.
Further benefits of directly using a host-mapped directory:
* Saves up to ~1 minute per job for restoring and saving the cache
contents via the GitLab CI cache mechanism
* Cache contents generated by failed jobs are no longer lost
* Jobs running in parallel on the same runner host can get hits from
each other
Also enable compression, so the default maximum cache size of 5G might
be sufficient.
v2:
* Move CCACHE_DIR variable to the .build-linux template
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Building GLVND in meson-main doesn't work because this disables
libEGL and it's needed for running shader-db.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Now that debugoptimized isn't set and that all test jobs depend on
meson-testing, enabling swr shouldn't slowdown the CI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
For turnip and RADV testing, we will need a debugoptimized build
without UBSAN. This introduces meson-testing which builds only the
things that are needed by the test stage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
This happens when mesa is built with only swrast. The default
driver being kmsro and the default driconf file being v3d,
it's NULL and then strdup crashes.
This fixes a crash with piglit spec/egl_mesa_query_driver/conformance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Enough trial and error ... just think even *more* Midgard about where
this field might be!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
also make 8 and 16 compoments invalid. We will enable that later again
when we actually support it.
v2: fix validation of nir_intrinsic_instr::num_components
correct validation of instr->num_components
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 52c7df1643. The pass,
while clearly useful for some shaders, has at least three bugs that I
was able to find fairly quickly:
1. It doesn't work for type-converting MOVs because f > 0 is not the
same as f2i(f) > 0
2. CSEL is a 3src instruction and only supports one source type; it
doesn't take this into account and tries to create instructions
which do a F compare and a D select. This is especially nasty to
debug because you don't see that in the dumped assembly because we
don't properly assert that types are the same in codegen.
3. While you can handle 2, in theory, by reinterpreting types, you
can't do that in the presence of source modifiers. This pass
doesn't even attempt to detect that.
Those are just the ones I found with the one almost trival shader I was
debugging. There very likely may be more and. Best thing to do for now
is just shut it off until someone has the time to figure out how to do
this properly and write tests to ensure it's correct.
Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
Reviewed-by: Brian Paul <brianp@vmware.com>
This will be useful as a deterministic identifier/index for the variable.
v2: fix comment style
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Doesn't shrink it (at least, on x86-64) and leaves space for more members.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2. [Hyunjun Ko (zzoon@igalia.com)]
Avoid using too much open code like "instr->regs[n]->flags |= FOO"
v3. [Hyunjun Ko (zzoon@igalia.com)]
Remove redundant code for both 16b and 32b operations.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit
versions of the reduce operation. This reduces the code duplication a
bit and will make it easier to later add 16-bit versions as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_compare_all_sizes which generates both 1-bit and 32-bit
versions of the comparison operation. This reduces the code
duplication a bit and will make it easier to later add 16-bit versions
as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Most of DEQP-VK.subgroups are skipped because 16-bit float aren't
supported but others pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Specifically when we are in non-uniform control flow, as we would need
to set the condition for the last instruction. If (for example) a
image atomic load stores directly their value on a NIR register,
last_inst would be a nop, and would fail when set the condition.
Fixes piglit test:
spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test
Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
v2: (Changes suggested by Eric Anholt)
* Cover all sig.ld* signals, not just ldunif and ldtmu, as all of
them have the same restriction.
* Update comment explaining why we add a MOV in that case
* Tweak commit message.
v3:
* Drop extra set of parens (Eric)
* Add missing ld signal to is_ld_signal to fix shader-db regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
Support cases such as depth-only renders and only set stencil buffers
when needed, to match the blob's behaviour.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The tiler unit in these GPUs is quite different and we haven't reverse
engineered enough of it yet to validate and pretty print it.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than open-coding checks on gpu_id in the compiler, let's track
quirks applying to whatever we're compiling for, to allow us to manage
the complexity of many heterogenous GPUs in the compiler.
It was discovered that a workaround used on T720 is also required on
T820 (and presumably T830), so let's fix this. This will also decrease
friction as we continue improving T720 support.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This will allow us to continue searching the current path for
relative shader includes.
From the ARB_shading_language_include spec:
"If it is quoted with double quotes in a previously included
string, then the first search point will be the tree location
where the previously included string had been found."
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
If the shader contains an include when need to first run the
preprocessor before deciding if we can skip compilation based
on the shader cache.
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
From the ARB_shading_language_include spec:
"#line must have, after macro substitution, one of the following
forms:
#line <line>
#line <line> <source-string-number>
#line <line> "<path>"
where <line> and <source-string-number> are constant integer
expressions and <path> is a valid string for a path supplied in the
#include directive. After processing this directive (including its
new-line), the implementation will behave as if it is compiling at
line number <line> and source string number <source-string-number>
or <path> path. Subsequent source strings will be numbered
sequentially, until another #line directive overrides that
numbering."
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
The new local function lookup_shader_include() will be used by
glDeleteNamedStringARB() in the following patch.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
This will be usefull when implementing glIsNamedStringARB() which
doesn't do error checking, it just returns false for invalid
lookups instead.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
When the scratch ringbuffer settings are changed, the shader unit has
to be idle or we will have shaders using old and new settings.
That combination is not supported on the HW (likely the offset is
ringbuffer idx * WAVESIZE * 1024).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Now that we can (mostly) generate a pipe format for a VkFormat, use that
to answer queries about formats. This will let us refactor the freedreno
format table surface layout code to be shared between gallium and vulkan.
This causes us to expose fewer formats for now (on a 1/100 CTS run I'm
doing, skips go from 3671 to 3835 out of 5145 tests). Fails stay about
the same (478 -> 434, but the run is pretty flaky and we're doing fewer
tests now).
v2: Rebase on master, throw a finishme on missing vk-to-pipe formats that
tu used to support.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> (v1)
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
I'm planning on using this from radv and tu for queries about formats.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This decreases memory usage, because serialized NIR is more compact.
If shader_has_one_variant is true and the shader is uncached, the first
variant is created from nir_shader, otherwise the first variant and
all other variants are created from serialized NIR.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
a later commit will add back st_vertex_program as a subclass of
st_common_program
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This matches the uncached codepath.
affected_states was used before initialization, which was technically
a bug, but probably not reproducible due to _NEW_PROGRAM rebinding
everything.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Allows eglmesaext.h to be used in C++ code.
This aligns this file with the rest of EGL.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Two files exist in that directory:
- vulkan_xlib_randr.h
- vulkan_xlib_xrandr.h
Both were imported in 205c271562 ("vulkan: Update the XML and
headers to 1.1.70") with identical contents (ie. the
VK_EXT_acquire_xlib_display extension), but the former was never
included anywhere and can't be found upstream [1], while the latter is
included in vulkan.h and found upstream.
[1] https://github.com/KhronosGroup/Vulkan-Headers/tree/master/include/vulkan
Fixes: 205c271562 ("vulkan: Update the XML and headers to 1.1.70")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We could enable it on GFX10 if LLVM wasn't used as a fallback for
unsupported stages. Note that the CTS only tests it if
VK_KHR_shader_float16_int8 is enabled, even though it's not a
requirement.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.
No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.
v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
Should make 64-bit integer reductions easier to implement.
v4: use num_opcodes instead of last_opcode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
This extension allows to use subgroup operations with 8 and 16-bits
Untested on GFX6-GFX7, and most of subgroup operations are broken
on GFX10, so don't enable it for now. Not enabled on ACO because
it's still doesn't support 8-bits/16-bits.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It should rely on the source type, not on the return type which
is always a boolean anyways, so vote_feq was never selected. For
OpSubgroupAllEqualKHR it's always an integer comparison.
This fixes some VK_KHR_shader_subgroup_extended_types tests with RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We don't seem to fault any more when running dEQP GLES2, and we don't
scrape serial output any more anyway so no problems should be caused by
that.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Always enabled; this doesn't require any driver work, it's just
core mesa bits.
quick_gl.txt is also updated because previously piglit ext_dsa
tests were skipped.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The spec is unclear on how to handle the buffer argument so we reuse
the logic from the EXT_direct_state_access spec.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can't simply alias ARB_direct_state_access functions because
those fail if the vao has never been bound before.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The wording in ARB_framebuffer_no_attachments and EXT_direct_state_access
is different.
In the former framebuffer names must have been generated using glGenFramebuffers
before using the named functions.
In the latter framebuffer names have no such constraints, so we can't use
the _mesa_lookup_framebuffer_dsa function.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All features from the EXT_dsa spec are implemented.
Interactions with other specs:
- GL_AMD_gpu_shader_int64: not needed, since it's not enabled in
compatibility profile.
- GL_ARB_bindless_texture is DONE
"INVALID_OPERATION is generated when calling various functions
to modify the state of a texture object from which handles have
been extracted"
- GL_ARB_buffer_storage/GL_EXT_buffer_storage is DONE (NamedBufferStorageEXT function)
- GL_ARB_texture_storage is DONE (3 TextureStorage*DEXT functions)
- GL_ARB_vertex_attrib_binding is DONE (6 VertexArray* functions)
- GL_EXT_external_buffer is not supported by Mesa
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It doesn't make sense to have nonlinear layouts for a buffer that can be
accessed as direct memory for a compute kernel. Turn that off so things
work as expected.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We can take the OpenCL kernel inputs and interpret them as uniforms by
simply reusing the Gallium callback.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Chrome OS would like to import and render to any supported format that has
a corresponding display plane format, and this prevents throwing
framebuffer incomplete for FBOs using these textures.
See: crbug.com/949260
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Store a flag stating if there was an implmentation, and use
fxn->impl as a temporary flag between deserializsation stages.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In d1c4e64a69, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push. I enabled it for iris because this is really what iris wants
but it seems to have caused a number of regressions. Revert to the old
behavior for now.
Fixes: d1c4e64a69 "intel/compiler: Add a flag to avoid compacting..."
Alignment requirements may have changed the horizontal stride already,
so don't set it if not required to avoid breaking said requirements.
Fixes several tests such as
dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
RS engine does this already, it is missing for BLT engine. This fixes
cases where a clear isn't immediately at the start of the frame.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
There are PE formats not supported by RS, so we can't have a single
to translate both.
Use RS only for same formats until we have a translate_rs_format and test
the possible different format blits.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
* Removes the incorrect usage of translate_rs_format
* Disables use of BLT engine for different src/dst format
We only really need the BLT engine for tiling/detiling right now, but it
would be nice to support as many blit cases as possible to avoid using PE
for that.
To deal with different formats we need to:
* Have a translate_blt_format which has all supported formats
* Fix the swizzle translation from gallium (current version was wrong)
* Set the src/dst sRGB bits as needed
* Find which type conversions the BLT engine can actually do
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
A security advisory (TALOS-2019-0857/CVE-2019-5068) found that
creating shared memory regions with permission mode 0777 could allow
any user to access that memory. Several Mesa drivers use shared-
memory XImages to implement back buffers for improved performance.
This path changes the shmget() calls to use 0600 (user r/w).
Tested with legacy Xlib driver and llvmpipe.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If both are zero (the common case), we can emit a null vertex buffer
rather than emitting a vertex buffer with zeros in it. The packing of
the VERTEX_BUFFER_STATE is faster because no relocation is emitted and
we can avoid creating the vertex buffer which means one less
anv_state_stream_alloc.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is a bit more natural because we're already getting an anv_state
most places in the pipeline. The important part here, however, is that
we're no longer calling anv_block_pool_map on every alloc_binding_table
call. While it's probably pretty cheap, it is potentially a linear walk
over the list of BOs and it was showing up in profiles.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of blindly dirtying descriptors and push constants the moment we
see a pipeline change, check to see if it actually changes the bind
layout or push constant layout. This doubles the runtime performance of
one CPU-limited example running with the Dawn WebGPU implementation when
running on my laptop.
NOTE: This effectively reverts beca63c6c0. While it was a nice
optimization, it was based on prog_data and we can't do that anymore
once we start allowing the same binding table to be used with multiple
different pipelines.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of dirtying all graphics or all compute based on binding point,
we're now much more careful. We first check to see if the actual
descriptor set changed and then only dirty the stages used by that
descriptor set. For dynamic offsets, we keep a bitfield per-stage of
which offsets are actually used in that stage and we only dirty push
constants and descriptors if that stage has dynamic offsets AND those
offsets actually change.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It theoretically could be more efficient but the real point here is that
it's no longer really a matter of dealing with special cases and then
the "real" thing. The way we're handling binding tables, it's more of a
multi-step process and a switch is more natural.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This substantially reworks both the state setup side of push constant
handling and the pipeline compile side. The fundamental change here is
that we're no longer respecting the prog_data::param array and instead
are just instructing the back-end compiler to leave the array alone.
This makes the state setup side substantially simpler because we can now
just memcpy the whole block of push constants and don't have to
upload one DWORD at a time.
This also means that we can compute the full push constant layout
up-front and just trust the back-end compiler to not mess with it.
Maybe one day we'll decide that the back-end compiler can do useful
things there again but for now, this is functionally no different from
what we had before this commit and makes the NIR handling cleaner.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This moves the compute stuff into a anv_push_constants::cs sub-struct.
It also moves dynamic offsets into the push constants. This means we
have to duplicate the data per-stage but that doesn't seem like the end
of the world and one day we may wish to make dynamic offsets per-stage
anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It turns off that emitting push constants is one of the hottest paths in
the driver and ANY work we do there costs us. By pre-computing things a
bit ahead of time, we shave 5% off the runtime of a CPU-limited example
running with the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The bounds checking is actually less safe than just pushing the data.
If the bounds checking actually ever kicks in and it's not on the last
UBO push range, then the shrinking will cause all subsequent ranges to
be pushed to the wrong place in the GRF. One of the behaviors we
definitely don't want is for OOB UBO access to result in completely
unrelated UBOs returning garbage values. It's safer to just push the
UBOs as-requested. If we're really concerned about robustness, we can
emit shader code to do bounds checking which should be stupid cheap (a
CMP followed by SEL).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
As of 2d78e55a8c, nir_intrinsic_load_constant with a constant offset
is constant-folded so we should never end up with any that trigger
brw_nir_analyze_ubo_ranges.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us stop tracking the pipeline layout. It also means less
indirection on a very hot path. As an extra bonus, we can make some of
our data structures smaller. No measurable CPU overhead improvement.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In the early days of the driver we allowed layout to be VK_NULL_HANDLE
and used that for some internal pipelines when we wanted to be lazy.
Vulkan doesn't actually allow NULL layouts, however, so there's no
reason to have this check.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
A 'normal' texture op may be emitted in a vertex shader on T720 but it
still doesn't take any derivatives.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting
3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in
SuperTuxCart and Tropico 6 which was seen only on Haswell.
The reason for this is unknown and fix was found empirically.
The closest mention in PRM is that it should improve performance.
From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):
"When the BLEND_STATE pointer changes but not the CC_STATE pointer,
driver needs to force a CC_STATE pointer change to improve
blend performance in pixel backend."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834
Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension adds the device coherent and device uncached memory
types. It's known to be slower than non-device coherent memory but
it might be useful for debugging.
This is only exposed for chips that support L2 uncached.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This simplifies manipulation of the offsets dramatically, fixing some
UBO access related bugs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Prefetch only supports the basic 2D texture case, checking is_array is
needed because 1d array textures pass the coord num_components==2 test.
Fixes: 2a0d45ae ("freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This makes the streams more readable and comparable with the blob's parser
as it parses the VS and PLBU stream and shows the currently known values.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
CodeGenFileType moved from ::llvm::TargetMachine in
llvm/Target/TargetMachine.h to ::llvm:: in llvm/Support/CodeGen.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
To avoid following building error:
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_util_intermediates/format/u_format_table.c:30:10:
fatal error: 'u_format.h' file not found
^~~~~~~~~~~~
1 error generated.
Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
GEN10_FORMAT_TABLE_INPUTS requires correction of u_format.csv file path
in order to avoid following build error:
ninja: error: 'external/mesa/util/format/u_format.csv',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/gfx10_format_table.h',
missing and no known rule to make it
Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
We had this ad-hoc exact size matching for unsized internalformats,
but st_choose_matching_format() can do exactly what we want. This
means, that, for example, we'll now prefer the matching ordering for
565/565_REV if the driver supports both orders. We also pass
Unpack.SwapBytes through from ChooseTextureFormat so that we can hit
the memcpy path for 8888 formats when that flag is set.
Some interesting format choice changes from this (on softpipe):
intf/form/type before after
----------------------------------------------------
RGBA/RGBA/USHORT: R8G8B8A8_UNORM -> RGBA_UNORM16
RGB/RGBA/8888: X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGB/ABGR/8888_REV: X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGBA/RGBA/5551: B5G5R5A1_UNORM -> A1B5G5R5_UNORM
RGBA/RGBA/4444: R8G8B8A8_UNORM -> A4B4G4R4_UNORM
RGBA/GL_RGBA/1010102: R8G8B8A8_UNORM -> A2B10G10R10_UNORM
DEPTH/DEPTH/UINT: Z24X8 -> Z_UNORM32
DEPTH/DEPTH/USHORT: Z24X8 -> Z_UNORM16
v2: Make sure that the baseformat still matches. v1 would pick
MESA_FORMAT_L16_UNORM for RED/LUMINANCE/SHORT, when we clearly
want a red format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
sRGB vs unorm was the only conflict case being guarded against in this
function. Before the PIPE_FORMAT conversion, we always listed the
unorm before the sRGB in the enums, but PIPE_FORMAT_A8B8G8R8_SRGB
happens to be before _UNORM. We always want the unorm result here.
Fixes: 807a800d8c ("mesa: Redefine MESA_FORMAT_* in terms of PIPE_FORMAT_*.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now have a nice helper function for finding those memcpy formats,
without needing to go through each entry of the mesa format table to
see if it happens to match.
While looking at sysprof of a softpipe GLES2 CTS run, we were spending
~8% of the CPU on ChooseTextureFormat. With this, roughly the same
region of the testsuite was .4%.
v2: Add Ken's fix for canonicalizing array formats.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just return MESA_FORMAT_NONE to avoid triggering unreachable; there's
really no sensible thing to return for this case anyway.
This prevents regressions in the next commit, which makes st/mesa
start using this function to find a reasonable format from GL format
and type enums.
Reviewed-by: Eric Anholt <eric@anholt.net>
Eventually, we will want to combine constants across types, but for now
let's not break the world.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
64-bit ops have their own funky swizzles. Let's pack them, both for
native 64-bit sources as well as extended 32-bit sources.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
State was leaking from previous frames as we weren't updating the
descriptor in all cases.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
On newer GPUs, this is a no-op. On older GPUs, this prevents needless
spilling since texture registers are shared with a subset of work
registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
This actually supports more of the extension than the LLVM backend but we
can't enable it because ACO doesn't work with all stages yet.
With more of it enabled, some CTS tests fail because our 64-bit sqrt
is very imprecise. I can't find any precision requirements for it
anywhere, so I'm thinking it might be a CTS issue.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO sets this itself and will have to set it differently in the future to
support shaderDenormFlushToZeroFloat64.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Two benefits:
Most docker image related environment variables can now be defined in
the jobs where they're used instead of globally. The DEBIAN_TAG values
are propagated to other jobs via YAML anchors.
Images on https://gitlab.freedesktop.org/mesa/mesa/container_registry
are now organized in separate repositories with a suffix matching the
name of the job which makes sure the image is there.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Cleans up .gitlab-ci/ a little, and allows using a single DEBIAN_EXEC
line for all container jobs.
v2:
* Use lava_arm.sh instead of arm_lava.sh for consistency with v2 of the
previous change
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This makes it easier to tell which job is which in a pipeline.
v2:
* Use lava_arm{64,hf} instead of arm{64,hf}_lava to keep these jobs
together in pipeline overviews
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Per the spec, the units passed to glPolygonOffset are to be multiplied
by an implementation-defined constant.
On Midgard, this constant seems to be 2.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
To not overwrite the resolve if there is pending clear aspects,
same as color resolves.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In the case of glibc, pthread_t is internally a pointer. If
lp_rast_destroy() passes a 0-value pthread_t to pthread_join(), the
latter will SEGV dereferencing it.
pthread_create() can fail if either the user's ulimit -u or Linux
kernel's /proc/sys/kernel/threads-max is reached.
Choosing to continue, rather than fail, on theory that it is better to
run with the one main thread, than not run at all.
Keeping as many threads as we got, since lack of threads severely
degrades llvmpipe performance.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
So nir_validate happens properly. Unfortunately this means we have
to play the metadata song and dance, so walk over all impls and say
that we didn't hurt anything.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When demoting it from an output to a global, we need to actually move
it to the correct list. While here, we also refactor so it's clear
we aren't mutating the list while iterating.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2106
Fixes: f9fd04aca1 ("nir: Fix non-determinism in lower_global_vars_to_local")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were compiling them twice, costing extra build time. Reduces my
ccache-hot clean build time by a second (24.3s to 23.3s, 3 runs each).
The windows args are a little strange -- it's not clear to me that
they're actually used for building these files, but keep them in place
just in case, since we don't have a good windows CI story yet. We
should want them on both gallium and classic regardless: Only osmesa
could be built for windows in classic, and classic OSMesa's scons
build defines these flags too.
Closes: #2052
Acked-by: Dylan Baker <dylan@pnwbakers.com>
As this required use of Python 3.8, mako module also had to be updated.
v2 - Unbind mako module version when using Meson.
Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
On gen7 and earlier the scratch space size is limited to 12kB.
By enabling this optimization we may easily exceed this limit
without having any fallback.
arb_compute_shader/linker/bug-93840.shader_test crashes with
this lowering on IVB due to exceeding scratch size limit.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092
Fixes: 69244fc7
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Previously, instruction selection had two kinds of booleans:
1. divergent which was per-lane and stored in s2 (VCC size)
2. uniform which was stored in s1
Additionally, uniform booleans were made per-lane when they resulted
from operations which were supported only by the VALU.
To decide which type was used, we relied on the destination size,
which was not reliable due to the per-lane uniform bools, but it
mostly works on wave64.
However, in wave32 mode (where VCC is also s1) this approach
makes it impossible keep track of which boolean is uniform and
which is divergent.
This commit makes all booleans per-lane.
The resulting excess code size will be taken care of by the optimizer.
v2 (by Daniel Schürmann):
- Better names for some functions
- Use s_andn2_b64 with exec for nir_op_inot
- Simplify code due to using s_and_b64 in bool_to_scalar_condition
v3 (by Timur Kristóf):
- Fix several subgroups regressions
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO's optimizer would try to propagate 64-bit constants, but
does so in such a way that wouldn't work due to how the 64-bit
constants are handled in the IR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This patch tries to give instructions with the same execution
mask also the same pass_flags and enables VN for SALU instructions
using exec as Operand.
This patch also adds back VN for VOPC instructions and removes VN for phis.
v2 (by Timur Kristóf):
- Fix some regressions.
v3 (by Daniel Schürmann):
- Fix additional issues
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Using a hash-table walk means that variables will get inserted in
different orders on different runs. Just walk the list of globals
instead, even if some of them can't be turned into locals.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit "1c2bf82d24a glsl: disable lower_fragdata_array() for NIR drivers"
disabled the GLSL IR lowering that turned gl_FragData from an array into a
collection of scalar outputs under the assumption that this was already being
handled properly elsewhere, however there are some corner cases where NIR
would fail to do this, leaving gl_FragData[] as an array variable. This can
break backends that assume that all their outputs will be scalar and use the
variable definitions from the shader to do their output setup, such as the
case of V3D.
At least one corner case was found in some Portal shaders from shader-db, where
NIR would optimize out the full body of a fragment shader. In this scenario,
the empty shader would keep the original array definition of gl_FragData[],
causing the backend to assert.
We need to do this late enough for it to be effective, since doing it in
st_nir_preprocess does not fix the original problem.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2091
Fixes: 1c2bf82d ("glsl: disable lower_fragdata_array() for NIR drivers")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit 7520478461.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit 1d1b457821.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit 1d122c104a.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit 34b1aa957a.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit c1c574fdf1.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Not much of a difference but slightly better and slightly less
arbitrary.
total instructions in shared programs: 3560 -> 3559 (-0.03%)
instructions in affected programs: 44 -> 43 (-2.27%)
helped: 1
HURT: 0
total bundles in shared programs: 1844 -> 1843 (-0.05%)
bundles in affected programs: 23 -> 22 (-4.35%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.
This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
This can also be verified by simply changing fix_byte_src() to apply
on all platforms.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
This was causing uninitialized value to end up propagated to the
3DSTATE_DEPTH_BOUNDS packet, leading to asserts on packet
building due to the value being greater than 1.
Fixes: 939ddccb7a ("anv: Add support for depth bounds testing.")
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Pretty routine, we do have a hack to force swizzle alignment for !32-bit
for until we implement !32-bit the right way.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is less complicated than previously thought. Note we have no way of
specifying the work register count for blend shaders; it must be
strictly less than the work register count of the corresponding fragment
shader (which is fine since we force the fragment shader to report a
count of 16 with a blend shader as a major hack until we get register
pressure down for blend shaders).
TODO: pandecode the flags.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Close input end of the pipe after data was written. Without this
fix I have seen a hang in sysfs_uevent_get(.., "OF_FULLNAME")
when key was not found.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can clear the "needs" flags once we emit a flag. And also, don't
open-code the opcode name.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
For pre-fs-dispatch texture fetch, we need to assign bary_ij to r0.x,
even if it is not used in the shader (ie. only varying use is for tex
coords). But if, for example, gl_FragCoord is used, it could get
assigned on top of bary_ij, resulting in a GPU hang.
The solution to this is two-fold: (1) the inputs/outputs rework has the
benefit of making RA realize bary_ij is a vec2, even if there are no
split/collect instructions (due to no varying fetches in the shader
itself). And (2) extend the live ranges of meta:input instructions to
the first non-input, to prevent RA from assigning the same register to
multiple inputs.
Backport note: because of (1) above, a better solution for 19.3 would be
to revert f30c256ec0.
Fixes: f30c256ec0 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
At the ir3 level, we would assume that we could use wrmask to mask
off other components of an instruction returning a vecN when they are
not used. Which would let RA use components not written for other live
values. But this is only true for tex instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Allow inputs/outputs to be vecN (ie. whatever their actual size is), and
use split to get scalar components of inputs, and collect to gather up
scalar components of outputs.
The main motivation is to simplify RA, by only having to consider split/
collect to figure out where values need to land in consecutive scalar
registers, rather than having to also deal with left/right neighbors.
Because of varying packing, and the resulting fractional location
(location_frac), to implement load_input/store_output, it is still
convenient to have a table of scalar inputs/outputs. We move this to
the compile ctx (since it is only needed for nir->ir3).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In almost all places, the add_sysval_input() is paired directly with a
create_input(). (The one exception is frag shader ij bary coord, and
this exception will go away in a later patch.)
So go ahead and clean this up before reworking input/output handling.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a driver-param (loaded from uniform), not a sysval (populated by
hw into a register). So it has no value to having a sysval slot.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We keep kill's alive w/ keeps these days, rather than a fake output.
This condition was left over from prior to that change.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If I'm going to refactor a bit to use these meta instructions to also
handle input/output, then might as well cleanup the names first.
Nouveau also uses collect/split for names of these meta instructions,
and I like those names better.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This doesn't really work, we can't necessarily just change the outputs
to half-precision like this in anything but simple cases.
Keep the shader key entry around though, eventually with proper mediump
support we could use this with a nir pass to use lower precision frag
shader outputs when the render target format has <= 16b/component.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction has 3 src regs, so `instr->regs[0..3]` are valid, but
`instr->regs[4]` is not.
```
Test case 'dEQP-GLES31.functional.shaders.linkage.es31.tessellation.varying.rules.output_superfluous_declaration'..
==29239== Invalid read of size 8
==29239== at 0x5BE9CDC: emit_cat6 (ir3.c:841)
==29239== by 0x5BEA1BF: ir3_assemble (ir3.c:921)
==29239== by 0x5BDF0A7: ir3_shader_assemble (ir3_shader.c:133)
==29239== by 0x5BDF193: assemble_variant (ir3_shader.c:162)
==29239== by 0x5BDF407: create_variant (ir3_shader.c:215)
==29239== by 0x5BDF4DB: shader_variant (ir3_shader.c:241)
==29239== by 0x5BDF553: ir3_shader_get_variant (ir3_shader.c:257)
==29239== by 0x5BA85F7: ir3_shader_variant (ir3_gallium.c:80)
==29239== by 0x5BA7703: ir3_cache_lookup (ir3_cache.c:96)
==29239== by 0x5B8B8B3: fd6_emit_get_prog (fd6_emit.h:119)
==29239== by 0x5B8C137: fd6_draw_vbo (fd6_draw.c:186)
==29239== by 0x5BB1FBB: fd_draw_vbo (freedreno_draw.c:290)
==29239== Address 0xb97f2d0 is 0 bytes after a block of size 240 alloc'd
==29239== at 0x4848D54: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==29239== by 0x61BD35B: ralloc_size (ralloc.c:119)
==29239== by 0x61BD41B: rzalloc_size (ralloc.c:151)
==29239== by 0x5BE599B: ir3_alloc (ir3.c:45)
==29239== by 0x5BEA583: instr_create (ir3.c:984)
==29239== by 0x5BEA5DF: ir3_instr_create2 (ir3.c:1000)
==29239== by 0x5BEE317: ir3_STLW (ir3.h:1431)
==29239== by 0x5BF12D3: emit_intrinsic_store_shared_ir3 (ir3_compiler_nir.c:903)
==29239== by 0x5BF418B: emit_intrinsic (ir3_compiler_nir.c:1802)
==29239== by 0x5BF5D07: emit_instr (ir3_compiler_nir.c:2339)
==29239== by 0x5BF603F: emit_block (ir3_compiler_nir.c:2426)
==29239== by 0x5BF624B: emit_cf_list (ir3_compiler_nir.c:2474)
==29239==
```
Probably this only triggers in non-optimized builds?
Fixes: 1f3b52ce50 ("freedreno/a6xx: Add register offset for STG/LDG")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we're not using so many job slots, it's easy to get these
jobs run in a reasonable amount of time (gles3 took 10 minutes for 4
cores, and gles31 was 15 minutes for 4 cores).
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
This runner is a little project by Bas, written in C++, that spawns
threads that then loop grabbing chunks of the (randomly shuffled but
consistently so) test list and hand it to a dEQP instance. As the
remaining list gets shorter, so do the chunks, so hopefully the
threads all complete effectively at once. It also handles restarting
after crashes automatically. I've extended the runner a bit to do
what I was doing in the bash scripts before, like the skip list and
expected failures handling. This project should also be a good
baseline for extending to handle retesting of intermittent failures.
By switching to it, we can have the swrast tests just take up one job
slot on the shared runners and keep their allotment of CPUs busy,
instead of taking up job slots with single-threaded dEQP jobs. It
will also let us (eventually, once I reprovision) switch the freedreno
runners over to threading within the job instead of running concurrent
jobs, so that memory scribbles in one pipeline don't affect unrelated
pipelines, and I can experiment with their parallelism (particularly
on a306 where we are frequently backed up) without trashing other
people's jobs.
What we lose in this process is per-test output in the log (not a big
loss, I think, since we summarize fails at the end and reducing log
length keeps chrome from choking on our logs so badly). We also drop
the renderer sanity checking, since it's not saving qpa files for us
to go poke through. Given that all the drivers involved have fail
lists, if we got the wrong renderer somehow, we'd get a job failure
anyway.
v2: Rebase on droppong of the autoscale cluster and the arm64
build/test split. Use a script to deduplicate the cts-runner
build.
v3: Rebase on the amd64 build/test container split.
Acked-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2)
The bash scripts were using grep in the manner that matches any subset
of the line, but the new CTS runner matches the whole line and I think
that's a pretty good behavior. Given that some of the skip lists
already were written to match the full test name, just make them
consistently do so.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
This helps cut down our container build time. I've left a few that
we're likely to rev more frequently or I was less confident in
dropping.
v2: Rebase on the build/test container split, now bumps the build
container tag in this commit.
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Daniel Stone <daniels@collabora.com> (v1)
We can end up with scenarios where last_fence is associated with a batch
that is flushed through some other path before needs_out_fence_fd gets
set. Resulting in returning a fence that has no backing fd.
The simplest thing is to just skip the optimization to try and avoid
no-op batches when a fence-fd is requested. This should normally be
just once a frame anyways.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If the data is uniform, then it's really a uniform copy. If the index is
uniform, then it's really a read_invocation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Seems we can use DPP's row_mask field to get an effect similar to
modifying exec.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Autotools was deprecated for a while and has now been removed, so let's
start using meson here so that we won't have any issues next time we
update libdrm.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
It seems whatever was causing this is no longer an issue. So let's get
rid of the hack here.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
The automatically generated padding in structs contains
undefined values, force pack the structs to eliminate the
padding. Otherwise structs with the same values may generate
different hashes.
Valgrind output:
Conditional jump or move depends on uninitialised value(s)
util_fast_urem32 (fast_urem_by_const.h:71)
hash_table_search (hash_table.c:262)
_mesa_hash_table_search (hash_table.c:296)
anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318)
anv_pipeline_cache_search (anv_pipeline_cache.c:335)
lookup_blorp_shader (anv_blorp.c:38)
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112)
blorp_mcs_partial_resolve (blorp_clear.c:1205)
anv_image_mcs_op (anv_blorp.c:1742)
anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774)
transition_color_buffer (genX_cmd_buffer.c:1159)
cmd_buffer_end_subpass (genX_cmd_buffer.c:4840)
Uninitialised value was created by a stack allocation
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103)
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Same as was done for the ARM images before.
This should make it less painful to update to newer dEQP / piglit as
well as to make changes to the build/test environment.
Reviewed-by: Eric Anholt <eric@anholt.net>
One job for the quick_gl profile, one for the glslparser & quick_shader
profiles (doing these together takes hardly any more time than
quick_shader alone).
v2:
* Don't break lava tests
v3:
* Remove piglit test artifacts paths:
* Exclude some quick_shader tests again:
- Test whose result flips between pass/fail/skip
- *@vs_in tests, as not the same one of these gets picked every time
v4:
* Do not list passing tests in .gitlab-ci/piglit/*.txt (Eric Anholt)
* Include the test number summary in .gitlab-ci/piglit/*.txt
* Completely disable generating any vs_in tests in the piglit build.
* Remove some more unneded files from the piglit build tree.
* Exclude quick_gl arb_gpu_shader5 tests; they were all skipped anyway,
as llvmpipe doesn't support this extension yet, but occasionally they
would spuriously fail instead.
v5:
* Set LD_LIBRARY_PATH, so we actually test the Mesa build from the
pipeline...
* Verify that wflinfo reports the expected Mesa version
* Pass -noreset to Xvfb
v6:
* Don't use autoscale runners, run piglit with -j4 (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
It's currently only needed for the meson-main and meson-arm64 jobs, not
the other meson build jobs.
Also remove MESON_SHADERDB, just run .gitlab-ci/run-shader-db.sh
directly from the meson-main job.
v2:
* Also run prepare-artifacts.sh in meson-arm64 script
v3:
* Move tarball creation into the new script as well, as it prevented
ccache --show-stats from running in after_script
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1
Reviewed-by: Eric Anholt <eric@anholt.net>
By default, ninja tries to saturate all cores of the runner host
machine, which could overload it due to other jobs running in parallel.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, we fail validation and potentially generate invalid code.
Let's fix up the mode of the accesses to the variable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This fixes crashes with some
dEQP-VK.spirv_assembly.instruction.spirv1p4.* tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
A640 seems to work without any other changes (glmark and vkcube).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Timeline semaphore introduce support for wait before signal behavior,
which means that it is now allowed to call vkQueueSubmit() with wait
semaphores not yet submitted for execution. Our kernel driver requires
all of the wait primitives to be created before calling the execbuf
ioctl. As a result, we must delay submissions in the userspace driver.
This change store the necessary information to be able to delay a
VkSubmitInfo submission to the kernel driver.
v2: Fold count++ into array access (Jason)
Move queue list to another patch (Jason)
v3: Document cleanup of temporary semaphores (Jason)
v4: Track semaphores of SYNC_FD type that needs updating after delayed
submission
v5: Don't forget to update sync_fd in signaled semaphores after
submission (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Delayed submissions required by timeline semaphores mean we need to be
able to update the sync fd backed semaphores in a delayed fashion.
This could mean a race between the application destroying the
semaphore and the submission code trying to update it with the new
sync fd.
This change prepares semaphores to be refcounted, we'll most likely
only take a reference for cases where we signal a sync fd semaphore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When we will submit to i915 from a submission thread, we won't be able
to directly report the error to the user (in particular through the
debug report callbacks). So prepare 2 paths to report errors device ->
notifying the user immediately, queue -> notifying the user the next
time an entry point is called.
In this change we still report directly for both paths, this will
change in the next commit.
v2: Split NULL batch parameter handling in
anv_queue_submit_simple_batch() in a different commit
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Prepare the queue initialization to take on more responsabilities and
possibly fail.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the future we'll have 2 different allocations depending on whether
we're using threaded submission or not.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.
v2: Found a second occurence
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.
Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!
v2: drop redundant fd = -1 (Jason)
v3: Update commit message (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The change made in 88d665830f ("mesa: check draw buffer completeness
on glClearBufferfi/glClearBufferiv") correctly updated the state prior
to checking the framebuffer completeness on glClearBufferiv but not in
glClearBufferfi.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 88d665830f ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/2072
Currently the linker do all the work then check for the limits, which
means num_textures and num_images in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
Add necessary plumbing to make sure we bail once those errors are
found.
Fixes: 84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Currently the linker do all the work then check for the limits, which
means num_ssbos and num_ubos in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
One drawback of this approach is that for some cases we might not see
the collected errors from various stages, but bail as soon as a stage
breaks the limits.
Fixes: 84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This allows ZSTD instead of ZLIB to be used for compressing the shader
cache.
On a 72 core system emulating skl with a full shader-db (with i965):
ZSTD:
1915.10s user 229.27s system 5150% cpu 41.632 total (cold cache)
225.40s user 10.87s system 3810% cpu 6.201 total (warm cache)
154M (235M on disk)
ZLIB:
2231.33s user 194.24s system 1899% cpu 2:07.72 total (cold cache)
229.15s user 10.63s system 3906% cpu 6.139 total (warm cache)
163M (244M on disk)
Tim Arceri sees (8 core ryzen and a full shader-db):
ZSTD:
2505.22 user 40.50 system 3:18.73 elapsed 1280% CPU (cold cache)
418.71 user 14.93 system 0:46.53 elapsed 931% CPU (warm cache)
454.3 MB (681.7 MB on disk)
ZLIB:
3069.83 user 40.02 system 4:20.13 elapsed 1195% CPU (cold cache)
425.50 user 15.17 system 0:46.80 elapsed 941% CPU (warm cache)
470.3 MB (701.4 MB on disk)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Move differences in eglextchromium.h header file, then provide the same header than libglvnd-1.2
So program that omit to include eglextchromium.h will fail to build with both mesa and libglvnd headers.
Fixes: a0a8109f "include: add the definition of EGL_EXT_image_flush_external"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Shader-db results on Kaby Lake:
total instructions in shared programs: 14929212 -> 14880028 (-0.33%)
instructions in affected programs: 72428 -> 23244 (-67.91%)
helped: 6
HURT: 2
helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624
helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08%
HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178
HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97%
95% mean confidence interval for instructions value: -11947.03 -348.97
95% mean confidence interval for instructions %-change: -125.72% 202.37%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs: 368585300 -> 342557344 (-7.06%)
cycles in affected programs: 28144921 -> 2116965 (-92.48%)
helped: 6
HURT: 2
helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682
helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28%
HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788
HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59%
95% mean confidence interval for cycles value: -5900438.73 -606550.27
95% mean confidence interval for cycles %-change: -140.79% 146.16%
Inconclusive result (%-change mean confidence interval includes 0).
total spills in shared programs: 9243 -> 8901 (-3.70%)
spills in affected programs: 2718 -> 2376 (-12.58%)
helped: 4
HURT: 4
total fills in shared programs: 21831 -> 10141 (-53.55%)
fills in affected programs: 11804 -> 114 (-99.03%)
helped: 6
HURT: 2
total sends in shared programs: 815912 -> 815912 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 1
GAINED: 3
The helped shaders are all compute shaders in Aztec Ruins. There is
also a compute shader in synmark2 OglCSDof that's helped but it doesn't
show up in above shader-db results because it went from SIMD8 to SIMD16.
That shader improves enough to yield an 15-20% performance boost to the
benchmark as a whole on my KBL laptop. The hurt shaders are a couple
shaders in Kerbal Space Program and a couple in Aztec Ruins.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit fills in a number of different pieces:
1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the
new intrinsics. This involves simple plumbing work as well as a
tiny bit of extra logic to always scalarize scratch intrinsics
2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics
into byte/dword scattered read/write messages which use the A32
stateless model.
3. Add code to lower_surface_logical_send to handle dword scattered
messages and the A32 stateless model.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This new helper is better than nir_bitcast_vector because it's able to
take a (mostly) arbitrary range from the source vector. The only
requirement is that first_bit has to be aligned to the smaller of the
two bit sizes. It wouldn't be hard to lift that requirement but it's
reasonable for now.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Avoid duplicating some checks and code by making anv_GetDeviceQueue a
subcase of anv_GetDeviceQueue2, like radv does.
Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we have an accelerated path for a particular framebuffer format,
let's use it to save a bunch of instructions in a blend shader.
[Tomeu: Only use the faster intrinsic on >T760]
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
While most load/store operations on 32-bit/vec4 intriniscally, some are
not and have special type-size-dependent semantics for the mask. We need
to convert into this native format.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
There are two versions of this opcode, depending what version of the ISA
you're using. I'm not sure if there's a semantic difference; I think
there might be some slight subtleties but it's too early to know at this
stage.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This is a single opcode, at least on newer Midgard chips. It's easier to
have this represented in NIR rather than trying to optimize out the
conversions, so let's add the intrinsic.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
When using packed vulkan-formats on little-endian systems, we need to
swap the components for the gallium formats. And since Zink isn't
big-endian safe yet, little-endian is the only endianess we care about
right now.
This fixes a bunch of piglit tests, amongs others:
- spec@arb_depth_texture@depth-level-clamp
- spec@arb_depth_texture@depthstencil-render-miplevels * d=z24
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
The system can be disabling HW acceleration unbeknown to the user,
leading to a long debug session trying to work out which component is
failing. A quick mention that it is the environment override would be
very useful.
v2: Use more generic "CPU renderer" and so try to avoid jargon.
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Martin Peres <martin.peres@linux.intel.com>
This commit makes two major changes. First, we add a second case to
OpLoad for sampled images which constructs a vtn_sampled_image and
stashes that rather than stashing a pointer to the combined image
sampler like we do for bare samplers and images. This should be more in
line with how SPIR-V is intended to work and hopefully doesn't cause any
weird problems. The second is a rework of vtn_handle_texture to assume
that everything has an image but not everything has a sampler. We also
add a vtn_fail_if for the case where a texture instructions require a
sampler but none is provided.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This helper makes a duplicate copy of the pointer if any new access
flags are set at this stage. This way we don't end up propagating
access flags further than they actual SPIR-V decorations. In several
instances where we create new pointers, we still call the decoration
helper directly because no copy is needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We have types on all vtn_values at this point so there's no reason to
carry the redundant type information.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The instruction count is (mostly) a measure of what optimization passes
can do, while # of nops is more an indication of how effectively the
scheduler is balancing register pressure vs instruction count. So track
these independently.
(There could be opportunities to rematerialize values to reduce register
pressure, swapping some nop's with other alu instructions, so nothing is
truely independent.. but it is still useful to break these stats out.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
The meta PHI instruction was removed long ago. And fanin/fanout
themselves to not contribute actual instructions (at least not by the
time you get to sched, they may prevent copy-propagating away a mov)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fold it in to writes_gpr() (since a register that does not reference any
registers by definition does not write a register). This lets us avoid
having to handle this case in a few other places.
Signed-off-by: Rob Clark <robdclark@chromium.org>
We did this in some places before, but not consistantly. But it will be
useful for two-pass RA, to identify which registers have already been
assigned.
While we are cleaning this up, use __ssa_src() and new __ssa_dst()
helper more consistently. (If nothing else, this reduces the # of
callers of ir3_reg_create() to audit that we didn't miss something)
Signed-off-by: Rob Clark <robdclark@chromium.org>
The stage specific fields of shader_info are in an union. We've
likely been lucky that this value was either overwritten or ignored by
other stages. The recent change in shader_info layout in commit
84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
made this issue visible.
Fixes: cf2257069c ("nir/spirv: Set a default number of invocations for geometry shaders")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Found once I started using the generated unpack code from the Mesa side.
Fixes: 4bbaac3782 ("gallium: Add some more channel orderings of packed formats.")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Mesa emulates planar format sampling with per-plane samplers. Virgl now
supports this by allowing the plane index to be passed when creating a
sampler view from a planar image. With this change, mesa now passes that
information to virgl.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Lepton Wu <lepton@chromium.org>
It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The panfrost BO cache can only grow since all newly allocated BOs are
returned to the cache (unless they've been exported).
With the MADVISE ioctl that's not a big issue because the kernel can
come and reclaim this memory, but MADVISE will only be available on 5.4
kernels. This means an app can currently allocate a lot memory without
ever releasing it, leading to some situations where the OOM-killer kicks
in and kills the app (or even worse, kills another process consuming
more memory than the GL app) to get some of this memory back.
Let's try to limit the amount of BOs we keep in the cache by evicting
entries that have not been used for more than one second (if the app
stopped allocating BOs of this size, it's likely to not allocate
similar BOs in a near future).
This solution is based on the VC4/V3D implementation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We will soon introduce an LRU list to evict BOs that have been unused
for more than 1 second. Let's first move all BO cache fields to a
sub-struct to clarify which fields are used by the BO caching logic.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There aren't texture pipeline registers anymore; instead, space is
shared with work and ldst registers for output and input respectively.
We need to shift the base registers to represent this correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The meaning of some bits shifts; we need to account for this to print
swizzles sanely.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We were using old style half-registers; let's update that to be
consistent, preparing us for more disassmbler changes in this area.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
When other geometry stages are present, we chose two quads and no
merged regs.
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
At least the gallium blitter helper will call us to draw with
tessellation shaders set but a non-patch primitive.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
It seems like tiling could work in the Adreno architecture, but we've
only ever seen bypass rendering with tessellation. For now, let's do
that too.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tessellation needs a couple of buffers that should hold the entire
output from a full VS+TCS draw call.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We need to select the right primitive type, set a bit to turn on
tessellation and or in the TES output primitive type.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The tessellation stages need size and stride or the patch layout as
well as locations of attributes in the patch. The tesselation stages
also use two system memory BOs and need the iovas of those.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Similar to GS, the registers are shared and not reinitialized betewen
VS and TCS, so we need to make sure to allocate the same registers for
the system values between stages.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Similar to GS, some inputs are reused when the chsh from VS to TCS or
TES to GS, so we need to make sure we setup the right inputs and make
the shared system values outputs so they don't get clobbered.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We add two new IR3 specific nir intrinsics that map to the new condend
and endpatch instructions.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Our lowering pass made the z component unused by replacing its uses
by 1 - x - y. The intrinsic implementation then just need to return
the x and y components.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
When we have both TES and GS, the TES needs to chain to the VS with
chmask and chsh GS just like the VS does to either TCS or GS.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
There are two new opcodes in use in tesselation control shaders:
category 0, opcodes 13 and 15. unk13 is a kill type of instruction
that terminates threads where !p0.x and it used to narrow down a patch
wavefront to just thread 0. Then, once thread 0 has written the tess
levels, it issues unk15, which might signal the TE that another patch
has been fully written.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
VS and TCS pass varyings the same way as VS and GS does. TCS then
writes entire patch to a system memory BO and TES eventually reads
back from the BO once the TE starts generating vertices. TES outputs
vertices the same way as VS and GS, except when there's a GS as well,
in which case TES passes varyings to GS same way the VS would.
In addition, the TCS needs a little bit of control flow massaging so
that it only runs for valid invocations needs a couple of unknown
instructions to synchronize with the TE.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Whether we're tessellating and which primitives the TES outputs
affects the entire pipeline so let's add a field to the key to track
that.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
With the imul24 opcode in place, we can now use it for computing local
offsets (ie for ldlw/stlw).
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
These provide the iovas for system memory buffers used for
tessellation as well as a new HW specific system value.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The gallium helper doesn't like patches and we can't determine how
many primitives it gets tessellated into anyway. On gens where we
have tessellation, we get the prim count from a HW counter so just
skip counting on the CPU.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
These intrinsics take a ivec2 for the 64 bit base address and a
integer offset.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Stages that load inputs with ldlw (TCS, GS) need byte offsets, stages
that load with ldg (TES) need dwords offsets.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
These instructions take a 64 bit iova as two conescutive registers and
a immediate offset. This patch adds support for the offset to be a
single register, which is added to the 64 bit iova.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
What we call eRB6_Z24_UNORM_S8_UINT now is actually
RB6_Z24_UNORM_S8_UINT_AS_R8G8B8A8 and RB6_X8Z24_UNORM is actually
RB6_Z24_UNORM_S8_UINT.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2D array textures and 3D textures are different enum values after all.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We use one mechanism for (REG_A6XX_RBBM_PRIMCTR_8_LO)
PIPE_QUERY_PRIMITIVES_GENERATED, which counts all primitives that exit
the geometry pipeline, whether or not xfb is on. Then for
PIPE_QUERY_PRIMITIVES_EMITTED, we use the CP_EVENT_WRITE subfunction
that writes out per-stream counts for generated and emitted, but only
when xfb is enabled.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
In particular, increase the cost of 64-bit integer division.
Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom
, with ACO used for GS this creates shaders requiring a branch with
>32767 dword offset.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fix memory leak on allocation for lima submit, reported by valgrind.
128 bytes in 1 blocks are definitely lost in loss record 38 of 84
at 0x484A6E8: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
by 0x58689C7: util_dynarray_ensure_cap (u_dynarray.h:91)
by 0x5868BBB: util_dynarray_grow_bytes (u_dynarray.h:139)
by 0x5868BBB: lima_submit_add_bo (lima_submit.c:113)
by 0x585D7D3: lima_ctx_buff_va (lima_context.c:57)
by 0x586378F: lima_pack_plbu_cmd (lima_draw.c:802)
by 0x586378F: lima_draw_vbo (lima_draw.c:1351)
by 0x5406A2F: u_vbuf_draw_vbo (u_vbuf.c:1184)
by 0x55D0A57: st_draw_vbo (st_draw.c:268)
by 0x55576CB: _mesa_draw_arrays (draw.c:374)
by 0x55576CB: _mesa_draw_arrays (draw.c:351)
by 0x43610B: Mesh::render_vbo() (mesh.cpp:583)
by 0x415DBB: SceneBuild::draw() (scene-build.cpp:242)
by 0x41131B: MainLoop::draw() (main-loop.cpp:133)
by 0x411947: MainLoop::step() (main-loop.cpp:108)
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Fix memory leak on allocation for nir shader, reported by valgrind.
3,502 (480 direct, 3,022 indirect) bytes in 1 blocks are definitely lost in loss record 77 of 84
at 0x48483F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
by 0x5750817: ralloc_size (ralloc.c:119)
by 0x5750977: rzalloc_size (ralloc.c:151)
by 0x575C173: nir_shader_create (nir.c:45)
by 0x5763ACB: nir_shader_clone (nir_clone.c:728)
by 0x55D5003: st_create_fp_variant (st_program.c:1242)
by 0x55D789F: st_get_fp_variant (st_program.c:1522)
by 0x55D789F: st_get_fp_variant (st_program.c:1507)
by 0x56400C3: st_update_fp (st_atom_shader.c:163)
by 0x563D333: st_validate_state (st_atom.c:261)
by 0x55D07CB: prepare_draw (st_draw.c:132)
by 0x55D08DF: st_draw_vbo (st_draw.c:184)
by 0x55576CB: _mesa_draw_arrays (draw.c:374)
by 0x55576CB: _mesa_draw_arrays (draw.c:351)
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
We would like to have GL 4.6 Compatibility too.
The extensions don't support compatibility features, so no other changes
are needed.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
All callers other than the unit test just wanted to convert back from
a known-mesa-equivalent format, which is now a no-op.
v2: Fix assertion failure in iris GL startup with BGR565 by continuing
to return MESA_FORMAT_NONE for non-Mesa formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Now that MESA_FORMAT_x is just a PIPE_FORMAT_x define, we can strip
this function down to just the compression fallbacks.
v2: Restore the SRGB format for ASTC SRGB fallback case.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are various places in Mesa where we would like to be able to
have a shared format enum between Mesa and gallium (NIR compiler's
image formats, for example, or mapping from gallium's formats to
mesa's and vice versa in st_format.c). Rewriting all MESA_FORMAT to
PIPE_FORMAT would be disruptive and possibly more work than it's worth
(And I actually prefer MESA_FORMAT's name scheme), so for now just
make it so that there's one shared set of enum values.
The #defines here were generated by printing out from the
tests/st_format.c round-tripping loop, with the exception of 8888
formats where I hand-edited the #defines to point at the corresponding
gallium packed format define.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To redefine MESA_FORMAT in terms of PIPE_FORMAT enums, we need to fix
places where we iterated up to MESA_FORMAT_COUNT. I use
_mesa_get_format_name(f) == NULL as the signal that it's not an enum
value with a MESA_FORMAT.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We checked round-tripping of formats without fallbacks, but weren't
setting the compression support flags in the mock context and thus
needed to skip testing those. Just set all the flags and assert that
no fallbacks are triggered, so we get full test coverage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We have packed formats for RGBA and ABGR already, so we can just
pack/unpack code.
v2: Rebase on endianness macro rename
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
These are the last formats that MESA_FORMAT had and PIPE_FORMAT
didn't. The .csv entries channel sizes and swizzles all came from the
corresponding UNORM format.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the last unorm format that MESA_FORMAT had and PIPE_FORMAT
didn't. Note that it's an array format on gallium's side as well,
since it's a NPOT pixel size.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This texture compression is exposed by 830 and 915, and to make
MESA_FORMAT match PIPE_FORMAT defines I need a corresponding
PIPE_FORMAT.
v2: Set is_hand_written so we don't try to generate pack/unpack code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The primitive indices have to be swapped to follow the drawing
order.
This fixes corruption with Overwatch when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reverts commit 4432a2d14d.
Pretty much every SKQP test dies with this assertion:
skqp: ../src/mesa/drivers/dri/i965/brw_program_cache.c:102: hash_key: Assertion `item->key_size % 4 == 0' failed.
The automatically generated padding in structs contains
undefined values, force pack the structs to eliminate the
padding. Otherwise structs with the same values may generate
different hashes.
Valgrind output:
Conditional jump or move depends on uninitialised value(s)
util_fast_urem32 (fast_urem_by_const.h:71)
hash_table_search (hash_table.c:262)
_mesa_hash_table_search (hash_table.c:296)
anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318)
anv_pipeline_cache_search (anv_pipeline_cache.c:335)
lookup_blorp_shader (anv_blorp.c:38)
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112)
blorp_mcs_partial_resolve (blorp_clear.c:1205)
anv_image_mcs_op (anv_blorp.c:1742)
anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774)
transition_color_buffer (genX_cmd_buffer.c:1159)
cmd_buffer_end_subpass (genX_cmd_buffer.c:4840)
Uninitialised value was created by a stack allocation
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Only the GL_UNSIGNED_BYTE cases actually work, the rest all fail, but we
should test the working cases to ensure that they continue to work.
Reviewed-by: Brian Paul <brianp@vmware.com>
The workaround got accidentally moved to the wrong place
Fixes: 08d510010b aco: increase accuracy of SGPR limits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
ctx->pipe_framebuffer contains the last bound FB state, let's release
resources pointed by this FB state when the context is destroyed.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
pipe->stream_uploader has been allocated with u_upload_create_default()
in panfrost_create_context(), let's destroy it in the context destroy
path.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This commit fixes the following warning:
../src/intel/common/gen_decoder.c: In function ‘gen_spec_load_from_path’:
../src/intel/common/gen_decoder.c:741:11: warning: variable ‘len’ set but not used [-Wunused-but-set-variable]
741 | size_t len, filename_len = strlen(path) + 20;
| ^~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit fixes the following warning:
../src/compiler/nir/nir.c:1827:1: warning: ‘dest_is_ssa’ defined but not used [-Wunused-function]
1827 | dest_is_ssa(nir_dest *dest, void *_state)
| ^~~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit fixes the following warning:
../src/compiler/glsl/gl_nir_link_uniforms.c: In function ‘find_and_update_previous_uniform_storage’:
../src/compiler/glsl/gl_nir_link_uniforms.c:166:16: warning: unused variable ‘num_blks’ [-Wunused-variable]
166 | unsigned num_blks = nir_variable_is_in_ubo(var) ?
| ^~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fix wrong color when playing video under Android + virgl
configuration.
Fixes: 2decad495f ("gallium/dri2: Support images with multiple planes for modifiers")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Lepton Wu <lepton@chromium.org>
v2: Use ternary to simplify code (Jason)
v3: Reorder switch cases to follow existing section ordering (Nanley)
Add missing comment in cmd_buffer_end_subpass() about new layout (Nanley)
v4: Fix layout comparison for stencil case (Nanley)
Update a few more comments (Nanley)
Move VK_IMAGE_LAYOUT_STENCIL_ATTACHMENT_OPTIMAL_KHR in color
attachment case for future stencil-CCS support (Nanley)
v5: Missed comments update (Nanley)
Updated relnotes.txt (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This reverts commit c9df92bf79.
It turns out that gitlab-runner uses kubernetes all wrong, spawning Pods
and sshing into them to run the script instead of Jobs containing the
script to run. This means that when anything goes wrong with the pod
(autoscale, preemption, VM maintenance, cluster reconfiguration), the job
fails and only sometimes gets handled as a runner system failure. Even
worse, due to bugs in either the runner or k8s itself, some classes of
timeout-related failure end up not being reported as failures, and the job
will incorrectly report success!
Disable using the "autoscale" cluster until we can do something else
(docker-machine instead of k8s, or the custom third-party k8s-native
runner).
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Acked-by: Daniel Stone <daniels@collabora.com>
The image used for test jobs is only about 1/6 as big as before, which
may help avoid some issues with some of the test boards.
Inspired by https://gitlab.freedesktop.org/mesa/mesa/issues/2046 .
v2:
* Leave LIBDRM_VERSION at 2.4.99 (Daniel Stone)
* Delete more build artifacts from dEQP tree (Daniel Stone)
v3:
* Set LD_LIBRARY_PATH for ldd
Acked-by: Daniel Stone <daniels@collabora.com> # v2
Reviewed-by: Eric Anholt <eric@anholt.net> # Except for the ldd line
We don't support nir_texop_txd, which is required by this cap. So let's
disable it for now.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
We do not support them yet, so let's not pretend.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
There's no good way to know if a texture-view will be created, so we
just have to accept it for all resources.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
We should use the format derived from the image-view here, not from the
image itselt. Otherwise, we'll end up with incompatible render-passes.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
Use unsigned values otherwise signed extension will produce a 64 bits value where
the 32 left-most bits are 1.
Fixes: 2afeed3010 ("radeonsi: tell the shader disk cache what IR is used")
This extension allows to control the subgroup size by allowing a
varying subgroup size and also specifying a required subgroup size.
This implementation only allows to specify a required subgroup
size for compute shaders because there is some caveats with
other shader stages (eg. NGG with geometry shader). This
basically allows apps to use Wave32 for compute shaders.
This extension is enabled for all chips but only GFX10 supports
Wave32. ACO doesn't support it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The maximum number of descriptor sets is indeed 32 but without
the sign bit.
The maximum number of bindings for RADV is way larger, keep it
as 32-bit.
Fixes: 96e6ef80d9 ("nir: pack the rest of nir_variable::data")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Now that all environment variables are documented, it would be
appreciated if we can keep this up-to-date.
[skip ci]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
That means we have only 30 bits for object IDs, because 2 bits are
sometimes used for something else.
This decrease the uncompressed shader size for the biggest Borderlands 2
shader from 33.6 KB to 23.2 KB. (31% decrease)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This decreases memory usage, because serialized NIR is more compact.
The main shader part is compiled from nir_shader.
Monolithic shader variants are compiled from nir_binary.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
not needed. We also need to free TGSI in the destroy function for the case
when an app is terminated and si_create_compute_state_async is never
executed because of util_queue_drop_job.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
GP handles gl_PointSize similar to gl_Position, i.e. it needs
separate buffer and it has special type in varying descriptors, also
for indexed draw we need to emit special PLBU command to pass
address of gl_PointSize buffer.
Blob also clamps gl_PointSize to 1 .. 100 (as well as line width),
so let's do the same.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This was made optional in ff9bf223c2 ("meson: make nm binary optional")
for Windows, but proper windows has been added and `nm` is now only used
on Unix systems.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviwed-by: Dylan Baker <dylan@pnwbakers>
Otherwise if glvnd is not installed systemwide, but only in a prefix,
it's headers wont be found. This happens because if it's headers are in
/usr/include/ then another dependence will provide the necessary -I
arguments and compilation will work.
Fixes: 035ec7a2bb
("meson: Add support for EGL glvnd")
Acked-by: Eric Engestrom <eric@engestrom.ch>
As requested by Tim.
This was generated with:
grep 'PIPE_ARCH_.*_ENDIAN' -rIl | xargs sed -ie 's@PIPE_ARCH_\(.*\)_ENDIAN@UTIL_ARCH_\1_ENDIAN@'g
v2: - add this patch
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This will allow it to be used as a drop in replacement for
_mesa_little_endian in a number of cases.
v2: - Always define PIPE_ARCH_LITTLE_ENDIAN and PIPE_ARCH_BIG_ENDIAN,
define the one that reflects the host system to 1 and the other to 0
- replace all uses of #ifdef, #ifndef, and #if defined() with #if
and #if ! with PIPE_ARCH_*_ENDIAN
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
_WIN32 is defined by basically all windows compilers (MSVC, ICL, MinGW),
wereas _MSC_VER is not defined by MinGW. Without this change MinGW falls
through and doesn't define PIPE_ARCH at all, and is caught by some extra
code in gallium.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Run only jobs needed for testing on LAVA devices if a branch starts with
lava-ci-.
This allows developers to have faster test cycles as these pipelines
take only a bit above 8 minutes. Also has the advantage of conserving
resources.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The implementation doesn't share much with get.c because:
* the refactoring needed for get.c to not depend on ctx->Array.VAO would
be quite large
* glGetVertexArray* would still need to filter pname to only accept the one
specified by the spec
* these functions are getter, the implementation is trivial (the complexity
is in the correct filtering of pname input)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add a single helper dealing with the lookup of both the vao
and the vbo to avoid duplicating this code in all the
glVertexArray* functions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ARB_dsa and EXT_dsa slightly differs when an uninitialized VAO
is requested.
In this case ARB_dsa fails while EXT_dsa requires to initialize
the object.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
$PWD doesn't work for variables:, it ended up as "/ccache", always
starting with an empty cache.
v2:
* Use relative path and realpath
v3:
* Use $CI_PROJECT_DIR (Eric Anholt)
* Clear ccache stats in before_script if the cache is in $CI_PROJECT_DIR
Fixes: c9df92bf79 "ci: Switch over to an autoscaling GKE cluster for
builds."
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes a ton of regressions in image load store tests.
Fixes: 4319cc8c0f ("nir: pack nir_variable::data::xfb_*")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Commit 5847de6e9a implemented a restriction that applies to ICL, but
wrongly marked it as also applying to GLK. Reviewers or MR !1125
pointed this, and the commit history shows removal of GLK to parts of
the patch, but it turns there was still a left-over GLK check in the
code.
This code was breaking some of the i8vec2 tests on GLK, for example:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
Removing the GLK check solves the issue for GLK. I don't see a reason
on why implementing this restriction would actually break GLK, so
there's still more to investigate here since this bug may be affecting
ICL+, but let's apply the real GLK fix while we analyze and discuss
the other possible issues.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1
on ICL")
BSpec: 3017
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
This prevents some additional optimizations that would change the
original result. This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a. Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.
This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958. However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false. If they had left
the calls to isnan() alone, everything would have just worked.
No shader-db changes on any Intel platform.
I also tried marking the comparison generated by the isnan() function
precise. The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().
I also considered adding a new ir_unop_isnan opcode that would implement
the functionality. During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).
This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.
Fixes: d55835b8bd ("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We would like to pack not just xyzw swizzles but also efgh swizzles.
This should work for vec4/16-bit. More work will be needed to pack
swizzles for vec8/16-bit and even more work for 8-bit, of course.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Midgard prefetches instructions based on tag (ALU, LD/ST, texture *
size). To do so, the shader descriptor specifies the tag of the first
instruction, all instructions specify the tag of the next linear
instruction is, and all branches explicitly specify the tag of the
branch target.
If you mess this up, you get an INSTR_TYPE_MISMATCH, which unambiguously
refers to this problem, but it's still annoying to try to work out all
the branch targets in your head to debug.
Instead, let's track the tags of various blocks over time, so we can
automatically validate tags of branch targets, to make
INSTR_TYPE_MISMATCH issues immediately obvious in a disassembly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The host query reset entry point didn't use the availability offset
for performance queries.
To fix this, reorder the availability of performance queries to match
other queries.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2b5f30b1d9 ("anv: implement VK_INTEL_performance_query")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When importing a dmabuf with a specified tiling, the dmabuf user
should always try to set the tiling mode because: 1) the exporter
can set tiling AFTER exporting/importing. 2) a dmabuf could be
exported from a kernel driver other than i915, in this case the
dmabuf user and exporter need to set tiling separately.
This patch fixes a problem when running vkmark under weston with
iris on ICL, it crashed to console with the following assert. i965
doesn't have this problem as it always tries to set the specified
tiling mode.
weston: ../src/gallium/drivers/iris/iris_resource.c:990: iris_resource_from_handle: Assertion `res->bo->tiling_mode == isl_tiling_to_i915_tiling(res->surf.tiling)' failed.
Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
In 2ca0d913ea, we began updating cso_fb->layers to the actual layer
count, rather than 0. This fixed cases where we were setting "Force
Zero RTA Index Enable" even when doing layered rendering. Sadly, it
also broke the check entirely: cso_fb->layers is now 1 for non-layered
cases, but the Force Zero RTA Index check was still comparing for 0.
Fixes: 2ca0d913ea ("iris: Fix framebuffer layer count")
Python has the identity operator `is`, and the equality operator `==`.
Using `is` with strings sometimes works in CPython due to optimizations
(they have some kind of cache), but it may not always work.
Fixes: 96c4b135e3
("nir/algebraic: Don't put quotes around floating point literals")
Reviewed-by: Matt Turner <mattst88@gmail.com>
MALI_DEPTH_TEST should only be set when depth->writemask is true,
not when the depth test is enabled. Let's rename the flag and patch
panfrost_bind_depth_stencil_state() to do the right thing.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If an app first creates a compute pipeline with
VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT set, then re-compile it
without that flag, the driver should re-compile the compute shader.
Otherwise, it will return the unoptimized one.
Fixes: ce188813bf ("radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Any BO would work, we don't have any BO types yet anyway. Moreover
lima_submit_add_bo() changes BO flags so they won't match allocation
flags.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
LIMA_DEBUG=bocache now activates debug prints for BO allocation,
destruction and BO cache.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Clearly we do want to have fp16 at some point ... but I kind of give up
debugging and it turns out the issues with fp16 support in 'frost are so
deeply rooted that I might as well disable this non-opt and land
LCRA now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The seccomp filter allows read/write, let us make sure nobody can
do anything with this.
Fixes: cff53da374 "radv: enable secure compile support"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is incorrect, because polygonMode only applies if the final
primitive type is a polygon; polygonMode doesn't apply to
line-primitives as the comment suggests.
The Vulkan 1.1 spec, section 26.11, "Polygons" defines that polygons are
separate from points and line segments:
" A polygon results from the decomposition of a triangle strip, triangle
fan or a series of independent triangles. Like points and line segments,
polygon rasterization is controlled by several variables in the
VkPipelineRasterizationStateCreateInfo structure. "
Further, section 26.11.2, "Polygon Mode", only define polygonMode to
apply to polygons:
" Possible values of the VkPipelineRasterizationStateCreateInfo::polygonMode
property of the currently active pipeline, specifying the method of
rasterization for polygons, are: "
This seems to clearly define that polygonMode doesn't apply to points
and lines, so let's make sure that we don't early out with the wrong
value.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rather than having hw-specific swizzles encoded directly in the
instructions, have a unified swizzle arary so we can manipulate swizzles
generically.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We want symmetry between loads and stores, so we add a dummy source. So
we get, e.g.
st_int4 _, val, arg_1, arg_2
ld_int4 dest, _, arg_1, arg_2
Semantically, this dummy source represents the data itself, as if the
load is simply a move. That means it has a swizzle that acts as a
source.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This function was added in 7e414b5864 to work around a defect in
lower_output_reads(). As of the previous commit no NIR driver calls
lower_output_reads().
This change means we don't need the special GLSL IR style
gl_FragData handling for building the resource list in a NIR based
linker.
No shader-db change on SKL i965.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will allow us to stop lowering gl_FragData in GLSL IR for NIR
drivers which means we won't need the special GLSL IR type
handling for building the resource list in a NIR based linker.
i965 has been doing this since b828f7a27b.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When doing an indexed draw with index_bias set to a non-zero value (e.g.
by glDrawElementsBaseVertex), the vertex buffer should be offseted by
index_bias vertices.
Add this offset when setting the vertex buffer address.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Now that we're no longer compacting binding table entries, the only time
they can possibly change is when we actually switch subpasses.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Instead, always emit one entry for every color attachment in the subpass
or one NULL if there are no color attachments. This will let us adjust
an Ice Lake workaround so we don't get a stall on every draw call.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Also, use color_outputs_valid rather than nr_color_outputs since it
should be a bit more accurate.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The GKE pool we're using is 1-3 32-core VMs, preemptible (to keep
costs down), with 8 jobs concurrent per system. We have plenty of
memory (4G/core), so we run make -j8 to try to keep the cores busy even
when one job is in a single-threaded step (docker image download, git
clone, artifacts processing, etc.) When all jobs are generating work
for all the cores, they'll be scheduled fairly.
The nodes in the pool have 300GB boot disks (over-provisioned in space
to provide enough iops and throughput) mounted to /ccache, and
CACHE_DIR set pointing to them. This means that once a new
autoscaled-up node has run some jobs, it should have a hot ccache from
then on (instead of having to rely on the docker container cache
having our ccache laying around and not getting wiped out by some
other fd.o job). Local SSDs would provide higher performance, but
unfortunately are not supported with the cluster autoscaler.
For now, the softpipe/llvmpipe test runs are still on the shared
runners, until I can get them ported onto Bas's runner so they can be
parallelized in a single job.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
That's where `xmlpool_options_h` is defined, and this way we can make sure
nobody starts making use of it in the future :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
When switching this to dynamic state, I forgot that this also needs to
be emitted when we use a polygon-mode set to lines.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 6d30abb4f1 ("zink: use dynamic state for line-width")
Build failure reported by i965 CI, triggered by building dynamic
pipeloaders with kmsro drivers (besides 'frost). At this point, there's
no reason to actually do that -- mesa CI didn't mind -- but let's not
break the build.
v2: Simplify script. Add extra dependencies for v3d.
Fixes: afb0d08cb0 ("pipe-loader: Default to kmsro if probe fails")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Now that we can conveniently map between GEM handles and struct anv_bo
pointers, we can use a simple bitset for residency tracking instead of
the complex hash set. This shaves about 3% off of a CPU-limited example
running with the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise relocations just up and crash.
Fixes: a3153162a9 "anv: Delay allocation of relocation lists"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to start needing to lookup BO pointers by GEM handle so we
need access to the device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
BOs are now only ever allocated through the BO cache so there's no need
to have these exposed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
While we're here, we get rid of the locking and use a lock-free
algorithm. The chances of spilling contention are low and this is
actually a bit simpler in some ways.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
the ASYNC flag, in particular, has the potential to help performance
because it means less sync tracking in the kernel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit switches block pools over to being allocated from the BO
cache rather than being allocated manually by the block pool.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to start depending on the BO cache in the state and block
pools so we need them properly initialized for the tests to work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All block pools are allocated with the same flags. There's no good
reason why it needs to be configurable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes a number of changes to the current API:
1. Everything is renamed to anv_device_* instead of anv_bo_cache_*
because the BO cache is soon going to be the sole BO allocation path
and not some special case to make import/export work.
2. Drop the cache parameter. It's totally redundant with the device
and just annoying to keep typing.
3. Rework flags so that they go the convenient direction for usage in
ANV rather than whichever awkward way the i915 specified it to
maintain backwards compatibility. This also gives us the
opportunity to set some defaults.
4. Add flags for mapping and coherency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The growing algorithms for the softpin case and the userptr version are
almost entirely different. Having this weird join doesn't make the code
more comprehensible. This rework does a few things:
1. Move the comment about 48-bit addresses to anv_device_init where we
actually unset the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.
2. Separate the paths in anv_block_pool_expand_range so it's easier to
see what happens in the two different cases.
3. Use the anv_block_poo::bos array for storing all allocated BOs in
both paths rather than using the cleanup list in both paths. This
lets us make the cleanups array only used for mmaps of the memfd for
the userptr case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of depending on a mutable BO in the state pool for handling
growing state pools, add a concept of "wrapper" BOs which just wrap an
actual BO. This way, the wrapper can exist once for all of time and we
can put it in relocation lists even if the actual BO it references gets
swapped out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're not THAT strapped for space that we can't burn one extra bit for
a boolean. If we're really worried about it, we can always shrink the
flags field to 16 bits because the kernel only uses 7 currently.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It has exactly one caller and we're about to change some of the dynamics
which would make this confusing as a separate function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We have to go through and rewrite them all anyway so it doesn't do us
any good to put them in the list in anv_reloc_list_add. Also, for state
pools the handles are likely wrong by the time vkQueueSubmit is called.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, we would read the offset from the BO in anv_reloc_list_add
to generate the presumed offset and then again in the caller to compute
the 64-bit address to write into the buffer. However, if the offset
somehow changed between these two points, the presumed offset would no
longer match the written offset. This is unlikely to actually ever be a
problem in practice because the presumed offset gets recorded first and
so if the written address is wrong then the presumed offset is almost
certainly wrong and the relocation will trigger. However, it's much
safer to simply have anv_reloc_list_add return the 64-bit address.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us do less allocation because the anv_bo's are now embedded in
the sparse array and it also allows lock-free translation from GEM
handle to BO which will be useful in future commits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The runner that submits jobs there is down and will turn some time to
get fixed. Disable them for now to keep the CI green.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Got some int->pointer warnings and 20 is not a valid pointer ....
Fixes: 2e3a635ee6 "radv: Add an early exit in the secure compile if we already have the cache entries."
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Until 8bef4df196 the IR (TGSI or NIR) was used in disk_cache driver_flags.
This commit restores this features to avoid crashing when switching from
one IR to the other.
As radeonsi's default is TGSI, I used "driver_flags & 0x8000000 = 0" for TGSI
to keep the same driver_flags.
Fixes: 8bef4df196 ("radeonsi: add si_debug_options for convenient adding/removing of options")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Currently the Android build system doesn't expose the panfrost
driver.
This patch enables the panfrost driver to be build on for the
Android platform.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-By: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
nir_lower_point_size.c was not build into the libmesa_nir library for non-meson
builds. However it was included in the meson build.
This patch fixes that.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Until now this made sense because we always paired vertex shaders
with fragment shaders, but as soon as we implement geometry and
tessellation shaders that will no longer be the case, so rename
this to (num_)used_outputs.
v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the following building error:
external/mesa/src/amd/compiler/aco_spill.cpp:1768:
error: undefined reference to 'aco::lower_to_cssa(aco::Program*, aco::live&, radv_nir_compiler_options const*)'
Fixes: 0b8216b ("aco: Lower to CSSA")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
When starting a BLORP operation, we do the BTI-change flush. However,
when ending it and transitioning back to regular drawing, we change the
render target again - without a set_framebuffer_state() call. We need
to do the BTI flush there too. BLORP flags IRIS_DIRTY_RENDER_BUFFER
now, which will cause the next draw to get the BTI flush again.
(explanation of fix by Ken)
Fixes: 2b956a093a ("iris: totally untested icelake support")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too. When it didn't find
it, the script would fail with:
Traceback (most recent call last):
File "./report.py", line 351, in <module>
main()
File "./report.py", line 182, in main
before_count = before[p][m]
KeyError: 'sends'
Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
libdrm returns -errno instead of directly the ioctl ret of -1.
Fixes: 1c3cda7d27 "radv: Add syncobj signal/reset/wait to winsys."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Observed an issue when looking at the code generatedy by the
image-vertex-attrib-input-output piglit test. Even though the test
itself worked fine (due to TIC 0 being used for the image), this needs
to be fixed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Unfortuantely we don't know if a particular load is a real 2d image (as
would be a cube face or 2d array element), or a layer of a 3d image.
Since we pass in the TIC reference, the instruction's type has to match
what's in the TIC (experimentally). In order to properly support
bindless images, this also can't be done by looking at the current
bindings and generating appropriate code.
As a result all plain 2d loads are converted into a pair of 2d/3d loads,
with appropriate predicates to ensure only one of those actually
executes, and the values are all merged in.
This goes somewhat against the current flow, so for GM107 we do the OOB
handling directly in the surface processing logic. Perhaps the other
gens should do something similar, but that is left to another change.
This fixes dEQP tests like image_load_store.3d.*_single_layer and GL-CTS
tests like shader_image_load_store.non-layered_binding without breaking
anything else.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "20.0" <mesa-stable@lists.freedesktop.org>
These reworks were combined into this patch:
* Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+
* Francisco Jerez: intel/eu/validate/gen12: Disable
qword_low_power_no_depctrl eu_validate test.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Calculated the number for allocation and did not
reserve space ....
Fixes: 2117c53b72 "radv: Add temporary datastructure for submissions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Variables spilled on both branch legs need to be assigned to the same spilling slot.
These affinities can be transitive through multiple merge blocks.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This patch makes the live variable analysis more precise
w.r.t. killed phi operands and the block's register pressure.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Converting to 'Conventional SSA Form' ensures correctness w.r.t. spilling of phi nodes.
Previously, it was possible that phi operands have intersecting live-ranges, and thus,
couldn't get spilled to the same spilling slot. For this reason, ACO tried to avoid to
spill phis, even if it was beneficial.
This patch implements a conversion pass which is currently only called if spilling is necessary.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes these deqp tests (and more):
dEQP-GLES2.functional.draw.draw_arrays.points.single_attribute
dEQP-GLES2.functional.draw.draw_arrays.points.multiple_attributes
dEQP-GLES2.functional.draw.draw_arrays.points.default_attribute
dEQP-GLES2.functional.draw.draw_elements.points.single_attribute
dEQP-GLES2.functional.draw.draw_elements.points.multiple_attributes
dEQP-GLES2.functional.draw.draw_elements.points.default_attribute
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The final version of previous stencil fix patch ended up breaking one-sided
stencil.
Fixes remaining failures in these deqp tests (tested on GC3000/GC7000L):
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
Note: deqp tests require --deqp-gl-config-name=rgba8888d24s8ms0
Fixes: 05da025f ("etnaviv: fix two-sided stencil")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fixes remaining failures in these deqp tests (tested on GC3000/GC7000L):
dEQP-GLES2.functional.polygon_offset.*
Fixes: 6c3c05dc ("etnaviv: fix polygon offset")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.
v2: Fix typo case in the comment (Nanley)
v3: Rebase and fix conflicts.
v4: Fix rebase mistake (Nanley).
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.
v2: Assert that image->planes[plane].offset is 4K aligned (Nanley)
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
While we're at it, make sure we error out if it's not supported when
required.
This brings us a bit closer to being able to test on SwiftShader, which
doesn't currently support KHR_external_memory_fd.
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.
enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.
Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.
Mostly for vertex formats, but they are supported as texture formats too
(untested however).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
If src/dst addresses are dw aligned and size is > 4 then we align
byte count to dw as well.
PAL implementation works like this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, the scheduler tried to move up instructions from below depending
VMEM instructions only to move them down again when scheduling the VMEM
instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
These got lost due to some refactoring.
Due to the way our scheduler works currently, for now
we add back the reorder flag for divergent loads only.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This patch changes VMEM scheduling in a way that they can only
be moved upwards by previous VMEM instructions but not downwards.
This way, it improves the order of VMEM instructions in relation
to their users.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Previously, we allowed all shaders to reduce the number of max_waves to as low as 5.
Restricting this on shaders with low register demand, increases the total number of waves
while the VMEM def-use distances hardly change.
This patch also changes the max number of move operations per MEM instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This shaves around 4-5% off of a CPU-limited example running with the
Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur. Unfortunately, he neglected to set
the bit in the vec4 back-end. This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable. We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.
Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
RADV_PERFTEST=outooforder has been removed a while ago. This fixes
dumping the options into hang reports.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is actually a non-threaded implementation. I'd summarize this
as event-based submission.
When submit happens we walk a tree of submissions that depend on
the syncobj signal operations to be submitted and if those submission
we no other dependencies we start to execute them immediately.
Or, well I still use a list to avoid issues with long chains and
the stacksize when using recursion.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This does not fully do wait-before-submit, to be done in a follow
up patch.
For kernels without support for timeline syncobjs, this adds an
implementation of non-shareable timelines using legacy syncobjs.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This will lead to fewer pipelines in the cache, which is assumed to
become our most unavoidable performance bottle-neck down the line.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is a function with timeout support for reading from the pipe
between processes used for secure compile.
Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will be used in the following patch to support timeouts for
reading the pipe between processes.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This skips touching %ebx most times and it shows that glGetString performance
increased from 114M/s to 120M/s on my desktop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Remove hard coded 16 and use entry_generate_or_patch to patch
public stubs. The generated code actually is sightly tighter
than before since the "nop" instructions before the final "jmp"
get removed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
The code works exactly the same with before. Just split this function
out so we can reuse it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
The x86 assembly language stub in src/mapi/entry_x86_tsd.h does not
generate PIC (position-independent code). This causes text relocations
which bring troubles on recent versions of FreeBSD, OpenBSD, Android.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108541
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Vulkan requires that only one bit for the ordering is set, but old
versions of GLSLang just set all the bits. This was fixed as part of
c51287d744
but we can still find older versions (or shaders compiled with it)
around.
So instead of failing, emit a warning and fallback to the effective
result of any combination of multiple bits: AcquireRelease.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2018
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We have to resolve destination surfaces if we are bliting to and from
the same surface.
v2: Revert unrelated change (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Let aux surface state tracker track the stencil buffer's aux state while
clearing depth stencil buffer.
v2: Fix condition check (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Even though stencil buffer compression looks like regular lossless color
compression w/o fast clear support, we have to resolve stencil buffer
with WM_HZ_OP packet.
v2: Check if resource is stencil with helper function (Nanley Chery)
v3: Remove unnecessary included file (Nanley Chery)
v4: (Nanley Chery)
- Avoid stencil buffer aux state transition by improving condition check
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
On Gen12+, Stencil buffer's lossless compression should be resolved
with WM_HZ_OP packet.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We never saw any failures regarding this typo but it's good to assign
correct stencil view while constructing blorp_params.
Fixes: 0cabf93b80 "intel/blorp: Add an entrypoint for clearing depth and stencil"
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
On Gen12, the CCS buffer address doesn't have to be referenced in state
packets. In the case of a stencil buffer with CCS, the kernel won't know
the location of the CCS unless an extra call is made to pin its address.
To avoid this extra call, make the CCS part of the main surface.
v2. Update comment above bo_size. (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The functions used during aux buffer configuration and creation only
return false for exceptional errors. Don't proceed with surface creation
in those cases.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The original value of 256 was under the assumption that you're a batch
buffer which is likely going to have a large number of relocations.
However, pipeline objects on Gen7 will have at most 6 relocations (one
per shader stage and one for the workaround BO) so this is a lot of
per-pipeline wasted space.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The old relocation list code always allocated 256 relocations and a hash
set up-front without knowing whether or not we really need them. In
particular, in the softpin case, this is two fairly large allocations
that we don't need to be making. Also, for pipeline objects on haswell
where we don't have softpin, we don't need relocations unless scratch is
used so this is extra data per-pipeline. Instead, we should do it
on-demand. This shaves 3.5% off of a cpu-limited example running with
the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Extract out values for the handful of unknown registers which have
different values across different a6xx models, to simplify adding
support for new a6xx's.
Signed-off-by: Rob Clark <robdclark@chromium.org>
E.g. documentation-only changes cannot affect the outcome of the
pipeline, so don't waste resources on running it.
The thing we need to be careful about here is that the container stage
jobs must always run if any later stage jobs using the corresponding
docker images run. We're currently using the same .ci-run-policy
template for all jobs, so this is trivially true.
v2:
* Add bin/ and common.py (Eric Engestrom)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> # v1
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
A few equations/programming changes for ICL.
v2: Fix a couple of issues in naming and floating/integer operations (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9edcce2a32 bumped the required libdrm-amdgpu version to
2.4.100. Update the version we use in our CI scripts to avoid CI
build failures.
Also bump the debian image name for this change to take effect.
Note that amdgpu is only built with the debian-buster image,
so only this image requires an update.
Fixes: 9edcce2a ("ac: get tcc_harvested from the kernel")
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
The anv_batch_bo contents are linked one to another, and when printing
we have to start with the first of those. Since in `u_vector` new
elements are added to the head, to get the first element we need the
vector's tail.
Fixes: 32ffd90002 ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.
This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan
with all reduction operations.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The existing "fallback" code didn't actually do anything, so this
removes it, and instead we just always fallback to `iris` for future
PCI IDs.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When this code was merged, this wasn't necessary because the
state-tracker would do it later anyway. But this recently got changed,
without changing the code that depended on this.
Arguably, this was a mistake in the lowering pass to begin with. Either
way, let's fix it by not assuming that the lowering code gets called
later when it's not needed.
This fixed user-defined clip-planes in Zink.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: eaffdad108 ("st/mesa: don't lower_global_vars_to_local for VS if there are no dead inputs")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add case for MCS_CCS so that we get the correct aux usage while copy
operation.
v2: Fix commit subject (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Depending on MCS_CSS or MCS we can emit blorp blit shaders.
As we support MCS_CSS and MCS, it makes sense to use
isl_aux_usage_has_mcs function.
v2: Fix commit message (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
If aux for MCS is already configured, don't configure again.
v2: Fix missing period in commit message (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The Vulkan spec says that an implementation has to support one of
VK_FORMAT_X8_D24_UNORM_PACK32 and VK_FORMAT_D32_SFLOAT, as well of
one of VK_FORMAT_D24_UNORM_S8_UINT and VK_FORMAT_D32_SFLOAT_S8_UINT.
So let's keep track which one is supported of earch pair, and emulate
one on top of the other one.
This won't give the exact result for comparisons, or when mapping and
unmapping the resources. But it's better than flat out failing to create
the resource, and we can fix the map/unmap issue later if needed.
Tested-by: Duncan Hopkins <duncan@thefoundry.co.uk>
If a modifier specifies an aux, it must be created.
Fixes: 75a3947af4 ("iris/resource: Fall back to no aux if creation fails")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Make sure the res struct is free'd before returning.
Fixes: 2dce0e94a3 ("iris: Initial commit of a new 'iris' driver for Intel Gen8+ GPUs.")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Store the converted depth value into two dwords. Avoids regressing the
piglit test "fbo-depth-array depth-clear", when HIZ_CCS sampling is
enabled in a later commit.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Write through to the CCS if the surface is used as a texture and can be
sampled by the HW with CCS.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Check that the alignment requirements for HIZ_CCS are satisfied by using
this function.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevent the piglit test,
amd_vertex_shader_layer-layered-depth-texture-render, from regressing in
in a future commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prepare this function to be used in iris and to handle new Gen12 behavior.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add a helper to determine if an ISL surface supports the write-through
mode of HIZ_CCS.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The HIZ_CCS and MCS_CCS auxiliary surface modes require that drivers
store information about two aux buffers. We choose to represent this as
HiZ/MCS being the primary aux surface and the CCS as an secondary/extra
aux surface. This representation has the effect of placing most of the
code that will have to choose between the two aux surfaces around the
aux-map entry points.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of guessing an aux_usage, then confirming it if the
isl_surf_get_*_surf functions are successful, just call the ISL
functions up-front. This will help us to more easily determine if a
depth buffer supports HIZ_CCS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add an extra aux parameter which will be filled out with CCS if the
first two isl_surf parameters fit the requirements for HiZ_CCS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While this format isn't listed in BSpec: 53911, other documentation and
empirical evidence suggest that it's fine to remap it to R32_FLOAT. I've
filed a bug for the BSpec page.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll start doing slow depth clears more often on HIZ_CCS buffers in a
future commit. Reduce the performance impact by making them use less
bandwidth.
From the Depth Test section of the BSpec:
This function is enabled by the Depth Test Enable state variable. If
enabled, the pixel's ("source") depth value is first computed. After
computation the pixel's depth value is clamped to the range defined
by Minimum Depth and Maximum Depth in the selected CC_VIEWPORT state.
Then the current ("destination") depth buffer value for this pixel is
read.
and from the Depth Buffer Updates section of the BSpec:
If depth testing is disabled or the depth test passed, the incoming
pixel's depth value is written to the Depth Buffer.
Taken together, it's clear that depth testing isn't necessary to perform
a depth buffer clear. Mark Janes and I analyzed this patch with
frameretrace and a depthrange piglit test. I disabled HiZ to ensure we'd
get slow depth clears. We've observed the bandwidth consumption by the
depth buffer access to be cut ~50% on BDW and SKL during depth clears.
On a more graphically intensive workload, the Shadowmapping Sascha
benchmark, I took the average of 3 runs on a BDW with a display
resolution of about 1920x1200 (minus some desktop environment
decorations). I measured a 22.61% FPS improvement when HiZ is disabled.
v2. The BSpec doesn't mandate this behavior, update comment accordingly.
(Ken)
Fixes: bc4bb5a7e3 ("intel/blorp: Emit more complete DEPTH_STENCIL state")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In ISL:
Update the format table to add CCS_E support for some 8BPP formats,
some 16BPP formats, and R10G10B10A2_UNORM_SRGB.
In the helper for determining CCS_E support, we return false for some
16BPP formats because they aren't properly handled in blorp_copy().
In BLORP:
Allow the new and non-problematic formats for CCS_E-enabled copies.
v2. Update other fields for A1B5G5R5_UNORM and A4B4G4R4_UNORM in table.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
The CCS could be described in a number of ways, but this format was
chosen to minimize churn in the drivers. We may decide on an different
direction in the future.
v2. Increase alignment for display surfaces. (Nanley)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Use a helper that will automatically handle Gen12's CCS tiling when
creating a CCS isl_surf.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In the function which translates ISL tilings to i915 tilings, map ISL's
HiZ and CCS tilings to Y instead of NONE (linear). The HW docs describe
HiZ and pre-Gen12 CCS surfaces as being Y-tiled in memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Avoid the compiler warnings for the new enums that will be introduced in
a future commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The isl_surf structs for Gen12's CCS won't describe how many slices in
the main surface can be compressed. All slices will be compressable if
CCS is enabled, so lookup the main surface's logical dimension.
v2. Add a space before a `?`. (Jordan)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This isn't accurate enough for HiZ which can have a discontiguous range
of supported aux slices. This also won't work with the plan to represent
Gen12 CCS as a single slice surface.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From "Render Target Fast Clear" description for Gen12:
"SW must store clear color using MI_STORE_DATA_IMM with
ForceWriteCompletionCheck bit set."
From Instruction_MI_STORE_DATA_IMM, bitfield 10 (when set to 1):
"Following the last write from this command, Command Streamer
will wait for all previous writes are completed and in global
observable domain before moving to next command."
We use 4 SDIs to store the clear color (one per channel). From the
description, it looks to me that setting that flag only on the last SDI
should be enough.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen12's CCS requires that the main surface have a pitch aligned to 512B.
v2. Provide a BSpec citation. (Ken)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
add_aux_state_tracking_buffer() actually checks the aux usage when
determining how many dwords to allocate for state tracking. Move the
function call to the point after the CCS_E aux usage is assigned.
Fixes: de3be61801 ("anv/cmd_buffer: Rework aux tracking")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Avoid failing the `info->use_clear_address` assertion in ISL on Gen12+.
Fixes: 6c9f9a82d7 ("intel/genxml,isl: Add gen12 render surface state changes")
Reported-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In gen12 we use the 3DSTATE_DEPTH_BOUNDS instruction
to enable depth bounds testing.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In gen12 we use the 3DSTATE_DEPTH_BOUNDS instruction
to enable depth bounds testing.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In gen12 we add the 3DSTATE_DEPTH_BOUNDS instruction
which enables support for depth bounds testing.
Signed-off-by: Plamena Manolova <plamena.manolova@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When resolving a merge-conflict, I accidentally only updated the
ARM64-tag tag. Let's correct this.
Fixes: 3d529c1739 ("gitlab-ci: also build Zink on CI")
Some implementations don't support the lineWidth-feature, so let's
avoid setting invalid state to them. But since we don't have a fallback
for this, inform the user.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
The driver can report a minimum alignment for UBOs, and that can be
larger than 64, which we've currently been using. Let's play ball, and
use the reported value instead.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
There's two things that goes wrong in this code on some drivers:
1. Rounding off the line-width to granularity can push it outside the
legal range.
2. A granularity of 0.0 results in NaN, because we divide by zero.
So let's make this code a bit more robust.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
We're now adding interface-types during code-emitting, so we need to
defer emitting the entry-point. No biggie, spirv_builder is prepares for
this.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
This is the only call-site that wants to specify unique values per
component for any of the get_*_constant functions. So let's give this
its own implementation instead, so we can ease the burden for the rest.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
While we're at it, let's move emit_float_const to the same location as
this needs to be defined at.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
This is going to make it easier to verify that 1-bit float sizes don't
leak into the rest of the code.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
We don't implement the get_timestamp context-method, so this is just
going to crash if anyone tries to use it. Let's implement it later.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
This isn't as inaccurate as the comment says, the Vulkan documentation
even seems to suggest this is the same. Let's drop the comment.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Because si.waitSemaphoreCount is 0, this won't even be looked at by the
driver, so let's just drop it.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
This inlines submit_cmdbuf into zink_end_batch, the only place it's
used. This makes the code a bit more straight-forward to read.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
These aren't guaranteed to be vectors, they can also be scalars. The
var-part is the significant part here, not the vector-ness. So let's
rename these.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
These track nir-registers, so it's clearer if we refer to them by that
name instead. There's potentially more vars than these.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
we don't need to track the resources for the samplers any longer, as
the sampler view holds a reference instead.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
The primitive topology is a bit of an odd-ball, as it's the only
truly draw-call specific state that needs to be passed to the program to
get a pipeline.
So let's make this a bit more explict, by passing it separately. This
makes the flow of data a bit easier to wrap your head around.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
This adds bitcasting to uint everywhere for now,
and stores all spir-v ssa values as uints.
It also casts bool to 0/0xffffffff for now
(nir 1-bit bools may be coming in the future).
This fixes a lot of piglit tests to pass now
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
In vulkan, the Z-range of clip-space goes from 0..W instead of -W..+W
as is the case in OpenGL. So we need to transform the Z-range to
account for this.
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Here's zink, a so far pretty simple vulkan-gallium driver that is able
to translate some applications from OpenGL to Vulkan.
The compiler is quite limited for now, this will be improved on later.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Do not flush NaN to 0.
Fixes
dEQP-VK.spirv_assembly.instruction.compute.opquantize.propagated_nans
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's similar to GFX9+. Shadow of Mordor (Vulkan beta) hits that
path and it works fine.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This interface allows the aux-map code in the intel/common library to
allocate and free buffers.
Reworks:
* free gen_buffer in gen_aux_map_buffer_free. (Rafael)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Commit d2b60e433e introduced restrictions (as per GLES spec) on the
internal format. We need to setup a sized format for the texture image
so framebuffers created with that are considered complete.
This change fixes following Android CTS test in AHardwareBufferNativeTests
category:
SingleLayer_ColorTest_GpuColorOutputAndSampledImage_R10G10B10A2_UNORM
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: d2b60e433e ("mesa/main: R10G10B10_(A2) formats are not color renderable in ES")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I thought there was hardware support for this, but it seems to broken,
or at least more complex than I believed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This memory needs to still be available after all the drawing is done
and forgotten about, so cannot be transient.
Also clear the result so that no rendering returns a zero.
Signed-off-by: Urja Rannikko <urjaman@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Serialized NIR is required for clover with the SPIR-V pipeline. With
this change and PAN_MESA_DEBUG=deqp, clinfo is able to successfully
probe panfrost.
Code from Nouveau (commit 7955fabcf8 by
Karol Herbst).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A device supported by kmsro will not automatically probe kmsro since the
driver name will be panfrost/lima/v3d/..., not "kmsro". Since kmsro is a
bit of a catch-all for generic (mostly embedded) GPUs, add a fallback on
kmsro for the dynamic loader.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
kmsro is used by numerous embedded GPUs for a common winsys abstraction.
Let's add support for it for the dynamic pipe loader, so clover can
probe on these drivers.
We build the target with Panfrost. When other drivers need kmsro+clover,
we can revisit the build system part; my mesonfu is wanting.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Can be enabled via the environment variable which tells the
driver how many compilation threads are expected to be called,
and therefore how many forked processes the driver should
create.
For example we would expect to call fossilize replay with
something like this:
RADV_SECURE_COMPILE_THREADS=8 ./fossilize-replay --num-threads 8 \
--shader-cache-size 0 --ignore-derived-pipelines pipeline_cache.foz
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This added support for the fork, the installation of the seccomp
filter, and the main loop for the actual compilation to be called
from i.e. run_secure_compile_device().
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This function will be called by the parent process when doing a
secure compile. It first selects a free process to work with then
passes it all the information it needs to compile the pipeline.
Once the pipeline information has been passed to the secure
process, it then waits around to read/write any disk cache entries
required before exiting.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows the secure process to read and write to the disk cache
via the parent process. This commit just adds the functionality
needed for the secure process, the following commit will add the
functionality for the parent process.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
These will be used by the following commits to hold information about
the forked secure compile processes.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will be used to identify information being passed between the
parent and secure process during a secure compile.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This can be usefull for debugging the on disk cache, but is also
useful in the following patch for secure compiles which will be
used to compile huge pipeline collections. These pipeline
collections can be multiple GBs and the in memory cache grows to
multiple GBs very quickly when they are compiled so we want to
be able to turn off the in memory cache.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is cleaner and avoids having to read/write an additional copy of
topology for use with secure compile.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This adds a new CI job that runs on windows with MSVC. It currently
builds softpipe and osmesa, and runs the related unit tests. It does
rely on meson's wraps for zlib, but I've set up caching of the wrap
dependencies so hopefully that wont be a problem.
I really wanted to user powershell for this, but there just isn't an
easy way to do that, it's much easier to use batch scripts, so thats
what I used.
The leading `/` for .gitlab-ci/lava... must be removed because windows
doesn't understand it, and when it reads the file the job ends in error.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
All of these (bug titles, patch titles, features, and people's names)
can contain characters that are not valid html. Just escape everything
for safety.
Fixes: 86079447da
("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
I made a bad assumption; I assumed this would be run in the release
branch. But we don't do that, we run in the master branch. As a result
we need to pass the version as an argument.
Fixes: 3226b12a09
("release: Add an update_release_calendar.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Which is very likely .Z > 0 releases.
Fixes: 86079447da
("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
If they use the `Fixes: #1` form.
Fixes: 86079447da
("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Previously this would result in the .0 warning be generated for .z > 0
and the .z == 0 would get the other message.
Fixes: 86079447da
("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
After the discussion in
https://github.com/KhronosGroup/OpenGL-API/issues/45
the section 8.17 (texture completeness) of the OpenGL 4.6 core profile
was changed to explicitly say that multisample texture completeness
ignores filter state of the texture.
"Using the preceding definitions, a texture is complete unless any of the
following conditions hold true:
...
- The minification filter requires a mipmap (is neither NEAREST nor LINEAR),
the texture is not multisample, and the texture is not mipmap complete.
- The texture is not multisample; either the magnification filter is not
NEAREST, or the minification filter is neither NEAREST nor NEAREST_-
MIPMAP_NEAREST; and any of
– The internal format of the texture is integer (see table 8.12).
– The internal format is STENCIL_INDEX.
– The internal format is DEPTH_STENCIL, and the value of DEPTH_-
STENCIL_TEXTURE_MODE for the texture is STENCIL_INDEX."
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
MAX_VARYINGS_INCL_PATCH subtracts VARYING_SLOT_VAR0 giving us a size
that's too small, so BITSET_SET writes words out of bounds, corrupting
the stack and causing all kinds of chaos. VARYING_SLOT_TESS_MAX is
the right value to use here, as it's the largest location.
Closes: 2002
Fixes: ee2050b111 ("nir: Use BITSET for tracking varyings in lower_io_arrays")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
[12/60] Compiling C object 'src/gallium/auxiliary/eb820e8@@gallium@sta/rbug_rbug_texture.c.o'.
FAILED: src/gallium/auxiliary/eb820e8@@gallium@sta/rbug_rbug_texture.c.o
[...]
../src/gallium/auxiliary/rbug/rbug_texture.c: In function 'rbug_send_texture_info_reply':
../src/gallium/auxiliary/rbug/rbug_texture.c:302:21: error: implicit declaration of function 'alloca'; did you mean 'malloc'? [-Werror=implicit-function-declaration]
uint32_t *height = alloca(sizeof(uint32_t) * height_len);
^~~~~~
malloc
../src/gallium/auxiliary/rbug/rbug_texture.c:302:21: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
../src/gallium/auxiliary/rbug/rbug_texture.c:303:20: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
uint32_t *depth = alloca(sizeof(uint32_t) * height_len);
^~~~~~
cc1: some warnings being treated as errors
Include c99_alloca.h to portably make the alloca() prototype available.
See also: 498d9d0f, adfb9c5c, fc8139b1
Fixes: 6174cba7 ("rbug: fix transmitted texture sizes")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Rather than supplying a mask/swizzle to compose with the original, just
supply the offset of the allocated register so we can directly offset
the mask/swizzle, without resorting to composition.
This is simpler, cleaner, and will generalize to non-32-bit.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
GFX10 hazards require a different approach compared to previous
generations, for example it doesn't need s_nop, and most hazards
can't be solved by adding NOPs at all. Also, they are not
resolved by branch instructions.
This commit reorganizes aco_insert_NOPs so that there is now a
separate pass for GFX10. The new GFX10 pass also respects the
control flow of the shader.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This commit refines the VMEMtoScalarWriteHazard mitigation, based
upon a closer look at what LLVM does. Also changes the code to
match the structure of the other hazard mitigations.
* The hazard is not only triggered by VMEM, FLAT and GLOBAL
but also SCRATCH and DS instructions.
* The SMEM/SALU instructions only cause a hazard when they
write a register that the VMEM/etc. are reading.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
There is a hazard caused by there is a branch between a
VMEM/GLOBAL/SCRATCH instruction and a DS instruction.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
There is a hazard that happens when an SMEM instruction
reads an SGPR and then a VALU instruction writes that same SGPR.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
There is a hazard when a non-VALU instruction reads the EXEC mask
and then a VALU instruction writes the EXEC mask.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Any permlane instruction that follows any VOPC instruction can cause a hazard,
this commit implements a workaround that avoids this causing a problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO currently mitigates VMEMtoScalarWriteHazard and Offset3fBug
(names from LLVM). There are some bugs that ACO needn't care about.
Just to be on the safe side, add an assertion that makes sure
that we aren't hit by FlatSegmentOffsetBug.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
From the Vulkan spec 1.1.126 :
"VK_SHADER_FLOAT_CONTROLS_INDEPENDENCE_32_BIT_ONLY_KHR specifies
that shader float controls for 32-bit floating point can be set
independently; other bit widths must be set identically to each
other."
Forgot to update this when I enabled that extension recently.
Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.independence_settings.independence_setting
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This saves one return and a simple benchmark which calls glGetString
repeatedly on my desktop shows it improves calls per second from 118M
to 128M.
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Most GPU require the sample count is power of 2. Just remove those
formats with unusual sample count. This decreases dEQP EGL tests run
time a lot.
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
MAX_VARYINGS_INCL_PATCH is greater than 64, so we'll need more that 64
bits (per component) to track which vars have indirects. This pass was
trying to track patch varyings (which start at bit 63) in a separate
64 bit word, but failed to subtract VARYING_SLOT_PATCH0 and accessed
out of bounds.
Do away with the ad-hoc bit mask tracking and just use a BITSET.
Fixes: dEQP-GLES31.functional.tessellation.user_defined_io.per_patch_block.vertex_io_array_size_implicit.triangles
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
In some cases, in particular when you have things that can be src
modifiers ((abs)/(neg)), once eliminating one mov, there is a
possibility to remove another. Handle this by re-visiting an
instruction after eliminating a copy on one of it's srcs.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
These date back to relatively early days of ir3, when a lot was still
not well understood. But according to CI (and what I've seen blob
driver do), these are not actually real restrictions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Now that we fixed the sharp edges that this was papering over, we can
relax the restriction about eliminating a mov coming out of a fanout
(for example from result of texture fetch).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This avoids copy-propagating a high register into an instruction which
cannot consume it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We did this properly already for split/fanout. But collect was missed.
Extract out a helper to share.
This way we avoid copy propagating a mov from high or half reg into an
instruction which cannot consume a high/half reg.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
1) deduplicate IR3_SHADER_DEBUG=disasm versus fs/vs/etc handling
2) standardize shader stage name prints, in particular VERT vs BVERT
3) don't mix stderr and stdout
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Avoid keeping track of the idx and all possible image operands for
each operation. Note for convenience we split up the handling of
ImageOperandsOffsetMask and ImageOperandsConstOffsetMask.
Suggested by Jason Ekstrand.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Change the information to also include the category, so that the
particulars of BitEnum enumeration can be handled in the template.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Emit barriers with semantics matching the access operand and the
storage class of the pointer.
v2: Fix order of visible / available emission relative to the
operations. (Bas)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Set the memory semantics and scope for later emitting the barrier.
Note the barrier emission code already exist in vtn_handle_image for
the Image atomics.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add a helper to split the memory semantics into before and after the
operation, and use that result to emit memory barriers.
v2: Be more explicit about which bits we are keeping around when
splitting memory semantics into a before and after. For now
we are ignoring Volatile. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Including the right storage memory semantic based on the storage class
of the operation. These will be used later to emit memory barriers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Three groups of tests, effectively defining what cases the
optimization is allowed or prevented
- Redudant loads (a load generated the value)
- Propagate SSA values (a store generated the value)
- Propagate a var (a copy generated the value)
Change the shader type of the tests to be COMPUTE so
nir_var_mem_shared can also be used. Doesn't affect the semantic of
the copy propagation.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add a NIR instrinsic that represent a memory barrier in SPIR-V /
Vulkan Memory Model, with extra attributes that describe the barrier:
- Ordering: whether is an Acquire or Release;
- "Cache control": availability ("ensure this gets written in the memory")
and visibility ("ensure my cache is up to date when I'm reading");
- Variable modes: which memory types this barrier applies to;
- Scope: how far this barrier applies.
Note that unlike in SPIR-V, the "Storage Semantics" and the "Memory
Semantics" are split into two different attributes so we can use
variable modes for the former.
NIR passes that took barriers in consideration were also changed
- nir_opt_copy_prop_vars: clean up the values for the mode of an
ACQUIRE barrier. Copy propagation effect is to "pull up a load" (by
not performing it), which is what ACQUIRE restricts.
- nir_opt_dead_write_vars and nir_opt_combine_writes: clean up the
pending writes for the modes of an RELEASE barrier. Dead writes
effect is to "push down a store", which is what RELEASE restricts.
- nir_opt_access: treat the ACQUIRE and RELEASE as a full barrier for
the modes. This is conservative, but since this is a GL-specific
pass, doesn't make a difference for now.
v2: Fix the scoped barrier handling in copy propagation. (Jason)
Add scoped barrier handling to nir_opt_access and
nir_opt_combine_writes. (Rhys)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This warning is different. Meson support for windows is less mature than
for other platforms, and the goal here is to alert people that
eventually we plan to drop scons and move to meson, and that they should
try out meson and report issues.
Reviewed-by: Eric Anholt <eric@anholt.net>
At this point meson should be able to handle all of the non-windows
platforms just fine; we'd like to be able to stop maintaining scons for
those platforms sooner than later.
Reviewed-by: Eric Anholt <eric@anholt.net>
This ensures that we get python3's print() function behavior even in
python2, instead of python2's print statement behavior. We'll be using
this in the next patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
On GFX8 the number of records is in bytes while on other chips
it's in units of "stride".
Fixes dEQP-VK.robustness.vertex_access.*.draw.vertex_* on RAVEN.
Tested on GFX6, GFX8, GFX10 and RAVEN.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Flagged by UBSan:
../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:233:14: runtime error: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself
#0 0x55b4c1a2a428 in rand_sint ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:233
#1 0x55b4c1a2ad3a in random_sdiv_test ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:308
#2 0x55b4c1a2b837 in fast_idiv_by_const_int32_Test::TestBody() ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:410
#3 0x55b4c1abc13f in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#4 0x55b4c1aa7a4d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#5 0x55b4c1a4ce57 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#6 0x55b4c1a4f530 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#7 0x55b4c1a51cbe in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#8 0x55b4c1a6d698 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#9 0x55b4c1abfd58 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#10 0x55b4c1aab425 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#11 0x55b4c1a64cba in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#12 0x55b4c1ae4b73 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#13 0x55b4c1ae4a33 in main ../src/gtest/src/gtest_main.cc:37
#14 0x7ff172d1dbba in __libc_start_main ../csu/libc-start.c:308
#15 0x55b4c1a28dc9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test+0x96dc9)
../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:309:52: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself
#0 0x563b24dafd2d in random_sdiv_test ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:309
#1 0x563b24db0f0f in fast_idiv_by_const_int64_Test::TestBody() ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:473
#2 0x563b24e41111 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#3 0x563b24e2ca1f in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#4 0x563b24dd1e29 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#5 0x563b24dd4502 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#6 0x563b24dd6c90 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#7 0x563b24df266a in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#8 0x563b24e44d2a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#9 0x563b24e303f7 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#10 0x563b24de9c8c in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#11 0x563b24e69b45 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#12 0x563b24e69a05 in main ../src/gtest/src/gtest_main.cc:37
#13 0x7f9a90330bba in __libc_start_main ../csu/libc-start.c:308
#14 0x563b24daddc9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test+0x96dc9)
v2:
* Use INT64_MIN instead of LLONG_MIN (Jason Ekstrand)
* Simpler test for INT64_MIN result from rand_sint (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Shifting int64_t values left into the sign bit has undefined behaviour:
../src/util/fast_idiv_by_const.c:175:14: runtime error: left shift of 131 by 56 places cannot be represented in type 'long int'
#0 0x561337ed10c1 in sign_extend ../src/util/fast_idiv_by_const.c:175
#1 0x561337ed1335 in util_compute_fast_sdiv_info ../src/util/fast_idiv_by_const.c:239
#2 0x561337e17519 in fast_idiv_by_const_int8_Test::TestBody() ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:357
#3 0x561337ea815d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#4 0x561337e93a6b in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#5 0x561337e38e75 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#6 0x561337e3b54e in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#7 0x561337e3dcdc in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#8 0x561337e596b6 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#9 0x561337eabd76 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#10 0x561337e97443 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#11 0x561337e50cd8 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#12 0x561337ed0b91 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#13 0x561337ed0a51 in main ../src/gtest/src/gtest_main.cc:37
#14 0x7f85ba483bba in __libc_start_main ../csu/libc-start.c:308
#15 0x561337e14dc9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test+0x96dc9)
../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:51:14: runtime error: left shift of negative value -63
#0 0x55fc3c0e67cc in strunc ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:51
#1 0x55fc3c0e6d93 in smul_high ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:140
#2 0x55fc3c0e7067 in fast_sdiv ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:181
#3 0x55fc3c0e858b in fast_idiv_by_const_int8_Test::TestBody() ../src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp:358
#4 0x55fc3c17915d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#5 0x55fc3c164a6b in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#6 0x55fc3c109e75 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#7 0x55fc3c10c54e in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#8 0x55fc3c10ecdc in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#9 0x55fc3c12a6b6 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#10 0x55fc3c17cd76 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#11 0x55fc3c168443 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#12 0x55fc3c121cd8 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#13 0x55fc3c1a1b91 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#14 0x55fc3c1a1a51 in main ../src/gtest/src/gtest_main.cc:37
#15 0x7fd224759bba in __libc_start_main ../csu/libc-start.c:308
#16 0x55fc3c0e5dc9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test+0x96dc9)
v2:
* Use two casts instead of changing the argument type (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Otherwise a smaller type may be promoted to int, which can hit undefined
behaviour:
../src/gallium/auxiliary/util/u_half.h:126:29: runtime error: left shift of 32768 by 16 places cannot be represented in type 'int'
#0 0x5646ff63d488 in util_half_to_float ../src/gallium/auxiliary/util/u_half.h:126
#1 0x5646ff63d749 in _mesa_half_to_float ../src/util/half_float.c:145
#2 0x5646ff54d557 in nir_const_value_negative_equal ../src/compiler/nir/nir_instr_set.c:372
#3 0x5646ff44d29a in const_value_negative_equal_test_nir_type_float16_trivially_true_Test::TestBody() ../src/compiler/nir/tests/negative_equal_tests.cpp:121
#4 0x5646ff505c05 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#5 0x5646ff4f1513 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#6 0x5646ff4979b5 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#7 0x5646ff49a08e in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#8 0x5646ff49c81c in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#9 0x5646ff4b81f6 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#10 0x5646ff50981e in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#11 0x5646ff4f4eeb in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#12 0x5646ff4af818 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#13 0x5646ff52e639 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#14 0x5646ff52e4f9 in main ../src/gtest/src/gtest_main.cc:37
#15 0x7f6bacb78bba in __libc_start_main ../csu/libc-start.c:308
#16 0x5646ff448019 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/compiler/nir/negative_equal+0x17c019)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Otherwise a smaller type may be promoted to int, which can hit undefined
behaviour:
../src/intel/compiler/brw_packed_float.c:66:17: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
#0 0x5604a03969aa in brw_vf_to_float ../src/intel/compiler/brw_packed_float.c:66
#1 0x5604a0391305 in vf_float_conversion_test_test_vf_to_float_Test::TestBody() ../src/intel/compiler/test_vf_float_conversions.cpp:70
#2 0x5604a041a323 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#3 0x5604a0405c31 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#4 0x5604a03ab03b in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#5 0x5604a03ad714 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#6 0x5604a03afea2 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#7 0x5604a03cb87c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#8 0x5604a041df3c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#9 0x5604a0409609 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#10 0x5604a03c2e9e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#11 0x5604a0442d57 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#12 0x5604a0442c17 in main ../src/gtest/src/gtest_main.cc:37
#13 0x7f9a1983dbba in __libc_start_main ../csu/libc-start.c:308
#14 0x5604a0390d89 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/vf_float_conversions+0x8dd89)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
To avoid it, use the modulo of the number of bits in the value being
shifted, which is presumably what ended up happening on x86.
Flagged by UBSan:
../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
#0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974
#1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851
#2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106
#3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486
#4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37
#15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308
#16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9)
Reviewed-by: Adam Jackson <ajax@redhat.com>
`strerror()` takes an `errno`, not the negative value returned by the
`ioctl()`.
Instead of fixing this as `"%s", strerror(errno)`, let's just use the
`"%m"` shortcut for it.
Fixes: 2b5f30b1d9 ("anv: implement VK_INTEL_performance_query")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The kernel total GMR/DMA size is limited, but it's definitely possible for the
kernel to allow a larger buffer allocation to succeed, but command
submission using that buffer as a GMR would fail typically causing an
application crash.
So have the winsys limit the size of GMR/DMA buffers. The pipe driver will
then resort to allocating smaller buffers and perform the DMA transfer in
multiple bands, also allowing for the pre-flush mechanism to kick in.
This avoids the related application crashes.
Fixes: e7843273fa ("winsys/svga: Update to vmwgfx kernel module 2.1")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Even with banded DMA uploads, st->hwbuf is always non-NULL, but when we've
allocated a software buffer to hold the full upload, unmapping of the
hardware buffer has already been done before
svga_texture_transfer_unmap_dma(), and the code was performing an unmap of
an already mapped buffer.
Fix this by testing for software buffer not present.
Fixes: a9c4a861d5 ("svga: refactor svga_texture_transfer_map/unmap functions")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Update to 5.4-rc4 so we can test Panfrost on devices with Mali T720 and
T820.
A bug was found that prevented things working at all on RK3288 devices,
so we carry a patch for now in my personal fork.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
If there are queued shaders to be written to disk, wait for that.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were relying on specific pass ordering in st to avoid setting
inputs_read/outputs_written for edge flags. Instead, just assume
that it happens and throw out the results we don't want.
We should probably revisit this and try and add a vertex element
property like I originally wanted so we can avoid having it be
associated with the VS altogether.
I recently changed the slow depth/stencil clear path to make sure
depth values are explicitly exported by the fragment shader. This
is actually only useful when VK_EXT_depth_range_unrestricted is
enabled.
While this path is correct, it introduced a performance regression
with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and
probably more titles. This is because it prevents the hardware
to do some optimizations like discarding fragments.
This commit re-introduces the previous (a bit faster) slow
depth/stencil clear path and it selects the unrestricted path
only if VK_EXT_depth_range_unrestricted is enabled.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
descriptorCount is the number of bytes into the descriptor, so
it shouldn't be used as an index. srcArrayElement/dstArrayElement
specify the starting byte offset within the binding to copy from/to.
This fixes new CTS tests:
dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_*
dEQP-VK.binding_model.descriptor_copy.*.mix_3
dEQP-VK.binding_model.descriptor_copy.*.mix_array1
Fixes: 8d2654a419 ("radv: Support VK_EXT_inline_uniform_block.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It used to cause weird issues on GFX10 in the past with vkmark and
Wreckfest, and they can't be reproduced now. Shadow Of Mordor
(Vulkan beta) hits that path and it works fine.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer
on Raven and other chips that allow rbplus.
This just prevents a crash and rbplus probaby needs more work.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The driver only supports up to 8 samples, so it's useless to
create more pipelines than needed.
This fixes a conditional jump reported by Valgrind on GFX10:
==194282== Conditional jump or move depends on uninitialised value(s)
==194282== at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242)
==194282== by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334)
==194282== by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440)
==194282== by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764)
==194282== by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788)
==194282== by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114)
==194282== by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297)
==194282== by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277)
==194282== by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363)
==194282== by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080
This is caused by an out of bound access of 'fmask_array' (ie. index
is 4 as for 16 samples).
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: Introduce the appropriate pipe controls
Properly deal with changes in metric sets (using execbuf parameter)
Record marker at query end
v3: Fill out PerfCntr1&2
v4: Introduce vkUninitializePerformanceApiINTEL
v5: Use new execbuf extension mechanism
v6: Fix comments in genX_query.c (Rafael)
Use PIPE_CONTROL workarounds (Rafael)
Refactor on the last kernel series update (Lionel)
v7: Only I915_PERF_IOCTL_CONFIG when perf stream is already opened (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We have 2 of those we can configure to source programmable events.
Those are not part of the OA reports. Configuration happens in i915
through the metric set selected by the application. On the Mesa side
we'll just sample those and do a diff.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Pull new updates from drm-next as of the following commit:
commit f1b4a9217efd61d0b84c6dc404596c8519ff6f59
Merge: 400e91347e1d f3a36d469621
Author: Dave Airlie <airlied@redhat.com>
Date: Tue Oct 22 15:04:00 2019 +1000
Merge tag 'du-next-20191016' of git://linuxtv.org/pinchartl/media into drm-next
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to query the content of register configurations from the
kernel. Let's pull this out of the query.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The Vulkan performance query extension is a bit lower level than the
GL one. Expose some of the functions to do the result accumulation
directly in the Anv driver.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This is useful for PBO texture upload with GL_RGB and GL_UNSIGNED_BYTE.
v2: Vasily Khoruzhick provided an update for the Lima CI expectations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
p_extract_vector's second operand is in units of the definition size, not
dwords.
v2: move extract_subvector() to right before ds_write_helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Small typo resulted in not converting footprint to vec4, meaning that we
could potentially ask for quite a few more registers than required
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If the load_interpolated_input is scalarized, we would be too
conservative about deciding the tex instruction wasn't a candidate to
pre-fetch:
vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)
vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (0) /* interp_mode=0 */
vec1 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* packed:v_uv,v_uv1 */
vec1 32 ssa_3 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 1) /* base=0 */ /* component=1 */ /* packed:v_uv,v_uv1 */
vec2 32 ssa_8 = vec2 ssa_2, ssa_3
vec4 32 ssa_9 = tex ssa_8 (coord), 0 (texture), 0 (sampler)
Really we don't care that the texcoord components come from different
load_interpolated_input instructions, just that they have consecutive
varying offsets.
Reported-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Previously, we used one hashset per BB, so that we could
always initialize the current hashset from the immediate
dominator. This patch changes the behavior to a single
hashmap using the block index per instruction to resolve
dominance.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Some of these lowerings aren't supported for drivers that supports
tesselation and geometry shaders. Let's add a couple of asserts to make
it obvious if these have been enabled when it's not possible.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
* Use LLVM 8 from buster-backports
v3:
* Use LLVM 7 again for armhf, llvmpipe is still broken there with LLVM 8
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
This allows running the regression tests.
One downside is that we can't easily build the Vulkan overlay layer,
because only x86 binaries of the glslang validator are available. If
that's important, we could either use those binaries via qemu, or build
it from source.
v2:
* Add :amd64 suffix to existing debian-9/10 job names (Eric Engestrom)
Acked-by: Eric Engestrom <eric.engestrom@intel.com> # v1
Apparently needs: in a definition overwrites inherited ones. So
.deqp-test effectively didn't declare needs: for debian-10, which means
any jobs based on .deqp-test could spuriously run after the debian-10
job failed or was cancelled.
Use https:// URLs in the APT configuration.
Drop --no-install-recommends, the image generation template disables
installation of recommended packages in /etc/apt/apt.conf.
Run apt-get autoremove at the end, cleaning up packages which were
installed to satisfy dependencies but are no longer needed.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.
Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's been throwing the following error today:
"<Fault -32603: 'Internal Server Error (contact server administrator
for details): could not extend file "base/17952/18226": No space left
on device\nHINT: Check free disk space.\n'>"
Reviewed-by: Daniel Stone <daniels@collabora.com>
If you set LP_NUM_THREADS=0 compute shaders would hang,
just execute the workloads in sequence if we have no threads
in the pool.
Fixes: 1b24e3ba75 ("llvmpipe: add compute threadpool + mutex")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This file is created in 2a0d45ae6c but
addition to android makefiles was omitted. It breaks the build with
missing references which are defined in this file.
List the file in ir3_SOURCES to make the build succeed.
Signed-off-by: Marijn Suijten <marijns95@gmail.com>
This fixes some crashes with dEQP-VK.descriptor_indexing.* when
read_first_invocation has its source from a descriptor.
Most of these tests still fail because of an LLVM bug (they work
with ACO).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.
v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)
Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.
v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask
v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler
v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
0.49.0 can compile most of mesa with ICC or ICL, but not SWR without
additional workarounds in our meson.build files. Bumping patch version
is easier and shouldn't be a big burden anyway, especially to cover a
niche compiler. The check originally only covered ICC, but now covers
ICL as well.
Fixes: 3740ffb59c
("meson: add switches for SWR with MSVC")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1937
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Due to a bug in GFX10 hardware, s_nop instructions must be added
if a branch is at 0x3f. We already do this, but forgot to also update
the constant addresses that come after this instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.
Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The current implementation does not synchronize on BO readiness when
DISCARD_WHOLE_RES flag is set, which can lead to misbehaviours when the
resource being updated is being used by one of the pending or already
flushed batches.
Adding unconditional BO synchronization would do the trick, but we can
sometimes optimize this path by re-allocating a new BO instead of
waiting for the existing one to be ready.
Reported-by: Daniel Stone <daniels@collabora.com>
Reported-by: Heinrich Fink <heinrich.fink@daqri.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
According to the OES_geometry_shader spec, section Dependencies:
"OpenGL ES 3.1 and OpenGL ES Shading Language 3.10
are required."
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We currently doesn't maintain it correctly and the buffer gets leaked if
surface is destroyed before calling swapping buffers.
From Android frameworks/native/libs/nativewindow/include/system/window.h:
The window holds a reference to the buffer between dequeueBuffer and
either queueBuffer or cancelBuffer, so clients only need their own
reference if they might use the buffer after queueing or canceling it.
v2: Remove our own reference.
Fixes: 0212db3504 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
If a pipeline has both graphics and compute, descriptors are same.
While we are at it, use queue->device for simplicity.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
To make sure a trace file is generated in case the driver crashes
during the hang report generation (which happens sometimes).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This information has never been useful. All descriptors are
already dumped with colors etc, and it's more useful.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A bunch of blend tests fixed on T760. A single blend test regressed on
both T760/T860 but I am unable to reproduce locally so am just
documenting the regression and moving on.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like to eliminate not just entire dead instructions, but also
dead components, which increases scheduler flexibility (since some
vector instructions can become scalar after eliminating dead
components). This also will allow better RA in the future.
Results are meh.
total instructions in shared programs: 3453 -> 3451 (-0.06%)
instructions in affected programs: 60 -> 58 (-3.33%)
helped: 2
HURT: 0
total bundles in shared programs: 1826 -> 1824 (-0.11%)
bundles in affected programs: 33 -> 31 (-6.06%)
helped: 2
HURT: 0
total quadwords in shared programs: 3144 -> 3144 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 321 -> 321 (0.00%)
registers in affected programs: 45 -> 45 (0.00%)
helped: 11
HURT: 11
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 50.00% x̄: 39.70% x̃: 50.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for registers value: -0.45 0.45
95% mean confidence interval for registers %-change: -1.87% 62.18%
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 445 -> 447 (0.45%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows for vec16 dependencies in the scheduler, not that we have
any yet (thankfully).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have notion of byte masks, liveness tracking can be updated
to reflect this extra granularity without loss of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Read component masks don't have a particular type associated, since the
type of the ALU operation may not match the type of the operands in
question. So let's generate byte masks instead, and update the rest of
the compiler to use byte masks when analyzing reads.
Preparation for mixed types.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There are essentially two formats of masks in play beginning with this
commit: masks per-channel and masks per-byte. The former make sense
within a given fixed-size instruction; the latter are
typesize-independent. It turns out you need the latter to meaningfully
manipulate instructions containing multiple sizes (which is quite
possible with ALU operations).
Similarly, we have mir_srcsize. We calculate the size of the source by
analyzing the size of the instruction itself and stepping down if there
is a half-modifier.
Finally, we have mir_round_bytemask_down, for when we want to take a
byte mask and "round it down" to a given component size, so that we can
use it as a component mask.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 2d914ebe81 ("pan/midgard: Fix memory corruption in register spilling")
This allows a write to proceed to an uninitialized part of a buffer
even when the GPU is using the previously-initialized portions.
Such a situation can be triggered with the following API usage example:
glBufferSubData(..., offset, size, data1);
glDrawArrays(...);
// append new vertex data
glBufferSubData(..., offset+size, size, data2);
glDrawArrays(...);
Same is done for freedreno, nouveau and radeon.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
This is the layout used in the GL API, and maps directly to PIPE
formats with no endianness trickery. As with the LA change, this
fixes big-endian fetching from texbos. Also cleans up some endian
shenanigans in shader images.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that Mesa is also using an array format for LA, nothing was using
these. (And, clearly, no HW driver had exposed them).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The array format is what the GL API wants (fixing texbos on
big-endian), and matches directly to gallium's corresponding array
format. The only driver exposing A8L8 was radeon/r200 in big-endian,
where the HW's underlying format was trying to read as array and we
needed to flip things around to make our packed format come out right
(note that while the radeon format tables had both AL and LA,
ChooseTextureFormat would only pick one of them based on endianness).
v2: Don't make r200/radeon use endian swaps.
v3: Rebase on dropping the r200 _be/_le format table removal patch
v4: reword commit message to explain why we can drop both formats
from radeon.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
The array format is what the GL API wants (and we made a mistake in
the format returned for texbos on big-endian!), and it's exactly what
the gallium-side PIPE_FORMAT_L16A16 is. The only downside is that
dri_util tries to fall back to sampling RG16 using LA16, which doesn't
have a match for big-endian any more. No HW drivers supported A16L16
anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The first arg to OUT_BATCH_RELOC is ignored, we actually wanted these
in the third arg. They're always 0 so far, so it didn't matter.
v2: Reword commit message that I don't end up using the tile bits, but
keep the commit as a cleanup anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
No matter what, we deref the texFormat from the table, except for a
mistake in cpp=4 where we pulled a 0 out of the table either way.
v2: Rebase on dropping r200 table deduplication patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
PP stack size should be set to maximum PP stack size, not to stack size of
last shader.
Fixes: 27e7603c34 ("lima: fix ppir spill stack allocation")
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Similiar to iadd, we can fold an added constant value from an imad24_ir3
into the load_uniform's constant offset. This avoids some cases where
the addition of imad24_ir3 could otherwise be a regression in instr
count.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
We can't encode immed sources for cat3 (mad) instructions, but we can
use const in first or third src. We handled this case already, but we
weren't considering that we could lower immed to const.
For manhattan:
total instructions in shared programs: 35202 -> 34718 (-1.37%)
instructions in affected programs: 14931 -> 14447 (-3.24%)
helped: 90
HURT: 0
total full in shared programs: 2451 -> 2359 (-3.75%)
full in affected programs: 653 -> 561 (-14.09%)
helped: 69
HURT: 2
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size. For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Some hardware can do 24b multiply in a single instruction, but not 32b.
However in most cases 24b is sufficient for address/offset calculation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
ir3 compiler has a signed integer multiply-add instruction (MAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can directly represent it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The pass should run once at the end of shader compilation, for a4xx
onwards. It iterates texture sampling instructions and mark those
eligibile for pre-dispatch by changing the tex op from 'tex' to
'tex_prefetch'. An instruction is eligibile if:
* The coordinate is a vector where all its components come from a
shader input.
* The order of the components match exactly that of the input (no
swizzles).
* The instruction is in the 'main' function, and in the outer
most-block.
The first two restrictions were arrived to empirically, so more
testing could tighten or loosen it.
The 3rd restriction is there to allow moving the instructions
eligible for pre-dispatch to the beginning of the shader, so
that we don't block the registers holding the result for too
long.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
It seems that pre-fs texture fetch only works if ij_pix ends up in r0.x.
I've tried unknown zero bits, to no avail, and blob also seems to force
r0.x when this feature is used.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Useful to see in disassembly listing texture fetches that were moved to
pre-dispatch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If the only use of varyings is a pre-shader texture-fetch, we still need
to issue a bary.f with the end-input flag, otherwise we'll block further
VS invocations, as the hw will think varying storage is still busy.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
It is possible that the result of a pre-fs texture fetch is an output
(or partially an output) of the FS. Sine the meta:tex_prefetch
instructions are dropped before the assembler, we need to account for
this when we fixup the register footprint.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Add a placeholder instruction to track texture fetches made prior to FS
shader dispatch. These, like meta:input instructions are scheduled
before any real instructions, so that RA realizes their result values
are live before the first real instruction. And to give legalize a way
to track usage of fetched sample requiring (sy) sync flags.
There is some related special handling for varying texcoord inputs used
for pre-fs-fetch, so that they are not DCE'd and remain in linkage
between FS and previous stage. Note that we could almost avoid this
special handling by giving meta:tex_prefetch real src arguments, except
that in the FS stage, inputs are actual bary.f/ldlv instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
When we enable pre-dispatch texture fetch, we could have a scenario
where the barycentric i/j coord sysval is not used in the shader, but
only used for the varying fetch for the pre-dispatch texture fetch.
In this case we need to take care not to DCE this sysval.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Will be needed for special handling of SYSTEM_VALUE_BARYCENTRIC_PIXEL
(ij_pix) when pre-fs texture fetch is enabled.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Not sure I remember how long this has been unused for. But it's unused
now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is like nir_texop_tex, but signals that the sampling coordinates
are immutable during the shader stage, in a way that allows the HW
that supports pre-dispatching sampling operations to pre-fetch
the result prior to scheduling the shader stage.
This is introduced to support the feature in Freedreno. Adreno HW
from a4xx supports it.
A NIR pass introduced later in this series will detect sampling
operations that are eligible for pre-dispatch, and replace
nir_texop_tex by this new op, to tell the backend to enable
pre-fetch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We don't use cmake normally because it always results in static linking.
This is very problematic for *nix OSes which expect shared linking by
default, but for windows this isn't a problem as LLVM doesn't support
shared linking on windows anyway.
Reviewed-by: Adam Jackson <ajax@redhat.com>
For building on Windows (when not using cygwin), users may want to use a
binary wrap of LLVM, this provides a fallback to the LLVM dependency
which may be used in this case
Reviewed-by: Adam Jackson <ajax@redhat.com>
MIN filter is only used when LOD MAX is at least 4 (I guess the 2 LSB don't
actually exist).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
The etnaviv kernel driver will only ever flush write caches. As both
the TX descriptor and instruction cache are read caches they must be
flushed from the user cmdstream at an appropriate time.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
It's just a matter of writing the addressing mode into the
texture descriptor.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Create a separate implementation file with texture-descriptor-based
sampler views and sampler states. Initialize the one or the other
based on the GPU. There is so little in common that this seemed more
appropriate that keeping them as one type of state object would
only be confusing.
This commit is actually a combiation of the original commit by
Wladimir, fixes and TS implementation from Jonathan and changed to
use softpin by Lucas.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Halti5 uses texture descriptors to control the samplers, and thus needs to
know the GPU virtual address for the texture buffers to fill into the
descriptor buffer. Without softpin userspace has no control over the GPU
VM and also no way to fix up the texture descriptor buffer, so there is
no point in creating a screen on a Halti5 device without softpin being
available.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
If softpin is available on the kernel side, we transparently replace the
relocs with self-managed GPU virtual addresses. This allows to skip some
work at the kernel side, as it doesn't need to touch the command stream
anymore before submitting it to the hardware.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Replace the per-screen locking of flushing with per-context one and
add per-context lock around command stream buffer accesses, to prevent
cross-context flushing from corrupting these command stream buffers.
Signed-off-by: Marek Vasut <marex@denx.de>
Reallocate the command stream buffer in case it is too small.
The older kernel versions are limited to 64 kiB buffer, so
limit the size to avoid oversized buffers.
Signed-off-by: Marek Vasut <marex@denx.de>
Have each context track which resources it marked as pending read and
pending write. Have each resource track in which context it is pending.
This way, it is possible to identify when a resource is both pending
read and pending write at the same time. Moreover, the status field
can be correctly calculated and updated when necessary.
Signed-off-by: Marek Vasut <marex@denx.de>
This way we can ensure that the pipe driver tracking of pending resources
stays in sync with the actual command buffer state, even if a space
reservation triggers a forced flush.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
As long as a resource is pending in any context we must not destroy
it, otherwise we'll hit a classical use-after-free with fireworks.
To avoid this take a reference when the resource is first added to
the pending set and put the reference when no longer pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Currently, the screen tracks all resources for all contexts, but this
is not correct. Each context should track the resources it uses. This
also allows a context to detect whether a resource is used by another
context and to notify another context using a resource that the current
context is done using the resource.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Guido Günther <guido.gunther@puri.sm>
Cc: Lucas Stach <l.stach@pengutronix.de>
This exposes what's required for DX and this is what we already
configure. The driver flushes denorms for FP32 and preserves them
for FP16/FP64. Note that we can't allow both preserving and
flushing denorms because this won't work for merged shaders. This
will require LLVM to update the float mode register to make it work.
Only enabled on GFX8+ with the LLVM path because it's untested on
previous chips and ACO doesn't support it.
This extension is required for SPIRV 1.4.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Because some instructions will be optimized by the backend compiler,
the driver has to manually flush to zero to keep the result exact.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The new Mac OS X images apparently already have python2 and python3,
and `brew` considers asking to install something already installed
as a fatal error...
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
This will expose GL_EXT_primitive_bounding_box and
GL_OES_primitive_bounding_box after previous commits
expose OpenGL ES 3.1 once Compute Shaders are available.
Reviewed-by: Eric Anholt <eric@anholt.net>
This adapts the v3d driver to the new CL submit ioctl interface that
allows the driver to request a flush of the caches after the render
job has completed. This seems to eliminate the kernel write violation
errors reported during CTS and Piglit excutions, fixing some CTS tests
and GPU resets along the way.
v2:
- Adapt to changes in the kernel side.
- Disable shader storage and shader images if the kernel doesn't
implement cache flushing.
Fixes CTS tests:
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-float
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-int
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-uint
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-float
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-int
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-uint
KHR-GLES31.core.shader_atomic_counters.advanced-usage-many-draw-calls2
KHR-GLES31.core.shader_atomic_counters.advanced-usage-draw-update-draw
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-int
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-matR
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-struct
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-matC-pad
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-vec
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that the UAPI has landed, add the pipe_context function for
dispatching compute shaders. This is the last major feature for GLES 3.1,
though it's not enabled quite yet.
That we set for any TMU write on spills and general tmu. It is then
used as part of v3d_emit_gl_shader_state later.
v2: add a new flag instead at v3d_compiler instead of dirty the flag
at v3dx if there is any spill (change suggested by Eric, added by
Alejandro)
v3: set this for anything that is not a load and do it also in
v3d40_vir_emit_image_load_store (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
The SCR_INIT macro used to install the rbug resource_changed method
will only do so when the driver below rbug exposes this method, so
the check will always evaluate to true.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
All the other context method initialzation follow the order of the pipe_context
structure definition making it easy to find unimplemented methods in rbug.
Move the flush_resource init to follow the same order.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
All resources passed to the drivers below rbug need to be unwrapped before
being passed down. We missed to do this for the index buffer resource when
this was made part of the draw_info structure.
Fixes: 330d0607ed (gallium: remove pipe_index_buffer and set_index_buffer)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The rbug wire format defines the texture size parameters to be uint32_t sized
and uses memcpy to move the function parameters to the message structure.
This caused totally wrong transmitted texture sizes since the height and depth
paramterds have been changed to uint16_t in the gallium API. Fix this by doing
an explicit conversion to the correct representation before packing into the
wire message.
Fixes: e6428092f5 (gallium: decrease the size of pipe_resource - 64 -> 48 bytes)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Using 0 as the backlog argument to listen() is exploiting implementation
defined behavior and will lead to no connections being accepted on some
libc implementations.
Quote of the listen manpage: "A backlog argument of 0 may allow the socket to
accept connections, in which case the length of the listen queue may be set to
an implementation-defined minimum value."
Fix this by using a more sensible backlog value.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
It seems that for desktop GL this was included with ARB_gpu_shader5, but
for OpenGL ES this is already included with the base extension and there is
a CTS test that checks this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Implement the 3 functions using the texturestorage_error() helper.
_mesa_lookup_or_create_texture is always called to make sure that 'texture'
is initialized (even if the texturestorage_error() generates an error afterwards).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When we import a resource through Gallium, we need to take account of
the offset parameter passed.
Fixes a failure seen with the VIVID V4L2 driver, which would create NV12
resources within the same BO, with an offset. Sample pipeline to
reproduce (replace videoN with your actual VIVID device node):
gst-launch-1.0 v4l2src device=/dev/videoN ! video/x-raw,format=NV12 ! glimagesink
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reworks:
* Change subject from "iris: Align main surface allocation to 64k on gen12+"
* Make use of isl surf alignment. (Nanley)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reworks:
* Fill out the format's entry in the ISL format table. (Nanley)
* Support CCS_E-enabled BLORP copies with the format. (Nanley)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can be useful to measure whether memory access optimizations are
having the desired effect. For example, we might see a reduction in
image loads/stores, or constant buffer loads. We can already see this
in cycle estimates to some extent, but this is a more direct approach,
minus a lot of the noise of random scheduler shuffling.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The removed st_nir_opts calls are mostly redundant.
There is an improvement with shader-db on radeonsi:
Before:
real 1m54.047s
user 28m37.857s
sys 0m7.573s
After:
real 1m52.012s
user 28m3.412s
sys 0m7.808s
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
DPH isn't actually commutative, so this doesn't work. If the immediate
in src0 would be a VF candidate, we could do better. *shrug*
No shader-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b04beaf41d ("intel/vec4: Try both sources as candidates for being immediates")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
This function was difficult to implement for new formats due to the
combination of endianness and swapbytes support. Since it's mostly
used for fast paths, bugs in it were often missed during testing.
Just reimplement it on top of the recent
_mesa_format_from_format_and_type() which can give us a canonical
MESA_FORMAT for a format and type enum (while respecting endianness).
Fixes:
- R4G4B4A4_UNORM, B4G4R4_UINT, R4G4B4A4_UINT incorrectly matched with
swapBytes (you can't just reverse the channels if the channels
aren't bytes)
- A4R4G4B4_UNORM and A4R4G4B4_UINT missing BGRA/4444_REV matches
- failing to match RGB/BGR unorm8 array formats on BE
- 2101010 formats incorrectly matching with swapBytes set.
- UINT/SINT byte formats failed to match with swapBytes set.
This deletes the part of tests/mesa_formats.cpp that called
_mesa_format_matches_format_and_type() to make sure it didn't
assertion fail, as it now would assertion fail due to the fact that we
were passing an invalid format (GL_RG) for most types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In desktop GL, you can specify things like GL_DEPTH_COMPONENT/GL_BYTE as a
ReadPixels format, and we need to be able to represent that to see if we
have proper MESA_FORMATs for them. That's exactly what the
mesa_array_format enum is for.
v2: Drop _mesa from static fn.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We had missed this case where GLES3 allows glReadPixels(DEPTH, UINT_24_8),
and just got lucky by the readpixels path never asking for the matching
format from this function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL spec says the 24-bit component is in the high bits, and
format_unpack.c looks at the high 24 bits in the S8Z24 case, not
Z24SS8.
Avoids a regression in the next commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The unreachable() that follows isn't very useful for debug, and by adding
this here we get a nice description of the failure in debug builds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we don't have streamout enabled, we have to read this register to
get the number of primitives emitted.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
When used in a GS pipeline, the VS doesn't end with the END
instruction. Instead it chains to the GS, which continues running with
the same register allocation. The intended use cases seems to be that
you can compile a regular VS (ie outputs in registers and ending with
END) but then tack on link-time generated code past the END to write
the outputs using STLW, in case the VS is used with GS.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
We don't know what kind of loads we might have to wait on when coming
in from chsh in the VS so set both sync flags.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
These sysvals have to be unclobbered by VS and in the same registers
in both VS and GS, since the chsh from VS to GS doesn't reload the
values. We use the pre-color argument to ir3_ra() to always place
these values in r0.x and r0.y.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Inputs are the GS header, which contains vertex ID, local primitive ID
and thread ID as well as primitive ID. The setup is a little different
from other sysvals, since we always have to receive them in the VS so
that it can pass them on into the GS.
The vertex flag outputs from GS is set up as a proper nir output in
the lowering pass and doesn't need special handling here.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
This implements the load_vs_primitive_stride_ir3,
load_vs_vertex_stride_ir3 and load_primitive_location_ir3 intrinsics,
used for getting the primitive layout strides and locations.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Since the presence of GS changes how the VS operates we need to track
that in the shader key.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
These intrinsics will let us do all the offset calculations in nir,
which is nicer to work with and lets nir_opt_algebraic eat it all up.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Before, offset held the offset, which can be either immediate or a
register. Use a third register to hold the offset so that we can use
a register.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Just add the constructors for now and special case similar to END so
we don't remove them.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
We know what these do an either write them in the program stateobj or
don't need to write them.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Tests the combinations of cases of RAW, WAW and WAR hazards involving
both inorder and outoforder instructions. Also tests that
dependencies combine and propagate correctly through control
flow (loops and conditionals).
v2: Add an extra test illustrating that the non-logical CFG edge
between then-block and else-block is being taking into
account. (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
LLVM 8 did remove both the signed and unsigned sse2/avx intrinsics in
the end, and provide arch-independent llvm intrinsics instead.
Fixes a crash when using snorm framebuffers (tested with piglit
arb_color_buffer_float-render GL_RGBA8_SNORM -auto).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
If two jobs use the same GEM object at the same time, the job that
finishes first will (previous to this commit) close the GEM object, even
if there's a job still referencing it.
To prevent this, have all jobs use the same panfrost_bo for a given GEM
object, so it's only closed once the last job is done with it.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A new throttle fence was initialized to 1, and increased by 1
again when it's put in drawable->throttle_fence; the ref was
decreased by 1 when it's removed from drawable->throttle_fence,
and never reached to 0, caused leak.
Fixes: ff77bf5cbf7 ("gallium: simplify throttle implementation")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1949
Signed-off-by: James Xiong <james.xiong@intel.com>
Reported-by: Florian Wesch <fw@info-beamer.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This actually corresponds to legal GL depth-ranges, because depth-clear
values are always in the 0..1 range in OpenGL.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will prevent us from accidentally falling back to the wrap-db
instead of using locally installed versions.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The one debian provides is broken in buster+, so I've just written my
own. This allows meson to find the installed zlib and prevents it from
falling back to wraps.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
It's not really needed, and there's no debian package for it so we're
forced to fall back to wraps in mesa's CI. This can be problematic in
itself.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
nvc0 and I assume radeonsi as well hit an assert inside glsl_to_tgsi as atan
instructions get inserted into the shader.
Fixes: cece947a8d ("glsl/builtin: Add alternate versions of atan using new ops")
Cc: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We're trying to cast the return type to the type of the var, but instead
we were casting `sizeof(*v)`.
Fixes: 6df72e970c ("util: Make u_atomic.h typeless.")
Fixes: 0a7f17cf5b ("util/u_atomic: add p_atomic_xchg")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
It's already defined in `m_debug_util.h`, along with an explanation of
what it is and how to use it.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
'struct lima_context' has to be declared before usage in lima_program.h
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Fixes build errors of:
In file included from ../src/intel/vulkan/anv_private.h:48,
from ../src/intel/vulkan/genX_blorp_exec.c:26:
../src/intel/common/gen_gem.h: In function ‘gen_ioctl’:
../src/intel/common/gen_gem.h:68:15: error: implicit declaration of function ‘ioctl’ [-Werror=implicit-function-declaration]
68 | ret = ioctl(fd, request, arg);
| ^~~~~
In file included from ../include/c11/threads_posix.h:35,
from ../include/c11/threads.h:66,
from ../src/mesa/main/mtypes.h:39,
from ../src/intel/compiler/brw_compiler.h:30,
from ../src/intel/vulkan/anv_private.h:51,
from ../src/intel/vulkan/genX_blorp_exec.c:26:
/usr/include/unistd.h: At top level:
/usr/include/unistd.h:471:12: error: conflicting types for ‘ioctl’
471 | extern int ioctl(int, int, ...);
| ^~~~~
/usr/include/unistd.h:471:1: note: a parameter list with an ellipsis can’t match an empty parameter name list declaration
471 | extern int ioctl(int, int, ...);
| ^~~~~~
In file included from ../src/intel/vulkan/anv_private.h:48,
from ../src/intel/vulkan/genX_blorp_exec.c:26:
../src/intel/common/gen_gem.h:68:15: note: previous implicit declaration of ‘ioctl’ was here
68 | ret = ioctl(fd, request, arg);
| ^~~~~
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
gcc is very particular about where you place the (void) cast
The previous placement made it error out with:
In file included from disk_cache.c:40:0:
../../src/util/u_atomic.h:203:29: error: void value not ignored as it ought to be
#define p_atomic_add(v, i) ((void) \
^
disk_cache.c:658:4: note: in expansion of macro ‘p_atomic_add’
p_atomic_add(cache->size, size);
^
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes build failures on Solaris in C++ files using gcc:
../src/util/u_math.h:628:41: error: expected ‘,’ or ‘...’ before ‘dest’
628 | util_memcpy_cpu_to_le32(void * restrict dest, const void * restrict src, size_t n)
| ^~~~
../src/util/u_math.h: In function ‘void* util_memcpy_cpu_to_le32(void*)’:
../src/util/u_math.h:641:18: error: ‘dest’ was not declared in this scope
641 | return memcpy(dest, src, n);
| ^~~~
../src/util/u_math.h:641:24: error: ‘src’ was not declared in this scope
641 | return memcpy(dest, src, n);
| ^~~
../src/util/u_math.h:641:29: error: ‘n’ was not declared in this scope; did you mean ‘yn’?
641 | return memcpy(dest, src, n);
| ^
| yn
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
It doesn't make sense. You already spilled it once, and it didn't help.
Don't try again, or you'll end up in a loop.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Essentially an off-by-one error ... bit of an edge case, but seems to
occur in some glamor shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The new frame throttling implemention interacts unfortunately with
pipelining, leading to fence fds leaking like crazy and ultimately apps
crashing quickly.
With this patch, apps still crash but not as quickly. We need to either
figure out the real cause or revert the core changes.
Nevertheless, we don't want frame throttling in the first place, so.
Fixes: a65e29ccb2 ("gallium: simplify throttle implementation")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This commit moves the target check before using _mesa_get_current_tex_object
to fix a "Mesa implementation error: bad target in _mesa_get_current_tex_object()"
error.
Fixes: 9dd1f7cec0 ("mesa: pass gl_texture_object as arg to not depend on state")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We don't really need to impose this condition, but we do need to cope
with the slightly more general case.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A buffer and its aux are imported separately, if the aux import is
not completed yet when resource_get_param is called, merge the
separate aux a.k.a the 2nd image into the main image.
Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Once again, we were handling back-to-front in the GLES3 case, but not
the desktop GL case.
Fixes GTF-GL46.gtf30.GL3Tests.framebuffer_srgb.framebuffer_srgb_default_encoding when run with --deqp-surface-type=pbuffer --deqp-gl-context-type=egl.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We were looking at ctx->DrawBuffer when asking about the read buffer,
which was good enough for CTS purposes, but definitely not right.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The first time you call glXMakeCurrent, current != ctx. As a result we
would never look up whether the drawable already had an XMesaDrawable,
and would instead always create one. Then XMesaBufferList would have two
different buffers for the same XID, and you'd be reading and drawing to
different places, and that's not what you want at all.
Instead just always look up the drawable.
Fixes: db8be355 (gallium/xlib: Remove drawable caching from the MakeCurrent path)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1196
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
This reverts commit 2ca8629fa9.
This was initially ported from RadeonSI, but in the meantime it has
been reverted because it might hang. Be conservative and re-introduce
this packet emission.
Unfortunately this doesn't fix anything known.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Variables with same location should use the same driver_location.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Taken from nir_lower_samplers. Sampler arrays don't work though, this is
just to avoid an assert fail in ir3.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Notably includes centroid varying bits that were missing.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Noticed while debugging a tiling-looking issue by comparing our gmem
blit setup to freedreno's.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Fixes ir3 compiler failure failure in
dEQP-VK.renderpass.dedicated_allocation.formats.r8g8b8a8_unorm.clear.clear_draw
(now just a rendering failure where the subpass clear isn't happening)
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Fixes assertion failures in
dEQP-VK.api.image_clearing.core.clear_color_image.2d.* for these
formats, though the test set as a whole is stil failing.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Deal with tiled r8g8 having different alignment and other updates taken
from fd6_resource. Additionally track image samples/cpp.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Avoids hangs and some texture tests are happy with just this.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Changes to make compressed, tiled, 3d, etc textures work
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
* Fix R16G16 SCALED and R16G16B16A16 SCALED having texture format
* Fix B5G6R5 swap value
* Use R8_UINT instead of R8_UNORM for S8_UINT rb format
* Disable 96-bit texture formats instead having a check for NPOT formats
* Don't fail assert on D24X8 format
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Not supported, so always set pointer to NULL
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Note: for output type U32, negative LOD is not sign extended from 16 bits
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robclark@gmail.com>
GPUs with a single supported vertex stream must use the single state
address to program the stream.
Fixes: 3d09bb390a (etnaviv: GC7000: State changes for HALTI3..5)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This gs_iface doesn't seem to require a dependence on the tgsi
context, except for the swr end prim code.
This refactors the API to include all the info that the swr
code needs in the interface rather than having to dig it out of
the struct inheritance.
This is a precursor to adding NIR support to llvmpipe.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Fixes the following building error:
external/mesa/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:42:10:
fatal error: 'ac_llvm_util.h' file not found
^~~~~~~~~~~~~~~~
1 error generated.
Fixes: 3a08110 ("amd: Move all amd/common code that depends on LLVM to amd/llvm.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
All gallium drivers currently set MAX_FRAME_IN_FLIGHT to either 1
or 0, which means that the drivers either throttle on the previous
render or don't throttle, the current implementation is more
complicated than necessary and can be simplified.
Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Adds alternate versions of the atan builtin functions that use
ir_unop_atan and ir_binop_atan2 instead of inlining to the IR
implementation of the function. These alternatives are selected if the
IR is going to be consumed by NIR. In that case the IR ops will be
translated to the appropriate NIR op.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Adds ir_binop_atan2 and ir_unop_atan. When converting to NIR these are
expanded out using the appropriate builtin generator. If they are used
with anything else then it will just hit an assert.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Moves build_atan and build_atan2 into nir_builtin_builder. The goal is
to be able to use this from the GLSL translator too.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
When users pass a config to `eglCreateWindowSurface` it requests double
buffering, but if the config doesn't have the appropriate `__DRIconfig`,
`eglCreateWindowSurface` fails with a `EGL_BAD_MATCH`.
Given that such behaviour is completely unacceptable, we drop the
`EGL_WINDOW_BIT` if we don't have at least one `__DRIconfig` supporting double
buffering, otherwise dropping the `EGL_PIXMAP_BIT`.
Fixes: 049f343e8a "egl: Allow 24-bit visuals for 32-bit RGBA8888 configs"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
This commit does this by allowing both RGB and RGBA visuals to match with
EGL configs. We also expose the `EGL_MESA_config_select_group` egl
extension, which is similar to GLX's visual select group extension, to
allow the RGBA visuals to get less priority.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
Fixes: 049f343e8a "egl: Allow 24-bit visuals for 32-bit RGBA8888 configs"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
The bit moved on gen12 in order to prepare for dual-SIMD8 dispatch.
This implementation isn't an entirely complete as it only works on SIMD8
and SIMD16 and not dual-SIMD8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently the ts_request_type and ts_resource_select thread spawner
message descriptor bits were removed from the hardware at least since
ICL. Drop them in order to avoid assertion failures on Gen12+
platforms which don't have any encoding for this. On Gen9+ these are
probably just ignored by the hardware, so this is unlikely to have had
any functional implications prior to Gen12.
v2: Mark TS message fields as non-existing in brw_inst.h on ICL. (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The WAIT instruction has been removed, but SYNC.bar can be used
instead to wait for a notification on n0.0.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently this field was removed on SKL, and according to the
hardware docs for previous platforms "This field is only valid for a
ForwardMsg message. It is ignored for other messages. The BarrierMsg
message always increments the N0 notification counter".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Confirmed no regressions after a full Piglit run on TGL with the
brw_fs_test_dispatch_packing() test enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They look like a NULL source if you don't look at the address mode.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following fix-up by Jordan Justen is squashed in:
intel/eu/validate: gen12 send instruction doesn't have a dst type field
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to hardware bug filed as HSDES#1604601757.
v2: Only return if result of fs_inst::can_do_source_mods() is known to
be false for the case new orthogonal restrictions are implemented
below in the future. (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Kept as a separate commit in order to avoid distracting reviewers of
the software scoreboard pass with memory management boilerplate.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Gen12+ hardware lacks the register scoreboard logic that used to
guarantee data coherency between register reads and writes in previous
generations. This lowering pass runs after register allocation in
order to make up for it.
It works by performing global dataflow analysis in order to determine
the set of potential dependencies of every instruction in the shader,
and then inserts any required SWSB annotations and additional SYNC
instructions in order to guarantee data coherency.
v2: Drop unnecessary _safe list iteration (Caio).
v3: Temporarily workaround potential WaR hazard between FPU
instruction and subsequent out-of-order write, pending
clarification from the hardware team. Drop redundant tracking of
implicit access of acc0-1, since the hardware guarantees coherency
of these (but not the other accumulators...).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewers are encouraged to audit the code generation pass
independently for the case I missed some potential data hazard or new
code has been added in the meantime.
v2: Add SYNC instruction to cr0 workaround in brw_float_controls_mode().
v3: Drop likely redundant (and potentially harmful) RegDist SWSB
annotation from ce0 read in brw_find_live_channel() (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
An effect similar to the one formerly provided by setting thread
control to "switch" can be achieved now by setting a RegDist of 1 on
the SWSB field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A future lowering pass will simulate the same behavior originally
provided by NoDDChk/NoDDClr at the IR level by using appropriate SWSB
annotations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new SEND instruction behaves like the former SENDS instruction.
The original single-payload SEND instruction is gone.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SEND instruction is now four-source. The descriptor is no longer
part of source 1, so avoid touching it to avoid corruption while
initializing the descriptor.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Quite a lot of churn because the encoding of most hardware opcodes has
changed unfortunately.
v2: Split dot-product description fixes to separate patch (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The encoding of almost every instruction field has changed in Gen12,
so this involves adding a Gen12+ bitfield spec to every brw_inst
macro. In addition some new macros are required to handle certain
discontiguous and variable-length fields.
This commit doesn't actually include the Gen12 updated bitfield specs,
only the macros are extended here for reviewability.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Rename FDC() to FFDC() and FDC1() to FDC() for consistency with
the existing F() and FF() macros.
This edge doesn't exist in the original scalar program, but it
represents a potential control flow path the EU will take in cases
where control flow isn't uniform across channels of the same SIMD
thread.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This edge doesn't exist in the original scalar program, but it
represents a potential control flow path the EU will take in cases
where the condition isn't uniform across channels of the same SIMD
thread.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Currently only the physical back-edge is represented, which
incidentally also leads to the exit block of the loop, but we need the
direct logical edge in addition for our logical CFG representation to
be complete.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This represents two control flow graphs in the same cfg_t data
structure: The physical CFG that will include all possible control
flow paths the EU can physically take, and the logical CFG restricted
to the control flow paths that exist in the original scalar program.
The latter is a subset of the former because in case of divergence the
SIMD vectorized program will take control flow paths that aren't part
of the original scalar program.
The bblock_link constructor and bblock_t::add_successor() now take a
"kind" parameter that specifies whether the edge is purely physical or
whether it's part of both the logical and physical CFGs (a logical
edge is of course always guaranteed to be in the physical CFG as
well). bblock_t::is_predecessor_of() and ::is_successor_of() also
take a kind parameter specifying which CFG is being queried. The '~>'
notation will be used now in order to represent purely physical edges
in IR dumps.
This commit doesn't actually add nor remove any edges from the CFG
(the only edges marked as purely physical here are the two WHILE loop
ones that already existed). Optimization passes should continue using
the same (incomplete) physical CFG they were using before until
they're fixed to do something smarter in a later commit, so this
shouldn't lead to any functional changes.
v2: Remove tabs from lines changed in this file (Caio).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Having the IR opcodes locked to their hardware representation is risky
because it causes opcodes as different as BRC and IFF to compare equal
at the IR level (luckily the back-end only ever uses one opcode from
each group, right now), and it prevents us from supporting
instructions that change their hardware representation across
generations, which will become a problem on Gen12+ platforms.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change brw_inst_set_opcode() and brw_inst_opcode() to call
brw_opcode_encode/decode() transparently in order to translate between
hardware and IR opcodes, and update the EU compaction code in order to
do the same as needed, so we can eventually drop the one-to-one
correspondence between hardware and IR opcodes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This rewrites the current opcode description tables as a more compact
flat data structure. The purpose is to allow efficient constant-time
look-up by either HW or IR opcode, which will allow us to drop the
hard-coded correspondence between HW and IR opcodes -- See the next
commits for the rationale.
brw_eu.c is now built as C++ source so we can take advantage of
pointers to member in order to make the look-up function work
regardless of the opcode_desc member used as look-up key.
v2: Optimize devinfo struct comparison (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The brw_inst opcode accessors are going away in one of the following
commits. We could potentially replace them with the new helpers that
do opcode remapping, but that would lead to a circular dependency
between brw_inst.h and brw_eu.h. This way we also avoid ordering
issues that can cause the semantics of the ex_desc accessors to change
depending on whether the ex_desc field is set after or before the
opcode instruction field.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is required because SEND message payload sources are fetched
asynchronously by the hardware, which can lead to WaR data corruption
on Gen12+ platforms if not handled specially by the compiler to
guarantee proper synchronization.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And after discard-only loops. Otherwise we end up with dead code
which confuses nir_repair_ssa into adding a whole bunch of uses
of undefined. However, for derefs, we sometimes always expect to
get a variable instead of undefined.
Fixes dEQP-VK.graphicsfuzz.write-red-in-loop-nest on radv.
Fixes: c832820ce9 "nir/dead_cf: Repair SSA if the pass makes progress"
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1928
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
p_as_uniform can get CSE'd, which can be incorrect and break some
dEQP-VK.descriptor_indexing.* tests.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes the UBO/SSBO dEQP-VK.descriptor_indexing.* tests
v2: remove bld.copy() usage
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This can happen when bcsel is used between the results of two
vulkan_resource_index. It's also probably needed for non-uniform
descriptor indexing
Fixes dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.reads_opselect_two_buffers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
v2: always assert on the texture/sampler handle's num_components
v3: replicate the deref inside the loop
v4: remove a case of useless line wrapping
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that the base resource is allowed to be incompatible with PE, we can
make a smarter choice of tiling mode to avoid allocating a PE compatible
base that is never used for regular textures. This affects GPUs like GC2000
where there is no tiling compatible with both PE and TE.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
For PE-incompatible layouts, use a mechanism similar to what texture does
to create a compatible base resource.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Remove the "addressing_mode" state, which is currently set incorrectly, and
instead deduce the addressing mode from the tiling layout.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
It simplifies the definitions of jobs using the Debian 10 image.
The needs: was previously missing from the llvmpipe/softpipe test jobs,
so they could spuriously run if the debian-10 job failed or was
cancelled.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The coroutine split pass is missing a dependency before LLVM 9.0,
and fails to initialise properly if the CallGraphWrapperPass hasn't
be initialised earlier (x86 does it due to some of it's passes
requiring it).
This is a workaround for llvm 8 (coroutines are only supported in 8
and higher). It adds another pass that has a dependency on the pass
the coroutines split requires. This pass shouldn't have any raal
effects.
Fixes: d32690b43c (gallivm: add coroutine pass manager support)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This is needed as part of GLES3.1 and helps for ARB_gpu_shader5.
Fixes: KHR-GLES31.core.texture_gather.* cases
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This will encode the component selection value (0, 1, 2, 3) into
the X swizzle of the sampler, if the driver requests it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Accessing the TG4 component via immediates in the llvmpipe backend is quite
messy (like really messy). Roland suggested we change the instruction encoding,
so introduce a cap to allow the component to be selected to be store in the
sampler swizzle, which should be otherwise unused.
I could probably switch all drivers over, but virgl would need some work that
I'd prefer not to rush it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This job uses the vs2017 backend of meson (msbuild) as opposed to the
ninja backend used on MacOS and Linux.
v7: - rebase on master
- remove llvm (we'll add that back later)
- remove cygwin (we'll add that back later too)
v6: - rebase on master, including the addition of cygwin
- consolidate 3 appveyor patches into this one patch
v5 - use the new b_vscrt option instead of manually specifying the crt
v4: - rebase on python3 generators
- cache meson wraps
- Build x86 instead of x86_64, since that's what the pre-built LLVM
is
- update to vs2017 from vs2015
- set the default-library to static
- use the new vscrt override
- add the /m switch to msbuild to make the build somewhat faster
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Currently meson doesn't correctly handle passing compiled binaries to
scripts in tests. This patch looks to the future (0.53) when meson will
have this functionality, but also immediately it fixes these tests in
cross compiles by causing them to return 77, which meson interprets as
skip.
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
MSVC is generally happy, but mingw errors. I've spent as much time
(several days) trying to squash all of these warnings and I'm done with
it, just leave them as warnings with MinGW.
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This has always been present in the scons build, so it should be in
the meson build as well.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Mesa uses the lib prefix, and doesn't use a version for it's dynamic
libraries, which meson defaults to.
v2: - this patch
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
It crashes hard (pop-up window and all).
v2: - Change comment to FIXME
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
I can't figure out why symbols are being exposed that shouldn't.
v2: - change comment to FIXME
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
They require the pipe-loaders, which require xmlconfig, which doesn't
build with msvc.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
There are quite a few tests that require getopt, when using MSVC we need
to use the bundled version of getopt since there isn't a system version.
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Because the macros for exporting dll symbols and using TLS are mutually
exclusive.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This makes two changes for SWR,
The first is that it reorders the arguments to try to put the ICL ones
first. This is required to support older versions of meson that don't
add enough "error in this case" switches to ICL, which causes it to
happy accept -mavx (for example) even though it doesn't support them,
resulting in compilation failures.
The second is to fix the names of the libraries, setting the soversion
to '' will result in <lib>.dll, instead of <lib>-0.dll. Since these are
not versioned dll's, but implement an internal API we should communicate
that. It's also what scons does.
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
There isn't an obvious command line switch here, /arch:AVX *might* be
the right thing, but meson doesn't know what to do here either and
leaves the -msse4.1 and -mstackrealign.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
v2: - Add missing D to pound define
- Simply define the variable rather than set it to 1 (mirrors
android.mk not scons)
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
There's a mingw bug for this, it exports __builtin_posix_memalign but
not posix_memalign, so the check will succeed, but compiling will fail.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
v2: - set so_version to '' (only affects windows)
- always set lib prefix to 'lib', even on msvc
v5: - key NO_EXPORTS on shared glapi instead of gles.
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
v4: - Fix check for broken mingw (should be for x86 not x86_64)
- Add comment about why check is needed
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
v4: - Handle enable gles properly
- Add comments about what various #defines do
v5: - key NO_EXPORTS on shared glapi instead of gles.
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
v4: - Retain scons comments for windows specific defines
v5: - key GLAPI_NO_EXPORTS off of shared-glapi instead of gles
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
These are needed to control the export or symbols due to differences
between the way windows and *nix handle symbol exports.
Reviewed-by: Eric Anholt <eric@anholt.net> (v2)
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
v5: - key NO_EXPORT off of shared-glapi instead of gles
v4: - Fix typo in warning code (4246 -> 4267)
- Copy comments from scons for what MSVC warnings codes do
- Merge linker argument changes into this commit
v5: - Add /GR- on windows if LLVM is build without rtti (equivalent to
GCc's -fno-rtti')
- Add /wd4291, which is catching the same hting that
-Wno-non-virtual-dtor is on GCC/Clang
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
MinGW defines only _WIN32, but doesn't have fcntl, so we need to use the
windows path.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Not sure if this is a bug in the user or not, but some CTS
tests fail due to using an 8 byte constant buffer.
Fixes: KHR-GLES31.core.layout_binding.block_layout_binding_block_VertexShader
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
since images are a single level, minify before passing the w/h
to draw.
Fixes: KHR-GLES31.core.shader_image_size.basic-nonMS-vs-*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Due to use vmovdqa instructions in the asm, which require 16-byte
aligned buffers.
This fixes a crash in
KHR-GLES31.core.texture_buffer.texture_buffer_texture_buffer_range
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Preparation for a later commit.
Fixes: 93df862b6a ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This reflects better what is provided by glvnd or not.
Fixes: 93df862b6a ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
SCons and Meson have never supported that feature, and Autotools was
deleted over 6 months ago and no-one complained yet, so it's pretty
obvious nobody cares about it.
Fixes: 95aefc94a9 ("Delete autotools")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
This is a security feature to disallow malicious apps from passing
a buffer that is too small.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
To abstract things a bit, this adds a helper function in radv_android.c.
However, this means we have to link in radv_android.c on non-android as
well, which means some scaffolding changes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since we really cannot share them ever.
Also remove an unused switch.
Fixes: b70829708a "radv: Implement VK_KHR_external_memory"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Derived from the Intel code.
For the internal format we just use the internal Vulkan format,
as we have Vulkan formats for all android formats we care about.
For the ycbcr properties we just do something. I do not have a real
clue what would be recommended.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The minigbm comment really says it all. We should
fix minigbm as well, but for now this is the more
robust solution.
Note that this only changes width and height for
the surface creation, not for the image and hence
also not for the sampler, where it would wreak
havoc due to the normalized coords.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We want this flexibility because in GFX10 we lose any stride fields,
so we have to make sure our width/height are in alignment with
the external image we import.
Furthermore, we need the ability to inject tiling modifiers on import
time which is strictly after create time for Android. So, with the
layout & patch functions being fully independent of pCreateInfo, we
can delay it until import/bind time.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Run dEQP on boards with Mali 400 and 450 in Baylibre's lab.
There's lots of skipped tests because of crashes and undetermined
behavior. May be a good idea to run the tests with valgrind and fix any
issues found.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
As the non-LAVA runner script does, have per-GPU version files listing
the tests that are to be skipped, due to being very slow, unstable, etc.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Create basic aub_context on GEM_CONTEXT_CREATE.
Set it up and submit a context + ring + pphwsp during execbuf
submission, if it has not been initialized yet.
v2: Write the HWSP only once per engine (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2:
- Only dump context if there were no erros (Lionel).
- Store counter for context handles in aub_file (Lionel).
v3:
- Add a comment about aub_context -> GEM context (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to be able to create contexts on demand, and increase the GGTT
as needed for that. Use the aub_map_ggtt() function for that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to reuse it in execlists_setup().
v2: Rename it to write_ggtt_ptes() (Lionel).
v3: Rename it to aub_map_ggtt() (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When the timestamp is not ready (ie. UINT64_MAX), the availabily bit
should be zero. The previous code used to copy the timestamp value
as the availabily bit and that's completely wrong.
Because it's not that simple to emit a conditional with the CP, the
driver now uses a compute shader for copying timestamp query results.
Fixes dEQP-VK.pipeline.timestamp.misc_tests.reset_query_before_copy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise, the GPU might write timestamp queries after the reset
operation. This is similar to other query operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
These are not needed anymore, since PhyReg has an implicit
conversion operator that can convert it to unsigned int,
which is equivalent to accessing this field.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
According to LLVM, branches with an offset of 0x3f are buggy.
v2: (by Timur Kristóf)
- extract the GFX10 specific part to its own function
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The DLC bit is now set to 1 for all loads when GLC is also set,
but cleared to 0 for all stores (otherwise it causes issues),
and also cleared to 0 for atomics.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Also remove img_format from aco_ir, since it can be calculated
from dfmt and nfmt. So only the assember needs to deal with it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
We'd like to use some functions, for example some
ac_shader_util functions in ACO, so we need to link
ACO to AC.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
We'd like to include some of these in C++ code later.
Specifically, ACO is written in C++ and we would like to use
some of this code in ACO in order to avoid code duplication.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Specifically when reading the primitive counters.
This fixed ~700 CTS tests using this pattern:
dEQP-GLES3.functional.transform_feedback.*
when run after tests like
dEQP-GLES3.functional.prerequisite.read_pixels on the same
caselist. When run individually those tests were passing because
prim_counts_offset was zero.
Fixes: 0f2d1dfe65 ("v3d: use the GPU to
record primitives written to transform feedback")
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The initial patch only fixed up the NIR path, but forgot
the TGSI path needed fixing as well.
Fixes: f92226931b ("st/mesa: Prefer R8 for bitmap textures")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With libc++ (LLVM's STL implementation), the original code does not compile because an
appropriate vector constructor cannot be found (for the _ForwardIterator one, requirement
is_constructible is not satisfied).
<sys/param.h> is required for NetBSD version detection,
and __NetBSD__ must be used to detect even on older releases.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
eglGetDisplay is awful because you have to inspect the pointer you're
given and guess what type of native display it corresponds to. We make
it worse by caching the type of the first such display we detect, so if
the second call to eglGetDisplay is to a different display type, kaboom.
Fortunately this is a problem that can be solved with the delete key.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/156
Previously, this could have made the resource divergent in code like
that which is genereated by nir_lower_non_uniform_access.
Fixes: da8ed68a ('nir: replace nir_move_load_const() with nir_opt_sink()')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Previously, for code like:
loop {
loop {
a = load_ubo()
}
use(a)
}
adjust_block_for_loops() would return the block before the first loop.
Now we compute the range of allowed blocks and then walk the dominance
tree directly, guaranteeing directly that we always choose a block that
dominates all the uses and is dominated by the definition.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
And use a new p_discard_early_exit instruction. This fixes some cases
where a definition having the same register as an operand causes issues.
v2: rename instruction to p_exit_early_if
v2: modify the existing instruction instead of creating a new one
v3: merge the "i == num - 1" IFs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cloning texture loads isn't a good idea since we may move it into
a block that is not shared between all the invocations of the shader.
We'd like to avoid that since it may result in undefined behavior.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Without this, the test jobs could spuriously run after the container
job failed or was cancelled, even if the build job didn't run at all.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
The spec has probably been misinterpreted during RADV bringup.
This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If it's not available, we fall back to A8. This should work on all drivers,
because we depend on it in the display-list code already.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This should help avoid stalls in the pixel mask array in certain
non-promoted depth cases. It especially helps for Z16, as each bit
in the PMA corresponds to two pixels when using Z16, as opposed to
the usual one pixel.
Improves performance in GFXBench5 TRex by 22% (n=1).
Fixes Piglit's gl-2.1-polygon-stipple-fs on iris.
Fixes: 63f24c3c01 ("gallium: Enable MESA_framebuffer_flip_y")
Reviewed-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Implement glFramebufferParameteriMESA on GLES 3 so
that the extension is not dependant on GLES 3.1
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
bound_vertex_buffers doesn't include extra draw parameters buffers.
Tracking this correctly is kind of complicated, and iris_destroy_state
isn't exactly in a hot path, so just loop over all VBO bindings.
Fixes: 4122665dd9 (iris: Enable ARB_shader_draw_parameters support)
Reported-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
On Solaris, sys/sysmacros.h has long-deprecated copies of major() & minor()
but not makedev().
sys/mkdev.h has all three and is the preferred choice.
Let's make sure we check for all 3 major(), minor() and makedev().
Reported-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Tested-by: Alan Coopersmith <alan.coopersmith@oracle.com>
The list of AMD/ATI devices supported by radeon/r200/r300/r600 is
complete, so anything else must use radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When only the depth/stencil bufs are cleared, we should make sure the
color content is reloaded into the tile buffers if we want to preserve
their content.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
glClear()s are expected to be the first thing GL apps do before drawing
new things. If there's already an existing batch targetting the same
FBO that has draws attached to it, we should make sure the new clear
gets a new batch assigned to guaranteed that the FB content is actually
cleared with the requested color/depth/stencil values.
We create a panfrost_get_fresh_batch_for_fbo() helper for that and
call it from panfrost_clear().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Hitting any fallback path on Broxton as we require clflushing the whole
buffer even for an upload of a subtexture. However, since gallium
provides a pbo upload path, allow it to sample packed RGB if supported.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It gets used by the gallium auxiliary draw module, which gets used
pretty much always when LLVM is used as JIT.
At the same time most builds don't hit the issue here because the
shared library of LLVM contains all modules.
Fixes: d32690b43c ("gallivm: add coroutine pass manager support")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/951
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
This commit is a step towards the goal of being able to build RADV
without LLVM. In the future we would like to offer the option to
use RADV solely with ACO. There is still a need for the common AMD
code located in amd/common but the LLVM specific parts need to be
separated.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This mirrors the intrinsics in the GLSL IR. One could imagine an
alternate definition where reading the semantic would account for the
READ_HELPER functionality, but that feels potentially dodgy and could be
subject to CSE unpleasantness.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
u_upload_mgr sets it, so that util_range_add can skip the lock.
The time spent in tc_transfer_flush_region decreases from 0.8% to 0.2%
in torcs on radeonsi.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Meson automatically tracks any file included by a file it already tracks,
and `pci_id_driver_map.h` & `loader.h` are included by `loader.c`, while
`loader_dri3_helper.h` is included by `loader_dri3_helper.c`.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
These can appear after loop unrolling.
v2: stylistic changes
v2: replace state->mem_ctx with state->shader
v2: add bounds checking
v3: use nir_intrinsic_range() for bounds checking
v3: fix issue where partially out-of-bounds reads are replaced with undefs
v4: fix merge conflicts during rebase
v5: split into two commits
v6: set constant_data to NULL after freeing (fixes nir_sweep()/Iris)
v7: don't remove the constant data if there are no constant loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v6)
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
We only have the subgroup variant in NIR (equivalent to clockARB), so
only support that for now.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In the test stage, we can use any of the two container images as we
arent going to do anything architecture-dependent when submitting the
jobs to LAVA.
But if we are in a pipeline in which the images need to be rebuilt and
one finishes much earlier than the other, it could happen that the test
job that executes first fails to find the container image.
To avoid that, have each job in the test stage to use the image that has
been already implicitly built by depending on the build job for the
given arch.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
This reverts commit 19546108d3.
This commit breaks the build because lima implements
->set_damage_region(). I guess we'll need more discussion before
removing the ->set_damage_region() hook.
This reverts commit 492ffbed63.
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update(), leading to a damage region update on the
wrong pipe_resource object.
Let's not expose the ->set_damage_region() method until the core is
fixed to handle that properly.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
In preparation for testing drivers other than Panfrost in LAVA labs.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Include Panfrost's gitlab.ci.yml file from Mesa's main .gitlab-ci.yml so
we test on devices with Panfrost.
This uses LAVA to schedule jobs in the devices and will be the base for
testing Etnaviv, Lima, etc.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a port of Nanley's 904c2a617d
from i965 to iris.
One concern is that iris uses larger batches, and also emits far fewer
commands, so we may come closer to the 500 limit within a batch, and
could need to supplement this with actual counting. Manhattan 3.0 had
239 3DSTATE_CONSTANT_PS packets in a batch, Unigine Valley had 155.
So it seems like we're still in the realm of safety.
We're missing the offset of the slice in the subslice mask...
This worked for most platforms that don't have first slice fused off
because we would reread the same mask from slice0 again and again...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c1900f5b0f ("intel: devinfo: add helper functions to fill fusing masks values")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1869
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We were supplying __DRI2_THROTTLE_SWAPBUFFER, rather than the obvious
choice of __DRI2_THROTTLE_COPYSUBBUFFER. This meant that we hit the
swap-based frame throttling. glXCopySubBuffer doesn't seem like it's
intended to be a frame boundary, so we'd like to avoid this throttling.
Tested-by: Michel Dänzer <mdaenzer@redhat.com> # DRI3 only
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
glXCopySubBufferMESA copies data from the back buffer to the front,
so it needs to perform a MSAA downsampling operation just like
glXSwapBuffers would.
Currently, the CopySubBuffer implementations supply a throttle reason
of __DRI2_THROTTLE_SWAPBUFFERS, so they hit this path and work today.
But we'd like to avoid swapbuffer throttling in this case, so the next
patch will change that reason.
Tested-by: Michel Dänzer <mdaenzer@redhat.com> # DRI3 only
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
I thought I fixed this, but I guess I must have broken it again.
Fixes various dEQP-VK.draw.* tests
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Working on the algebraic implementation, I was being driven nuts by my
editor not highlighting and handling indentation for the C code. It turns
out that it's basically not pass-specific code, and we can move it over to
the relevant .c file. Replaces 30KB of code with 34KB of data on my i965
build. No perf diff on shader-db (n=3)
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Having passes generate these is just making more work for copy
propagation (and thus probably calling more optimization passes)
later. Noticed while trying to debug nir_opt_algebraic()
top-to-bottom having O(n^2) behavior due to not finding new matches in
replacement code.
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This matches what we do for uses_sample_qualifier, and what we
do in ir_set_program_inouts.cpp as well.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies ACO and allows the lowered code to be optimized (in
particular, constant folded).
Totals from affected shaders:
SGPRS: 1776 -> 1776 (0.00 %)
VGPRS: 1436 -> 1436 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 203452 -> 203564 (0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 103 -> 103 (0.00 %)
At least some of the code size increase seems to be from literals being
applied to instructions as a result of constant folding.
v2: remove fmod/frem handling in init_context()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
If the instruction interpolateAtCentroid is used the extra interpolator
must also be enabled in the state.
Fixes: fs-interpolateatcentroid-block
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Now that we have live_out calculated per block as metadata, calculating
liveness of an instruction at a given point in the program becomes O(n)
to the size of the block worst-case, rather than O(n) the program.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Callers should have liveness info ready. Ideally we'd have a nice
metadata tracking framework like NIR to handle this automatically, but
for now this will allow us to make forward progress... when we're about
to do something with liveness, invalidate everything ahead to force a
clean calculation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to explicitly invalidate liveness analysis results so
we can cache liveness results.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
By definition, once liveness analysis has occurred:
live_out = OR {succ} succ->live_in
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There are unfortunately two distinct liveness analysis passes in the
compiler right now -- one good (but complex) pass used by RA based on
solving data flow equations, and one awful (but simple) pass used for
dead code elimination and bundling based on an abstract walk of the AST.
Let's move RA's pass into shared code so we can work on unifying.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to fill in ctx->temp_count explicitly, even if we haven't
squished down the MIR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We already enforce this with the SSA/register distinction in the
backend. There is no need to duplicate this logic merely for an assert.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have track inter-batch dependencies, the flush done in
panfrost_set_framebuffer_state() is no longer needed. Let's get rid of
it.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have all the pieces in place to support pipelining batches
we can get rid of the drmSyncobjWait() at the end of
panfrost_batch_submit().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't have to flush all batches when we're only interested in
reading/writing a specific BO. Thanks to the
panfrost_flush_batches_accessing_bo() and panfrost_bo_wait() helpers
we can now flush only the batches touching the BO we want to access
from the CPU.
This fixes the dEQP-GLES2.functional.fbo.render.texsubimage.* tests.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is needed if we want to free the panfrost_batch object at submit
time in order to not have to GC the batch on the next job submission.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Will be useful to make the ioctl(WAIT_BO) call conditional on BOs that
are not exported/imported (meaning that all GPU accesses are known
by the context).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to only flush batches touching a specific resource,
which is particularly useful when the CPU needs to access a BO.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
And use it in panfrost_flush() to flush all batches, and not only the
one currently bound to the context.
We also replace all internal calls to panfrost_flush() by
panfrost_flush_all_batches() ones.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The panfrost_fence logic currently waits on the last submitted batch,
but the batch serialization that was enforced in
panfrost_batch_submit() is about to go away, allowing for several
batches to be pipelined, and the last submitted one is not necessarily
the one that will finish last.
We need to make sure the fence logic waits on all flushed batches, not
only the last one.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The idea is to track which BO are being accessed and the type of access
to determine when a dependency exists. Thanks to that we can build a
dependency graph that will allow us to flush batches in the correct
order.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We'll soon need to freeze a batch not only when it's flushed, but also
when another batch depends on us, so let's add a helper to avoid
duplicating the logic.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We just replace the per-context out_sync object by a pointer to the
the fence of the last last submitted batch. Pipelining of batches will
come later.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
So we can store the flags as data and keep the BO as a key. This way
we keep track of the type of access done on BOs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The type of access being done on a BO has impacts on job scheduling
(shared resources being written enforce serialization while those
being read only allow for job parallelization) and BO lifetime (the
fragment job might last longer than the vertex/tiler ones, if we can,
it's good to release BOs earlier so that others can re-use them
through the BO re-use cache).
Let's pass extra access flags to panfrost_batch_add_bo() and
panfrost_batch_create_bo() so the batch submission logic can take the
appropriate when submitting batches. Note that this information is not
used yet, we're just patching callers to pass the correct flags here.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The CTS finally has agreed to drop the requirement for a
565-no-depth-no-stencil config for ES 3.0. Hence we can now remove the
code to satisfy this requirement using a pbuffer-only visual with
whatever other buffers the driver happens to have given us.
This reverts commit 82607f8a90,
commit 6ad31c4ff3 and
commit dacb11a585.
v2:
- Reference the VK-GL-CTS issue (Eric E.).
v3:
- Don't revert
fc21394bc4 ("egl: Quiet warning about front buffer rendering for pixmaps/pbuffers")
(Kenneth).
References: VK-GL-CTS issue 1601.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This script is responsible for generating an entire page in the
docs/relnotes/ directory. It includes a template for the page, and uses
mako to fill in the necessary bits. It is designed to be purely fire and
forget, calculating previous versions, shortlogs, bug fixes, and dates.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
The next patch is going to introduce a tool that creates the entire
release html page for us, without any user intervention. As such we
can't be editing it. To that end the script will read the
new_features.txt file to get a list of new features.
This is a flat text file, one entry per line.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
On 64 bits platforms, some atomic operations like __sync_fetch_and_add()
have constant time, but on 32 bits platforms they are implemented with a
loop and might take much longer.
Additionally, it seems like if their operands are not aligned to 64
bits, they also require extra memory accesses. From the Intel
Architecture's Developer Manual Vol. 1, 4.1.1:
"A word or doubleword operand that crosses a 4-byte boundary or a
quadword operand that crosses an 8-byte boundary is considered
unaligned and requires two separate memory bus cycles for access."
Forcing the u64 field to be aligned to 64 bits seems to make the unit
tests that are stressing this finish much faster.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This replaces to old Bugzilla: tag, which no longer makes sense because
we don't use bugzilla anymore.
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
v1 by Topi Pohjolainen
v2,v3 by Anuj Phogat:
- Apply for gen >= 11
- Remove wa_bug_xxx function
- Use helper functions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lowers fmod and frem at NIR level like RadeonSI. fmod is
already lowered directly in NIR->LLVM, and frem will be lowered by
LLVM anyways.
This fixes a LLVM crash with:
dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
... because it's wrong to do so. The error path out of
dri2_initialize_drm ends with dri2_display_destroy, which calls
functions in the vtable we're trying to set up, so if we dlclose the
driver then those function pointers will point off into space and things
crash.
Noticed this because after !1923 eglinfo would crash when setting up the
GBM platform. This was something of a cascade failure, because my kernel
is too old for DRM_IOCTL_I915_GETPARAM to work without DRM_AUTH, so i965
wouldn't load. platform_drm.c then got very confused when it tries to
load swrast as a dri2 driver.
Reviewed-by: Eric Anholt <eric@anholt.net>
uintptr_t is 32 bits in a 32-bits build, resulting in shifting out
of bounds.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We need the continue CS for referencing the tess/GDS/sample position BOs.
Fixes: 46e52df34d "radv: add tessellation ring allocation support. (v2)"
Fixes: e1dc3ab753 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout"
Fixes: 1171b304f3 "radv: overhaul fragment shader sample positions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Instead of a single cache shared between all jobs, but reduce the
maximum cache size to 1.5G (from 5G).
Rationale for smaller cache:
Pulling & pushing a 5G cache could take a long time. Consider
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/684010 (click the "Show
complete raw" button to see timestamps): Pulling the cache took
1569927241-1569927194 = 47 seconds, pushing it 1569927671-1569927519
= 152, for a total of 199 seconds. The actual build took comparable
1569927518-1569927243 = 275 seconds, despite no cache hits from ccache.
In other words, the cache transfers almost doubled the job duration,
and they would have negated any build time benefits from ccache even
with a high cache hit rate.
Also, the smaller caches avoid blowing up storage requirements for them
too much.
Rationale for per-job caches:
Making a single cache significantly smaller might result in cached
build products from one job getting evicted by another job, reducing
the likelihood of cache hits from previous pipelines.
v2:
* Move up "ccache --max-size=1500M" call (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
To truly to do this correctly, we'll have to fix the discrepancy between
drm_virtgpu_3d_transfer_to_host and virtio_gpu_transfer_host_3d. However,
this is a good starting point.
Since virtio-gpu only supports self-import and export, this should be fine.
Let's only do WINSYS_HANDLE_TYPE_FD for this currently.
Reviewed by: Robert Tarasov <tutankhamen@chromium.org>
The winsys might supply dimensions that are different than
those we calculate. In additional, it may supply virtualized
modifiers.
In practice, a stride != bpp * width and virtualized modifiers don't
happen yet, but the plan is to move in that direction.
Also make virgl_resource_layout static.
Reviewed by: Robert Tarasov <tutankhamen@chromium.org>
i915 will report ENODEV on generations prior to Haswell because there
is no point in reporting values on those. This is prior any fusing
could happen on parts with identical PCI ids.
This query call was previously only triggered on generations that
support performance queries, which happens to match generation for
which i915 reports topology, but the commit pointed below started
using it on all generations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1860
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common code")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Random hangs no longer happen, I'm actually not sure if they were
related to this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Make sure to export the expected clear values to the depth
stencil attachment.
This fixes dEQP-VK.pipeline.depth_range_unrestricted.* on GFX10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The number of vertices has to be adjusted with the output primitive
type.
This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The GS outputs are stored differently in the LDS storage, they
are indexed by out_idx which is incremented for each stored DWORD.
Thus, we need a different path for exporting the stream outputs.
This fixes a bunch of CTS failures when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The LDS storage allocated for stream outputs is 4 * N, where N
is the number of outputs. So, we have to store/load with N as index
and not with the output location as index.
This doesn't fix anything known but it should fix out-of-bounds
access and it also reduces the number of outputs written to the
LDS storage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Some hardware has a bug with triangle strips and it is signalled by the
flag BUG_FIXED8 whether this bug has been fixed. So only enable triangle
strips when this flag is set.
Thanks: Jonathan Marek and Christian Gmeiner for the pointers
v2: Add TODO to indicate that the handling should be refined
(Jonathan & Christian)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
It's unused here, and undefined in scons. It is used in targets/osmesa,
but it's properly defined there already.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On the Android Antutu benchmark we ran into an assert in ISL where the
(base layer + num layers) > total layers. It turns out the core of
mesa forgot to clear the _Layer variable, potentially leaving an
inconsistent value.
v2: Pull setting u->_Layer out of the conditional blocks (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In converting to shift/size-based validation, we lost a condition from
the ARGB/XRGB equivalence check, which left it working one way round
but not the other, and broke applications like glmark2-es2-drm on some
platforms. Restore the equivalent check that *both* configs actually
have an alpha channel before considering a mismatch.
Fixes: 7b4ed2b513 ("egl: Convert configs to use shifts and sizes instead of masks")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
1. The hgl.c file is a read-only file versus read-write.
Ref: src/gallium/state_trackers/hgl/hgl.c
2. I've included the Haiku-specific patches I used to get a successful
build of Mesa 19.1.7 on Haiku using the meson/ninja build procedure.
Shows "[764/764] linking target ... libswpipe.so" at build completion.
v2:
Remove autotools files (Eric)
v3:
Update the patch
Reported-by: Ken Mays <kmays2000@gmail.com>
Tested-by: Ken Mays <kmays2000@gmail.com>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Alexander von Gluck IV <kallisti5@unixzen.com>
After 41549a18e6 ("i965: Enable OpenGL 4.6 for Gen8+"), i965
implements GL_ARB_gl_spirv, GL_ARB_spirv_extensions and OpenGL 4.6.
After 15e439071d ("iris: Enable ARB_gl_spirv and ARB_spirv_extensions"),
iris implements GL_ARB_gl_spirv, GL_ARB_spirv_extensions and OpenGL
4.6.
v2:
- Explicit the support is for i965 and iris.
v3:
- Add also GL_ARB_spirv_extensions to the release notes (Alejandro).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Found when building for Android in C99 mode. Include bitscan.h to ensure ffs is
available.
Fixes: 7b4ed2b5 ("egl: Convert configs to use shifts and sizes instead of masks")
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The order of comparison has changed, so we need to invert the logic of
"insert_left" when using rb_tree_insert_at().
Fixes: dae33052db (util/rb_tree: Reverse the order of comparison
functions).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To enable EXT_demote_to_helper_invocation:
This extension adds a "demote" keyword that is similar to "discard" but
only suppresses subsequent writes and outputs to the framebuffer, and
does not terminate the execution of the invocation. For the remainder
of the execution, the invocation is "demoted" to act like a helper
invocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From EXT_demote_to_helper_invocation, implemented with the existing
nir_intrinsic_is_helper_invocation.
Such builtin is necessary when using `demote` because we can't
redefine the value of gl_HelperInvocation (since it is an input
variable).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the EXT_demote_to_helper_invocation extension is enabled,
`demote` is treated as a keyword, and produces an ir_demote.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To represent the new `demote` keyword when using
EXT_demote_to_helper_invocation extension. Most of the changes are to
include it in the visitors.
Demote is not considered a control flow, so also include an empty
visit member function in ir_control_flow_visitor.
Only NIR actually supports `demote`, so assert the translations for
TGSI and Mesa's gl_program -- since the demote is not expected to
appear for those.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can't just check for the BO base address, we need to check for the
full address including any offset we may have applied. When updating
the address, we need to include the offset again.
Fixes: 5ad0c88dbe ("iris: Replace buffer backing storage and rebind to update addresses.")
A while back, Michael Larabel noticed that Paraview's Wavelet Volume
case runs significantly slower on iris than i965. It turns out this
is because we enable CCS_E for 32-bit floating point formats, while
i965 disables it, with an oblique comment saying that we benchmarked
it (on what exactly?) and determined that it was a loss.
Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed
large framerate drops when enabling CCS_E for either format. However,
several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit
floating point formats, with no apparent ill effects.
So, disable compression for 32-bit float formats for now, but leave it
enabled for 16-bit float formats as they seem to be working fine.
Improves performance in Paraview's Wavelet Volume test by 62% on a
Skylake GT4e.
Fixes: 3cfc6a207b ("iris: Fill out res->aux.possible_usages")
Tests done with llvm-config indicate that there are only 2 libraries in
irreader and not in engine, LLVMAsmParser and LLVMIRReader and both of them
are part of coroutines so I replaced irreader with coroutines and added
libraries unique to coroutines.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Now that we have constant adjustment logic abstracted, we can do this
safely. Along with the csel inversion patch, this allows many more
common csel ops to inline their condition in the bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we can reuse constant slots from other instructions, we would like to
do so to include more instructions per bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If an instruction could be scheduled to vmul to satisfy the writeout
conditions, let's do that and save an instruction+cycle per fragment
shader.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We still emit in-order but we switch to using the bundles created from
the new scheduler, which will allow greater flexibility and room for
out-of-order optimization.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We require chosen instructions to be "close", to avoid ballooning
register pressure. This is a kludge that will go away once we have
proper liveness tracking in the scheduler, but for now it prevents a lot
of needless spilling.
v2: Lower threshold to 6 (from 8). Schedule is hurt, but a few shaders
that spilled excessively are fixed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Derp
We can bundle two load/store together. This eliminates the need for
explicit load/store pairing in a prepass, as well.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Conditions for branches don't have a swizzle explicitly in the emitted
binary, but they do implicitly get swizzled in whatever instruction
wrote r31, so we need to handle that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Conditional instructions (csel and conditional branches) require their
condition to be written to a special condition pipeline register (r31.w
for scalar, r31.xyzw for vector). However, pipeline registers are live
only for the duration of a single bundle. As such, the logic to schedule
conditionals correct is surprisingly complex. Essentially, we see if we
could stuff the conditional within the same bundle as the csel/branch
without breaking anything; if we can, we do that. If we can't, we add a
dummy move to make room.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A bit of a kludge but allows setting an implicit dependency of synthetic
conditional moves on the actual condition, fixing code generated like:
vmul.feq r0, ..
sadd.imov r31, .., r0
vadd.fcsel [...]
The imov runs simultaneous with feq so it gets garbage results, but it's
too late to add an actual dependency practically speaking, since the new
synthetic imov doesn't have a node associated.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In the future, we will want to keep track of which components of
constants of various sizes correspond to which parts of the bundle
constants, like in the old scheduler. For now, let's just stub it out
for a simple rule of one instruction with embedded constants per bundle.
We can eventually do better, of course.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't actually do any scheduling here yet, but add per-tag helpers to
consume an instruction, print it, pop it off the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's not always obvious what the optimal bundle type should be. Let's
break out the logic to decide.
Currently set for purely in-order operation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
After we've chosen an instruction, popped it off, and processed it, it's
time to update the worklist, removing that instruction from the
dependency graph to allow its dependents to be put onto the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In the future, this routine will implement the core scheduling logic to
decide which instruction out of the worklist will be scheduled next, in
a way that minimizes cycle count and register pressure.
In the present, we are more interested in replicating in-order
scheduling with the much-more-powerful out-of-order model. So rather
than discriminating by a register pressure estimate, we simply choose
the latest possible instruction in the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like to flatten a linked list of midgard_instructions into an
array of midgard_instruction pointers on the heap.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's not based on the writemask and it can't be inferred; it's just
intrinsic to the op itself.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This worked better than my original v3d-local pass for just subs, and is a
huge win over not producing subs.
total instructions in shared programs: 6408469 -> 6167932 (-3.75%)
total threads in shared programs: 153784 -> 154104 (0.21%)
total uniforms in shared programs: 2157078 -> 1905823 (-11.65%)
total max-temps in shared programs: 904546 -> 895796 (-0.97%)
total spills in shared programs: 4959 -> 4993 (0.69%)
total fills in shared programs: 6558 -> 6670 (1.71%)
total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%)
total inst-and-stalls in shared programs: 6434314 -> 6193107 (-3.75%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.
This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Upcoming changes to sub optimization will make this pass required. Over
the course of that series, we see uniforms +.46%, instructions -.24%
(seems like a fine tradeoff -- uniforms are 1/2 the size of instructions
as far as cache occupancy)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Without this, it was theoretically possible for the jobs to run before
the docker image was ready.
v2:
* Use - list syntax instead of [] (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows most build jobs to run before the stretch or arm64 docker
images are ready.
v2:
* Use - list syntax instead of [] (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows the *-old-llvm jobs to run before the buster docker images
are ready.
v2:
* Use - list syntax instead of [] (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch is based on 28e3f85e09/mingw-w64-mesa/link-ole32.patch but with tweaks to avoid MSVC build break when applied.
v2: Create Mingw platform alias pointing to windows host platform define to avoid spurious crosscompilation;
v3: Fix obviously wrong compiler flags for swr driver;
v4: Update original patch URL because it has been relocated;
v5: Don't bother patching autools stuff as it's not used by MSYS2 Mingw-w64 build and it's days are numbered anyway;
v6: After Mingw posix flag fix in 295851eb things are far simpler as we don't need more linking of uuid, ole32, version and shell32 than what is already in place.
As X86AsmPrinter component is gone, LLVMX86AsmPrinter got replaced
with LLVMRemarks, LLVMBitstreamReader and LLVMDebugInfoDWARF.
Tests done with llvm-config on both LLVM 8 and 9 indicate that
mcjit, bitwriter and x86asmprinter fully fit inside engine component.
On other platforms and with meson build mcdisassembler was used to replace
X86AsmPrinter but mcdisassembler also fully fits inside engine component
for LLVM>=8 according to same tests.
v2: Avoid duplicating code related to Mingw pthreads.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
On 19.1 this patch does not apply cleanly without 88eb2a1f
Looks like blob uses following values for uniforms buffer:
0 for 8 bytes
1 for 16 bytes
2 for 24 bytes
2 for 32 bytes
3 for 40 bytes
3 for 48 bytes
3 for 56 bytes
3 for 64 bytes
4 for 72 bytes
It all looks like log2(size / 8) rounded up, so let's do the same.
Fixes: 931fc2a7b3f9("lima: do not set the PP uniforms address lowest bits")
Reviewed-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Android building rules are added in src/amd/Android.compiler.mk
libmesa_aco static library is built conditionally to radeonsi
as done for vulkan.radv module
This will prevent Android build errors for non x86 systems
filter-out compiler/aco_instruction_selection_setup.cpp source,
as already included by compiler/aco_instruction_selection.cpp
and would cause several multiple definition linker errors
NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco
Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Fixes: a70a998 ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Fixes a few building errors similar to the following:
In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26:
In file included from external/libcxx/include/algorithm:639:
external/libcxx/include/utility:321:9:
error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>'
_T2 second;
^
Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Fixes the following piglit test: fragdepth_gles2 (for ETNA_MESA_DEBUG=nir)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Note it can still be improved a bit:
* Use alu swizzle to determine if src is scalar
* Take into account new immediates in the multiple uniform src lowering
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This can improve performance by allowing the LAST_VARYING_2X bit to be
set when possible (and possibility more benefits on HALTI5 where the
number of components is set for each varying).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
LOAD starts reading into the first enabled destination component, and
doesn't skip disabled components, so we need to allocate a destination with
contiguous components.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Only invert front facing when glFrontFace is GL_CW.
Fixes following deqp test:
dEQP-GLES2.functional.shaders.builtin_variable.frontfacing
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The PP uniforms address register in render state is not a direct pointer
to the uniforms storage -- instead, it points to an one-item array, and
the array item is the real pointer to the uniforms storage.
This register reuses some of its LSBs as a size field. Currently the
size is set according to the length of the real uniforms storage.
However, as the register itself contains only a pointer to the one-item
array, the size field should be set to the length of the one-item array
and subtract it by 1, which means a fixed value of 0. That means we can
just omit it now.
Test shows this should be the correct approach to set this register.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
glsl 4.4 spec section '5.9 expressions':
"The operator is multiply (*), where both operands are matrices or one operand is a vector and the
other a matrix. A right vector operand is treated as a column vector and a left vector operand as a
row vector. In all these cases, it is required that the number of columns of the left operand is equal
to the number of rows of the right operand. Then, the multiply (*) operation does a linear
algebraic multiply, yielding an object that has the same number of rows as the left operand and the
same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations”
explains in more detail how vectors and matrices are operated on."
This fix disallows a multiplication of incompatible matrices like:
mat4x3(..) * mat4x3(..)
mat4x2(..) * mat4x2(..)
mat3x2(..) * mat3x2(..)
....
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
According to the 1.1.123 spec:
"The implementation will attempt to create all pipelines, and only
return VK_NULL_HANDLE values for those that actually failed."
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
I was inheriting the one from src/freedreno with funny tabs, while
this driver is written with normal Mesa 3-space indents.
Unfortunately I have to add both files, because I use emacs and emacs
prefers .dir-locals to .editorconfig :(
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We include shader_enums.h from freedreno's compiler for both GL and
Vulkan, and the main/config.h include resulted in polluting the
namespace with things like MAX_VIEWPORTS that other Vulkan drivers use
as their driver-specific maximums.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Without this, we were DCEing flag writes because we didn't think their
results were used because we didn't understand that an ANY32 predicate
actually read all the flags.
Fixes: df1aec763e "i965/fs: Define methods to calculate the flag..."
Reviewed-by: Matt Turner <mattst88@gmail.com>
Passes most of piglit's tests regarding arb_framebuffer_object
and unlocks some more piglit tests.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
We are using util_resource_copy_region(..) as fallback which supports
different formats for src and dst. Improves the experience when running
deqp or piglit with a debug build.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Prior to xvmc 1.0.12 libxvmc incorrectly required libxv, but that was
fixed. This results in compilation failures for the gallium xvmc tracker
and tools. This patch fixes that by explicitly linking to libxv.
Fixes: 22a817af8a
("meson: build gallium xvmc state tracker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1844
Reviewed-by: Adam Jackson <ajax@redhat.com>
We don't want to require a visual for the drawable, because there exist
fbconfigs that don't correspond to any visual (say a 565 pixmap|pbuffer
config on a depth-24 display). Fortunately, we don't need one either.
Passing the visual to XCreateImage serves only to fill in the XImage's
{red,green,blue}_mask fields, which libX11 itself never uses, they exist
only for the client's convenience, and we don't care. And we already
have the drawable depth in glx_config::rgbBits. So replace the
XVisualInfo field in the drawable private with a pointer to the
glx_config.
Having done that driswCreateGCs becomes trivial, so inline it into its
caller.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1194
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no reason to have two GCs here. The only difference between
them is that swapgc would generate graphics exposures, except we only
ever use this GC for PutImage, and PutImage doesn't generate graphics
exposures. We also don't need to explicitly ChangeGC to GXCopy, because
that's the default.
Reviewed-by: Eric Anholt <eric@anholt.net>
Looks like r16_unorm might have precision issues.
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.r16_unorm.r16_unorm.general_general
fails, but the dumped images in the xml are the same so
I'd guess the low bits are the issue.
r8_unorm and r16_uint work.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
3D blits & format reinterpretation are still TBD.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need the loop header phis for the outer exec masks. Needed for
dEQP-VK.glsl.demote.dynamic_loop_texture
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
NIR may emit a single instrinsic to load several packed varyings,
but that's suboptimal for Utgard PP for several reasons:
- varyings that are used as sampler inputs can be passed using
pipeline register with increased precision
- we have small number of regs, so using a vec4 regs for storing
two vec2 varyings increases reg pressure.
Add NIR pass to split a single load into several loads and utilize
it in lima.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
According to radeonsi, GLM doesn't support WB alone, so
we have to set INV too when WB is set.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Avoids getting a "load_output" in a case like this:
gl_Position = ubuf.MVP * ubuf.position[gl_VertexIndex];
frag_pos = gl_Position.xyz;
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Lower these to something compatible with ir3, and save the descriptor set
and binding information.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
The old version of the iterators relies on a &iter->field != NULL check
which works fine on older GCC but newer GCC versions and clang have
optimizations that break if you do pointer math on a null pointer. The
correct solution to this is to do the null comparisons before we do any
sort of &iter->field or use rb_node_data to do the reverse operation.
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is based on the fix I used for the same problem on V3D. In this
case, it fixes all but the the
dEQP-GLES2.functional.texture.filtering.2d.*_npot cases of
dEQP-GLES2.functional.texture.filtering.2d.*'s failures.
Acked-by: Rob Clark <robdclark@chromium.org>
As Vasily discovered, the bit 7 of the word 1 of the texture descriptor
is set when reloading the framebuffer, to use framebuffer-based offset
rather than normalized one. This bit also works for regular textures to
enable accessing with non-normalized offset.
Add support for rectangle texture by setting this bit for
PIPE_TEXTURE_RECT.
Suggested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Per the valgrind output below, we were returning the pointer to freed
memory if none of the later conditional pointer assignments were
executed. This caused dEQP CI jobs to crash on certain runners,
presumably due to a double-free down the line.
Also, we were skipping to the out: label before the vendor_id & chip_id
variables used by it were initialized, resulting in broken
LIBGL_DEBUG=verbose output such as
libGL: pci id for fd 4: 51108f00:51108f00, driver radeonsi
Fixes: 5a545e355b "loader: always map the "amdgpu" kernel driver name to radeonsi (v2)"
==403== Invalid read of size 1
==403== at 0x4AFD576: surfaceless_probe_device (platform_surfaceless.c:316)
==403== by 0x4AFD915: dri2_initialize_surfaceless (platform_surfaceless.c:391)
==403== by 0x4AF5EEA: dri2_initialize (egl_dri2.c:984)
==403== by 0x4AF5EEA: dri2_initialize (egl_dri2.c:958)
==403== by 0x4AF1EEC: _eglMatchAndInitialize (egldriver.c:75)
==403== by 0x4AF1F3B: _eglMatchDriver (egldriver.c:96)
==403== by 0x4AE9367: eglInitialize (eglapi.c:617)
==403== by 0x1D99C9: tcu::surfaceless::EglRenderContext::EglRenderContext(glu::RenderConfig const&, tcu::CommandLine const&) [clone .constprop.57] (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x1DABB0: tcu::surfaceless::ContextFactory::createContext(glu::RenderConfig const&, tcu::CommandLine const&, glu::RenderContext const*) const (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x53EBD1: glu::createRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::RenderConfig const&, glu::RenderContext const*) (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x53EFE9: glu::createDefaultRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::ApiType) (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x1DE07A: deqp::gles2::Context::Context(tcu::TestContext&) (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x1DB5EF: deqp::gles2::TestPackage::init() (in /deqp/modules/gles2/deqp-gles2)
==403== Address 0x56bd340 is 0 bytes inside a block of size 4 free'd
==403== at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==403== by 0x4B01767: loader_get_driver_for_fd (loader.c:464)
==403== by 0x4AFD553: surfaceless_probe_device (platform_surfaceless.c:308)
==403== by 0x4AFD915: dri2_initialize_surfaceless (platform_surfaceless.c:391)
==403== by 0x4AF5EEA: dri2_initialize (egl_dri2.c:984)
==403== by 0x4AF5EEA: dri2_initialize (egl_dri2.c:958)
==403== by 0x4AF1EEC: _eglMatchAndInitialize (egldriver.c:75)
==403== by 0x4AF1F3B: _eglMatchDriver (egldriver.c:96)
==403== by 0x4AE9367: eglInitialize (eglapi.c:617)
==403== by 0x1D99C9: tcu::surfaceless::EglRenderContext::EglRenderContext(glu::RenderConfig const&, tcu::CommandLine const&) [clone .constprop.57] (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x1DABB0: tcu::surfaceless::ContextFactory::createContext(glu::RenderConfig const&, tcu::CommandLine const&, glu::RenderContext const*) const (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x53EBD1: glu::createRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::RenderConfig const&, glu::RenderContext const*) (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x53EFE9: glu::createDefaultRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::ApiType) (in /deqp/modules/gles2/deqp-gles2)
==403== Block was alloc'd at
==403== at 0x483577F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==403== by 0x4EE5E09: strndup (strndup.c:43)
==403== by 0x4B010B1: loader_get_kernel_driver_name (loader.c:101)
==403== by 0x4B016AF: loader_get_driver_for_fd (loader.c:462)
==403== by 0x4AFD553: surfaceless_probe_device (platform_surfaceless.c:308)
==403== by 0x4AFD915: dri2_initialize_surfaceless (platform_surfaceless.c:391)
==403== by 0x4AF5EEA: dri2_initialize (egl_dri2.c:984)
==403== by 0x4AF5EEA: dri2_initialize (egl_dri2.c:958)
==403== by 0x4AF1EEC: _eglMatchAndInitialize (egldriver.c:75)
==403== by 0x4AF1F3B: _eglMatchDriver (egldriver.c:96)
==403== by 0x4AE9367: eglInitialize (eglapi.c:617)
==403== by 0x1D99C9: tcu::surfaceless::EglRenderContext::EglRenderContext(glu::RenderConfig const&, tcu::CommandLine const&) [clone .constprop.57] (in /deqp/modules/gles2/deqp-gles2)
==403== by 0x1DABB0: tcu::surfaceless::ContextFactory::createContext(glu::RenderConfig const&, tcu::CommandLine const&, glu::RenderContext const*) const (in /deqp/modules/gles2/deqp-gles2)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This new option can help debug shader compiler problems when
there are issues with the meta shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add a function called ac_get_fs_input_vgpr_cnt which will return
the number of input VGPRs used by an AMD shader. Previously,
radv and radeonsi had the same code duplicated, but this commit also
allows them to share this code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This commit allows RADV to set the shared VGPR count according to
the shader config.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit moves ac_get_tbuffer_format, ac_get_sampler_dim and
ac_get_image_dim into ac_shader_util, thus enabling them to be used
by compilers other than LLVM.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The aim of this commit is to keep ac_shader_util LLVM-free,
since we would like to use it in ACO later.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: rename pass_temp to pass_flags
v2: also CSE reductions
v3: add ds_swizzle_b32 support
v3: check gds/offset0/offset1 fields
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
We want to generate PC files for non-glvnd builds and for builds with
old glvnd, but the current logic doesn't do that, it builds them
unconditionally, and for GLES it builds the shared libraries, which is
also not what we want. This does not generate .pc files for gles1 or
gles2. Which it we weren't doing before either, making this not a
regression but a return to status-quo.o
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1838
Fixes: 93df862b6a
("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows the reslut of mov and bcsel to be separately interpreted as
float or int depending on the use.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Some shaders are hurt by this change because now a
load_const(0x00000000) is not recognized as eq_zero when loaded as a
float. This behavior is restored in a later patch (nir/range-analysis:
Use types to provide better ranges from bcsel and mov).
v2: Add a comment about reinterpretation of int/uint/bool. Suggested by
Caio. Rewrite condition the check for types being float versus checking
for types not being all the things that aren't float.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16327543 -> 16328255 (<.01%)
instructions in affected programs: 55928 -> 56640 (1.27%)
helped: 0
HURT: 208
HURT stats (abs) min: 1 max: 16 x̄: 3.42 x̃: 3
HURT stats (rel) min: 0.33% max: 6.74% x̄: 1.31% x̃: 1.12%
95% mean confidence interval for instructions value: 3.06 3.79
95% mean confidence interval for instructions %-change: 1.17% 1.46%
Instructions are HURT.
total cycles in shared programs: 363682759 -> 363683977 (<.01%)
cycles in affected programs: 325758 -> 326976 (0.37%)
helped: 44
HURT: 133
helped stats (abs) min: 1 max: 179 x̄: 33.61 x̃: 5
helped stats (rel) min: 0.06% max: 14.21% x̄: 2.47% x̃: 0.29%
HURT stats (abs) min: 1 max: 157 x̄: 20.28 x̃: 14
HURT stats (rel) min: 0.07% max: 14.44% x̄: 1.42% x̃: 0.73%
95% mean confidence interval for cycles value: 0.38 13.39
95% mean confidence interval for cycles %-change: -0.06% 0.96%
Inconclusive result (%-change mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs: 10787433 -> 10787443 (<.01%)
instructions in affected programs: 1842 -> 1852 (0.54%)
helped: 0
HURT: 10
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.33% max: 1.85% x̄: 0.73% x̃: 0.49%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.36% 1.10%
Instructions are HURT.
total cycles in shared programs: 153724543 -> 153724563 (<.01%)
cycles in affected programs: 8407 -> 8427 (0.24%)
helped: 1
HURT: 3
helped stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18
helped stats (rel) min: 0.98% max: 0.98% x̄: 0.98% x̃: 0.98%
HURT stats (abs) min: 4 max: 18 x̄: 12.67 x̃: 16
HURT stats (rel) min: 0.21% max: 0.75% x̄: 0.56% x̃: 0.72%
95% mean confidence interval for cycles value: -21.31 31.31
95% mean confidence interval for cycles %-change: -1.11% 1.46%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on Iron Lake or GM45.
We're using vs and fs now, and adding hs, ds and gs soon. It's
confusing enough that we have both DS/TCS and HS/TES. At least for VS
and FS there doesn't have to be multiple names.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
RET as a last instruction could be safely ignored.
Remove it to prevent crashes/warnings in case underlying driver
doesn't implement arbitrary returns.
A better way would be to remove the RET after the whole shader
is parsed which will handle a possible case when the last RET is
followed by a comment.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
--oneline shortens hashes, while --oneline=pretty doesn't, otherwise
they are the same. Having full hashes is convenient as that is the
format that the bin/.cherry-ignore script requires to work correctly.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
The main reason to do this is that 19.2 has slipped by two weeks, and
such the 19.3 branch is due to happen extremely close to the release of
19.2.0. I think it would be better to have a little more time between
releases for developers and for packagers.
This would still have the 19.3 release out before December, even if it
slips by 1 week.
Acked-By: Karol Herbst <kherbst@redhat.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
This is a bit counter-intuitive, but the issue is that GLVND is broken
in versions <= 1.1.1, so we need to keep wrongly providing these files
to cover up their mistake, otherwise the rest of the world ends up
broken.
Suggested-by: Dylan Baker <dylan@pnwbakers.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
We currently lower them, but nir_opt_algebraic() can add new ones because
lower_sub=true.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
When handling two variables with overlapping locations, we process the
one with lower location first, and then extend the location ->
driver_location map to guarantee that it's contiguous for the second
variable too. But the loop had the wrong bound, so we weren't extending
the map 100%, which could lead to problems later such as an incorrect
num_inputs. The loop index i is an index into the slots of the variable,
so we need to stop at the final slot of the variable (var_size) instead
of the number of unassigned slots.
This fixes
spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-interleave-range
on radeonsi NIR.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
both clang and gcc warn with:
"moving a local object in a return statement prevents copy elision"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <dev@pmoreau.org>
This moves the fix from commit 361f3d19f1 to happen in get_param
(used now instead of get_handle by st/dri). This fixes artifacts
seen with Xorg and CCS_E.
Fixes: fc12fd05f5 "iris: Implement pipe_screen::resource_get_param"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this, we'll incorrectly round off huge values to the nearest
representable double instead of keeping it at the exact value as
we're supposed to.
Found by inspecting compiler-warnings.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 85faf5082f ("glsl: Add 64-bit integer support for constant expressions")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Currently we add dependecies in 3 cases:
1) One node consumes value produced by another node
2) Sequency dependencies
3) Write after read dependencies
2) and 3) only affect scheduler decisions since we still can use pipeline
register if we have only 1 dependency of type 1).
Add 3 dependency types and mark dependencies as we add them.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
It makes no sense to clone texture coords if it's not varying, moreover
we don't support cloning ALU nodes.
Fixes: 1c1890fa70 ("lima/ppir: clone uniforms and load_coords into each successor")
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
because vl doesn't call flush_resource and I wasn't able to find
all places where flush_resource needs to be called.
This fixes corrupted / unflushed surfaces with fullscreen videos on Raven.
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
This fixes some piglit tests on radeonsi NIR where a varying is
initialized to a constant array in the vertex shader. Varying packing
after nir_lower_io_to_temporaries creates writemasked stores which
persist after pulling the constant initialization down into the fragment
shader.
While we're here, rewrite handle_constant_store() to do the loop over
components outside the switch, so that we don't have to duplicate the
writemask checking for every bitsize.
Fixes: 1235850522 ("nir: Add a large constants optimization pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes a compilation error when building libnouveau:
In file included from ../src/gallium/drivers/nouveau/nv50/nv50_program.c:25:
../src/compiler/nir/nir.h:1115:10: fatal error: nir_intrinsics.h: No such file or directory
#include "nir_intrinsics.h"
^~~~~~~~~~~~~~~~~~
compilation terminated.
Fixes: f014ae3c7c ("nouveau: add support for nir")
Signed-off-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Released today and hangs on RADV. We don't have the root cause yet,
but this should unblock people playing the game.
No drirc because the radv debugflags are not usable from drirc and
I want this backported.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
flrp was forgotten when already adding the rounding mode for other
instructions.
Fixes: ba1e25e1aa ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
After
1711bf6cf2 ("intel/fs: Generate better code for fsign multiplied by a value"),
the conflicts resolution for setting the rounding mode after the
fused fmul and fsign optimization is non obvious.
Basically, the optimization doesn't really result in a MUL, or any
other operation which would need to have the rounding mode set. Hence,
we set it just before the actual MUL in the treatment of fmul.
Fixes: ba1e25e1aa ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The script only handles commits with "Fixes: <sha1>" where <sha1> is
equal or great than 8 chars. But <sha1> can be smaller, like 7 chars.
This commit relax the restriction to handle <sha1> 4 or more chars.
Fixes: 533fead423 ("bin/get-pick-list.sh: tweak the commit sha matching pattern")
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
The scheduler doesn't expect them. To do this, I had to refactor the
registration part of gpir_node_create_dest() to be separate from
creating and inserting the node, since the last two now aren't done when
handling moves. This adds more code but creates the possibility of
automatically inserting input dependencies when inserting nodes, similar
to what's done in NIR with the use-def lists (this isn't done yet).
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
We guarantee that a complex1 op is always used by postlog2 directly by
rewriting the postlog2 op to be a move when there would be a move
inserted between them. But we weren't doing this in all circumstances
where there might be a move. Move the logic to place_move() so that it
always happens. Fixes a few log tests that happened to start failing due
to changes in the register allocator leading to a different scheduling
order.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
This commit adds the framework for cross-basic-block register
allocation. Like ARM's compiler, we assume that the value registers
aren't usable across branches, which means we have to use physical
registers to store any value that crosses a basic block. There are three
parts to this:
1. When translating from NIR, we rely on the NIR out-of-ssa pass to
coalesce values into registers. We insert store_reg instructions for
values used in more than one basic block, and load_reg instructions for
values not defined in the same basic block (or defined after their use,
for loops). So by the time we've translated out of NIR we've already
split things into values (which are only used in the same basic block)
and registers (which are only used in different basic blocks than where
they're defined).
2. We allocate the registers at the same time that we allocate the
values, before the final scheduler. Unlike the values, where the
assigned color is fake, we assign the actual physical index & component
to physregs at this stage. load_reg and store_reg are treated as moves
in the allocator and when creating write-after-read dependencies.
3. Finally, in the main scheduler we have to avoid overwriting existing
live physregs when spilling. First, we have to tell the scheduler which
physical registers are live at the end of each block, to avoid
overwriting those. If a register is only live at the beginning, we can
reuse it for spilling after the last original use in the final program
happens, i.e. before any original use is scheduled, but we have to be
careful to add the proper dependencies so that the spill write is
scheduled before the original reads. To handle this we repurpose
reg_link for uses to be used by the scheduler.
A few register-related things copied over from NIR or from other
drivers can be dropped.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Because branch conditions have to be in the pass slot, there is no
unconditional branch, and realistically the pass slot has to contain a
move when branching (there's nothing it does that would be useful for
operating on booleans, so we can't use it for anything when computing
the branch condition), we put the branch instruction in the pass slot
and at codegen time turn it into a move of the branch condition. This
means that it doesn't have to be special-cased like store instructions
are in the scheduler. Because of this decision we can remove the
half-implemented BRANCH codegen slot. Finally, we (ab)use the existing
schedule_first mechanism to make sure that branches are always last in
the basic block.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
When picking a node to be scheduled, we try to schedule its children as
well. But we shouldn't try to schedule nodes which only have a fake
dependency on the original node, since this isn't the point of
scheduling children at the same time and can break some expectations of
the rest of the code.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Somewhat terrifyingly, we never sent this for direct contexts, which
means the server never knew the context/drawable bindings. To handle
this sanely, pull the request code up out of the indirect backend, and
rewrite the context switch path to call it as appropriate. This
attempts to preserve the existing behavior of not calling unbind() on
the context if its refcount would not drop to zero.
Of course, you can't just do this indiscriminately, because this is GLX
and extant X servers have bugs and everything is terrible. To wit:
- For 1.20.x prior to 1.20.6, you can bind a direct context once, but
the second time you try to modify the context's binding you will get
GLXBadContextTag. This includes unbinding the context. And "deleting"
the context will leak memory, because it will still appear to be
current.
- For 1.19 and earlier, glXMakeCurrent(dpy, None, ctx) should be legal
for GL 3.0+ contexts, but the server will throw BadMatch.
To guard against this, we only send the request for indirect contexts
unless the server is known good, and only mention one context at a time
in such a request; if switching between contexts, we first unbind the
old, and then bind the new. Note that the second VendorRelease() version
is to catch XFree86 4.x and Xorg [67].x, which almost certainly have the
above bugs. Other servers might report different version numbers here,
but we can't do direct rendering against them, so this should be safe.
Fixes glx-make-context, glx-multi-window-single-context and
glx-query-drawable-glx_fbconfig_id-window. Sufficiently old piglit will
regress on glx-make-glxdrawable-current (throwing BadMatch), which is
fixed by mesa/piglit!116.
From the MEDIA_VFE_STATE docs:
"Starting with this configuration, the Maximum Number of Threads must
be set to (#EU * 8) for GPGPU dispatches.
Although there are only 7 threads per EU in the configuration, the
FFTID is calculated as if there are 8 threads per EU, which in turn
requires a larger amount of Scratch Space to be allocated by the
driver."
It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern. The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.
Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.
Fixes: 5ac804bd9a ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts commit 729de1488f.
It turns out that, although the register is in the logical context,
it isn't whitelisted, so we can't actually write it from userspace
batch buffers. The write just becomes a noop, which is why we saw
no performance changes.
I manually whitelisted it, and still observed no performance gains, but
it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments
on the iris driver. So we might need to fix something before enabling
this. To prevent it randomly getting turned on should the kernel ever
whitelist this register, we revert the patch for now.
'α' has never appeared in any genxml files, so there's no need to
replace it with the word "alpha".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Use VPC_SO_OVERRIDE to control whether we do streamout in binning or
draw pass. Normally we want to do streamout in binning pass, except
when there is a single tile and binning passed is skipped.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We could bit doing streamout from binning pass. In this case we want to
use the full VS which doesn't have (potentially streamed out) varyings
stripped out.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
In a3268599f3, I attempted to fix nir_repair_ssa for unreachable
blocks. However, that commit missed the possibility that the use is in
a block which, itself, is unreachable. In this case, we can end up in
an infinite loop trying to replace a def with itself. Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it. Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def. That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.
Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
pipe->clear() is not called for partial clears, which mesa emulates by
drawing a quad.
Furthermore, drivers should not use rasterizer state information for
scissor information (which was being used to handle the partial clears).
So, remove the partial clear support since it was not supposed to be
handled by pipe->clear() anyway.
This fixes issues with clearing after switching to different sized
framebuffers.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
->padded_count should be large enough to cover all vertices pointed by
the index array. Use the local vertex_count variable that contains the
updated vertex_count value for the indexed draw case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
fixes "sorry, unimplemented: non-trivial designated initializers not supported"
Fixes: deb04adf2a ("clover: add support for passing kernels as nir to the driver")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Allocating BOs is expensive, so we should avoid doing that by caching
freed BOs.
BO cache is modelled after one in v3d driver and works as follows:
- in lima_bo_create() check if we have matching BO in cache and return
it if there's one, allocate new BO otherwise.
- in lima_bo_unreference() (renamed from lima_bo_free()): put BO in
cache instead of freeing it and remove all stale BOs from cache
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
os_time_get_absolute_timeout(0) returns current time, while kernel
driver expects 0 as value to poll BO status and return immediately.
Fix it by setting abs_timeout to 0 if timeout_ns is 0
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Some time weston set full damage region. It is
more effient to use the cached pp stream instead
of dynamically create one.
Reviewed-and-Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
This extension set a damage region for each
buffer swap which can be used to reduce buffer
reload cost by only feed damage region's tile
buffer address for PP.
Reviewed-and-Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
The PLBU expects the viewport's 4 borders' coordinates, however
currently we're feeding the coordinate of the left-bottom point and the
size to it, which leads to misrendering when the left-bottom point is
not (0,0).
Change the macros for the viewport PLBU command, and the data feed to
it. The code to calculate the 4 borders is ported from Panfrost.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
ACO depends on C++14, but radeonsi/radv with LLVM 8,9 do not. Let us
only require it for RADV, since that is the only user.
Fixes: a70a998718 "radv/aco: Setup alternate path in RADV to support the experimental ACO compiler"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
required for OpenCL
v2: adjust to changes in previous commits
v3: properly convert to NIR in nvc0_cp_state_create
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr> (v1)
v2: minor formatting fixes
v3: call glsl_type_singleton_init_or_ref and glsl_type_singleton_decref
v4: capitalize and punctuate comments
fix text_executable -> text_intermediate in TODO
make glsl_type_singleton wrapper static
v5: rewrite how we run the nir passes
v6: fix unhandled case switch warning in st/mesa
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v4)
v2: rework arguments to compiler::compile_program
add assert to device::ir_format
v3: remove PIPE_SHADER_IR_SPIRV
change title
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v2)
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Most drivers have actually no binary format and just store the IR directly
as a single entry point blob.
v2: add a cap to switch between single or multi entry point binaries
v3: remove the entry_point field
v4: remove PIPE_CAP_MULTI_ENTRY_POINT_BINARIES
v5: remove supports_multiple_entry_points
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Changes since:
* v12:
- remove autotools (Karol Herbst)
- Remove the callback in format_validation_msg. (Francisco Jerez)
- Removed is_binary_spirv. (Francisco Jerez)
- Pass a string reference to is_valid_spirv instead of the
notification callback. (Francisco Jerez)
* v11: Fix compilation error introduced in v11.
* v10:
- Reuse format_validation_msg in is_valid_spirv.
- Remove LVL2STR macro in format_validation_msg.
* v9: Add `clover_cpp_std` to the overrides of the `libclspirv` target
in Meson.
* v7: Add DEFINES to libclspirv and libclover, in autotools, as they
would otherwise never know whether CLOVER_ALLOW_SPIRV has been
defined (Dave Airlie)
* v6: Update the dependency name (meson) and the libs variable
(Makefile) due to the replacement of llvm-spirv to the new
official SPIRV-LLVM-Translator.
* v5: Changed to match the updated “clover/llvm: Allow translating from
SPIR-V to LLVM IR” in the v6.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Changes since:
* v12 (Karol Herbst):
- rename CLOVER_ALLOW_SPIRV to HAVE_CLOVER_SPIRV
* v11 (Karol Herbst):
- only set new defines for clover to speed up recompilation
- remove autotools
* v10:
- Add a new flag (`--enable-opencl-spirv` for autotools, and
`-Dopencl-spirv=true` for meson) for enabling SPIR-V support in
clover, and never automagically enable it without that flag. (Dylan Baker)
- When enabling the SPIR-V support, the SPIRV-Tools and
SPIRV-LLVM-Translator libraries are now required dependencies.
* v7:
- Properly align LLVMSPIRVLib comment (Dylan Baker)
- Only define CLOVER_ALLOW_SPIRV when **both** dependencies are found:
autotools was only requiring one or the other.
* v6: Replace the llvm-spirv repository by the new official
SPIRV-LLVM-Translator.
* v4: Add a comment saying where to find llvm-spirv (Karol Herbst).
* v3:
- make SPIRV-Tools and llvm-spirv optional (Francisco Jerez);
- bump requirement for llvm-spirv to version 0.2
* v2:
- Bump the required version of SPIRV-Tools to the latest release;
- Add a dependency on llvm-spirv.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v10)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Gen11 doesn't require us to bypass the L2 cache for BC* images anymore.
The documentation is a bit hard to follow on this point, but the Windows
driver clearly only applies this workaround on Gen9, and their commit
history indicates that this was an intentional change to drop the
workaround for Gen11+.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently there is no way to make no context current w/gallium + osmesa.
The non-gallium version of osmesa does this if the context and buffer
passed to `OSMesaMakeCurrent` are both null. This small change makes it
so that this is also the case with the gallium version.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't really handle it in the little-core 64-bit case but it's not
really needed there. Where we really want this is for when we need to
do 16 -> 8-bit conversions.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Because byte immediates aren't a thing on GEN hardware, we return a
signed or unsigned word immediate in the byte case.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
During generate_shuffle(), when we use byte sized registers we end up
with a destination stride of 2. We don't take the stride into
consideration when selecting the group offset for the last MOV
operation, which means we end up moving things to the wrong place,
leaving the last few channels untouched. Take the destination stride
in consideration so we don't miss the last channels.
v2: Assert this is not necessary for the IVB special case (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
The new order matches that of the comparison functions accepted by the C
standard library qsort() functions. Being consistent with qsort will
hopefully help avoid developer confusion.
The only current user of the red-black tree is aub_mem.c which is pretty
easy to fix up.
Reviewed-by: Lionel Landwerlin <lionel.g.lndwerlin@intel.com>
When I wrote the red-black tree implementation, I wrote tests for it but
they never got imported into mesa.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This effectively breaks the instance dispatch table in 2 with entry
points using a physical device as first argument getting their own
dispatch table.
As a result we now have to check instance & physical device dispatch
table instead of just the instance dispatch table before.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were using the current drawable of the context to name the
appropriate screen for creating the bitmaps. But one, the current
drawable can be None, and two, it can be a GLXDrawable. Passing either
one as the second argument to XCreatePixmap will throw BadDrawable. Use
the root window of the context's screen instead.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/89
LOLed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
atof() is locale-dependent (sigh), which means 1.3 becomes 1.0 if the
locale's decimal separator isn't a full-stop. Just use the protocol
major/minor instead. This would be slightly broken if the server
generically implements 1.3+ but a particular screen is only capable of
less, but in practice no such servers exist.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/74
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant. For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry. This
can result in Z-fighting artifacts (at best). For now, disable this
optimization in these stages.
In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.
total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There's a missing prev_ldst = NULL; assignment in the new logic,
but even with this fixed it seems to regress some applications,
so let's revert the change until we find the real problem.
This reverts commit c9bebae287.
BLORP always turns off TCS/TES/GS. If regular drawing also has them
disabled (the overwhelmingly common case), then leaving them disabled
is just fine by us and we can skip dirtying them, as that would just
re-disable them a second time on the next draw.
If they are actually enabled, however, we do need to flag them.
Cuts 52% of the 3DSTATE_HS packets in an Aztec Ruins trace.
Later generations support bindless for samplers, images, and buffers and
thus per-stage descriptors are not limited by the binding table size.
However, gen8 doesn't support bindless images and thus needs to report a
lower per-stage limit so that all combinations of descriptors that fit
within the advertised limits are reported as supported by
vkGetDescriptorSetLayoutSupport.
Fixes test dEQP-VK.api.maintenance3_check.descriptor_set
Fixes: 79fb0d27f3 ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Commit d1e1563bb6 added a NULL check for eglGetSyncAttribKHR
but eglGetSyncAttrib does not do this. Patch adds same check to
happen with eglGetSyncAttrib.
Fixes crashes in (when exposing EGL 1.5):
dEQP-EGL.functional.fence_sync.invalid.get_invalid_value
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: mesa-stable@lists.freedesktop.org
This improves a couple of things:
1. We now only update anything if the shader actually cares.
Previously, is_indexed_draw was causing us to flag dirty vertex
buffers, elements, and SGVs every time the shader switched between
indexed and non-indexed draws. This is a very common situation,
but we only need that information if the shader uses gl_BaseVertex.
We were also flagging things when switching between indirect/direct
draws as well, and now we only bother if it matters.
2. We upload new draw parameters only when necessary.
When we detect that the draw parameters have changed, we upload a
new copy, and use that. Previously we were uploading it every time
the vertex buffers were dirty (for possibly unrelated reasons) and
the shader needed that info. Tying these together also makes the
code a bit easier to follow.
In Civilization VI's benchmark, this code was flagging dirty state
many times per frame (49 average, 16 median, 614 maximum). Now it
occurs exactly once for the entire run.
This makes use of the total job size limiting feature added in the
previous patch.
The idea is to avoid an excessive build up in memory use due to the
use of both the UTIL_QUEUE_INIT_RESIZE_IF_FULL and
UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY flags.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When both UTIL_QUEUE_INIT_RESIZE_IF_FULL and
UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY are set, we can get into a
situation where the queue never executes and grows to a huge size
due to all other threads being busy.
This is the case with the shader cache when attempting to compile a
huge number of shaders up front. If all threads are busy compiling
shaders the cache queues memory use can climb into the many GBs
very fast.
The use of these two flags with the shader cache is intended to
allow shaders compiled at runtime to be compiled as fast as possible.
To avoid huge memory use but still allow the queue to perform
optimally in the run time compilation case, we now add the ability
to track memory consumed by the jobs in the queue and limit it to
a hardcoded 256MB which should be more than enough.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since we set the UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY flag this should
have little impact on low core systems. However just about all modern
CPUs currently available that run Mesa have *at least* 4 cores. For
these CPUs allowing more threads can result in the queue being
processed faster and avoid excessive memory use due to a backlog of
cache entrys building up in the queue.
This change helps avoid a huge build up of cache entrys in the queue
due to using both the UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY and
UTIL_QUEUE_INIT_RESIZE_IF_FULL flags.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The current code can create functions with a width of 32, which is not
supported by our hardware. Add some code to simplify how we express
what we want and prevent such cases.
For some unknown reason, all the tests I could run seem to work even
with these unsupported MOVs.
Fixes: b0858c1cc6 "intel/fs: Add a couple of simple helper opcodes"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
There are cases where we try to generate registers with a stride of
32, while the hardware maximum is just 16. This happens, for example,
when using 8 bit integers on SIMD32. This results in a crash because
the variable 'width' has a value of 32:
../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned
int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid
register width"' failed.
This change prevents the crash and makes the tests pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
IMHO the code is easier to understand this way, being explicit that
we're doing exactly the same thing every time.
No functional changes.
v2: Adjust the loop breaking condition (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
When dealing with uint16_t and uint8_t on SIMD32 we can do all the
operations using just 2 registers, so we don't hit the recursion at
the beginning of emit_scan(). Because of that, we need to actually
compute scan/reduce for channels 31:16.
v2: Still missed instructions (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
We want this for tessellation eventually, but we can turn it on now.
Shader-db results:
total instructions in shared programs: 8612905 -> 8611387 (-0.02%)
instructions in affected programs: 164952 -> 163434 (-0.92%)
total dwords in shared programs: 11952000 -> 11950560 (-0.01%)
dwords in affected programs: 68096 -> 66656 (-2.11%)
total full in shared programs: 315019 -> 315009 (<.01%)
full in affected programs: 1642 -> 1632 (-0.61%)
total constlen in shared programs: 2463654 -> 2463654 (0.00%)
constlen in affected programs: 0 -> 0
total (ss) in shared programs: 152379 -> 152409 (0.02%)
(ss) in affected programs: 1503 -> 1533 (2.00%)
total (sy) in shared programs: 96473 -> 96525 (0.05%)
(sy) in affected programs: 654 -> 706 (7.95%)
total max_sun in shared programs: 1172454 -> 1172472 (<.01%)
max_sun in affected programs: 104 -> 122 (17.31%)
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also, swap vs and fs constructor or so fs comes first.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When using xfb and rasterizing, the fragment shader may have fewer
inputs than the vertex shader outputs. We can't rely on gl_Position to
be placed at fs->total_in, but have to instead remember where we add
it in the link map and use that location.
Fixes 100+ tesselation dEQPs under
dEQP-GLES31.functional.tessellation.primitive_discard.*
dEQP-GLES31.functional.tessellation.user_defined_io.*
Reviewed-by: Eric Anholt <eric@anholt.net>
New added cases "stole" the previous break.
Fixes: 420ad0a1a3 ("spirv: check support for SPV_KHR_float_controls capabilities")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
If we can entirely push uniform data, we don't need a SURFACE_STATE
descriptor for pulling data. Since constant uploads are a very common
operation, and being able to push all data is also very common, we would
like to avoid the overhead in this case.
This patch defers uploading new descriptors. Instead of handling that
at iris_set_constant_buffer, we do it at iris_update_compiled_shaders,
where we can see the currently bound shader variants. If any need pull
descriptors, and descriptors are missing, we update them and flag that
the binding table also needs to be refreshed.
Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by
31.9774% +/- 1.12947% (n=15).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I would like for iris to be able to avoid setting up SURFACE_STATE
for UBOs in the common case where all constants are pushed.
Unfortunately, we don't know up front whether everything will be
pushed: the backend is allowed to demote pushed UBOs to pull loads
fairly late in the process. This is probably desirable though, as
we'd like the backend to be able to re-pull pushed data to break up
long live ranges in response to register pressure.
Here we simply add a "are there any pull loads at all" boolean to
prog_data, which is a bit crude but at least allows us to skip work
in the common "everything pushed" case. We could skip more work by
tracking exactly which UBO surfaces are pulled in a bitmask, but I
wanted to avoid bringing back the old mark_surface_used() mechanism.
Finer-grained tracking could allow us to skip a bit more work when
multiple UBOs are in use and /some/ are 100% pushed, but others are
accessed via pulls. However, I'm not sure how common this is and
it would save at most 4 pull descriptors, so we defer that for now.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We now track per-stage bind history for constant and shader buffers,
shader images, and sampler views by adding an extra res->bind_stages
field to go with res->bind_history.
This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages
involved, and also skip some CPU overhead in iris_rebind_buffer.
Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace
on Icelake.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The underlying buffer isn't changing - so we don't need to update any
SURFACE_STATE descriptors - we just might have new constants, meaning
we need to re-emit 3DSTATE_CONSTANT_XS. On Gen9, this means we need
to update 3DSTATE_BINDING_TABLE_POINTERS_XS too, but that's now handled
by the explicit check in the previous patch.
On Gen9, this should cause us to re-emit the binding table /pointer/ on
writing to a buffer with PIPE_BIND_CONSTANT_BUFFER, rather than emitting
a whole new /table/.
On Gen8 and Gen11, this avoids binding table churn altogether.
Cuts 61% of 3DSTATE_BINDING_TABLE_POINTERS_XS packets in a Shadow of
Mordor trace on Icelake.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Right now, we usually flag both IRIS_DIRTY_{CONSTANTS,BINDINGS}_XS,
because we have SURFACE_STATE for constant buffers in case the shaders
access them via pull mode.
But this flagging is overkill in many cases. Gen8 and Gen11 don't need
it at all. Gen9 doesn't need that large of a hammer in all cases.
Just handle it explicitly so the right thing happens.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We upload a new SURFACE_STATE for the UBO/SSBO in question, which
means that we need new binding tables as well.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Apparently we already enabled it without having support ...
Not sure if we also need to set disable_start_of_prim when the PS
has memory writes, but this mirrors radeonsi.
Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to
~32 pixels/cycle on a Raven.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The pass assumed that "Most ALU ops produce an undefined result if any
source is undef" which is completely untrue. Due to how we lower if
statements to selects and then optimize on those selects later, we
simply cannot make that assumption. In particular this pass tried to
replace an ior of undef and true, which had been generated by
optimizing a select which itself came from flattening an if statement,
to undef causing a miscompilation for a CTS test with radeonsi NIR.
We fix this by always doing what the non-undef path did, i.e. duplicate
the instruction twice. If there are cases where the instruction before
the loop can be folded away due to having an undef source, we should add
these to opt_undef instead.
The comment above the pass says that if the phi source from before the
loop is undef, and we can fold the instruction before the loop to undef,
then we can ignore sources of the original instruction that don't
dominate the block before the loop because we don't need them to create
the instruction before the loop. This is incorrect, because the
instruction at the bottom of the loop would get those sources from the
wrong loop iteration. The code never actually did what the comment said,
so we only have to update the comment to match what the pass actually
does. We also update the example to more closely match what most actual
loops look like after vtn and peephole_select.
There are no shader-db changes with i965, radeonsi NIR, or radv. With
anv and my vkpipeline-db there's only one change:
total instructions in shared programs: 14125290 -> 14125300 (<.01%)
instructions in affected programs: 2598 -> 2608 (0.38%)
helped: 0
HURT: 1
total cycles in shared programs: 2051473437 -> 2051473397 (<.01%)
cycles in affected programs: 36697 -> 36657 (-0.11%)
helped: 1
HURT: 0
Fixes
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
with radeonsi NIR.
Akin to 1a25980c46 ("egl: drop incorrect pkg-config file for
glvnd") and b01524fff0 ("meson: don't build libGLES*.so with
GLVND") , removes a pkg-config file that shouldn't have been there in
the first place, but was needed because of that GLVND bug.
Now that the glvnd bug has been fixed, it was apparent that this gl.pc
pkg-config file was forgotten to be removed, so let's do just that :)
Suggested-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
No GS copy shader if a pipeline enables NGG GS.
This fixes
dEQP-VK.pipeline.executable_properties.graphics.*geometry_stage*.
Fixes: 86864eedd2 ("radv: Implement radv_GetPipelineExecutablePropertiesKHR.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* Set missing STENCIL_CONFIG_EXT2 bits
* Swap stencil sides when rendering CCW
Fixes following deqp tests (which were 99% failing):
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
Note: deqp tests require --deqp-gl-config-name=rgba8888d24s8ms0
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This is needed in particular to get a recent enough version of meson in
the stretch image, but should be generally beneficial.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Pros:
* Less fragile due to not mixing packages from stretch and buster
* No longer need to use third-party LLVM packages
* The buster image now uses GCC 8 for C++ as well (previously 6 for C++,
8 for C), allowing to drop some hacks
Con:
* The stretch image now only uses GCC 6 for C as well as C++
* Need separate jobs for testing old LLVM versions
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
If installing new packages would require removing previously installed
ones, this flag causes apt-get to abort with an error instead,
preventing later obscure failures due to the missing packages.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
If we want to execute several batches in parallel they need to have
their own tiler and scratchpad BOs. Let move those objects to
panfrost_batch and allocate them on a per-batch basis.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we want the batch dependency tracking to work correctly we must
make sure all BOs are added to the batch->bos set early enough. Adding
FBO BOs when generating the fragment job is clearly to late. Add a
panfrost_batch_add_fbo_bos helper and call it in the clear/draw path.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This helper automates the panfrost_bo_create()+panfrost_batch_add_bo()+
panfrost_bo_unreference() sequence that's done for all per-batch BOs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Thanks to that we avoid the recursive call into panfrost_bo_create()
and we can get rid of panfrost_bo_release() by inlining the code in
panfrost_bo_unreference().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
panfrost_bo_unreference() should be used instead.
The only difference caused by this change is that the scratchpad,
tiler_heap and tiler_dummy BOs are now returned to the cache instead
of being freed when a context is destroyed. This is only a problem if
we care about context isolation, which apparently is not the case since
transient BOs are already returned to the per-FD cache (and all contexts
share the same address space anyway, so enforcing context isolation
is almost impossible).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Store a screen pointer in panfrost_bo so we don't have to pass a screen
object to all functions manipulating the BO.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Right now, the BO API is spread over pan_{allocate,resource,screen}.h.
Let's move all BO related definitions to a separate header file.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
pan_drm.c was only meaningful when we were supporting 2 kernel drivers
(mali_kbase, and the drm one). Now that there's now kernel-driver
abstraction we're better off moving those functions were they belong:
* BO related functions in pan_bo.c
* fence related functions + query_gpu_version() in pan_screen.c
* submit related functions in pan_job.c
While at it, we rename the functions according to the place they're
being moved to.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
ctx is allocated with rzalloc() which takes care of zero-ing the memory
region. No need to call memset(0) on top.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
That's what we do for other per-batch BOs, and we'll soon add an helper
to automate this create_bo()+add_bo()+bo_unreference() sequence, so
let's prepare the code to ease this transition.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Some BOs are used by batches but never explicitly added to the BO set.
This is currently not a problem because we wait for the execution of
a batch to be finished before releasing a BO, but we will soon relax
this rule.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The DRM driver expects an array of u32, let's use the correct type, even
if using an int works in practice because it's still a 32-bit integer.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Looks like only HALT2 GPUs have support for it but that is not yet
implemented so disable ARB_shadow for now.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
drmIoctl handles EAGAIN itself and actually it always return -1 on errors.
Remove the wrong handling of its return value. Also, print a warning when
it fails.
v2: - use _debug_printf instead of fprintf (Gurchetan Singh)
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
The compiler now sets the "Null Render Target" bit in the RT write
extended message descriptor, causing it to write to an implicit null
surface without us needing to set one up in the binding table.
Together with the last patch, this improves performance in Car Chase on
an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When there are no color regions (i.e. a depth only pass), we can set
the "Null Render Target" bit in the Gen11 RT write extended message
descriptor to indicate that it should behave as if it's writing to a
null render target, without the need for a binding table entry.
This lets drivers avoid setting up that null RT binding table entry,
but more importantly means the HW doesn't actually have to bother
looking up the surface state.
Together with the next patch, this improves performance in Car Chase on
an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This adds support for
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and
enables de Vulkan and SPIR-V extensions.
Also, notice that this includes the updates applied to the
VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension
VK_KHR_shader_float_controls v4 and Vulkan 1.1.116.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The remove_extra_rounding_modes() optimization will remove duplicated
rounding mode changes.
v2:
- Fix bug in the rounding mode change (Alejandro).
v3:
- Fix rounding modes.
v4:
- Updated to renamed shader info member and enum values (Andres).
v5:
- Simplify flags logic operations (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We need this function to emit code that setups the control register
later with the defined execution mode for the shader. Therefore, we
emit it as the first instruction.
v2:
- Fix bug in setting the default mode mask in brw_rnd_mode_from_nir().
- Fix support for rounding modes in brw_rnd_mode_from_nir().
v3:
- Updated to renamed shader info member and enum values (Andres).
v4:
- Add actual emission as first instruction of emit_nir_code (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Before this commit, we had only FPRoundingMode decoration (the per
instruction one) that is applied during the SPIR-V handling. In
vtn_alu we find out the rounding mode, and generate the code
accordingly that later will be used to look for the respective
nir_op_f2f16_{rtz,rtne}.
Per-instruction gets prioritized because we make them explicit
conversions (with RTZ or RTNE nir opcodes) and they will override the
default execution mode defined with float controls. However, we need
to come back to the mode defined by float controls after the execution
of the FP Rounding instruction.
Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be
used to set the default rounding mode and denorms treatment in the
whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be
used as prioritized rounding mode in a per-instruction basis.
v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.
v3:
- Update comment (Caio).
v4:
- Split the patch into the helper and the new opcode (this
one) (Caio).
v5:
- Add an explanation on the actual purpose and priority of the newly
introduced opcode in the commit log (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.
v3:
- Update comment (Caio).
v4:
- Split the patch into the helper (this one) and the new
opcode (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The denorm mode is set in the control register, no need to do
something else.
v2:
- Add an assert to make sure that we realize if this assumption is
broken in the future (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we have fsin or fcos trigonometric operations with constant values
as inputs, we will multiply the result by 0.99997 in
brw_nir_apply_trig_workarounds, making the result wrong.
Adjusting the rules so they do not apply to const values we let a
later constant fold to deal with it.
v2:
- Do not early constant fold but only apply the trig workaround for
non constants (Caio).
- Add fixes tag to commit log (Caio).
Fixes: bfd17c76c1 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Until now, it was using the floating point version of fmin/fmax,
instead of the double version.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Replace hard coded value with DBL_MIN (Connor).
v3:
- Have into account the FLOAT_CONTROLS_DENORM_PRESERVE_FP64
flag (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
According to VK_KHR_shader_float_controls:
"Denormalized values obtained via unpacking an integer into a vector
of values with smaller bit width and interpreting those values as
floating-point numbers must: be flushed to zero, unless the entry
point is declared with the code:DenormPreserve execution mode."
v2:
- Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor).
v3:
- Adapt to use the new NIR lowering framework (Andres).
v4:
- Updated to renamed shader info member and enum values (Andres).
v5:
- Simplify flags logic operations (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
If FLOAT_CONTROLS_SIGNED_ZERO_INF_NAN_PRESERVE or
FLOAT_CONTROLS_DENORM_FLUSH_TO_ZERO are enabled, do not apply the
inexact optimizations so the VK_KHR_shader_float_controls execution
mode is respected.
v2:
- Do not apply inexact optimizations if SHADER_DENORM_FLUSH_TO_ZERO is
enabled (Andres).
v3:
- Updated to renamed shader info member (Andres).
v4:
- Directly access execution mode instead of dragging it by parameter (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v1]
With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.
For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.
Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Move the op-code specific knowledge to nir_opcodes.py even if it
means a rount trip conversion (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
According to Vulkan spec, the new execution modes affect only
correctly rounded SPIR-V instructions, which includes fadd, fsub and
fmul.
v2:
- Fix fmul, fsub and fadd round-to-zero definitions, they should use
auxiliary functions to calculate the proper value because Mesa uses
round-to-nearest-even rounding mode by default (Connor).
v3:
- Do an actual fused multiply-add at ffma (Connor).
v4:
- Simplify fadd and fmul for bit sizes < 64 (Connor).
- Do not use double ffma for 32 bits float (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
f2f16's rounding modes are already handled and f2f64 don't need it
as there is not a floating point type with higher bit size than 64 for
now.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In order to be coherent with the pre-existent API for half floats,
this new API for double is the one meant to be used when doing double
to float conversions. It is no more than a wrapper for the softfloat.h
API but we meant to keep that one private.
v2:
- Fix bug in _mesa_double_to_float_rtz() in the inf/nan detection
using the exponent value.
v3:
- Replace custom f64 -> f32 implementations with the softfloat
one (Andres).
v4:
- Added API usage clarifying comments (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
In order to be coherent with the pre-existent functions, this new API
is the one meant to be used when doing half float to float
conversions. It is no more than a wrapper for the softfloat.h API but
we meant to keep that one private.
v2:
- Replace custom f32 -> f16 RTZ implementation with the softfloat
one (Andres).
v3:
- Added API usage clarifying comments (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Implemented fadd, fsub, fmul and ffma for doubles and ffma for floats,
rounding to zero, using a modified implementation from Berkely
Softfloat 3e Library.
Their implementation correctness has been checked with the Berkeley
TestFloat Release 3e tool for x86_64.
v2:
- Reuse util_last_bit64() in _mesa_count_leading_zeros64()
implementation (Connor).
v3:
- Add a specific ffma for floats version (Connor).
- Implement the ffma for doubles version (Andres).
- Lots of fixes in fadd, fsub and fmul (Andres).
- Improved documentation (Andres).
v4:
- Added f64 -> f32 conversion function (Andres).
- Added f32 -> f16 RTZ conversion function (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Tested-by: Andres Gomez <agomez@igalia.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2:
- Refactor conditions and shared function (Connor).
- Move code to nir_eval_const_opcode() (Connor).
- Don't flush to zero on fquantize2f16
From Vulkan spec, VK_KHR_shader_float_controls section:
"3) Do denorm and rounding mode controls apply to OpSpecConstantOp?
RESOLVED: Yes, except when the opcode is OpQuantizeToF16."
v3:
- Fix bit size (Connor).
- Fix execution mode on nir_loop_analize (Connor).
v4:
- Adapt after API changes to nir_eval_const_opcode (Andres).
v5:
- Simplify constant_denorm_flush_to_zero (Caio).
v6:
- Adapt after API changes and to use the new constant
constructors (Andres).
- Replace MAYBE_UNUSED with UNUSED as the first is going
away (Andres).
v7:
- Adapt to newly added calls (Andres).
- Simplified the auxiliary to flush denorms to zero (Caio).
- Updated to renamed supported capabilities member (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v4]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2:
- Added more functions.
v3:
- Simplify most of the functions (Caio).
v4:
- Updated to renamed enum values (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
v2:
- Add support for rounding modes for each floating point bit size.
v3:
- Commit e68871f6a4 ("spirv: Handle constants and types before
execution modes") changed when the execution modes are handled,
which affects the result of the floating point constants when the
rounding mode is set in the execution mode. Moved the handling of
the rounding modes before we handle the constants.
v4:
- Rename vtn_decoration "literals" to "operands" (Andres).
- Simplify execution mode parsing util function (Caio).
- Extend the comment about the timing of the handling of the rounding
modes (Caio).
v5:
- Correct extension name (Caio).
- Rename shader info member (Andres).
- Rename float controls enum (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Yes, some tests fail, but we can turn those into XFAILs at meson time.
Better to keep the things that work working than not cover them at all.
Unfortunately XPASS results will not cause the build to fail until we
update CI to meson 0.51 or newer.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Since struct timespec's tv_sec member is of type time_t, adjust the
expected value to allow for the truncation which will occur with 32-bit
time_t.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Otherwise it never gets closed, this fixes errors seen with deqp-egl
where we end up opening 1024 files.
Fixes: 2dce0e94 ("iris: Initial commit of a new 'iris' driver for Intel Gen8+ GPUs.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In case the GLSL version is 130 or higher, we've already enabled
ARB_shader_bit_encoding a bit earlier in this same function. So this
condition will always be true.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The PLBU seems to preserve scissor state between draws, and since lima doesn't
emit PLBU_CMD_SCISSORS() if scissor test is disabled, it uses state from previous draw.
Fix it by emitting PLBU_CMD_SCISSORS() for full fb if scissor test is disabled.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
SPIR-V 1.5 incorported the SPV_EXT_shader_viewport_index_layer but
splitting into the two capabilities above. Just handle them as we
support the extension already.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We can't deref list_(first/last)_entries unless we know we have at least
one. Instead, just use our IP we've been tracking as we go to set up the
start ip, and fill in the end IP as we walk instructions.
Fixes a complaint in valgrind on
dEQP-GLES3.functional.transform_feedback.* which sometimes has an
empty main (non-END) block when the VS inputs are just directly mapped
to outputs without any ALU ops.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Table 23.54 of the OpenGL 4.5 spec lists the minimum values for
GL_POINT_SIZE_RANGE as [1, 1]. So zero is not allowed (even though
arguably this could be useful for MSAA rendering, where a sub-1px
point might cover only some samples...)
This fixes the WebGL 2.0 conformance suite's state.gl-get-calls test
on Chromium on Linux, which uses desktop OpenGL. The test checks that
the minimum value of GL_ALIASED_POINT_SIZE_RANGE is 1. Unfortunately,
that query doesn't exist in desktop GL, so it checks POINT_SIZE_RANGE,
which is the anti-aliased value. There's not really anything better
for Chromium to do here, unfortunately. When running Chromium with
--api=es3, it maps it to the correct query and the test already works.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, internalformat GL_RGBA and type GL_UNSIGNED_SHORT_5_5_5_1
was promoted to RGBA8888 as the table entry with the 5551 formats
is listed below the 8888 entry, and it also doesn't have GL_RGBA as
a possible internalformat.
Using actual 5551 fixes the following dEQP-EGL test:
- dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently scons puts them in src/mapi/glapi, meosn puts them in
src/mapi/glapi/gen. This results in some things being compilable only by
one or the other, put them in the same places so that everyone is happy.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
It's useful for analyzing shader binaries produced by ARM mali offline
compiler which outputs files in MBS format. MBS is mali binary shader,
currently parser just extracts shader binary and ignores everything else.
Reviewed-and-tested-by: Connor Abbott<cwabbott0@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Update GL headers and xml API from upstream Khronos registry (commit
3d0c3eb). Keep `BUILDING_MESA` quirk in glext.h.
mesa/extensions: Expose EXT_EGL_sync instead of MESA_EGL_sync to reflect
Khronos request of changing this extension's scope from MESA to EXT.
EGL_EGL_sync is also the name of the extension that has been merged into
the upstream Khronos GL registry.
Remove MESA_EGL_sync spec txt from Mesa tree as it is now published as
EXT by Khronos.
v1: Remove MESA_EGL_sync spec and squash commits (Eric E)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This might allow the arm64 tests to start running earlier.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2:
* Preserve setting NIR_VALIDATE=0 for all arm64_* jobs
* Preserve setting DEQP_SKIPS=deqp-default-skips.txt for
arm64_a306_gles2 jobs
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1
Reviewed-by: Eric Anholt <eric@anholt.net>
Support for multiple inheritance was added to GitLab recently.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows the arm64_a306_gles2 jobs to run as soon as the meson-arm64
job has finished.
Fixes: 6f0dc087b7 "freedreno: Introduce gitlab-based CI."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit f3e978db incorrectly assumed the maximum number of
samplers was equal to the max number of defined samplers
e.g. where bindings skip slots.
This fixes an assert in si_nir_load_sampler_desc() for an
enemy territory quake wars shader. And fixes potential bugs with
incorrect bounds limiting in the same code for production builds
of mesa.
Fixes: f3e978db ("radeonsi/nir: Remove uniform variable scanning")
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
It's still disabled by default because transform feedback randomly
hangs and it seems like it's related to GDS (cf. RadeonSI).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise the next streamout operation will overwrite GDS. This
can be improved by tracking if there is a streamout operation in
flight. Currently the driver unconditionally flushes but that
doesn't matter much as NGG streamout is disabled by default.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
NGG streamout uses GDS and we have to make sure that another
process isn't going to overwrite GDS while our shaders are busy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise the wave IDs are probably 0 and it hangs. NGG_WAVE_ID_EN
generates wave IDs for GDS OA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This internal option is turned off by default because NGG streamout
still hangs. It seems like it's related to GDS as RadeonSI.
That option will be turned on once all issues are resolved.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Documentation for pipe_context::flush states:
"NOTE: use screen->fence_reference() (or equivalent) to transfer
new fence ref to **fence, to ensure that previous fence is unref'd"
Hence we need to unref previous out_fence.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A filed of nir_variable.location may be equel to -1.
That may cause copying to invalid address of list-node,
making some internal fields corrupted.
Patch fixes segfault during freeing context due to
corrupted address of ralloc_header.destructor.
v2: copy data if var is constant (Connor Abbott)
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b6d4753568 (nir/large_constants: De-duplicate constants)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111676
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Vulkan applications can register with the following structure :
typedef struct VkApplicationInfo {
VkStructureType sType;
const void* pNext;
const char* pApplicationName;
uint32_t applicationVersion;
const char* pEngineName;
uint32_t engineVersion;
uint32_t apiVersion;
} VkApplicationInfo;
This enables the Vulkan implementations to apply workarounds based off
matching this description.
Here we add a new parameter for matching the driconfig options with
the following :
<device driver="anv">
<application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42">
<option name="blaaah" value="true" />
</application>
</device>
v2: switch engine name match to use regexps
v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric)
v4: Add missing bit that went to the following commit (Eric)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Seen a couple flakes on this one so far. Not sure if it is a real
driver problem or not, but skip it to unblock things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
It was calloc'd to 0 which is PIPE_PRIM_POINTS, which means that we
fail to notice an initial primitive of points being new, and fail at
updating the "primitive is points or lines" field.
We do not need to reset this on device loss because we're tracking
the last primitive mode sent to us on the CPU via draw_vbo, not the
last primitive mode sent to the GPU.
Fixes several tests:
- dEQP-GLES3.functional.clipping.point.wide_point_clip
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner
Fixes: dcfca0af7c ("iris: Set XY Clipping correctly.")
If people fix bugs without updating the expected-fails list, then we
end up with a lack of coverage of those failures in the future. Also,
some day down the line another developer ends up trying to figure out
if the bug was actually fixed or their environment is just failing to
reproduce it.
Suggested-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This hasn't failed for me in ~5 minutes of looping over
dEQP-GLES3.functional.fbo.msaa.*
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Add a ppir dummy node for nir_ssa_undef_instr, create a reg for it and mark
it as undefined, so that regalloc can set it non-interfering to avoid
register pressure.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Vasily Khozuzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Building w/ AOSP, I was hitting the following error:
external/mesa3d/src/amd/Android.common.mk:95: error: missing separator.
Which was due to the changes to mesa-build-with-llvm missing
a line continuation.
Fixes: 96b592696f
Signed-off-by: John Stultz <john.stultz@linaro.org>
We are about to patch panfrost_flush() to flush all pending batches,
not only the current one. In order to do that, we need to move the
'flush single batch' code to panfrost_batch_submit().
While at it, we get rid of the existing pipelining logic, which is
currently unused and replace it by an unconditional wait at the end of
panfrost_batch_submit(). A new pipeline logic will be introduced later
on.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
panfrost_flush() is about to be reworked to flush all pending batches,
but we want the fence to block on the last one. Let's move the fence
creation logic in panfrost_flush() to prepare for this situation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
panfrost_draw_vbo() Might call the primeconvert/without_prim_restart
helpers which will enter the ->draw_vbo() again. Let's delay
payloads[].offset_start initialization so we don't initialize them
twice.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
panfrost_attach_vt_xxx() functions are now passed a batch, and the
generated FB desc is kept in panfrost_batch so we can switch FBs
without forcing a flush. The postfix->framebuffer field is restored
on the next attach_vt_framebuffer() call if the batch already has an
FB desc.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
So we can emit SET_VALUE jobs for a batch that's not currently bound
to the context.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
We'll soon be able to flush a batch that's not currently bound to the
context, which means ctx->pipe_framebuffer will not necessarily be the
FBO targeted by the wallpaper draw. Let's prepare for this case and
use ctx->wallpaper_batch in panfrost_blit_wallpaper().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
We need that if we want to upload transient buffers to a batch that's
not currently bound to the context, which in turn will be needed if we
want to relax the batch serialization we have right now (only flush
batches when we need to: on a flush request, or when one batch depends
on the result of other batches).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Rename panfrost_is_scanout() into panfrost_batch_is_scanout(), pass it
a batch instead of a context and move the code to pan_job.c.
With this in place, we can now test if a batch is targeting a scanout
FB even if this batch is not bound to the context.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Will be replaced by something similar but using a BOs as keys instead
of resources.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This way we have all the fb_state information directly attached to a
batch and can pass only the batch to functions emitting CMDs, which is
needed if we want to be able to queue CMDs to a batch that's not
currently bound to the context.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
mir_foreach_instr_in_block_safe() is based on list_for_each_entry_safe()
which is designed to protect against removal of the current entry, but
removing the entry placed just after the current one will lead to a
use-after-free situation.
Luckily, the midgard_pair_load_store() logic guarantees that the
instruction being removed (if any) is never placed just after ins which
in turn guarantees that the hidden __next variable always points to a
valid object.
Took me a bit of time to realize that this code was safe, so I'm
suggesting to get rid of the inner mir_foreach_instr_in_block_from()
loop and rework the code so that the removed instruction is always the
current one (which is what the list_for_each_entry_safe() API was
initially designed for).
While at it, we also get rid of the unecessary insert(ins)/remove(ins)
dance by simply moving the instruction around.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
list_for_each_entry() does not allow modifying the current item pointer.
Let's rework the skip-instructions logic in schedule_block() to not
break this rule.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The V3D documentation states that primitive counters are reset when
we emit Tile Binning Mode Configuration items, which we do at the start
of each draw call, however, in the actual hardware this doesn't seem to
take effect when transform feedback is not active (this doesn't happen in
the simulator). This causes a problem in the following scenario:
glBeginTransformFeedback()
glDrawArrays()
glPauseTransformFeedback()
glDrawArrays()
glResumeTransformFeedback()
glEndTransformFeedback()
The TF pause will trigger a flush of the primitive counters, which results
in a correct number of primitives up to that point. In theory, the counter
should then be reset when we execute the draw after pausing TF, but that
doesn't happen, and since TF is enabled again by the resume command before
we end recording, by the time we end the transform feedback recording we
again check the counters, but instead of reading 0, we read again the same
value we read at the time we paused, incorrectly accumulating that value
again.
In theory, we should be able to avoid this by using the other method to
reset the primitive counters: using operation 1 instead of 0 when we
flush the counts to the buffer at the time we pause, but again, this
doesn't seem to be work and we still see obsolete counts by the time we
end transform feedback.
This patch fixes the problem by not accumulating TF primitive counts
unless we know we have actually queued draw calls during transform
feedback, since that seems to effectively reset the counters. This should
also be more performant, since it saves unnecessary stalls for the
primitive counters to be updated when we know there haven't been any
new primitives drawn.
Fixes CTS tests:
dEQP-GLES3.functional.transform_feedback.*
Reviewed-by: Eric Anholt <eric@anholt.net>
This was updating the counter for the indexed draw path only, but we are
already updating the counter for all paths a bit later, so this is only
duplicating counts for indexed paths.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of running it with the Wayland platform, which introduces
unwanted dependencies and complexity.
Makes tests run 30% faster, as well.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
streamout_buffers is assigned after that function, so the previous
fix was completely wrong. This probably fix something when streamout
buffers and push constants are used/inlined in the same shader.
Fixes: 378e2d2414 ("radv: fix computing number of user SGPRs for streamout buffers")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
_mesa_texstore_z32f_x24s8 calculates source rowStride at a
pace of 64-bit, this will make inaccuracy offset if the width
of src image is an odd number. Modify src pointer to int_32* as
source image format is gl_float which is 32-bit per pixel.
Reviewed by Ilia Mirkin
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The AnTuTu "garden" benchmark overflows the fixed size constbuffer
stateobject, so lets be more clever and calculate (a potentially
slightly pessimistic) actual size.
Signed-off-by: Rob Clark <robdclark@chromium.org>
fd6_blitter.c:724:31: warning: passing argument 1 of ‘fd_resource_level_linear’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since freedreno's kernel and GPU reset seem to be totally solid, we
don't need to have the complexity of the LAVA setup that panfrost has.
Instead, we can register some boards as shared gitlab runners and have
the jobs run out of a docker container just like we do for llvmpipe.
Just make sure that the DRI device node is passed through to the
containers in the gitlab config ('devices = ["/dev/dri"]' under
runners.docker).
If a runner fails (networking dies, kernel panic, etc.) it'll take out
one build but the rest can keep going since gitlab-runner is what
pulls jobs. Since the runner pulls jobs, it also means that they can
live behind firewalls instead of needing some public address to be
accessed by gitlab.fd.o.
For now, enable it just on db410c (A307) and cheza (A630) as those are
the hardware that I have plenty of. A307 is only testing GLES2 since
running all of GLES3 takes too long for the number of boards I've
brought up.
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Sometimes you just want confirmation that dEQP really picked up the
driver we built you thought. This is not as good as one might like,
because git isn't present in the cross-build image.
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
A handful of tests on freedreno have been close to the watchdog
timeout, and now sporadically fail since range analysis has slowed
down the compiler for them.
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This brings back the fallback previously present in
st_nir_lookup_parameter_index(): if there's no parameter associated
with the variable, use a parameter from a variable with the same
prefix.
We'll have to sort out something for SPIR-V, but in the meantime let's
fix GLSL.
Fixes: b6384e57f5 ("mesa/st: Lookup parameters without using names")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
As introduced in "v3d: flag dirty state when binding new sampler states"
we need to add support for compute states. New flag VC5_DIRTY_COMPTEX and
VC5_DIRTY_UNCOMPILED_CS are introduced.
Reaching 33 flags at the dirty field forces us to change the type to
uint_64. Flags are reordered and empty continuous bits are available
for future pipeline stages.
v2: Update flag conditions to compile cs shader. (Eric Antholt)
Now dirty flags use uint_64t and flags are reordered.
Added VC5_DIRTY_UNCOMPILED_CS flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Translating TGSI_INTERPOLATE_COLOR as INTERP_MODE_SMOOTH made
it for drivers impossible to have flatshaded color inputs.
Translate it to INTERP_MODE_NONE which drivers interpret as
smooth or flat depending on flatshading state.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111467
Fixes: 770faf54 ("tgsi_to_nir: Improve interpolation modes.")
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The patch adds support for HAL_PIXEL_FORMAT_RGBA_1010102 on
Android platform.
Fixes android.media.cts.DecoderTest#testVp9HdrStaticMetadata
which failed in egl due to "Unsupported native buffer format 0x2b"
on Android.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chenglei Ren <chenglei.ren@intel.com>
We have only two defines that aren't from DRM_FORMAT_*: SARGB and
SABGR. Keep only those as __DRI_IMAGE_FOURCC and garbage collect the
rest.
While this header is also used from the X server, the X server doesn't
use any __DRI_IMAGE enums.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Taken from drm-misc-next 268de6530aa1 ("drm: mst: Fix query_payload
ack reply struct")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Previously, ReadPixels, PBO upload/download, and clears would call
cso_save_state with CSO_PAUSE_QUERIES, causing cso_context to call
pipe->set_active_query_state() twice for each operation. This can
potentially cause driver work to enable/disable statistics counters.
But often, there are no queries happening which need to be paused.
By keeping a simple tally of active queries, we can skip this work.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Initial benchmarking didn't show any performance benefits. But it might eventually.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For example, the surfaceless platform only supports pbuffers. If the
driver supports MSAA, we would still create a config, but it would have
no supported surface types. That's meaningless, so don't do it.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
To go any further than this would be to break the current version of
Android.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
[ Michel Dänzer: Dropped jessie line from debian-install.sh again ]
Upgrading to a newer g++ causes older LLVM/clang packages to be
removed.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Something seems to have changed in Debian buster causing installation
of the other foreign packages to fail without this.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
IIRC, designated initializers are not legal C++.
Fixes the MSVC build.
Fixes: 83fd1e58 ("glsl/nir: Add and use a gl_nir_link() function")
Reviewed-by: Neha Bhende <bhenden@vmware.com>
This is unsupported by meson and may become a hard error in the future.
Fixes: 5adfc8602c
("lima/ppir: move sin/cos input scaling into NIR")
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Looks like .out_sync wasn't set in lima_submit_start(), as result
submit completion fence was never signalled.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This is a pipe format, not a boolean.
Fixes: 5849e0612c ("gallium/auxiliary: Add util_format_get_depth_only() helper.")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fixes dEQP-GLES3.functional.texture.specification.texstorage3d.size.3d_2x2x2_2_levels
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If the lowest (largest) mipmap level is too small to tile, then don't
bother pretending.
Note that this requires initializing pipe->screen before
fd_resource_level_linear() is called.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This will also "unlock" OpenGL 4.6 for Iris!
v2: Also enable PIPE_CAP_GL_SPIRV_VARIABLE_POINTERS.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
Perform all the NIR linking steps in order. Change iris and i965 to
use it. Suggested by Alejandro.
v2: Add gl_nir_linker_options struct.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
The PIPE_CAP_GL_SPIRV capability enables ARB_gl_spirv and
ARB_spirv_extensions, and will make sure the corresponding SPIR-V
capabilities and extensions lists are initialized.
The additional PIPE_CAP_GL_SPIRV_VARIABLE_POINTERS capability enables
the support for Variable Pointers in SPIR-V shaders. This depends on
the driver and is not mandatory for ARB_gl_spirv support.
v2: Add a PIPE_CAP for Variable Pointers. (Marek)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
There's no such case, if we load prog->nir from the shader cache, we
shouldn't hit this path.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The SPIR-V codepath uses NIR linking, so we have to preprocess after
the linking steps, which makes things slightly different than GLSL.
To make more clear when the preprocess is happening, I've ended up
inlining st_nir_get_mesa_program() into its caller.
The goal was to make both GLSL and SPIR-V to use the same preprocess
function, the exceptions are:
- SPIR-V codepath don't support NIR state slots yet;
- GLSL lowers shared memory early, so we don't do the deref lowering
for those.
For now I didn't bother to rename other functions and files (now that
many of them apply to both GLSL and SPIR-V), but we should do this in
further patches.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Refactor to split the glsl_to_nir conversion from the preprocessing
NIR passes into separate functions, so we can use them in SPIR-V.
Unlike in GLSL, there we'll need to perform a few passes with the NIR
linker before doing the individual preprocess calls.
No behavior should change with this patch.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Use the new MainUniformStorageIndex field in Parameter instead. It
was added so we could match those in the SPIR-V case, where names are
optional.
v2: Use MainUniformStorageIndex for all cases.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Use the new UniformStorageIndex field in Parameter instead. This
mechanism was added so we could match those in the SPIR-V case, where
names are optional.
v2: Use UniformStorageIndex for all cases. (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When creating Parameters, fill in the associated uniform storage
indices, like it is done with the NIR linker used for SPIR-V. This
will allow later code to not rely on names (which would never work for
SPIR-V where names are optional).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The parameter lists were not being created nor filled since i965
doesn't use them. In Gallium they are used for uniform handling, so
add a way to fill them.
The gl_uniform_storage struct got two new fields that let us go
- from a Parameter to the matching UniformStorage and,
- from the variable to the *first* UniformStorage
without relying on names -- since they are optional for ARB_gl_spirv.
Later patches will make use of them.
v2: Do not fill parameters for i965. (Timothy)
Use uint32_t for the new attributes. (Marek)
v3: Serialize the new fields. (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The gl_register_file doesn't need 16 bits, so shorten it and use the
extra room for 'Padded' (also mark it as a single bit). This shrinks
the struct size from 32 bytes to 24 bytes.
See also 4794fbc86e ("mesa: reduce the size of gl_program_parameter")
that shrinked from 40 to 24 and later 7536af670b ("glsl: fix shader
cache for packed param list") that added `Padded`.
v2: Use just 5 bits for gl_register_file. (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Every uniform that have the "gl_" name also have some state slots. So
use the state_slots like we did in 57b6184931 ("i965: account for NIR
uniforms without name").
This removes the dependency on names, which are optional when using
ARB_gl_spirv.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Don't use the UNMAPPED_UNIFORM_LOC (-1) to set the unsigned
max_uniform_location. Those unmapped uniforms don't have to be
accounted at this point.
Fixes: 7a9e5cdfbb ("nir/linker: Add gl_nir_link_uniforms()")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This mirrors the haiku build which uses a platform.
v2: - Fix some rebase problems
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
v4: - Don't wrap a single file in a list to match mesa style
- Use null_dep instead of empty list
Reviewed-by: Eric Anholt <eric@anholt.net> (v3)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
v4: - Don't run checks on Windows that will always fail
Reviewed-by: Eric Anholt <eric@anholt.net> (v3)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Which will allow meson to build a shared glapi build with mingw.
v2: - Add symbol to symbol check test
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
It doesn't compile due to undefined symbols, which are in
libglapi_static, so I don't understand the problem.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Currently the praser for s expressions assumes that newlines will be \n,
resulting in incorrect parsing on windows, where the newline is \r\n.
This patch just adds \r? to the regular expression used to parse the s
expressions, which fixes at 1 test on windows.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Since the system value refactor, we've accidentally only been setting
cbuf->buffer_size in the UBO case, and not in the uploaded-constants
case. We use cbuf->buffer_size to fill out the SURFACE_STATE entry,
so it needs to be initialized in both cases.
Fixes: 3b6d787e40 ("iris: move sysvals to their own constant buffer")
This fixes some interactions when NGG GS is enabled. It fixes:
- dEQP-VK.clipping.user_defined.clip_cull_distance_dynamic_index.*geom*
- dEQP-VK.tessellation.geometry_interaction.passthrough.*
For some reasons, using the computed ESGS ring size randomly hangs
with CTS. For now, just use the maximum LDS size for ESGS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This shouldn't be in NIR->LLVM because ACO also needs the shader
info. This will also help for computing some NGG values that are
necessary for declaring LDS symbols.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
ac_surface computes it for amdgpu.
radeon_drm_surface computes it for radeon.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
The VBO module maps a buffer with GL_MAP_FLUSH_EXPLICIT, and keeps
appending data, and calling glFlushMappedBufferRange(). We were
invalidating the VF cache each time it flushed a new range, which
results in a ton of VF flushes.
If the contents of the destination in the target range are undefined
(never even possibly written), this patch makes us assume that it's
likely not in the cache and so cache invalidations are required. If
the destination range is defined, we continue cache flushing as we may
need to expunge stale data.
This eliminates 88% of the VF cache invalidates on Manhattan 3.0.
Improves performance in Manhattan 3.0 on my Icelake 8x8 with the GPU
frequency locked to 700Mhz by 0.376724% +/- 0.0989183% (n=10).
This cuts roughly 85% of the 3DSTATE_SAMPLER_STATE_POINTERS_PS calls in
the J2DBench images test. For some reason, the state tracker is calling
bind_sampler_state with the same sampler state in a bunch of cases.
The line stipple pattern and factor only matter if line stippling is
actually enabled. Otherwise, we can safely ignore it.
PBO upload may give us zero for line stipple information, while normal
drawing tends to give us an actual stipple pattern such as 0xffff. This
was causing us to flag IRIS_DIRTY_LINE_STIPPLE way too often, leading to
useless 3DSTATE_LINE_STIPPLE commands, which are non-pipelined and thus
very expensive.
Improves performance in Manhattan 3.0 on Skylake GT4e by
0.149261% +/- 0.0380796% (n=210). On an Icelake 8x8 with the GPU
frequency locked at 700Mhz, improves by 0.423756% +/- 0.222843% (n=3).
The entire point of schedule_first is that the node has to be scheduled
as soon as possible without any moves because it doesn't produce a
proper floating-point value, or its value changes depending on where you
read it. We were still introducing a move for preexp2 in some cases
though, even if it got scheduled as soon as possible, which broke some
exp() tests. Fix that.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The whole point of schedule_first nodes is that they need to be
scheduled as soon as possible, so if a schedule_first node is the
successor in a fake dependency that prevents it from being scheduled
after its parent, that can cause problems. We need to add these fake
dependencies to the parent as well, and we need to guarantee that the
pre-RA scheduler puts schedule_first nodes right before their parents in
order to prevent this from adding cycles to the dependency graph.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The idea was to make sure schedule_first nodes were always first in the
ready list. I made sure they were inserted first, but not that other
nodes wouldn't later be scheduled ahead of them. Fixes
spec@glsl-1.10@execution@built-in-functions@vs-exp-float and probably
others.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The point of the function is to avoid creating a complex move which is
used by certain slots in the next instruction, but unscheduled
successors will never be in the next instruction. Found while debugging
a crash that the previous commit fixed.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The scheduler assumes that load nodes are always duplicated so that they
can always be scheduled eventually and therefore they never need to be
spilled. But some lowerings were running after the pre-RA scheduler,
whereas duplication has to happen before then since it's needed for the
scheduler to do a better job reducing register pressure. This meant
that lowerings were introducing multiple uses of a load instruction,
which broke the scheduler's expectation and resulted in infinite loops
in situations where the only nodes available to spill were load nodes.
Spilling load nodes would be silly, so we want to fix the lowerings
rather than the scheduler. Just do all lowerings before the pre-RA
scheduler, which also helps with reducing pressure since the scheduler
can more accurately compute the pressure.
Fixeslima/mesa#104.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Change needed to fix the following building error:
In file included from external/mesa/src/intel/vulkan/anv_device.c:43:
external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found
^~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 4dcb1ff ("anv: add support for driconf")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
So we can move all the BO logic into this file instead of having it
spread over pan_resource.c, pan_drm.c and pan_bo_cache.c.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The last users have been converted to use plain BOs. Let's get rid of
this abstraction. We can always consider adding it back if we need it
at some point.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Some fields in panfrost_context are unused (probably leftovers from
previous refactor). Let's get rid of them.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
ctx->{scratchpad,tiler_heap,tiler_dummy} are allocated using
panfrost_drm_allocate_slab() but they never any of the SLAB-based
allocation logic. Let's convert those fields to plain BOs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Right now, the transient memory allocator implements its own BO caching
mechanism, which is not really needed since we already have a generic
BO cache. Let's simplify things a bit.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
What we currently call a job is actually a batch containing several jobs
all attached to a rendering operation targeting a specific FBO.
Let's rename structs, functions, variables and fields to reflect this
fact.
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This commit follow OES_EGL_sync to universially enable use of EGL sync
objects with desktop OpenGL contexts.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The dead_cf pass calls into the CF manipulation helpers which attempt to
keep NIR's SSA form sane. However, when the only break is removed from
a loop, dominance gets messed up anyway because the CF SSA clean-up code
only looks at phis and doesn't consider the case of code becoming
unreachable. One solution to this would be to put the loop into LCSSA
form before we modify any of its contents. Another (and the approach
taken by this pass) is to just run the repair_ssa pass afterwards
because the CF manipulation helpers are smart enough to keep all the
use/def stuff sane; they just don't always preserve dominance
properties.
While we're here, we clean up some bogus indentation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111405
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111069
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
NIR currently assumes that unreachable blocks are trivially dominated by
everything. However, when considering well-formed SSA, there is no path
from any block to an unreachable block. Therefore, we can break any
use-def chains where the use is in an unreachable block. This removes
any dependencies on code created by uses in unreachable blocks and lets
DCE do a better job of cleaning it up.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already bail and don't split the vars but we were passing a NULL to
_mesa_hash_table_search which is not allowed.
Fixes: f1cb3348f1 "nir/split_vars: Properly bail in the presence of ..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
In the case where the stencil clear is nicely aligned, we can clear
stencil much more efficiently by mapping it as a wide format (say
RGBA32_UINT) and blasting out the stencil clear value with a repclear.
On Unigine Heaven, this makes one stencil clear go from non-trivial to
unnoticeable when looking at per-draw timings.
In order for this change to work properly, ANV needs to do a bit more
flushing around depth and stencil clears. i965 and iris already have
the cache tracking logic to handle this so no changes are required
there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This isn't known to fix any current bugs but it does prevent a
regression in a subsequent commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 02bc4aabb48 ('nir/lower_io_to_vector: allow FS outputs to be vectorized')
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
u_endian.h needs to be included, otherwise PIPE_ARCH_BIG_ENDIAN might not
be defined on big-endian architectures and the endian conversion macros
will be incorrect.
I don't think anything is broken because of this, I just noticed this when
looking at the file.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This bit redirects the state cache from the unified/RO sections of the
L3 cache to the "CS command buffer" section of the cache, which would
be set up via TCCNTLREG. The documentation says:
"Additionaly, this redirection should be enabled only if there is a
non-zero allocation for the CS command buffer section."
We don't allocate any cache to the CS command buffer section, so
enabling this redirection effectively disabled the state cache.
The Windows driver only sets up that section when using POSH, which
we do not currently use. So, leave it unallocated and disable the
redirection to get a functional state cache again.
Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%,
and Car Chase by 2%.
Jason pointed out that the caches likely refer to offsets from dynamic
and surface state base addresses, so when we change those, we need to
invalidate the caches.
Comment borrowed from src/intel/vulkan/genX_cmd_buffer.c.
The driver can't determine PIPE_QUERY_PRIMITIVES_GENERATED or
PIPE_QUERY_PRIMITIVES_EMITTED once we support geometry or
tessellation, since these stages add primitives at runtime. Use the
WRITE_PRIMITIVE_COUNTS event to write back the primitive counts and
implement a hw query for this.
Reviewed-by: Rob Clark <robdclark@gmail.com>
The GPU writes out streamout offsets as it goes to the FLUSH_BASE
pointer. We use that value with CP_MEM_TO_REG when appending to the
stream so that we don't have to track the offsets with the CPU in the
driver. This ensures that streamout continues to work once we enable
geometry and tessellation shader stages that add geometry.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Should fix some issues we're seeing. And use REALLOC instead of realloc.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This has lower_io_to_vector try to turn variables into arrays of 4-sized
vectors when possible and fall back to the old approach when that isn't
possible.
This is so that lower_io_to_vector can guarantee that only one variable is
used for each fragment shader output.
v2: handle dual-source blending
v3: don't try to merge structs and non-32-bit types in get_flat_type()
v3: fix per-vertex inputs
v3: fix and cleanup location advancement in get_flat_type() and it's
calling code
v4: prioritize the original mode over the flat mode
v4: don't create flat variables to merge only one variable
v5: don't skip an entire slot when encountering structs in the old mode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
It shouldn't matter much because output varyings should have been
compacted during NIR shader linking but it mirrors what the driver
does when emitting NGG GS vertex parameters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the fragment shader needs the layer index, we have to allocate
one more dword in the NGG GS storage. Found by inspection. This
doesn't fix anything known.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Sometimes LAVA jobs will timeout due to transient issues, and the Gitlab
job will fail in that case. Increase the timeouts to reduce the
likeliness of that happening and reduce false positives.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
So repositories don't need to be specially configured with a token to
access LAVA, store this token in a bind volume for a special runner.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
For loops which condition is false on the first iteration
iteration count was falsely calculated under the assumption
that loop's condition is true until it becomes false, meaning
it's true at least one time.
Now such loops are reported as having 0 iteration.
Similar to the fix e71fc7f2 done in NIR.
Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit removes the GLSL dependency in TTN by manually recording
the textures used and calling nir_lower_samplers
instead of its GL counterpart.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.
This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This fixes a memory leak in the flush code:
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 in __interceptor_realloc .../gcc-8.3.0/libsanitizer/asan/asan_malloc_linux.cc:105
#1 in si_buffer_do_flush_region src/gallium/drivers/radeonsi/si_buffer.c:573
#2 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:608
#3 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:597
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch partially reverts 20294dc ("mesa: Enable asm unconditionally, ...")
Android makefile build logic needs to disable assembler optimization
in 32bit builds to avoid text relocations for libglapi.so shared
Fixes the following build error with Android x86 32bit target:
[ 0% 4/477] target SharedLib: libglapi (out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so)
FAILED: out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so
...
prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: warning: shared library text segment is not shareable
prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: error: treating warnings as errors
clang-6.0: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 20294dc ("mesa: Enable asm unconditionally, now that gen_matypes is gone.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
The codegen handles it and it adds the correct casts. This fixes
a bunch of LLVM validation errors when enabling Wave32 for compute.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Load a 32-bit value then convert to 1-bit. Convert 1-bit to 32-bit
value, then Store it.
These cases started to appear when we changed Anvil to use derefs for
shared memory.
v2: Use `bit_size` in a couple of places we were missing. (Jason)
Reassign `value` instead of `src[0]`. (Jason)
Fixes: 024a46a407 ("anv: use derefs for shared memory access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit c0504569ea. Now that
we're doing interpolation lowering in NIR, we can continue to stride the
FS input registers directly in the brw_fs_nir code like we did before.
This fixes SIMD32 fragment shaders which broke because lower_simd_width
depended on the 0 stride to split PLN instructions correctly.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This commit does two things. First, it simplifies the way we compute
the FB write group bit. There's no reason to use a ternary because
inst->group / 16 can only be 0 or 1. Second, it fixes an order-of-
operations bug where the ternary wasn't selecting between (1 << 11) and
0 but between (1 << 11) and 0 | brw_dp_write_desc(...).
Fixes: 0d9648416 "intel/compiler: Use generic SEND for Gen7+ FB writes"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Utgard PP is vec4 architecture, so lowering phis to scalars
increases instruction count and potentially interferes with
spilling.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
For render formats, update fd2_pipe2color to only work with HW supported
render formats, and remove the format whitelist is_format_supported. This
patch enables float render formats (which work).
For vertex/texture formats, use a generic function which translates using
the bitsize of the channels. Since we fake support for some vertex formats,
check for these in is_format_supported to avoid enabling them as sampler
formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Use fd_gmem_restore_format() to avoid trying to use unsupported Z24S8/Z16
render formats for gmem restore.
Also apply this change to gmem2mem so it doesn't depend on fd2_pipe2color
working with depth formats.
gmem2mem/mem2gmem also doesn't need to use the swap/swizzle, since dst/src
formats are the same.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Fixes failures in the following deqp tests:
dEQP-GLES2.functional.polygon_offset.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes failures in the following deqp tests:
dEQP-GLES2.functional.fragment_ops.*src_alpha_saturate*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the following deqp test:
dEQP-GLES2.functional.shaders.builtin_variable.pointcoord
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some instructions generated by int/bool float lowering need to be lowered
by opt_algebraic.
Fixes: 43dbd7d6
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Utgard PP has vector fcsel operation, but its condition is scalar. Add
filtering callback that checks whether {b,f}csel condition is not scalar
to lower {b,f}csel to scalar only in this case.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This appears to work fine (with the additional constraint of keeping the
indirect load in the same block that a0.x was loaded).
We can probably lift this restriction on earlier gens after testing.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Need to use ir3_instr_set_address(), otherwise the instruction might not
get added to the indirects table. This becomes a problem when we turn
on copy propagation for relative accesses, as check_instr() in the sched
pass won't realize there is an indirect consumer of address register
load that is ready to be scheduled.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
An instruction can reference only a single address register value.
Add an assert to catch bugs.
Also, address value should also be local to the same block as the
instruction.
(The one spot where changing the instruction address is actually legit
needs to clear the address first.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
After the next patch enabling copy propagation for relative sources,
we'll need to dereference the n'th src in valid_flags(), so we actually
need to swap the sources before calling valid_flags().
But the logic was already a bit cumbersome, so move it into a helper
function.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The live_values and use_count was not being properly updated. This
starts triggering problems with the next patch, where we allow copy
propagation for RELATIV access.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move the constant part of the indirect offset into nir intrinsic base.
When we have multiple indirect accesses with different constant offsets,
this lets other opt passes clean up things to use a single address
register value.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that spilling ops can be inserted into existing instructions, it
makes sense to increase cost to spill registers that would cause the
creation of a new instruction.
Experimental results showed that penalizing too much due to this caused
worse results, however it is beneficial as a tie resolver between
registers with the same number of components.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Avoid creating unnecessary instructions for the load/store temp nodes
when not required, to further reduce register pressure.
The store_temp operation seems to be unable to do any spilling.
At least the offline shader seems to never output instructions accessing
swizzled components, and attempting to output that in ppir results in
errors. So, force spilled registers to allocate a full vec4 register.
This seems to be the optimal way as it is possible to always keep stores
and temps in a single instruction that can be pipelined.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
One ssa created in the spillinc code in ppir_update_spilled_src was not
properly being marked 'spilled', which made it a candidate for future
spilling attempts.
Since it was being inserted by the spilling code itself, let's mark it
unspillable to avoid an infinite spilling loop.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Shaders must not attempt to write to the register files in the last
three instructions, but that doesn't include the magic registers:
nop ; nop ; thrsw; ldtmu.- *** ERROR ***
nop ; nop
nop ; nop
v2: Simplify validation rules. (Eric Anholt)
v3: Adjust validation even more. (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
For radeonsi, we will prefer the NIR pass as it'll generate better code
(some index calculation and a single load vs. a load, then index
calculation, then another load) and oftentimes NIR optimization can kick
in and make all the access indices constant.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This prevents regressions when disabling indirect lowering. Sometimes
the only use of an input array was copying it to the array created by
nir_lower_io_to_temporaries, and without lowering indirects we wouldn't
have eliminated the temporary array until after linking, which was too
late to remove unused code in the producer.
No shader-db changes with radeonsi NIR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Setup a constant global variable that LLVM will stick in a .rodata
section and generate PC-relative loads for.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We usually use these counts as a simple way to figure out if a change
reduces the number of instructions or shrinks an instruction. However,
since .rodata sections aren't executed, we shouldn't be counting their
size for this analysis. Make the linker return the total executable
size, and use it to report the more useful size in both drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Removing GL_FRAMEBUFFER_FLIP_Y_MESA token from glheader.h as it is now
provided by glext.h
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Sync extension spec of MESA_framebuffer_flip_y to what has been merged
upstream in the GL registry. Update now carries the accepted GL
extension no.
v2: split GL headers update off to separate commit
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Integrating headers from upstream registry [0] master branch. Effective
GL registry commit integrated:
9d534f9312e56c72df763207e449c6719576fd54
Keeping the following quirks local to Mesa:
- glext.h: BUILDING_MESA guard (see !1492)
- glxext.h: glXQueryGLXPbufferSGIX: 'int' return type (Mesa) vs while
'void' (GL registry)
- glxext.h: GLX_RENDERER_ID_MESA is still expected by some mesa tests,
even though its token has been removed from the spec (see
docs/specs/MESA_query_renderer.spec)
- glxext.h: glXGetTransparentIndexSUN / PFNGLXGETTRANSPARENTINDEXSUNPROC
argument pTransparentIndex has type 'unsigned long *' (Mesa) vs. 'long
*' (GL registry)
[0] https://github.com/KhronosGroup/OpenGL-Registry
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Given that we occasionally touch this code and probably nobody really
wants to think about it, introduce a minimal test so that we know we
haven't completely broken OSMesa.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When run in optirun, applications that linked to `libGLX.so` and then
proceeded to querying Mesa for extension strings caused a SEGV in Mesa.
`glXQueryExtensionsString` was calling a chain of functions that
eventually led to `__glXQueryServerString`. This function would call
`xcb_glx_query_server_string` then `xcb_glx_query_server_string_reply`.
The latter for some unknown reason returned `NULL`. Passing this `NULL`
to `xcb_glx_query_server_string_string_length` would cause a SEGV as the
function tried to dereference it.
The reason behind the function returning `NULL` is yet to be determined,
however, simply checking that the ptr is not `NULL` resolves this. A
similar check has been added to `__glXGetString` for completeness sake,
although not immediately necessary.
In addition to that, we stumbled into a similar problem in
`AllocAndFetchScreenConfigs` which tries to access the configs to free
them if `__glXQueryServerString` fails. This, of course, SEGVs, because the
configs are yet to have been allocated. Simply continuing past the configs
if their config ptrs are `NULL` resolves this. We also switch to `calloc`
to make sure that the config ptrs are `NULL` by default, and not some
uninitialized value.
Cc: mesa-stable@lists.freedesktop.org
Fixes: 24b8a8cfe8 "glx: implement __glXGetString, hide __glXGetStringFromServer"
Fixes: cb3610e37c "Import the GLX client side library, formerly from xc/lib/GL/glx. Build it "
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
The precision for a function return type is now stored in
ir_function_signature. This will later be useful to implement mediump
to float16 lowering. In the meantime it is also useful to catch errors
where a function is redeclared with a different precision.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This adds the dispatch code. It creates a job for the number
of blocks in the grid, and dispatches them to the threadpool
implementation. The threadpool then calls the JIT code to
execute the coroutines.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This creates the coroutine execution environment and the
main compute shaders that get executed inside it.
Each compute shader block is executed in it's own coroutine
execution shader, which each "thread" being a coroutine executed
inside it in sequence.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This doesn't actually build any of the shaders yet, but just
builds up the framework necessary to start building the shaders
and variants.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The compute shader will need it's own context like the frag shader
has, this just introduces the framework struct and allocates/frees
for it in the right places.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When the code is executing an hits a barrier, it will suspend
the coroutine and return control to the coroutine dispatcher.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In order to efficiently run a number of compute blocks, use
a threadpool that just allows for jobs with unique sequential
ids to be dispatched.
In order to share the texture/image/sampler code with compute
shaders we need to reorg them to be at the front of context
same as draw does for vs/gs sharing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
These wrap the coroutine intrinsics and also add some higher
level wrappers around coroutine begin, end and suspend procedures
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were previously not doing at least some of the checks. This uses the
same logic that is used in glTexImage*.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Each BSD has slightly different sysctl for retrieving per-CPU times.
FreeBSD returns long while NetBSD returns uint64_t. On OpenBSD return
type differs between summation and per-CPU times. DragonFly is
compatible with FreeBSD.
Signed-off-by: Jan Beich <jbeich@FreeBSD.org>
Based on the vc4 implementation.
Fixes Android RenderEngine::flush() routine:
android.googlesource.com/platform/frameworks/native/+/refs/tags/android-o-mr1-iot-release-smart-clock-fcs/services/surfaceflinger/RenderEngine/RenderEngine.cpp#225
Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Try more aggressive approach with cloning uniform and coord loads.
Uniform load can be inserted into any instruction, so let's do that. ARM site
claim that penalty for cache miss is one clock, so we don't lose anything if
we merge it into instruction that uses the result. As side effect we can also
pipeline it and thus decrease reg pressure.
Do the same for varyings that hold texture coords, but for different reason:
looks like there's a special path for coords that increases precision if
varying that holds it is pipelined. If we don't pipeline it and load coords
from a register its precision is fp16 and thus only 10 bits which is not enough
to accurately sample textures of size 1024 or larger.
Since instruction can hold only one uniform load and one varying load,
node_to_instr now creates a move using helper introduced in previous commit if
slot is already taken. As side effect of this change we can also try to
pipeline texture loads and create a move if attempt fails.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
It can load value from varying directly as well. Also load_regs is the
only op that has a source, so add src_num field to load node and set it
accordingly.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
When lowering from ubo, use the constant base field in the load_uniform
instruction for the constant part of the offset. Doesn't change much
for constant indexing, but this will help for indirect indexing because
constant-folding can't completely clean up the result.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This looks like clear copy-and-pasteos, and fixes:
dEQP-GLES2.functional.draw.random.40
(on A307 and A630, both tested in the new CI farm)
Reviewed-by: Rob Clark <robdclark@chromium.org>
We can get all the information we need from NIR. It's slightly less
accurate, but radeonsi doesn't use the extra information. The old code
also overcounted atomic counters, which led to problems when everything
was used at once.
Fixes KHR-GL45.compute_shader.resources-max.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Otherwise it's impossible to know the maximum SSBO index for both
internal TGSI shaders from TTN (which don't have any notion of atomic
counters and no offset) as well as shaders from GLSL.
I fixed everything I could find while grepping for num_ssbos and
num_abos, which hopefully is everything (iris was the only user I could
find that uses it in a meaningful way).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds a bit of unneccesary code on radeonsi, since whether
unnormalized coordinates are used is known at compile time with GL, but
I wasn't sure if it was worth the few instructions to plumb everything
through, especially for something so rare -- my shader-db doesn't have
any instances where this changes anything.
Fixes CTS tests I created at
https://github.com/cwabbott0/VK-GL-CTS/tree/unnorm-gather-tests
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The workaround was originally written based on amdgpu-pro traces, but
since then radeonsi has got its own slightly different version. Use the
radeonsi version instead, to be consistent and because it'll be slightly
more convenient for handling unnormalized coordinates.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Technically, the user might have set EGL_DISPLAY instead of
EGL_PLATFORM, but since the former is deprecated let's just mention the
latter in the warning message.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This routine was made obsolete over a series of reworks of memory
allocation; Tomeu's changes to shader memory allocation finally made
this unused as cppcheck noted.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
swr_shader.cpp: In function ‘void (* swr_compile_gs(swr_context*, swr_jit_gs_key&))(HANDLE, HANDLE, SWR_GS_CONTEXT*)’:
swr_shader.cpp:732:44: error: ‘make_unique’ was not declared in this scope
ctx->gs->map.insert(std::make_pair(key, make_unique<VariantGS>(builder.gallivm, func)));
^~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
While the documentation for _BitScanReverse64 on MSDN says that it's
available on ARM, this isn't true. It's only available on ARM64. So
let's match reality.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This code generates CVTSD2SI, which requires SSE2. So let's fix the
required SSE-version.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5de29ae (util: try to use SSE instructions with MSVC and 32-bit gcc)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This has been unused since 183db3a645 ("glsl: move half<->float
convertion to util"), Oct 10 2015. Let's drop needlessly including it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Lionel found actual documentation for this at long last. Apparently
it actually is a sampler cache limitation that was mostly fixed on
Icelake. Unfortunately, it seems there are still issues with ASTC
and non-ASTC sampler views. Still, we can lessen the flush condition
from "format mismatch" to "ASTC mismatch", which eliminates most of
the flushing here.
We also update the documentation to refer to the workaround name.
strchrnul is not available on macOS.
pipe_loader.c:141:14: error: implicit declaration of function 'strchrnul' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
next = strchrnul(library_paths, ':');
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The majority of these only apply the start argument to the input, but a
few of them also does for the output-array. util_primconvert, the only
user of this argument expects this pass a non-zero start-argument does
not expect this to be applied to the output; if it is, it will write
outside of allocated memory, leading to VRAM corruption.
The reason this doesn't seem to have been noticed before, is that no
driver currently use util_primconvert to convert a primitive-type to
itself, which is the cases where this was broken. But for Zink, this
will no longer be true, because we need to eliminate the use of 8-bit
index-buffers.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 28f3f8d413 ("gallium/auxiliary/indices: add start param")
Reviewed-by: Rob Clark <robdclark@chromium.org>
Commit 6f7306c029 ("swr/rast: Refactor memory API between rasterizer
core and swr") unintentionally removed changes for llvm-9.0.
Fixes: 6f7306c029 ("swr/rast: Refactor memory API between rasterizer core and swr")
Fixes: 5dd9ad1570 ("swr/rasterizer: Better implementation of scatter")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
We already had a perfectly cromulent pass for this, but one landed in
common NIR code so let's switch and lighten our tree.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This optimization depended on RA running before scheduling. It therefore
no longer applies and is now unused.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a tradeoff.
Scheduling before RA means we don't do RA on what-will-become pipeline
registers. Importantly, it means the scheduler is able to reorder
instructions, as registers have not been decided yet.
Unfortunately, it also complicates register spilling, since the spills
themselves won't get bundled optimally and we can only spill twice per
ALU bundle (only one spill per bundle allowed here). It also prevents us
from eliminating dead moves introduced by register allocation, as they
are not dead before RA. The shader-db regressions are from poor spilling
choices introduced by the new bundling requirements. These could be
solved by the combination of a post-scheduler (to combine adjacent
spills into bundles) with a VLIW-aware spill cost calculation.
Nevertheless, the change is small enough that I feel it's worth it to
eat a tiny shader-db regression for the sake of flexibility.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than using a pile of hacks and awkward constructs in MIR to
ensure the writeout parameter gets written into r0, let's add a
dedicated shadow register class for writeout (interfering with work
register r0) so we can express the writeout condition succintly and
directly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There's no slot for it; you'll end up writing into the void and
clobbering stuff. Don't. do it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
When running the register allocator after scheduling, the MIR looks a
little different, so we need to extend the RA to handle a few of these
extra cases correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
After scheduling, we still have valid MIR, but we have additional
bundling annotations which we would like to keep debug, so print these.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than a vague "br.??" line, annotate the branch with its target
type (useful for disambiguating discards) and whether it was inverted.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
I'm not sure if this is strictly necessary but it makes debugging easier
and minimizes the diff with the experimental scheduler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Scheduling occurs on a per-block basis, strongly assuming that a given
block contains at most a single branch. This does not always map to the
source NIR control flow, particularly when discard intrinsics are
involved. The solution is to allow scheduling barriers, which will
terminate a block early in code generation and open a new block.
To facilitate this, we need to move some post-block processing to a new
pass, rather than relying hackily on the current_block pointer.
This allows us to cleanup some logic analyzing branches in other parts
of the driver us well, now that the MIR is much more well-formed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's sometimes convenient to call this with no instruction specified. By
definition, a missing instruction cannot reference any argument, so
let's check for NULL and shortciruit to false.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The branch has the writeout specified in its source list, making this
special even if it's not explicitly part of r0.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In order to run register allocation after scheduling, it is sometimes
necessary to be able to insert instructions into an already-scheduled
program. This is suboptimal, since it forces us to do a worst-case
scheduling, but it is nevertheless required for correct handling of
spills/fills. Let's add helpers to insert instructions as standalone
bundles for use in spilling code.
These helpers are minimal -- they *only* work on load/store ops or
moves. They should not be used for anything but register spilling; any
other instructions should be added prior to the schedule.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
While it doesn't matter with an unconditional move to the conditional
register (r31), when we try to elide that move we'll need to track the
swizzle explicitly, and there is no slot for that yet since ALU ops are
normally binary.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Oh boy. Midgard scheduling is crazy... These are all just the
requirements, not even the algorithm yet.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This does not affect shaders in any way. Rather, it makes the shader-db
instruction count recorded in the compiler accurate with the in-order
scheduler, matching up with what we calculate from pandecode.
Though shaders are the same, instruction counts cannot be compared
across this commit for this reason.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Allow a direct link to the PDF itself from the authors themselves,
rather than a paywall splash page.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
The GLX extension strings are independent of any context, so abusing the
direct_support bit to control this extension's visibility is wrong.
This reverts commit 079d0717fc896bc8086b037d0ed22642274986c7.
Reported-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Memory allocated through panfrost_allocate_transient() is likely to
come from the transient pool. Let's add the BO backing the allocated
memory region to the job batch so the kernel can retain this BO while
jobs are executed.
In practice that has never been a problem because the transient pool
is never shrinked, and even if it was, we still control the lifetime of
the job, so there's no reason for this BO to be freed before the GPU is
done executing the batch. But it still make sense to add the BO for
debugging purpose.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This fix two dEQP tests for virgl:
dEQP-EGL.functional.image.create.gles2_cubemap_positive_x_rgba_texture
dEQP-EGL.functional.image.render_multiple_contexts.gles2_cubemap_positive_x_rgba8_texture
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* Fix 2D/2DArray/3D tiling parameters:
There is a bottom threshold for width and height.
* Renable tiling for Cubemap, after setting the right parameters.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Equivalent of 0c1dd9dee "broadcom/vc4: Allow importing linear BOs with
arbitrary offset/stride." for v3d.
Allows YUV buffers with a single buffer and plane offsets to be
passed in.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Input to GS is just a set of attributes, so remove explicit setup of
'position' which is meaningless for GS input processing.
Reviewed-by: Alok Hota <alok.hota@intel.com>
RADV no longer uses specific LLVM options compared to the common code.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The patch adds support for 64 bit HAL_PIXEL_FORMAT_RGBA_FP16
for android platform.
Fixes android.graphics.cts.BitmapColorSpaceTest#test16bitHardware
which failed in egl due to "Unsupported native buffer format 0x16"
on chromebooks.
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
v2: Update several of the comments. Drop some redundant uses of
ASSERT_UNION_OF_OTHERS_MATCHES_UNKNOWN_*_SOURCE source. Suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
One shader from Metro Last Light and the rest from Rochard. In the
Rochard cases, something like:
min(1.0, max(pow(saturate(x), y), z))
was transformed to
saturate(max(pow(saturate(x), y), z))
because the result of the pow must be >= 0.
The Metro Last Light case was similar. An instance of
min(pow(abs(x), y), 1.0)
became
saturate(pow(abs(x), y))
v2: Fix some comments. Suggested by Caio.
v3: Fix setting is_intgral when the exponent might be negative. See
also Mesa MR !1778.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16280670 -> 16280659 (<.01%)
instructions in affected programs: 1130 -> 1119 (-0.97%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.72% max: 1.43% x̄: 1.03% x̃: 0.97%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.19% -0.86%
Instructions are helped.
total cycles in shared programs: 367168430 -> 367168270 (<.01%)
cycles in affected programs: 10281 -> 10121 (-1.56%)
helped: 10
HURT: 1
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.31% max: 2.43% x̄: 1.79% x̃: 1.70%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 3.10% max: 3.10% x̄: 3.10% x̃: 3.10%
95% mean confidence interval for cycles value: -20.06 -9.04
95% mean confidence interval for cycles %-change: -2.36% -0.32%
Cycles are helped.
I discovered this while looking at a shader that was hurt by some other
work I'm doing. When I examined the changes, I was confused that one
instance of a comparison that was used in a discard_if was (incorrectly)
eliminated, while another instance used by a bcsel was (correctly) not
eliminated. I had to use NIR_PRINT=true to see exactly where things
when wrong.
A bunch of shaders in Goat Simulator, Dungeon Defenders, Sanctum 2, and
Strike Suit Zero were impacted.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16280659 -> 16281075 (<.01%)
instructions in affected programs: 21042 -> 21458 (1.98%)
helped: 0
HURT: 136
HURT stats (abs) min: 1 max: 9 x̄: 3.06 x̃: 3
HURT stats (rel) min: 1.16% max: 6.12% x̄: 2.23% x̃: 2.03%
95% mean confidence interval for instructions value: 2.93 3.19
95% mean confidence interval for instructions %-change: 2.08% 2.37%
Instructions are HURT.
total cycles in shared programs: 367168270 -> 367170313 (<.01%)
cycles in affected programs: 172020 -> 174063 (1.19%)
helped: 14
HURT: 111
helped stats (abs) min: 2 max: 80 x̄: 21.21 x̃: 9
helped stats (rel) min: 0.10% max: 4.47% x̄: 1.35% x̃: 0.79%
HURT stats (abs) min: 2 max: 584 x̄: 21.08 x̃: 5
HURT stats (rel) min: 0.12% max: 17.28% x̄: 1.55% x̃: 0.40%
95% mean confidence interval for cycles value: 5.41 27.28
95% mean confidence interval for cycles %-change: 0.64% 1.81%
Cycles are HURT.
Found by inspection. I tried really, really hard to make a test case
that would trigger this problem, but I was unsuccesful. It's very hard
to get an instruction to produce a ne_zero result without ne_zero
sources. The most plausible way is using bcsel. That proves
problematic because bcsel interprets its sources as integers, so it
cannot currently be used to "clean" values for floating point
instructions.
No shader-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Fixes piglit tests (new in piglit!110):
- fs-underflow-fma-compare-zero.shader_test
- fs-underflow-mul-compare-zero.shader_test
v2: Add back part of comment accidentally deleted. Noticed by
Caio. Remove is_not_zero function as it is no longer used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: fa116ce357 ("nir/range-analysis: Range tracking for ffma and flrp")
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Gen7+ platforms** had similar results. (Ice Lake shown)
total instructions in shared programs: 16278465 -> 16279492 (<.01%)
instructions in affected programs: 16765 -> 17792 (6.13%)
helped: 0
HURT: 23
HURT stats (abs) min: 7 max: 275 x̄: 44.65 x̃: 8
HURT stats (rel) min: 1.15% max: 17.51% x̄: 4.23% x̃: 1.62%
95% mean confidence interval for instructions value: 9.57 79.74
95% mean confidence interval for instructions %-change: 1.85% 6.61%
Instructions are HURT.
total cycles in shared programs: 367135159 -> 367154270 (<.01%)
cycles in affected programs: 279306 -> 298417 (6.84%)
helped: 0
HURT: 23
HURT stats (abs) min: 13 max: 6029 x̄: 830.91 x̃: 54
HURT stats (rel) min: 0.17% max: 45.67% x̄: 7.33% x̃: 0.49%
95% mean confidence interval for cycles value: 100.89 1560.94
95% mean confidence interval for cycles %-change: 0.94% 13.71%
Cycles are HURT.
total spills in shared programs: 8870 -> 8869 (-0.01%)
spills in affected programs: 19 -> 18 (-5.26%)
helped: 1
HURT: 0
total fills in shared programs: 21904 -> 21901 (-0.01%)
fills in affected programs: 81 -> 78 (-3.70%)
helped: 1
HURT: 0
LOST: 0
GAINED: 1
** On Broadwell, a shader was hurt for spills / fills instead of
helped.
No changes on any earlier platforms.
Fix the a / b ordering in some compares. Delete duplicate patterns.
Add a table explaining things. While I was cleaning this up, I managed
to confuse myself. The table helped sort that out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This didn't fix bug #111308, but it was found will trying to find the
actual cause of that bug.
Fixes piglit tests (new in piglit!110):
- fs-fract-of-NaN.shader_test
- fs-lt-nan-tautology.shader_test
- fs-ge-nan-tautology.shader_test
No shader-db changes on any Intel platform.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: b77070e293 ("nir/algebraic: Use value range analysis to eliminate tautological compares")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We enabled fast clears at level > 0, but didn't minify the dimensions
when comparing the box size, so we always thought it was a partial
clear and as a result never actually enabled any.
This eliminates some slow clears in Civilization VI, but they are mostly
during initialization and not the main rendering.
Thanks to Dan Walsh for noticing we had too many slow clears.
Fixes: 393f659ed8 ("iris: Enable fast clears on other miplevels and layers than 0.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Otherwise it doesn't exist and can't be parsed, so everything dies at
screen init time.
Fixes: 6dc4ddc5f8 ("iris: use driconf for 'bo_reuse' parameter")
Some functionality has been added to deqp-volt to only print
regressions, so update our version of it and use the new options.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
LLVM 7.0 ditched the pmulu intrinsics.
This is only a trivial patch to use the fallback code instead.
It'll likely produce atrocious code since the pattern doesn't match what
llvm itself uses in its autoupgrade paths, hence the pattern won't be
recognized.
Should fix https://bugs.freedesktop.org/show_bug.cgi?id=111496
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes a possible data race spotted while debugging on other EGL
related failures where glFinish and eglCreateContext are going on at
the same time:
==11558== Possible data race during read of size 1 at 0x5E78CD0 by thread #23
==11558== Locks held: 1, at address 0x5E77CA8
==11558== at 0x61B71D4: bo_alloc_internal (brw_bufmgr.c:639)
==11558== by 0x61B7328: brw_bo_alloc (brw_bufmgr.c:669)
==11558== by 0x61EF975: recreate_growing_buffer (intel_batchbuffer.c:231)
==11558== by 0x61EFAAE: intel_batchbuffer_reset (intel_batchbuffer.c:255)
==11558== by 0x61EFB85: intel_batchbuffer_reset_and_clear_render_cache (intel_batchbuffer.c:280)
==11558== by 0x61F0507: brw_new_batch (intel_batchbuffer.c:551)
==11558== by 0x61F12C1: _intel_batchbuffer_flush_fence (intel_batchbuffer.c:888)
==11558== by 0x61BDD6B: intel_glFlush (brw_context.c:296)
==11558== by 0x61BDDB9: intel_finish (brw_context.c:307)
==11558== by 0x623831B: _mesa_Finish (context.c:1906)
==11558== by 0x46D556: deqp::egl::GLES2ThreadTest::Operation::execute(tcu::ThreadUtil::Thread&)
==11558== by 0x721502: tcu::ThreadUtil::Thread::run()
==11558==
==11558== This conflicts with a previous write of size 1 by thread #26
==11558== Locks held: 1, at address 0x5D09878
==11558== at 0x61B98A9: brw_bufmgr_enable_reuse (brw_bufmgr.c:1541)
==11558== by 0x61BF09D: brw_process_driconf_options (brw_context.c:854)
==11558== by 0x61BF6CA: brwCreateContext (brw_context.c:993)
==11558== by 0x621181F: driCreateContextAttribs (dri_util.c:473)
==11558== by 0x53FE87B: dri2_create_context (egl_dri2.c:1388)
==11558== by 0x53EE7BE: eglCreateContext (eglapi.c:807)
==11558== by 0x5C8AB9: eglw::FuncPtrLibrary::createContext(void*, void*, void*, int const*) const
==11558== by 0x46E027: deqp::egl::GLES2ThreadTest::CreateContext::exec(tcu::ThreadUtil::Thread&)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When u_upload_mgr fills up a buffer, it unmaps and destroys it. Our
unmap function was automatically performing the equivalent of a
FlushMappedBufferRange call in this case. Because the buffer mapping
is persistent and coherent, we don't actually do any flushing when we
do the rest of the writes to the buffer - we were just doing one final
one at the end. But we would be using the uploaded contents on the
GPU the whole time.
This certainly shouldn't be necessary for streaming buffers, and if
such flushing and dirtying is necessary for coherent buffers, this is
wildly insufficient.
Drops a small number of constant packets and PIPE_CONTROL flushes from
most benchmarks that I've looked at. Doesn't seem to make much of an
impact on performance, however.
Thanks to Felix Degrood for noticing that we were emitting more
3DSTATE_CONSTANT_* packets than we needed to.
NIR shaders use GLSL types (note: these live outside libglsl), and
nine needs to properly initialize these just like the other state
trackers. This fixes an assertion failure when TTN is used.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes:
dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_vertex
dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_fragment
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While resolving jumps to skip intermediate jumps from the structured
CFG, maintain the successors and predecessors correctly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the ability for intel devices that:
* Only load on i965
* Only load on iris
* First attempt i965, and try iris next
* First attempt iris, and try i965 next
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If a field name differs slightly between two generations then this
change will still add the fields into the same group.
For example, these will be treated as equal:
* "Software Exception" and "Software Exception"
* "Per Thread" and "Per-Thread"
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were clamping the LOD to force non-mipmap filtering, but that means
that the HW doesn't get to select between the min and mag filters.
Setting MIPFILTER_LINEAR_FAR appears to force non-mipmap filtering.
Fixes all failures in dEQP-GLES2.functional.texture.filtering.2d.*
Reviewed-by: Rob Clark <robdclark@chromium.org>
See the previous commit for the explanation of the Fixes tag.
Hurts 21 shaders in shader-db. All of the hurt shaders are in Unreal
Engine 4 tech demos.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3 ("nir: Add lowering for nir_op_bitfield_reverse.")
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend. This was encountered in some Unreal4 tech
demos in shader-db. The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.
The fixes tag is a bit a lie. The actual bug was introduced about
26,000 commits earlier in 371c4b3c48 ("nir: Recognize open-coded
bitfield_reverse."). Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist. Hopefully nobody will care to
fix this on an earlier Mesa release.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3 ("nir: Add lowering for nir_op_bitfield_reverse.")
Reduces the size of the u_format_table.c file by 140k (out of 1.64M)
and makes me less confused about endianness in gallium.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The formats affected are:
- LA x (16_FLOAT, 32_FLOAT, 32_UINT, 32_SINT)
- R8G8B8 x (UNORM, SNORM, SRGB, USCALED, SSCALED, UINT, SINT)
- RG/RGB/RGBA x (64_FLOAT, 32_FLOAT, 16_FLOAT, 32_UNORM, 32_SNORM,
32_USCALED, 32_SSCALED, 32_FIXED, 32_UINT, 32_SINT)
- RGB/RGBA x (16_UNORM, 16_SNORM, 16_USCALED, 16_SSCALED,
16_UINT, 16_SINT)
- RGBx16 x (UNORM, SNORM, FLOAT, UINT, SINT)
- RGBx32 x (FLOAT, UINT, SINT)
- RA x (16_FLOAT, 32_FLOAT, 32_UINT, 32_SINT)
The updated st_formats.c unit test checks that the formats affected by
this change are all array formats in the equivalent Mesa format (if
any). Mesa's array format definition is clear: the value stored is an
array (increasing memory address) of values of the channel's type.
It's also the only thing that makes sense for the RGB types, or very
large types like RGBA64_FLOAT (A should not move to the low address
because the cpu is BE).
Acked-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Tested-by: Matt Turner <mattst88@gmail.com> (unit tests on BE)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nothing accessed the .value field, just the .chan. Unwrap all the
code from the union, for clarity (and 13k less generated code).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Shaves 30k off of the 1.6M .c file, and makes for less noise for me
trying to understand how gallium formats actually work.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instructions attached to blocks are never explicitly freed. Let's
use ralloc() to attach those objects to the compiler context so that
they are automatically freed when the ctx object is freed.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
I don't know how Meson didn't hit this issue, when it too already uses
-Werror=incompatible-pointer-types
Fixes: 3dd299c3d5 ("glx: Sync <GL/glxext.h> with Khronos")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Looks like initial RE was wrong and some fields have different purpose.
I.e. there's no "disable_mipmap" field, it's actually part of another field
that selects mipmap filtering.
Also fix layout position.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This fixes the following CTS test on 32-bit systems:
GTF-GL46.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_init
It does glGetTexImage of a 16-bit SNORM image, requesting 32-bit UNORM
data. In get_tex_rgba_uncompressed, we round trip through float to
handle image transfer ops for clamping. _mesa_format_convert does:
_mesa_float_to_unorm(0.571428597f, 32)
which translated to:
_mesa_lroundevenf(0.571428597f * 0xffffffffu)
which produced different results on 64-bit and 32-bit systems:
64-bit: result = 0x92492500
32-bit: result = 0x80000000
This is because the size of "long" varies between the two systems, and
0x92492500 is too large to fit in a signed 32-bit integer. To fix this,
we switch to the new _mesa_i64roundevenf function which always does the
64-bit operation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104395
Fixes: 594fc0f859 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This always returns a int64_t, translating to _mesa_lroundevenf on
systems where long is 64-bit, and llrintf where "long long" is needed.
Fixes: 594fc0f859 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
GLX_EXT_import_context operates only on indirect contexts, a direct
context cannot possibly support it. Without this change the extension
will appear in the combined GLX extension string even if it is missing
from the server string, indicating a lack of required server support.
At least on Linux, we can use the ELF auxiliary vector to
detect the presence of AltiVec, VSX and other CPU features
without having to go through handling SIGILL, which has
various problems of its own.
A similar thing is already being done for ARM to detect NEON.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Gen11 adds support for specifying the render target index and src0
alpha present bits in the extended message descriptor. Previously,
we had to use a message header for this, requiring extra instructions
to write the fields, and two registers of extra payload.
Improves performance on my ICL 8x8 frequency locked to 700Mhz, on iris:
GfxBench5 Manhattan 3.0: 2.13635% +/- 0.159859% (n=5)
GfxBench5 Aztec Ruins: 1.57173% +/- 0.128749% (n=5)
Synmark2 OglDeferred: 2.86914% +/- 0.191211% (n=10)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This takes care of generate_fb_write/fire_fb_write/brw_fb_WRITE's stuff
earlier in the visitor. It will also make it easier to generate SENDSC
messages with indirect extended descriptors in a few patches.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Annoyingly, these bits exist in some extended message descriptors
(in particular render target writes), but they don't have any
corresponding bits in the ISA encoding. So we can't use an immediate
and have to fall back to an indirect extended descriptor.
Thanks to Jason Ekstrand for reminding me that you can still set these
bits via an indirect descriptor, even if they don't exist in the ISA.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
src0 vstride and type overlap with bits of the extended descriptor.
brw_set_desc() also sets the extended descriptor to 0. So by setting
the descriptor, then setting src0, we were accidentally setting a bunch
of extended descriptor bits unintentionally.
When using this infrastructure for framebuffer writes (in a future
patch), this ended up setting the extended descriptor bit 20, which is
"Null Render Target" on Icelake, causing nothing to be written to the
framebuffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need two different values of the register, one for NGG and one for
legacy, in order to fix edge flags for the legacy pipeline.
Passing the ngg flag to emit_clip_regs would be too complicated,
so CONTEXT_REG_RMW is used for partial register updates.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Right now we're leaking all block and instruction objects allocated by
the compiler. Let's clean things up before leaving
midgard_compile_shader_nir().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
src/gallium/winsys/svga/drm/vmw_screen.c: In function ‘vmw_dev_compare’:
src/gallium/winsys/svga/drm/vmw_screen.c:48:12: warning: implicit declaration of function ‘major’ [-Wimplicit-function-declaration]
48 | return (major(*(dev_t *)key1) == major(*(dev_t *)key2) &&
| ^~~~~
src/gallium/winsys/svga/drm/vmw_screen.c:49:12: warning: implicit declaration of function ‘minor’ [-Wimplicit-function-declaration]
49 | minor(*(dev_t *)key1) == minor(*(dev_t *)key2)) ? 0 : 1;
| ^~~~~
That file (and many others) already has the proper #include with their
respective guards, but scons wasn't defining them, resulting in implicit
functions being used instead (and an always-true check that's probably
breaking something down the line).
Note that I'm cheating a bit here because Scons doesn't seem to have
a clean way to detect the existence of major() et al. as functions or
macros, so I'm taking the shortcut of just detecting the presence of the
header and assuming its contents is what we expect.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-By: Jose Fonseca <jfonseca@vmware.com>
This mirrors the vs/gs keys, and will be needed when adding images
support.
The const changes also mirror how the draw code work (as is needed
when we add images)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Not sure how I missed this before, but compswap was hitting an
assert here as it is it's own special case.
Fixes: b5ac381d8f ("gallivm: add buffer operations to the tgsi->llvm conversion.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Looks like a copy/paste error. This patch prevents a segfault when
running the following on BDW:
INTEL_DEBUG=no8,no16,do32 ./deqp-vk -n \
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_dvec4
For the curious, the message we're getting is:
CS compile failed: Failure to register allocate. Reduce number
of live scalar values to avoid this.
Fixes: 864737ce6c ("i965/fs: Build 32-wide compute shader when needed.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
No driver implements them yet, but this is a long way toward gallium
having matching format enums for Mesa formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To add ASTC 3D compression formats, we need to be able to express the
block depth. While I'm touching every line, line up the columns of
the CSV again as they've drifted over time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When moving constants, if switching to a floating-point representation
doesn't break anything, we'd rather have an fmov than an imov,
permitting inlining the constant in many circumstances.
total quadwords in shared programs: 3408 -> 3366 (-1.23%)
quadwords in affected programs: 1188 -> 1146 (-3.54%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 9.65% x̃: 11.11%
95% mean confidence interval for quadwords value: -1.07 -0.98
95% mean confidence interval for quadwords %-change: -11.38% -7.93%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Storing constants as float doesn't make sense when we have integer
instructions; better to switch to be integer natively and coerce to/from
float rather than the opposite.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This fixes dEQP-GLES3.functional.texture.specification subtests on iris:
- texsubimage3d_depth.depth24_stencil8_2d_array
- texsubimage3d_depth.depth32f_stencil8_2d_array
- texsubimage3d_depth.depth_component32f_2d_array
- texsubimage3d_depth.depth_component24_2d_array
- texstorage2d.format.depth24_stencil8_2d
- texstorage2d.format.depth32f_stencil8_2d
- texstorage2d.format.depth_component24_2d
- texstorage2d.format.depth_component32f_2d
- texstorage3d.format.depth24_stencil8_2d_array
- texstorage3d.format.depth32f_stencil8_2d_array
- texstorage3d.format.depth_component24_2d_array
- texstorage3d.format.depth_component32f_2d_array
Here, something appears to be going wrong with having this bit set
during blorp_copy operations for texture upload, which override the
format to R8G8B8A8_UINT.
AFAICT this bit should have no effect for integer surfaces, as it has
to do with blending, and integer blending is not a thing. So it should
be harmless to disable it.
The Windows driver appears to be setting this bit universally, so
I am unclear why we would need to. Perhaps they simply haven't run
into this issue.
Fixes: f741de236b ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason suggested I remove this in review, and he's right. AFAICT this
affects blending, and that just isn't going to happen on buffers.
Fixes: f741de236b ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For each mipmaps, the driver will store the clear values (8-bytes)
and the TC-compat zrange value (4-bytes).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The workaround was entirely in common code, and it's needed in radeonsi
too so just always do it when necessary. Fixes
KHR-GL45.shader_image_load_store.advanced-allStages-oneImage on gfx9
with LLVM 8.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
GL and Vulkan allow you to bind a single layer of a 3D texture to a 2D
image, and we weren't implementing a workaround for that on gfx9 that
TGSI was. Copy it over.
Fixes KHR-GL45.shader_image_load_store.non-layered_binding with radeonsi
NIR.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
16-bit and 32-bit values match hardware values but 8-bit doesn't.
This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index.
Fixes: 372c3dcfdb ("radv: implement VK_EXT_index_type_uint8")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
The virgl formats are fixed in time snapshots of the gallium ones,
we just need to provide a translation table between them when
we enter the hardware.
This fixes a regression since Eric renumbered the gallium table.
Fixes: c45c33a5a2 (gallium: Remove manual defining of PIPE_FORMAT enum values.)
Bugzilla: https://bugs.freedesktop.org/111454
v1 by Dave Airlie <airlied@redhat.com>
v2: virgl: Add a number of formats to the table that are used, e.g. for vertex
attributes
v3: cover some more missing formats from a piglit run
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
pp has vector units and some operations can be optimized when bundled
together.
Benchmarking this with piglit shaders shows that the instruction count
can be greatly reduced on many examples with vectorize.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
nir vec4 fcsel assumes that each component of the condition will be used
to select the same component from the options, but pp can't implement
that since it only has 1 component for the condition.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
The previous spill stack was fixed and too small, and caused instability
in programs requiring spilling for roughly more than one value.
This patch adds a dynamic calculation of the buffer size based on stack
utilization and switches it to a separate allocation at flush time that
will fit the shader that requires the largest buffer.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
It's not used by anything anymore now that so much lowering has been
moved into NIR. Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge. Short of a lot of pointless work,
that one's probably not going away.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create a unified table to handle pipe format to texture
and render target format lookup.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.
When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.
Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.
On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(Bas: CCing this to 19.2-rc due to high impact and limited complexity)
Add better liveness analysis that was modelled after one in vc4.
It uses live ranges and is aware of multiple blocks which is prerequisite
for adding CF support
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Mali4x0 supports only gl_FragColor. gl_FragDepth is not supported.
Check that we don't get anything but gl_FragColor in shader outputs.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
We don't have a special OP to store color in PP, all we need to do is to
store gl_FragColor into reg0, thus it's just a mov and therefore ALU node.
Yet we still need to indicate that it's store_color op so regalloc ignores
its destination.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Create ppir block for each corresponding NIR block and populate
its successors. It will be used later in liveness analysis and
in CF support
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
We can get following from NIR:
(1) r1 = r2
(2) r2 = ssa1
Note that r2 is read before it's assigned, so there's no node for
it in comp->var_nodes. We need to create a dummy node in this case
which sole purpose is to hold ppir_dest with reg in it.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
There can be several root nodes, i.e.:
(1) r0 = r1
(2) r2 = r3
(3) branch if (ssa1)
We need to make (3) depend on (1) and (2), old code added
dependency only for (2), and (1) was kept as root node since there
is no branch/discard or store color between two movs.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
ppir_lower_load() and ppir_lower_load_texture() assume that node
is in the same block as its successors, fix it by cloning each
ld_uni and ld_tex to every block.
It also reduces register pressure since values never cross block
boundaries and thus never appear in live_in or live_out of any block,
so do it for varyings as well.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Const nodes are now cloned for each user, i.e. const is guaranteed to have
exactly one successor, so we can use ppir_do_one_node_to_instr() and
drop insert_to_each_succ_instr()
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
On commit f6e7de41d7, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.
This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.
v2: Improve commit message and add documentation about skipped checks
(Jason)
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We can statically determine from the disassembly if helper invocations
will be needed, so we can validate the corresponding bit in the
cmdstream and thus avoid printing the bit itself in the decode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We check for texture ops which calculate derivatives (either explicitly
via dFd* or implicitly) and mark the shader as requiring helper
invocations.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The variables level and start_layer are not initialized, then
initialized if we have a BUFFER_BIT_DEPTH set. We assert on them
later using the same check. This should be enough but GCC 9.1.1 is
not convinced, so let's initialize the variables.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Code that used it was removed in 4ebe6b2e72 ("tgsi: Drop the SSE2
constants setup that's been dead code since 2011.")
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Initialize `next_batch_addr` and `second_level`. If the batch is well
formed, those values will be overriden, if not, they are as good as
uninitialized garbage.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The helper check_node_type() is only used when DEBUG is set (in the
function below), but ASSERTED macro uses NDEBUG. So just guard the
helper with #ifdef. If we see more such cases we might consider a
ASSERTED-like macro for the DEBUG case.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compiler can't see that d is initialized.
../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’:
../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized]
984 | d = MAX2(-d, d);
Assert that we expect at least one component -- hence d going to be
set. That by itself is not enough, so also zero initialize the
variable.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
...by copying the implementation of anv_get_absolute_timeout().
Appears to fix a CTS test with 32-bit builds:
GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush
Fixes: f459c56be6 ("iris: Add fence support using drm_syncobj")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Rafael Antognolli tracked down a performance gap between i965 and iris
in Synmark2's OglCSDof microbenchmark, noting that iris was performing
substantially more memory reads and writes, with substantially fewer
L3 hits. He suggested that something might be wrong with MOCS, or L3
configs, at which point I came up with a theory...
It would appear that the STATE_BASE_ADDRESS command updates the MOCS
settings for various base addresses even if you don't specify the
"Modify Enable" bit for that address. Until now, we had been setting
only the MOCS for bases we intended to change, leaving the others
"blank" which is MOCS table entry 0, which is uncached.
Most data access has a more specific MOCS (e.g. in SURFACE_STATE),
but scratch access uses the Stateless Data Port Access MOCS from
STATE_BASE_ADDRESS. So this meant all scratch access was uncached.
Improves performance in Synmark2's OglCSDof by 2x, bringing iris
on par with the existing i965 driver.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fix this build error on macOS.
../src/glx/apple/glx_empty.c:158:4: error: void function 'glXQueryGLXPbufferSGIX' should not return a value [-Wreturn-type]
return 0;
^ ~
Fixes: 3dd299c3d5 ("glx: Sync <GL/glxext.h> with Khronos")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Similarly to before, this didn't properly handle varying structs with
doubles in them.
This doesn't fix any tests, but was noticed while looking at the code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The old version wasn't as accurate as it could be, and didn't handle
double variables inside structs correctly. Walk the path to compute the
actual components affected.
In combination with the previous commit fixes
KHR-GL45.enhanced_layouts.varying_structure_locations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is already done in get_deref_offset() in the common code. We were
adding it twice accidentally.
Fixes KHR-GL45.enhanced_layouts.varying_array_locations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some users of this function (e.g. GS inputs) currently only work with
constant offsets. We got lucky since all the tests used an array index
of 0, so the non-constant part was always 0. But we still need to handle
this.
This doesn't fix any CTS test, but was noticed while debugging one.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that LLVM 9 will be released soon, we will only support
LLVM 8, 9 and master (10).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes errors seen with eglSetBlobCacheFuncsANDROID on Android when
running dEQP that terminates and reinitializes a display.
Fixes: 6f5b57093b "egl: add support for EGL_ANDROID_blob_cache"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We were always resolving the buffer as if we were accessing it via
CPU maps, which don't understand any auxiliary surfaces. But we often
copy to a temporary using BLORP, which understands compression just
fine. So we can avoid the resolve, and accelerate the copy as well.
Fixes: 9d1334d2a0 ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This doesn't work for compressed formats, as the source texture and
temporary texture would have different block sizes. (Forcing the driver
to always take the GPU path would expose the bug.) Instead, just use
the source format for the temporary, and let blorp_copy deal with
overrides.
The one case where we can't do this is ASTC, because isl won't let us
create a linear ASTC surface. Fall back to the CPU paths there for now.
Fixes: 9d1334d2a0 ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Gen11 stores the fast clear color in an "indirect clear buffer", as
a packed pixel value. Gen9 hardware stores it as a float or integer
value, which is interpreted via the format. We were trying to store
that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM
it from there to the actual SURFACE_STATE bytes where it's stored.
This unfortunately doesn't work for blorp_copy(), which does bit-for-bit
copies, and overrides the format to a CCS-compatible UINT format. This
causes the clear color to be interpreted in the overridden format.
Normally, we provide the clear color on the CPU, and blorp_blit.c:2611
converts it to a packed pixel value in the original format, then unpacks
it in the overridden format, so the clear color we use expands to the
bits we originally desired.
However, BLORP doesn't support this pack/unpack with an indirect clear
buffer, as it would need to do the math on the GPU. On Gen11+, it isn't
necessary, as the hardware does the right thing.
This patch changes Gen9 to stop using an indirect clear buffer and
simply do PIPE_CONTROLs with post-sync write immediate operations
to store the new color over the surface states for regular drawing.
BLORP continues streaming out surface states, and handles fast clear
colors on the CPU.
Fixes: 53c484ba8a ("iris: blorp using resolve hooks")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
For renderable surfaces, we allocate SURFACE_STATEs for each bit in
res->aux.possible_usages. Sampler views use res->aux.sampler_usages.
When pinning buffers, we call surf_state_offset_for_aux() to calculate
the offset to the desired surface state. surf_state_offset_for_aux()
took an aux_modes parameter, which should be one of those two fields.
However...it was not using that parameter. It always used the broader
res->aux.possible_usages field directly.
One of the callers, update_clear_value(), was passing incorrect masks
for this parameter. It iterated through the bits in order, using
u_bit_scan(), which destructively modifies the mask. So each time we
called it, the count of bits before our selected mode was 0, which would
cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE,
rather than updating each in turn. This was hidden by the earlier bug
where surf_state_offset_for_aux() ignored the parameter.
Fixes: 7339660e80 ("iris: Add aux.sampler_usages.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This is genxml, we can compile out this code.
Fixes: 2660667284 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Rather than passing through the transformed gl_Position, we can use the
hardware-level varying for this, which will correctly handle
gl_FragCoord.w
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The offset is added to the base address, so we need to subtract it from
the size to maintain the same end address and thus prevent a buffer
overflow:
end_address = start_address + size
start_address' = start_address + offset
size' = size - offset
end_address' = start_address' + size'
= (start_address + offset) + (size - offset)
= (start_address + size) + (offset - offset)
= start_address + size
= end_address
QED.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We need a special path for special varyings so we parse them correctly
instead of throwing an error when they inevitably point to bad memory.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The hardware doesn't care, and a lot of Panfrost code relies on an
oversized buffer. The important part is that (stride *
padded_num_vertices) is no greater than size, which we'll need to check
once we validate instancing.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't need to dump the contents necessary, but having the stub with
the address is useful.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
I don't know who thought this mask was a good idea but unfortunately it
must have been me.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we permit more $whatever through than the shader needs, that's a bit
of a waste, but it isn't an error.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't actually care about the *contents* of the index buffer, but we
would rather like to ensure it is present and of the correct size.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We can infer these stats in many cases from the disassembly, so we
should try to sanity check where we can. We may need to be fuzzy about
analysis, since analysis gives us a bound but we don't mind if it's not
used fully by the shader.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We could do better by forcing the checks to *equal* zero (right now, an
indeterminate answer will pass the checks), but this is a start to guard
against some egregious cases.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There are a number of conditions we need to test for to statically check
for TILE_RANGE_FAULTs, but once these checks are in order, we can print
as-is.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These tags need to match up with what's actually described by the MFBD,
so check this. Once this is checked, since the type and contents of the
FBD are obvious from printing above, there's no need to explicitly mark
off the framebuffer line.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
For shaders using exclusively direct attribute/varyings, we can work
this out statically. For shaders with indirect access, we just set an
upper bound of 16 (the max attributes/varyings we support) and the
actual count will be reported regardless.
We proceed similarly for textures/samplers, as well as for UBOs. While
UBOs can be *indexed* indirectly, the *UBO itself* -- which is what we
count in the shader descriptor (rather than the UBO descriptors) -- is
statically determinable.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This one is a little tricky, but the idea is that:
r16-r23 are always uniforms
r8-r15 are sometimes work, sometimes uniforms...
...but as work, they are always written before use
...and as uniforms, they are never written before use
So we use that heuristic to determine the count to feed the machine.
We'll record work register use in the next commit.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Panfrost is the only user of the macro; we are better off expanding than
having random stuff in nir.h.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Right now it always returns zero, but as of:
commit a48a6b8a40
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Nov 14 15:13:05 2017 -0500
glx: Prepare driFetchDrawable for no-config contexts
We were hoping it would return true if the drawable could actually be
looked up. It wasn't, so that didn't go very well. With the most recent
update to <GL/glxext.h> glXQueryGLXPbufferSGIX (correctly) returns void,
so there's no longer anything else besides driFetchDrawable that depends
on the return value from __glXGetDrawableAttribute.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Minor fixups required to keep the prototypes matching and to remove
mention of retired enums.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We added this utility for vulkan where all timeouts are given as
uint64_t values. We can switch from signed to unsigned as this is the
only user and if we ever deal with signed integers somewhere else
we'll have to be careful to use the corresponding
timespec_(add|sub)_msec and always pass absolute values.
v2: Forgot to drop the test calling add_nsec() with a negative number
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: d2d70c3bb5 ("util: add a timespec helper")
Acked-by: Daniel Stone <daniels@collabora.com>
This fixes a regression introduced with scan&reduce operations
on GFX10. Note that some subgroups CTS still fail on GFX10 but
I assume it's a different issue.
This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive*.
Fixes: 227c29a80d "amd/common/gfx10: implement scan & reduce operations"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Commit fixes current crashes with Vulkan applications on Android.
Fixes: c0376a1234 "util: add anon_file.h for all memfd/temp file usage"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
v2: Pass through to oscreen rather than faking it (review from Marek).
Fixes: 0346b70083 ("gallium/screen: Add pipe_screen::resource_get_param")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
At compressed_tex_sub_image we only can obtain the tex_object after
compressed_subtexture_target_check is validated for TEX_MODE_CURRENT.
So if the target is wrong the error is raised to the user.
This completes the fix for the regression introduced on "mesa: refactor
compressed_tex_sub_image function" of the pending failing tests:
dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d
dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.compressedtexsubimage3d
v2: Fix warning that texObj might be used uninitialized (Gert Wollny)
Fixes: 7df233d68d ("mesa: refactor compressed_tex_sub_image function")
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Expose configs when allow_fp16_configs has been enabled and
DRI_LOADER_CAP_FP16 is set in the loader.
Also, make kms_swrast_dri respect format bpp, to allow for allocating
buffers wider than 32 bpp.
Make fp16 opt-in for gallium.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Expose configs when allow_fp16_configs has been enabled and
DRI_LOADER_CAP_FP16 is set in the loader.
Also, define a new dri configuration option so users can disable exposure of
fp16 formats. Make fp16 opt-in for i965.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Add dri formats for RGBA ordered 64 bpp IEEE 754 half precision floating
point. Leverage existing offscreen render support for
MESA_FORMAT_RGBA_FLOAT16 and MESA_FORMAT_RGBX_FLOAT16.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In the case that __DRI_ATTRIB_FLOAT_BIT is set in the dri config, set
EGL_COLOR_COMPONENT_TYPE_FLOAT_EXT in the egl config. Add a field to the
platform driver visual to indicate if it has components that are in floating
point form.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In order to handle pixel formats that consist of floating point data, enable
floatMode field in the dri config, and set __DRI_ATTRIB_FLOAT_BIT in the
render type attribute.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Change dri2_add_config to take arrays of shifts and sizes, and compare with
those set in the dri config. Convert all platform driver masks
to shifts and sizes.
In order to handle older drivers, where shift attributes aren't available,
we fall back to the mask attributes and compute the shifts with ffs.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
bitcount is free from the pipe header dependencies that make u_math.h hard
to include by non-gallium specific code, so move it to bitscan.h. bitscan.h
is included by u_math.h so existing references will continue working.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The existing mask attributes can only support up to 32 bpp. Introduce
per-channel SHIFT attributes that indicate how many bits, from lsb towards
msb, the bit field is offset. A shift of -1 will indicate that there is no
bit field set for the channel.
As old loaders will still be looking for masks, we set the masks to 0 for
any formats wider than 32 bpp.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The driver checks dri config options and loader caps to filter out certain
formats during config creation. Fold 4 call sites under a single helper
function.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This debug option allows vkGet[Instance/Device]ProcAddr() to succeed
even if the extension associated with the requested entrypoint was not
enabled.
This has come in handy in a few instances when debugging VR
applications, so I thought it would be good to have a cleaned up version
upstreamed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This should fix glDepthRangef issues. Eventually, something similar
should allow implementing the depth bounds test.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A pair of special flags can turn the texture/sampler handle fields into
register selects. This means code like:
texture(uTextures[hr28.w], ...)
can be compiled to something like:
texture ..., fsampler[hr28.w], texture[hr28.w]
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This data structure is shared in other parts of the texture word, so
let's streamline printing.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows nodes to be unsigned and prevents a class of weird
signedness bugs identified by Coverity.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This path shouldn't be possible for in-spec shaders, but let's be
defensive. (Because security, right? Mostly because Coverity.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode. Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle. In SPIR-V, signed min/max are separate
opcodes from unsigned.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can smush this into one-line per record as per usual. We still need
more validation and cleaning this up, especially around instancing. But
for LINEAR records, it works okay already.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This consolidates texture format and dimensionality into something simple:
tiled rgba8_unorm.rgb1: 512x512
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Textures of a smaller dimension don't need higher dimensions printed.
This allows us to be more compact, while enforcing verification that
higher dimensions must be zero.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
unknown3A I think I've actually seen on T6xx but.. we'll see what
happens in traces going forward. We don't want the zero noise normally,
and if they show up in the wild, we want to draw attention to them.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This dramatically reduces visual clutter: now an entire
attribute/varying record looks something like:
rgba32f attribute_0[16].bgra;
which is equivalent to the raw structure:
{
.index = 0,
.format = MALI_FORMAT_RGBA32F,
.swizzle = (MALI_CHANNEL_BLUE << 9) | ....,
.src_offset = 16,
}
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We want to make sure we don't access a component in the swizzle that
doesn't exist in the format, since that is (as far as I know) undefined.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We've never seen them, so if they come up in trace, we want to draw
attention to that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Varying discard is not used by Panfrost, but the blob uses it sometimes
to have some padding in the varyings table, probably to minimize
per-draw overhead. (...We should maybe consider this ourselves!)
Let's check for this and ensure the rest of the record is consistent
with a discarded varying.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's a legacy GL thing... we don't really need to handle it *right* now,
but we shouldn't crash..
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This CAP controls a desktop-only extension. If the corresponding support
exists in the hardware, we don't know how to use it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Subtle issue masked by how we emitted SET_VALUE jobs, but this case can
and does occur, so let's fix it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This normalizes the printed format. It also makes it easier for the
future when we may introduce semantic _warn and _error handlers.
A tripped zero is essentially a hazard to check for.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If this bit is clear, MFBD preload will be enabled, and you.. don't want
that. (At least, when the bit is clear, the old contents of the
framebuffer will be preserved. I'm assuming this is what "MFBD preload"
refers to in kbase.)
Validate that this bit is always set.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There is no "chunknown" structure; that part of the union is an artefact
from falsely believing vertex/tiler MFBDs could have render targets
attached (they can't). These are just plain old AFBC fields, and if
there is no AFBC, it's error to set these field.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
For our purposes of driver debugging, the contents of uniform buffers
are rarely interesting; we're more concerned about the metadata setting
them up.
We do need to be careful to validate the sizes of both uniforms and
uniform buffers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Many structures in the command stream have a GPU address and size
determined statically. We should check that the pointers we are passed
are valid and the buffers they point to are big enough for the given
size. If they're not, an MMU fault would be raised.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Verify sizes / masks / etc against something logical to cull down the
trace space and automatically guard against a number of potential
hazards.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
While the algorithm for computing the header size has been correct for a
while, we used a major hack to conservatively guess the body size. Let's
scrap that and figure out the algorithm we actually need to use to be
bit-identical with what the hardware expects.
We do have to be careful to add the header size to total comptued BO
size.
It's not clear how big the polygon list needs to be in practice -- but
it has to be somewhat bigger than the polygon list itself. This needs
more investigation. If we size the polygon list exactly based on the
polygon_list_size field, we get faults like:
[ 1224.219886] panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x000000001BDE8000
Reason: TODO
raw fault status: 0x660003C3
decoded fault status: SLAVE FAULT
exception type 0xC3: TRANSLATION_FAULT_LEVEL3
access type 0x3: WRITE
source id 0x6600
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The other commented lines just add noise/entropy we don't want, and can
in fact crash the trace due to asserts failing.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The polygon sizes are computed from the width/height/flags, so we can
reverse the computation and use our computation to verify the two
computation algorithms are bit-identical. If they are, we can omit the
computed fields.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We have the BOs available; ensure that the bounds specified in the
command stream are actually the correct bounds.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows the caller to call track_mmap multiple times for the same
gpu_va for the purpose of updating the mmap. This is used to trace
invisible BOs with kbase and doesn't apply to native traces.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to catch a class of errors (for negative offsets, etc)
automatically.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The on-the-wire representation of workgroups is not 1:1 to the decoded
Gallium-level workgroups (there are multiple valid encodings; see the
previous commit). Nevertheless, since we're now bit-identical in packing
vs the blob, we can check for a canonical form and only print the
verbose trace if we fail the canonical form.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a blob quirk; in so much as I know, the hardware doesn't care.
But we're trying to be bit-identical to take as much entropy out of
traces as possible, so let's introduce the quirk.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The routines in this file have no dependency on Gallium. Let's share
them so they can be used for a theoretical future Vulkan driver or, more
immediately, consulted when tracing.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's obvious that it's linked by virtue of us printing the struct it
links against. No need to repeat ourselves.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The last remaining stuff was ARB_gl_spirv and ARB_spirv_extensions.
Note that it is really likely that we can enable it for some Gen7 (as
4.5 was), but it was not tested yet.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
To help make sure we are running tests in the ideal number of threads,
print load stats to make obvious when there's a problem with
utilization.
This will be specially useful when we run tests on a wider variety of
devices.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Some runners may be configured such that the qemu binary might not be
available by the time we need to start running commands within the
chroot.
So make sure that it's there to avoid suprising problems in that case.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A number of things can go wrong when building the rootfs from within a
non-native chroot, so make sure to print the bootstrap.log so we can
tell what's going on.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's able to run tests in parallel, fully utilizing the HW and
shortening considerable the time it takes.
Needed to disable tests in RK3288 for now because Volt doesn't support
armhf yet, though this should be fixed soon.
Tests are now run with --deqp-gl-config-name=rgba8888d24s8ms0, so we are
hitting a few more failures in tests that previously were being skipped.
The time to run the tests decreases from around 8 minutes to 1:45
minutes, allowing for extending coverage without increasing CI times too
much.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This gives a nice boost, +20% at this time on my Vega 56. Shader
ballot should be enabled by default at some point but it reduces
performance a bit (-6%) with Wolfeinstein II. Enable it only for
Youngblood at the moment, like what we did for Talos in the past.
As a bonus point, it gets rid of some minor artifacts that only
happens when ballot is disabled for some reasons.
Cc: 19.2 <mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Loops like:
block block_0:
vec1 32 ssa_2 = load_const (0x00000020)
vec1 32 ssa_3 = load_const (0x00000001)
loop {
vec1 32 ssa_7 = phi block_0: ssa_3, block_4: ssa_9
vec1 1 ssa_8 = ige ssa_2, ssa_7
if ssa_8 {
break
} else {
}
vec1 32 ssa_9 = iadd ssa_7, ssa_1
}
Were treated as having more than 1 iteration and after unrolling
produced wrong results, however such loop will exit during
the first iteration if not unrolled.
So we check if loop will actually loop.
Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The comments say that we should remove continue if it is the last
intruction in a loop however we remove any kind of jump.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Otherwise hangs are possible. This register was already set for
GS and NGG.
Fixes: 5eaed7ecfc "radv/gfx10: enable support for NAVI10, NAVI12 and NAVI14"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Should take the max of the 2.
Fixes: ea337c8b7e "radv/gfx10: fix VS input VGPRs with the legacy path"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
An application quitting before the destroying its GL context and
binding a NULL context might still have a radeonsi compiler thread
running and potentially still accessing the types.
Therefore take a reference for the duration of the threads' lifetime.
v2: Only ref the glsl types, the builtins should be used by the time
shader data gets to a gallium driver.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The issue we're running into when running CTS is that glsl types are
deleted while builtins depending on them are not.
This happens because on one hand we have glsl types ref counted, but
builtins are not. Instead builtins are destroyed when unloading libGL
or explicitly calling glReleaseShaderCompiler().
This change removes almost entirely any dealing with glsl types
ref/unref by letting the builtins deal with it instead. In turn we
introduce a builtin ref count mechanism. Each GL context takes a
reference on the builtins when compiling a shader for the first time.
It releases the reference when the context is destroyed. It can also
explicitly release those when glReleaseShaderCompiler() is called.
Finally we also take a reference on the glsl types when loading libGL
to avoid recreating glsl types too often.
v2: Ensure we take a reference if we don't have one in link step (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110796
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The compute paths in vl are a bit AMD-specific. For example, they (on
nouveau), try to use a BGRX8 image format, which is not supported.
Fixing all this is probably possible, but since the compute paths aren't
in any way better, it's difficult to care.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Fixes: 9364d66cb7 (gallium/auxiliary/vl: Add video compositor compute shader render)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
# Install python wheels, necessary to install SCons via pip
- python -m pip install wheel
# Install SCons
- python -m pip install scons==3.0.1
- scons --version
# Install flex/bison
- set WINFLEXBISON_ARCHIVE=win_flex_bison-%WINFLEXBISON_VERSION%.zip
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://github.com/lexxmark/winflexbison/releases/download/v%WINFLEXBISON_VERSION%/%WINFLEXBISON_ARCHIVE%"
- 7z x -y -owinflexbison\ "%WINFLEXBISON_ARCHIVE%" > nul
- set Path=%CD%\winflexbison;%Path%
- win_flex --version
- win_bison --version
# Download and extract LLVM
- if not exist "%LLVM_ARCHIVE%" appveyor DownloadFile "https://people.freedesktop.org/~jrfonseca/llvm/%LLVM_ARCHIVE%"
<h2>January 28, 2020</h2><p><ahref="relnotes/19.3.3.html">Mesa 19.3.3</a> is released. This is a bug fix release.</p><h2>January 9, 2020</h2><p><ahref="relnotes/19.3.2.html">Mesa 19.3.2</a> is released. This is a bug fix release.</p><h2>December 18, 2019</h2><p><ahref="relnotes/19.2.8.html">Mesa 19.2.8</a> is released. This is a bug fix release.</p><h2>December 18, 2019</h2><p><ahref="relnotes/19.3.1.html">Mesa 19.3.1</a> is released. This is a bug fix release.</p><h2>December 12, 2019</h2><p><ahref="relnotes/19.3.0.html">Mesa 19.3.0</a> is released. This is a new development release. See the release notes for mor information about this release.</p><h2>December 4, 2019</h2><p><ahref="relnotes/19.2.7.html">Mesa 19.2.7</a> is released. This is a bug fix release.</p><h2>November 21, 2019</h2><p><ahref="relnotes/19.2.6.html">Mesa 19.2.6</a> is released. This is a bug fix release.</p><h2>November 20, 2019</h2><p><ahref="relnotes/19.2.5.html">Mesa 19.2.5</a> is released. This is a bug fix release.</p><h2>November 13, 2019</h2><p><ahref="relnotes/19.2.4.html">Mesa 19.2.4</a> is released. This is an emergency bugfix release, all users of 19.2.3 are recomended to upgrade immediately.</p>
<h2>November 6, 2019</h2><p><ahref="relnotes/19.2.3.html">Mesa 19.2.3</a> is released. This is a bug fix release.</p><h2>October 24, 2019</h2><p><ahref="relnotes/19.2.2.html">Mesa 19.2.2</a> is released. This is a bug fix release.</p><h2>October 21, 2019</h2>
<p>
<ahref="relnotes/19.1.8.html">Mesa 19.1.8</a> is released.
This is a bug-fix release.
</p>
<p>
NOTE: It is anticipated that 19.1.8 will be the final release in the
19.1 series. Users of 19.1 are encouraged to migrate to the 19.2
series in order to obtain future fixes.
</p>
<h2>October 9, 2019</h2><p><ahref="relnotes/19.2.1.html">Mesa 19.2.1</a> is released. This is a bug fix release.</p><h2>September 25, 2019</h2>
<p>
<ahref="relnotes/19.2.0.html">Mesa 19.2.0</a> is released.
This is a new development release. See the release notes for more
information about this release
</p>
<h2>September 17, 2019</h2>
<p>
<ahref="relnotes/19.1.7.html">Mesa 19.1.7</a> is released.
This is a bug-fix release.
</p>
<h2>September 3, 2019</h2>
<p>
<ahref="relnotes/19.1.6.html">Mesa 19.1.6</a> is released.
This is a bug-fix release.
</p>
<h2>August 23, 2019</h2>
<p>
<ahref="relnotes/19.1.5.html">Mesa 19.1.5</a> is released.
This is a bug-fix release.
</p>
<h2>August 7, 2019</h2>
<p>
<ahref="relnotes/19.1.4.html">Mesa 19.1.4</a> is released.
@@ -1603,7 +1640,7 @@ shading language and built-in functions.
<h2>April 4, 2007</h2>
<p>
Thomas Hellström of Tungsten Graphics has written a whitepaper
Thomas Hellström of Tungsten Graphics has written a whitepaper
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109630">Bug 109630</a> - vkQuake flickering geometry under Intel</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110395">Bug 110395</a> - Shadows are flickering in SuperTuxKart</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111113">Bug 111113</a> - ANGLE BlitFramebufferTest.MultisampleDepthClear/ES3_OpenGL fails on Intel Ubuntu19.04</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111267">Bug 111267</a> - [CM246] Flickering with multiple draw calls within the same graphics pipeline if a compute pipeline is present</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (4):</p>
<ul>
<li>radv: Do non-uniform lowering before bool lowering.</li>
<li>ac/nir: Use correct cast for readfirstlane and ptrs.</li>
<li>radv: Avoid binning RAVEN hangs.</li>
<li>radv: Avoid VEGA/RAVEN scissor bug in binning.</li>
</ul>
<p>Danylo Piliaiev (1):</p>
<ul>
<li>i965: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>util: fix mem leak of program path</li>
</ul>
<p>Erik Faye-Lund (2):</p>
<ul>
<li>gallium/dump: add missing query-type to short-list</li>
<li>gallium/dump: add missing query-type to short-list</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111411">Bug 111411</a> - SPIR-V shader leads to GPU hang, sometimes making machine unstable</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110814">Bug 110814</a> - KWin compositor crashes on launch</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111069">Bug 111069</a> - Assertion fails in nir_opt_remove_phis.c during compilation of SPIR-V shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111271">Bug 111271</a> - Crash in eglMakeCurrent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111401">Bug 111401</a> - Vulkan overlay layer - async compute not supported, making overlay disappear in Doom</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111405">Bug 111405</a> - Some infinite 'do{}while' loops lead mesa to an infinite compilation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111467">Bug 111467</a> - WOLF RPG Editor + Gallium Nine Standalone: Rendering issue when using Iris driver</li>
<li><ahref="https://gitlab.freedesktop.org/mesa/mesa/issues/1878">Issue #1878</a> - meson.build:1447:6: ERROR: Problem encountered: libdrm required for gallium video statetrackers when using x11</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>docs: Update bug report URLs for the gitlab migration</li>
</ul>
<p>Alan Coopersmith (5):</p>
<ul>
<li>c99_compat.h: Don't try to use 'restrict' in C++ code</li>
<li>util: Make Solaris implemention of p_atomic_add work with gcc</li>
<li>util: Workaround lack of flock on Solaris</li>
<li>meson: recognize "sunos" as the system name for Solaris</li>
<li>intel/common: include unistd.h for ioctl() prototype on Solaris</li>
</ul>
<p>Andreas Gottschling (1):</p>
<ul>
<li>drisw: Fix shared memory leak on drawable resize</li>
</ul>
<p>Andres Gomez (3):</p>
<ul>
<li>docs: Add the maximum implemented Vulkan API version in 19.1 rel notes</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103674">Bug 103674</a> - u_queue.c:173:7: error: implicit declaration of function 'timespec_get' is invalid in C99</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104395">Bug 104395</a> - [CTS] GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels tests fail on 32bit Mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110765">Bug 110765</a> - ANV regression: Assertion `pass->attachment_count == framebuffer->attachment_count' failed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110814">Bug 110814</a> - KWin compositor crashes on launch</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111069">Bug 111069</a> - Assertion fails in nir_opt_remove_phis.c during compilation of SPIR-V shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111213">Bug 111213</a> - VA-API nouveau SIGSEGV and asserts</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111248">Bug 111248</a> - Navi10 Font rendering issue in Overwatch</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111271">Bug 111271</a> - Crash in eglMakeCurrent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111308">Bug 111308</a> - [Regression, NIR, bisected] Black squares in Unigine Heaven via DXVK</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111401">Bug 111401</a> - Vulkan overlay layer - async compute not supported, making overlay disappear in Doom</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111405">Bug 111405</a> - Some infinite 'do{}while' loops lead mesa to an infinite compilation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111411">Bug 111411</a> - SPIR-V shader leads to GPU hang, sometimes making machine unstable</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111414">Bug 111414</a> - [REGRESSION] [BISECTED] Segmentation fault in si_bind_blend_state after removal of the blend state NULL check</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111467">Bug 111467</a> - WOLF RPG Editor + Gallium Nine Standalone: Rendering issue when using Iris driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111490">Bug 111490</a> - [REGRESSION] [BISECTED] Shadow Tactics: Blades of the Shogun - problems rendering water</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111493">Bug 111493</a> - In the game The Surge (378540) - textures disappear then appear again when I change the camera angle view</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111509">Bug 111509</a> - [regression][bisected] piglit.spec.ext_image_dma_buf_import.ext_image_dma_buf_import-export fails on iris</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111522">Bug 111522</a> - [bisected] Supraland no longer start</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111566">Bug 111566</a> - [REGRESSION] [BISECTED] Large CS workgroup sizes broken in combination with FP64 on Intel.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111576">Bug 111576</a> - [bisected] Performance regression in X4:Foundations in 19.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111676">Bug 111676</a> - Tropico 6 apitrace throws error into logs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=111734">Bug 111734</a> - Geometry shader with double interpolators fails in LLVM</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>TBD</li>
<li>docs: Update bug report URLs for the gitlab migration</li>
</ul>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Change memory type order for GPUs without dedicated VRAM</li>
</ul>
<p>Alyssa Rosenzweig (1):</p>
<ul>
<li>pan/midgard: Fix writeout combining</li>
</ul>
<p>Andres Gomez (1):</p>
<ul>
<li>docs: Add the maximum implemented Vulkan API version in 19.2 rel notes</li>
</ul>
<p>Andres Rodriguez (1):</p>
<ul>
<li>radv: additional query fixes</li>
</ul>
<p>Arcady Goldmints-Orlov (1):</p>
<ul>
<li>anv: fix descriptor limits on gen8</li>
</ul>
<p>Bas Nieuwenhuizen (6):</p>
<ul>
<li>radv: Use correct vgpr_comp_cnt for VS if both prim_id and instance_id are needed.</li>
<li>radv: Emit VGT_GS_ONCHIP_CNTL for tess on GFX10.</li>
<li>radv: Disable NGG for geometry shaders.</li>
<li>Revert "ac/nir: Lower large indirect variables to scratch"</li>
<li>tu: Set up glsl types.</li>
<li>radv: Add workaround for hang in The Surge 2.</li>
</ul>
<p>Caio Marcelo de Oliveira Filho (2):</p>
<ul>
<li>nir/lower_explicit_io: Handle 1 bit loads and stores</li>
<li>glsl/nir: Avoid overflow when setting max_uniform_location</li>
</ul>
<p>Connor Abbott (1):</p>
<ul>
<li>radv: Call nir_propagate_invariant()</li>
</ul>
<p>Danylo Piliaiev (3):</p>
<ul>
<li>nir/loop_unroll: Prepare loop for unrolling in wrapper_unroll</li>
<li>nir/loop_analyze: Treat do{}while(false) loops as 0 iterations</li>
<li>tgsi_to_nir: Translate TGSI_INTERPOLATE_COLOR as INTERP_MODE_NONE</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>virgl: fix format conversion for recent gallium changes.</li>
<li>gallivm: fix atomic compare-and-swap</li>
</ul>
<p>Dave Stevenson (1):</p>
<ul>
<li>broadcom/v3d: Allow importing linear BOs with arbitrary offset/stride.</li>
</ul>
<p>Dylan Baker (9):</p>
<ul>
<li>bump version to 19.2-rc2</li>
<li>nir: Add is_not_negative helper function</li>
<li>Bump version for rc3</li>
<li>meson: don't generate file into subdirs</li>
<li>add patches to be ignored</li>
<li>Bump version for 19.2.0-rc4</li>
<li>cherry-ignore: Add patches</li>
<li>rehardcode from origin/master to upstream/master</li>
<li>bin/get-pick-list: use --oneline=pretty instead of --oneline</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>Update version to 19.2.0-rc1</li>
</ul>
<p>Eric Engestrom (14):</p>
<ul>
<li>ttn: fix 64-bit shift on 32-bit `1`</li>
<li>egl: fix deadlock in malloc error path</li>
<li>util/os_file: fix double-close()</li>
<li>anv: fix format string in error message</li>
<li>freedreno/drm-shim: fix mem leak</li>
<li>nir: fix memleak in error path</li>
<li>anv: add support for driconf</li>
<li>wsi: add minImageCount override</li>
<li>anv: add support for vk_x11_override_min_image_count</li>
<li>amd: move adaptive sync to performance section, as it is defined in xmlpool</li>
<li>radv: add support for vk_x11_override_min_image_count</li>
<li>drirc: override minImageCount=2 for gfxbench</li>
<li>gl: drop incorrect pkg-config file for glvnd</li>
<li>meson: re-add incorrect pkg-config files with GLVND for backward compatibility</li>
</ul>
<p>Erik Faye-Lund (2):</p>
<ul>
<li>gallium/auxiliary/indices: consistently apply start only to input</li>
<li>util: fix SSE-version needed for double opcodes</li>
</ul>
<p>Haihao Xiang (1):</p>
<ul>
<li>i965: support AYUV/XYUV for external import only</li>
</ul>
<p>Hal Gentz (2):</p>
<ul>
<li>glx: Fix SEGV due to dereferencing a NULL ptr from XCB-GLX.</li>
<li>gallium/osmesa: Fix the inability to set no context as current.</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>v3d: make sure we have enough space in the CL for the primitive counts packet</li>
</ul>
<p>Ian Romanick (8):</p>
<ul>
<li>nir/algrbraic: Don't optimize open-coded bitfield reverse when lowering is enabled</li>
<li>intel/compiler: Request bitfield_reverse lowering on pre-Gen7 hardware</li>
<li>nir/algebraic: Mark some value range analysis-based optimizations imprecise</li>
<li>nir/range-analysis: Adjust result range of exp2 to account for flush-to-zero</li>
<li>nir/range-analysis: Adjust result range of multiplication to account for flush-to-zero</li>
<li>nir/range-analysis: Fix incorrect fadd range result for (ne_zero, ne_zero)</li>
<li>nir/range-analysis: Handle constants in nir_op_mov just like nir_op_bcsel</li>
<li>nir/algebraic: Do not apply late DPH optimization in vertex processing stages</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>gallium/vl: use compute preference for all multimedia, not just blit</li>
</ul>
<p>Jason Ekstrand (9):</p>
<ul>
<li>anv: Bump maxComputeWorkgroupSize</li>
<li>nir: Handle complex derefs in nir_split_array_vars</li>
<li>nir: Don't infinitely recurse in lower_ssa_defs_to_regs_block</li>
<li>nir: Add a block_is_unreachable helper</li>
<li>nir/repair_ssa: Repair dominance for unreachable blocks</li>
<li>nir/repair_ssa: Insert deref casts when needed</li>
<li>nir/dead_cf: Repair SSA if the pass makes progress</li>
<li>intel/fs: Handle UNDEF in split_virtual_grfs</li>
<li>nir/repair_ssa: Replace the unreachable check with the phi builder</li>
</ul>
<p>Jonathan Marek (1):</p>
<ul>
<li>freedreno/a2xx: ir2: fix lowering of instructions after float lowering</li>
</ul>
<p>Jose Maria Casanova Crespo (1):</p>
<ul>
<li>mesa: recover target_check before get_current_tex_objects</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>bin/get-pick-list.sh: sha1 commits can be smaller than 8 chars</li>
</ul>
<p>Kenneth Graunke (20):</p>
<ul>
<li>gallium/ddebug: Wrap resource_get_param if available</li>
<li>gallium/trace: Wrap resource_get_param if available</li>
<li>gallium/rbug: Wrap resource_get_param if available</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.