We depend on BLORP to convert the clear color and write it into the
clear color buffer for us. However, we weren't bothering to call blorp
in the case where the state is ISL_AUX_STATE_CLEAR. This leads to the
clear color not getting properly updated if we have back-to-back clears
with different clear colors. Technically, we could go out of our way to
set the clear color directly from iris in this case but this is a case
we're unlikely to see in the wild so let's not bother. This matches
what we already do for color surfaces.
Cc: mesa-stable@lists.freedesktop.org
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4073>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4073>
(cherry picked from commit 9d07d59842)
This reverts commit f04d7439a0.
It no longer helps performance and the current vbo implementation is
faster anyway.
The app that hit this was a CAD program called Spazio3D. It made pretty
terrible use of the OpenGL API and we sent them some tips for improvements.
I'm assuming they've fixed this by now.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4052>
(cherry picked from commit df3891e74a)
This reverts commit 35fc7bdf0e.
Unfortunately mentioned commit introduced a memory leak because
`driwindowsMapConfigs` and `createDriMode` functions allocate
small memory portions for each element:
21,576 (232 direct, 21,344 indirect) bytes in 1 blocks are definitely lost in loss record 1,411 of 1,414
at 0x483A7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D4AA09: createDriMode (dri_common.c:291)
by 0x5D4ABF5: driConvertConfigs (dri_common.c:310)
by 0x5D58414: dri3_create_screen (dri3_glx.c:945)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
by 0x4F83038: glXQueryExtensionsString (in /usr/local/lib/libGL.so.1.7.0)
by 0x4F2EA6B: ??? (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.6.0)
by 0x4F2A0D7: waffle_display_connect (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.6.0)
by 0x498F42A: wfl_checked_display_connect (piglit-util-waffle.h:74)
There is one more thing which disallow us to easily fix it are different element sizes
for instance: `glx_config_create_list` allocates memory just for `glx_config`,
`driwindowsMapConfigs` for `driwindows_config` and
`createDriMode` for `__GLXDRIconfigPrivate`.
Yes it is possible but size of such fix
will be more big and complex than original one.
So it make sense only if the malloc overhead
really is a big problem there.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3406>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3406>
(cherry picked from commit 311c82e192)
Our current GPGPU_WALKER code only supports up to 64 threads.
On HSW we could use up to 70 and TGL up to 112, but only if the walker
is adjusted so the width does not exceed 64. Work to support this is
in progress.
Previous to this change, we might try to downgrade to SIMD8 if the
SIMD16 shader spilled. Since HSW and TGL have the max number of
threads above 64, we would then try to emit an invalid GPGPU walker
command.
Fixes: 932045061b ("i965/cs: Emit compute shader code and upload programs")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit cf12faef61)
The __DRI_IMAGE_FORMAT_* part wants to be handled for the *101010
type formats as well. Factor out a common function for that task.
That again makes the piglit egl_ext_device_base test work again
for hardware drivers.
v2: Factor out a common function for that task.
v3: dri2_pbuffer_visuals -> dri2_pbuffer_visuals
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 9acb94b623 "egl: Enable 10bpc EGLConfigs for platform_{device,surfaceless}"
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3790>
(cherry picked from commit d32c458de7)
When a resource is written by a compute shader and then used by a
non-compute stage we sync on last compute job to guarantee that the
resource has been completely written when the next stage reads resources.
In the other cases how flushes are done guarantee the serialization of
the writes and reads.
To reproduce the failure the following tests should be executed in batch
as last test don't fail when run isolated:
KHR-GLES31.core.shader_image_load_store.basic-allFormats-load-fs
KHR-GLES31.core.shader_image_load_store.basic-allFormats-loadStoreComputeStage
KHR-GLES31.core.shader_image_load_store.basic-allTargets-load-cs
KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
v2: Use fence dep instead of bo_wait (Eric Anholt)
v3: Rename struct names (Iago Toral)
Document why is not needed on graphics->compute case. (Iago Toral)
Follow same code pattern of the other update of in_sync_bcl.
v4: Fixed comments style. (Iago Toral)
Fixes KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
CC: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2700>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2700>
(cherry picked from commit 01496e3d1e)
When os_memory_debug.h was promoted to src/util, this source-file on
which it depends on when the debug-flag is set on windows was left
out. So let's move this also.
It doesn't seem there's any way of triggering this issue right now, but
it seems better to correct this to avoid this from biting us in the ass
in the future.
Fixes: 88c4680b5a ("util: promote u_memory to src/util")
Reviewed-by: Dylan Baker <dylan@pnwbakers>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3844>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3844>
(cherry picked from commit 2e3318b151)
The other source of the multiply will be interpreted as a uint32_t in an
XOR instruction. Any source modifiers with either not be interpreted at
all or will be misinterpreted due to the differing types.
If the other operand of the multiplication has a source modifier, just
emit an extra move to resolve the source modifiers.
The negation source modifier problem is difficult to reproduce due to an
algebraic optimization that changes (-a*b) to -(a*b). However, changes
in MR !1359 push the negations back down.
On Gen7+ it might be possible to do slightly better for an abs() source
modifier by using BFI2 as a glorified copysign().
On Gen8+ it might be possible to do slightly better for a neg() source
modifier by emitting (~a ^ b).
There were no shader-db changes on any Intel platform, so I think we can
deal with that problem when it arises.
See also piglit!224.
Fixes: 06d2c11641 ("intel/fs: Add a scale factor to emit_fsign")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
(cherry picked from commit 273b8cd1ca)
The interpretation of the fields is different depending whether the
instruction is a SEND/MATH or not.
This fixes the disassembly output for non-SEND/MATH instructions that
have both in-order and out-of-order dependencies. Their dependencies
were wrongly represented as `@A $B` when the correct would be `@A
$B.dst`.
Fixes: 6154cdf924 ("intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen.")
Fixes: 83612c0127 ("intel/disasm/gen12: Disassemble software scoreboard information.")
Acked-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
(cherry picked from commit 79788b8f7f)
Found by inspection. Existing code was trying to avoid assuming that
an SBID had been assigned to the virtual instruction, but
synchronizing the header setup with respect to the previous SIMD16
SEND by using SYNC.ALLRD doesn't really seem possible unless the SEND
instruction had been assigned an SBID. Assert-fail instead if no SBID
has been allocated.
Fixes: 15e3a0d9d2 "intel/eu/gen12: Set SWSB annotations in hand-crafted assembly."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4e4e8d793f)
This reverts the functional part of commit
d17ff2f7f1, leaving the unit test for
mesa/pipe agreement on what's an array.
The issue is that the util_channel_desc.shift values on array formats are
not used for bit addressing in memory, they're bit addressing within a
word treating a pixel of the format as a native type, as seen by
llvmpipe's use of the values to do shifts (see
lp_build_unpack_arith_rgba_aos() for example). This means the values are
nonsensical for 3-byte RGB, but then llvmpipe doesn't expose those formats
so it works out.
I still want to clean up our big-endian format handling at some point, but
let's fix the s390x regression first, sort out our format unit tests in
CI, then be able to refactor with confidence.
Fixes: d17ff2f7f1 ("gallium: Fix big-endian addressing of non-bitmask array formats.")
Closes: #2472
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
(cherry picked from commit 1886dbfe73)
Since we are now using sparse matrix for format_conversion_table,
we have to make sure we have last entry in table which gives the
sense of required size of format_conversion_table
Fixes: 84db6ba7 ("svga: Drop unsupported formats from the format table")
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit 470e73e7f8)
The previous commit leads to match immed values unexpectedly.
This makes constlen for each shader including bvert wrong.
Also fixes atan2 for mediump deqp tests.
Fixes: cbd1f47433 ("freedreno/ir3: convert back to 32-bit values for half constant registers.")
v2: Move conversion up above fabs/fneg modifier handling as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
(cherry picked from commit 260bd32b58)
Allocation all the bo as ALWAYS_VALID means they must all fit in memory
(vram + gtt) at each command submission.
This causes some trouble when the total allocated memory is greater than
the available memory.
Possible solutions:
- being able to tag/untag a bo as ALWAYS_VALID: would require kernel changes
- disable VM_ALWAYS_VALID when memory usage is more than a percentage of the
available memory
- disable VM_ALWAYS_VALID entirely
v1 of this patch implemented option 2. v2 (this version) implements option 3.
Related issues:
- https://gitlab.freedesktop.org/drm/amd/issues/607
- https://gitlab.freedesktop.org/mesa/mesa/issues/1257
It also helps with some piglit tests (-t maxsize -t "max[_-].*size" -t maxuniformblocksize):
instead of crashing the machine, the tests fail cleanly.
(cherry-pick from ab54624d0d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3709>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3709>
The extra bits in CB_SHADER_MASK break dual source blending in
SkQP on a Stoney device. However:
- As far as I can tell, some other dual source blend tests are passing
before and after the change.
- A hacked around skqp passes on my Vega desktop and Raven laptop
- Getting Skqp to give any useful info or to run it outside of Android
on ChromeOS is proving difficult.
I have confirmed 3 strategies that seem to work:
- The old radv behavior of setting CB_SHADER_MASK to 0xF
- AMDVLK: CB_SHADER_MASK = 0xFF, and the 3 RB+ regs
are 0.
- radeonsi: CB_SHADER_MASK = 0xFF, but does not set DISABLE
bits in SX_BLEND_OPT_CONTROL for CB 1-7.
Let us use the radeonsi solution as that solution also seems like the correct
thing to do for holes. I have tested on my Raven laptop that setting the high
surfaces to not disabled and downconvert to 32_R does not imply a performance
penalty.
Fixes: e9316fdfd4 "radv: fix setting CB_SHADER_MASK for dual source blending"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3670>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3670>
(cherry picked from commit 65a6dc5139)
When the amdgpu_screen_winsys uses the same FD as the amdgpu_winsys
(which is always the case for the first amdgpu_screen_winsys), we can
just use bo->u.real.kms_handle.
v2:
* Also only create the kms_handles hash table if the
amdgpu_screen_winsys fd is different from the amdgpu_winsys one.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(Cherry picked from commit c6468f66c7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3693>
The assumption being that KMS handles are only retrieved for relatively
few BOs, so hash tables should be efficient both in terms of performance
and memory consumption.
We use the address of struct amdgpu_winsys_bo as the key and its
kms_handle field (the KMS handle valid for the DRM file descriptor
passed to amdgpu_device_initialize) as the hash value.
v2:
* Add comment above amdgpu_screen_winsys::kms_handles (Pierre-Eric
Pelloux-Prayer)
v3:
* Protect kms_handles hash table with amdgpu_winsys::sws_list_lock
mutex.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(Cherry picked from commit 24075ac60f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3693>
In the long term the goal of this script is to nearly completely
automate the process of picking stable nominations, in a well tested
way.
In the short term the goal is to provide a better, faster UI to interact
with stable nominations.
This commit makes two changes:
1. We set pending_pipe_bits instead of emitting PIPE_CONTROL directly
for the flush at the end of cmd_buffer_begin_subpass.
2. Because BLORP ops such as vkCmdClearAttachments may come in the
middle of a render pass, we have to also flag the need for a cache
flush after the blorp op.
Fixes: 185630c6bc "anv/blorp: Do the gen11 BTI flush"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
(cherry picked from commit c70a786c77)
Conflicts:
src/intel/vulkan/genX_cmd_buffer.c
A shader might require vgpr spilling but not require sgpr spilling. In
that case, the spiller lowers the sgpr target by 5 which could mean sgpr
spilling is then required. Then the vgpr target has to be lowered to make
space for the linear vgprs. Previously, space wasn't make for the linear
vgprs.
Found while testing the spiller on the pipeline-db with a lowered limit
Fixes: a7ff1bb5b9
('aco: simplify calculation of target register pressure when spilling')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
(cherry picked from commit 590c26beab)
Conflicts:
src/amd/compiler/aco_spill.cpp
Otherwise, code like this will be broken:
loop {
if (...) {
break;
} else {
break;
}
}
The continue_or_break block doesn't have any logical predecessors but it's
a logical predecessor of the header block. This liveness error breaks the
spiller in init_live_in_vars() (under "keep variables spilled on all
incoming paths") and eventually creates garbage reloads.
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
(cherry picked from commit d282a292ec)
Currently, fetching the partial results (VK_QUERY_RESULT_PARTIAL_BIT)
of an unavailable occlusion query via vkGetQueryPoolResults can
return invalid values. anv returns slot.end - slot.begin, but in the
case of unavailable queries, slot.end is still at the initial value
of 0. If slot.begin is non-zero, the occlusion count underflows to
a value that is likely outside the acceptable range of the partial
result.
This commit fixes vkGetQueryPoolResults by always returning 0 if the
query is unavailable and the VK_QUERY_RESULT_PARTIAL_BIT is set.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
(cherry picked from commit af92ce50a7)
We were applying row pitch constraint of CCS surfaces to linear
surfaces. But CCS is only supported in linear tiling under some
condition (more on that in the following commit). So let's drop that
requirement for now.
Fixes a bunch of crucible assert where the byte size of a linear image
is expected to be similar to the byte size of buffer for the same
extent in the following category :
func.miptree.r8g8b8a8-unorm.aspect-color.view-2d.*download-copy-with-draw.*
v2: Move restriction to isl_calc_tiled_min_row_pitch()
v3: Move restrinction to isl_calc_row_pitch_alignment() (Jason)
v4: Update message (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 07e16221d9 ("isl: Round up some pitches to 512B for Gen12's CCS")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3551>
(cherry picked from commit a3f6db2c4e)
Decompress resources properly but don't do it inside launch_grid
to prevent recursion.
(cherry-picked from df34fa14bb)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is not always ->rgbBits, because there are cases where that could
be 32 but we're (legally) bound to a depth-24 pixmap. The important
thing to have match here is the actual server-side notion of depth. You
can look this up (at modest expense) from the xlib visual info if the
fbconfig has a visual. But it might not, so if not, fetch it (at
slightly greater expense) from XGetGeometry. Do this at GLX drawable
creation so you don't have to do it on the SwapBuffers path.
Apparently this fixes glx/glx-swap-singlebuffer, which is unintentional
but quite pleasant.
Fixes: mesa/mesa#2291
Fixes: 90d58286 ("drisw: Fix and simplify drawable setup")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
(cherry picked from commit 2fc11e8a05)
Conflicts:
.gitlab-ci/piglit/quick_gl.txt
This testing doesn't exist in the 19.3 branch, so I've deleted the file.
Dylan
When VK_DESCRIPTOR_TYPE_SAMPLER is provided, it doesn't need to be
counted as a buffer count. Otherwise it leads to mismatch of allocated
buffer size, hitting VK_ERROR_OUT_OF_POOL_MEMORY finally.
Fixes: c39afe68f0
Also fixes amber tests:
./tests/cases/address_modes_float.amber
./tests/cases/address_modes_int.amber
./tests/cases/magfilter_linear.amber
./tests/cases/magfilter_nearest.amber
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
(cherry picked from commit 26d93a7495)
Some formats, in particular YCbCr formats and ASTC have additional
restrictions. We already whack ASTC formats to RGBA32_UINT because the
hardware doesn't allow LINEAR with ASTC. However, we need to fix YCbCr
formats as well because they come with alignment restrictions that we
can't guarantee are satisfied. We're using blorp_copy to do the copies
so we may as well just stomp formats for everything.
Fixes: b24b93d584 "anv: enable VK_KHR_sampler_ycbcr_conversion"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
(cherry picked from commit dd92179a72)
The pre-tag right before is a block-level tag, which means it implicitly
terminates the paragraph. So there's no paragraph to close after this.
Instead, move the paragraph-closing before the pre-tag, to explicitly
close the paragraph.
Fixes: 41b3eb08d9 "docs: update meson docs for windows"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>
(cherry picked from commit b387f68f49)
Fixes the following valgrind error:
Invalid read of size 16
at 0x28F458A1: si_set_sampler_view_desc (in radeonsi_drv_video.so)
by 0x28F4657E: si_set_sampler_views (in radeonsi_drv_video.so)
by 0x28D62BF5: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Address 0x18142a10 is 0 bytes inside a block of size 48 free'd
at 0x48369AB: free (vg_replace_malloc.c:540)
by 0x28D62D51: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Block was alloc'd at
at 0x4837B65: calloc (vg_replace_malloc.c:762)
by 0x28EFB2EC: si_create_sampler_state (in radeonsi_drv_video.so)
by 0x28D62C30: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Fixes: 69430d7e59 ("va: use a compute shader for the blit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2321
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
(cherry picked from commit 5b1c4e1b75)
This patch changes lower_to_cssa to be much more conservative
about assumptions which phi operands might interfere.
Previously, this pass wasn't exhaustive and could miss some corner cases.
v2: remove optimizations to find better insertion points as it's hard
to guarantee that they are always correct and have overall no benefit.
Fixes: 0b8216b2cd ('aco: Lower to CSSA')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>
(cherry picked from commit d098024c40)
get_nir_image_intrinsic_image() was incorrectly mutating the value held
by the register which holds the intrinsic's first source (image index).
If this happened to be the register for an SSA def which is also used
elsewhere in the program, this meant that we would clobber that value
in subsequent uses.
Note that this only affects i965, because neither anv nor iris use the
binding table start sections, so nothing is ever added here.
Fixes KHR-GL46.compute_shader.resources-max on i965 with Eric Anholt's
MR !3240 applied. That MR reorders SSBOs and ABOs, so that test uses
image 0 and SSBO 0, causing this code to brilliantly add binding table
index 45 to both the image (correct) and the SSBO (bzzt, wrong!).
Fixes: 09f1de97a7 ("anv,i965: Lower away image derefs in the driver")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404>
(cherry picked from commit 0a1c47074b)
This fixes YUYV support on etnaviv.
Fixes: 7404833c "gallium: add handling for YUV planar surfaces"
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
We don't create atomics with definitions if they are not used in NIR, but
our own DCE can remove the uses if an export turns out to be null.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
(cherry picked from commit 69bed1c918)
Conflicts resolved by Dylan Baker
Conflicts:
src/amd/compiler/aco_opcodes.py
Fixes: b390ff3517 ("intel/fs: Add support for SLM fence in Gen11")
Fixes: e142061399 ("intel/fs: Implement scoped_memory_barrier")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit edf6a40cb2)
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message. In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function. This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.
Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.
The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0703eab012)
Prevents invalid code from being emitted for ROR/ROL instructions in
SIMD32 shaders.
The problem can be reproduced with the following tests while forcing
SIMD32 to be used for fragment shaders:
piglit.shaders.glsl-rotate-left
piglit.shaders.glsl-rotate-right
However the issue could occur in production already with compute
shaders and a workgroup size large enough to trigger SIMD32 dispatch.
Fixes: 83fdec0f0d "intel/compiler: Enable the emission of ROR/ROL instructions"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0a6e46d44d)
The current implementation was broken for any integers between 2^24
and 2^30 (it would return zero for me on ICL). The reason is that for
such integers we wouldn't take the 'if (0 <= shiftCount)' early return
path, however 'shiftCount + 7' would be positive, leading to a
negative 'count' argument passed to __shift64RightJamming(), which
would give undefined results.
This reworks the affected conversion functions to use either
__shortShift64Left() or __shift64RightJamming() based on the sign of
the final shift count, which should avoid the problem. In addition
this should qualify as a clean-up/optimization -- This implementation
of the conversion functions translates to 7 instructions less than the
original on Intel hardware.
This fixes the 'KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot'
conformance tests on soft fp64 hardware with large enough subgroup
size (>16).
Fixes: d5cf6e92b4 "glsl: Add built-in functions to do uint64_to_fp32(uint64_t)"
Fixes: c9d333a6b7 "glsl: Add built-in functions to do int64_to_fp32(int64_t)"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
(cherry picked from commit a30bb25a7a)
This patch prevents memory leak in get_version function in st_manager.c
This issue was found by valgrind:
16 bytes in 1 blocks are definitely lost in loss record 6 of 1,418
at 0x483CD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x63D9476: st_init_extensions (st_extensions.c:1679)
by 0x63B803B: get_version (st_manager.c:1271)
by 0x63B8124: st_api_query_versions (st_manager.c:1289)
by 0x63266EF: dri_init_screen_helper (dri_screen.c:583)
by 0x6321B12: dri2_init_screen (dri2.c:2110)
by 0x631AACC: driCreateNewScreen2 (dri_util.c:155)
by 0x5D58192: dri3_create_screen (dri3_glx.c:897)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
Fixes: eca8032f20 ("gallium: Add ARB_gl_spirv support")
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
(cherry picked from commit ebaab89761)
Fixes: 9b331e462e ("radeonsi: use compute shaders for clear_buffer & copy_buffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a5fe84aefb)
Fixes: 1b25d340b7 ("radeonsi: use compute for resource_copy_region when possible")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 1acf714d57)
Fixes: 984fd73515 ("radeonsi: use compute for clear_render_target when possible")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e1e87466ae)
Fixes: 095a58204d ("radeonsi: expand FMASK before MSAA image stores are used")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6c019e28ca)
The availability is not written at the location changed in
ee6fbb95a74d...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ee6fbb95a7 ("anv: Properly handle host query reset of performance queries")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 60e0db3bfb)
When decoding using VDPAU, the _MaxLevel value becomes -1 due to
NumLevels being equal to 0 at a certain point, and decoding fails
due to an assertion later on.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3a4f8c8158)
e5167a9276 disabled SDMA for gfx8.
This caused 3 piglit arb_sparse_buffer tests (basic, buffer-data
and commit) to crash on GFX8.
Reported-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: e5167a9276 ("radeonsi: disable SDMA on gfx8 to fix corruption on RX 580")
(cherry picked from commit 5f8daae4d8)
Conflicts resolved by Dylan Baker
Conflicts:
src/gallium/drivers/radeonsi/si_buffer.c
From issue 10 of the OES_EGL_image_external_essl3:
A limited set of use-cases is enabled by making glBindImageTexture
accept external textures. Shaders can access such external textures
using the existing <image2D> sampler type.
Fixes: 02a6d901ee ("mesa: add OES_EGL_image_external_essl3 support")
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit ed43dd62ac)
Our barrier instruction does not implicitly do a memory fence but the
GLSL barrier() intrinsic is supposed to. The easiest back-portable
solution is to just add the NIR barriers. We'll sort this out more
properly in later commits.
Cc: mesa-stable@lists.freedesktop.orgCloses: #2138
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 803fad43c3)
Fixes a hang on Raven with Resident Evil 2.
I did not find anything more restricted to fix it:
- Setting persistent_states_per_bin to 1 fixes it too,
but likely does an internal break on any descriptor set changes
too.
- Only breaking the batch when cb_target_mask changes does not fix
it (and looking at AMDVLK comments, I suspect the code in radeonsi
should really be doing a FLUSH_DFSM).
- Always doing a FLUSH_DFSM on shader switch helps, but that is more
often than this and I don't think we should be doing that when DFSM
is disabled.
- Also emitting the existing break on framebuffer change when DFSM is
disabled does not fix the issue.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2315
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 7cc0702bbb)
When SSBO array is used with packed layout, both IR tree
and as a result, NIR tree will be incorrect.
In fact, the SSBO dereference indices won't
match the array size in some cases like the following:
"layout(packed, binding=1) buffer SSBO { vec4 a; } ssbo[3];
out vec4 color;
void main() {
color = ssbo[2].a;
}"
After linking the IR and then NIR will have an SSBO array
definition with size 1 but dereference still will have index 2
and linked_shader->Program->sh.ShaderStorageBlocks
will contain just SSBO with name "SSBO[2]"
So this line should be removed at least as a workaround for now
to avoid error like:
Failed to find the block by name "SSBO[0]"
Fixes: 810dde2a "glsl/nir: Add a pass to lower UBO and SSBO access"
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit be6d51e1e3)
This is needed to be in agreement with spec requirements:
https://github.com/KhronosGroup/OpenGL-API/issues/46
Piers Daniell:
"We discussed this in the OpenGL/ES working group meeting
and agreed that eliminating unused elements from the interface
block array is not desirable. There is no statement in the spec
that this takes place and it would be highly implementation
dependent if it happens. If the application has an "interface"
in the shader they need to match up with the API it would be
quite confusing to have the binding point get compacted.
So the answer is no, the binding points aren't affected by
unused elements in the interface block array."
v2: - 'original_dim_size' field moved above to keep
the struct packed better on 64-bit
- added a comment for 'total_num_array_elements' field
- fixed a binding point calculations for SSBOs array of arrays
( Ian Romanick <ian.d.romanick@intel.com> )
- fixed binding point calculations for non-packed SSBOs
v3:
- rename 'total_num_array_elements' to 'aoa_size'
( Jason Ekstrand <jason@jlekstrand.net> )
- rename 'boffset' to 'binding_stride'
( Alejandro Piñeiro <apinheiro@igalia.com> )
Fixes: 8cf1333b "glsl: link uniform block arrays of arrays"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532
Reported-By: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 4beb0a2308)
The tiled-case is non-sensical for non-base mips, but Vulkan requires
that this function handles it but at the same time does not require
returning anything useful. So we can basically return anything.
Correct tiled pitch and offset are still required for our own WSI and
in the future getting the layouts of images with DRM format modifiers.
Both don't have to deal with images with more than 1 level though.
Fixes: 824bd0830e "radv: return the correct pitch for linear mipmaps on GFX10"
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2301
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2304
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 17741a0a05)
Fixes: 624789e370 "compiler/glsl: handle case where we have multiple users for types"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 96c9483ccf)
Conflicts:
src/compiler/spirv/spirv2nir.c
There are only 13 bits available to store the line width, hence
it can't be larger than 8191
v2: Add Fixes tag
v3: - Unify value since for all r600 archs (Konstantin Kharlamov)
- Correct the value the line width value is emitted as a 12.4
fixed point value of 1/2 line width on r600-r700 and as
8 * line width on Evergreen and newer.
Fixes: 06bfb2d28f
r600: fork and import gallium/radeon
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3286>
(cherry picked from commit e8559ae448)
According to the description of VkGraphicsPipelineCreateInfo(),
pViewportState, pMultisampleState, pDepthStencilState and
pColorBlendState must be ignored when rasterization is not enabled.
This avoids potentially invalid pointers being dereferenced when
rasterization is disabled. Tested with `demos_x64 VK_Parameter_Zoo`
from Renderdoc repository.
v2: Don't store the `raster_enabled` as part of anv_pipeline, just
query it from the create info. This avoids storing a state that's
only used during pipeline creation. (Jason)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2258
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch> [v1]
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 75a19186b2)
Fixes 240 failing test cases in dEQP-VK.spirv_assembly which
were failing due to a bad s_ashr_i32 instruction. This commit
fixes the instruction format along with the definitions of the
instruction.
Fixes: 11f43caaec
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 11e62a9734)
We support the same set of samples for integer color formats as for
non-integer. We've been advertising it wrong since before the initial
Vulkan 1.0 release. :-(
Fixes: d689745303 "vk/0.210.0: Rework device features and limits"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit ac70442ce1)
Currently piglit spec@arb_occlusion_query@occlusion_query_conform
spins for ever as the resource status is never reset. See
etna_hw_get_query_result(..) for more details.
Fixes: 1456aa61cc ("etnaviv: Rework resource status tracking")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Marek Vasut <marex@denx.de>
(cherry picked from commit 6e75f2172b)
Using a drm syscall layer faking a kernel driver :
==581460== Conditional jump or move depends on uninitialised value(s)
==581460== by 0x48A4C2B: close (drm-hooks.cpp:185)
==581460== by 0x5A815F1: dri3_alloc_render_buffer (loader_dri3_helper.c:1469)
==581460== by 0x5A82050: dri3_get_buffer (loader_dri3_helper.c:1827)
==581460== by 0x5A82662: loader_dri3_get_buffers (loader_dri3_helper.c:2028)
==581460== by 0x6C78109: intel_update_image_buffers (brw_context.c:1870)
==581460== by 0x6C77805: intel_update_renderbuffers (brw_context.c:1499)
==581460== by 0x6C7789D: intel_prepare_render (brw_context.c:1520)
==581460== by 0x6C773D4: intelMakeCurrent (brw_context.c:1341)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 069fdd5f9f ("egl/x11: Support DRI3 v1.1")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3152>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3152>
(cherry picked from commit fc2552b644)
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older
platforms that don't support MUL with 32x32 types and use vec4.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 766fdeccf9)
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms
that don't support MUL with 32x32 types, including ICL and TGL.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 2137be22fa)
We appear to have got lucky that the only type of temporary fence
payload we could have was a syncobj and that would only happen when
the type of the permanent payload was also a syncobj.
This code was broken if that assumption changed and it did in commit
f9a3d9738b.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 52bc235f2a)
Neither Mutter nor KWin's wayland compositors appear to use modifiers.
In the non-modifier case, iris was still trying to use Y-tiling for
scan-out surfaces, leading to this error:
(gnome-shell:7247): mutter-WARNING **: 09:23:47.787: meta_drm_buffer_gbm_new failed: drmModeAddFB failed: Invalid argument
We now fall back to the historical X-tiling for scanout buffers, which
ought to work everyone, at lower performance. To regain that, we need
to ensure modifiers are actually supported in environments people use.
Fixes: fbf3124771 ("iris: Rework tiling/modifiers handling")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit dcb4230e5e)
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.
Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 5fdea9f401)
Global load/store instructions can't know if the offset is
out-of-bound because they don't use descriptors (no range).
Fix this by clamping the offset for arrays that are indexed
with a non-constant offset that's greater or equal to the array
size.
This fixes VM faults and GPU hangs with Dead Rising 4.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148
Fixes: 71a6794200 ("ac/nir: Enable nir_opt_large_constants")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit a0f1a5fa05)
vl functions moved from radeonsi to gallium/auxiliary/vl have left
android build of radeonsi in broken state.
libmesa_galliumvl static is need to build readeonsi,
gallium_dri building rules are reworked to avoid multiple symbols
and libmesa_galliumvl static dependency is needed in radeonsi.
Here is the changelog:
- android: gallium/auxiliary: add libmesa_galliumvl static
- android: gallium_dri: move libmesa_gallium to static to prevent multiple symbols
- android: radeonsi: fix build after vl refactoring
Fixes the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_uvd.c:47:
error: undefined reference to 'vl_video_buffer_create_as_resource'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 86e60bc ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 96aef08dc6)
Conflicts Resolved by Dylan Baker
Conflicts:
src/gallium/targets/dri/Android.mk
Panfrost is not enabled for android in 19.3, and the series is a bit
bigger than I'd like to pull into the stable branch for a .0 release
Multi-planar surfaces are allowed to have modifiers. Don't require
DRM_FORMAT_MOD_INVALID in order to create a surface for each plane
defined by the format.
Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 21376cffb3)
When importing a dmabuf with a specified tiling, the dmabuf user
should always try to set the tiling mode because: 1) the exporter
can set tiling AFTER exporting/importing. 2) a dmabuf could be
exported from a kernel driver other than i915, in this case the
dmabuf user and exporter need to set tiling separately.
This patch fixes a problem when running vkmark under weston with
iris on ICL, it crashed to console with the following assert. i965
doesn't have this problem as it always tries to set the specified
tiling mode.
weston: ../src/gallium/drivers/iris/iris_resource.c:990: iris_resource_from_handle: Assertion `res->bo->tiling_mode == isl_tiling_to_i915_tiling(res->surf.tiling)' failed.
Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit b6d45e7f74)
This format will be used to properly handle planar images with modifiers
in iris.
Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 51ee8fff9b)
This is correct per the Vulkan spec format equivalence table.
Fixes: f36b52740a "radv/android: Add android hardware buffer queries."
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 2e44bfc14f)
When using 3 planes, the sequence produces this chain:
plane0 -> plane2
This commit fixes this to produce:
plane0 -> plane1 -> plane2
Fixes: 86e60bc265 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e3e91cebcd)
u_decomposed_prims_for_vertices cannot support POLYGON, but POLYGON is
trivial to support as a special case directly (since we have the number
of vertices directly).
Fixes aborts in Panfrost in apps using GL_POLYGON.
Fixes: e881aa8c12 ("gallium/util: Add u_stream_outputs_for_vertices helper")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Revewied-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit a37822f5f7)
It's a very odd case to hit in the real world. However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.
Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 0f60aa4037)
With the addition of the planar formats helper, the
planar formats no longer have a valid block.bits field.
Calling util_format_get_blocksize therefore asserts.
Reorder the check to see if the format is supported
before doing the query to get the blocksize.
Fixes: 20f132e5ef ("gallium/util: add planar format layouts and helpers")
Signed-off-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
(cherry picked from commit c496d44284)
The commit noted below assumed and enforced that DRM_MOD_INVALID was the
only valid modifier for multi-planar imported images. Due to that, it
required that modifier on multi-planar images to:
1. Allow multiple planes.
2. Perform YUV format lowering and extent adjustments.
3. Use buffer_index to correctly map the given planes.
Fix these issues by removing or updating the code built on that
assumption.
Fixes: 2066966c10 ("gallium/dri2: Support creating multi-planar modifier images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d5c857837a)
Iterate the system values list when adding varyings to the program
resource list in the NIR linker. This is needed to avoid CTS
regressions when using the NIR to build the GLSL resource list in
an upcoming series. Presumably it also fixes a bug with the current
ARB_gl_spirv support.
Fixes: ffdb44d3a0 ("nir/linker: Add inputs/outputs to the program resource list")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 1abca2b3c8)
Without looking at the assembly or something, I'm not sure what the
compiler does here. The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t. That's the only reason I
could think that comparing with -1 would be always true.
This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.
src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1515 | return *type != -1;
| ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1518 | return *type != -1;
| ^~
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
(cherry picked from commit 668635abd2)
Somehow adjusting maxloc based on existing outputs got lost, resulting
in the clipdist varying clobbering the position varying. Causing a
shader that had no position output in freedreno/ir3, which triggers GPU
hangs in neverball.
Fixes: d0f746b645 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 372ed42d22)
This is a more accurate description of what happens in processing the
OA reports.
Previously we only had a somewhat difficult to parse state machine
tracking the context ID.
What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :
* whether the r0 is tagged with the context ID relevant to us
* if r0 is not tagged with our context ID and r1 is: does r0 have a
invalid context id? If not then we're in a case where i915 has
resubmitted the same context for execution through the execlist
submission port
v2: Update comment (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8c0b058263)
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b364e920bf)
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.
We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d0a5c817c)
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit acea59dbf8)
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit b1f37688ba)
LLVM and the proprietary compiler seem to do this
Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit a9fc81b098)
brw_performance_query_metrics.h was removed in
134e750e16 and
brw_performance_query.h was removed in
8ae6667992
remove reference to these files from Makefile.sources
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Fixes: 8ae6667992 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 34dda0ca65)
We must reset the damage info of our render targets here even though a
damage reset normally happens when the DRI layer swaps buffers. That's
because there can be implicit flushes the GL app is not aware of, and
those might impact the damage region: if part of the damaged portion
is drawn during those implicit flushes, you have to reload those areas
before next draws are pushed, and since the driver can't easily know
what's been modified by the draws it flushed, the easiest solution is
to reload everything.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 65ae86b854 ("panfrost: Add support for KHR_partial_update()")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
(cherry picked from commit c6e2096c47)
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b196e1a8cf)
Was totally broken ...
Removed two if(point) {} because point is always non-NULL and we
were counting on that already for counting, since we NULL our
references to semaphores without active point earlier.
Fixes: 4aa75bb3bd "radv: Add wait-before-submit support for timelines."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2137
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 48fc65413c)
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour. On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.
This occurs in amdgpu_winsys_create() after
cb446dc0fa
winsys/amdgpu: Add amdgpu_screen_winsys
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 3fe3bde4f2)
They were out of sync. Besides syncing, lets ensure they never diverge
again.
Fixes: 8d2654a419 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4cde0e04e3)
When drawing the main character in Shadow of Mordor, the game appears
to draw Talion with one vertex shader, and the Wraith with another.
If the compiler optimizes those in different ways which lead to slight
imprecisions, then the resulting positions may not line up, leading to
Z-fighting occurring as the game decides which of the two are in front.
brw_nir_opt_peephole_ffma looks at usages of multiply adds across the
entire shader, and may make different decisions between the two, leading
to such imprecisions and Z-fighting. This started happening recently
after a NIR change to eliminate unnecessary MOVs (7025dbe7), but that
change simply exposed the existing problem.
Improves performance on Skylake GT4e by 1.22945% +/- 0.398672% (n=3),
likely due to the fixed rendering.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1985
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 51cc380894)
Many applications use multi-pass rendering and require their vertex
shader position to be computed the same way each time. Optimizations
may consider, say, fusing a multiply-add based on global usage of an
expression in a shader. But a second shader with the same expression
may have different code, causing that optimization to make the other
choice the second time around.
The correct solution is for applications to mark their VS outputs
'invariant', indicating they need multiple shaders to compute that
output in the same manner. However, most applications fail to do so.
So, we add a new driconf option - vs_position_always_invariant - which
forces the gl_Position output in vertex shaders to be marked invariant.
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 9b577f2a88)
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.
Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 86a5fbfd4a)
Commit 0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail
on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't
propely handle this. Disabling this format also for buffers fixes the
issue.
In addition, disabling also the related RGB8 integer formats for buffers
fixes some deqp-gles3 tests:
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d
Fixes: 0899bf55
st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM
Closes#2118
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit e41958e344)
In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.
Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.
Fixes: cff53da3 ("radv: enable secure compile support")
(cherry picked from commit f54c4e85ce)
This will be used to create a communication pipe between the user
facing device and a freshly forked (per pipeline compile) slim copy
of that device.
We can't use pipe() here because the fork will not be a direct fork
of the user facing process. Instead we use a previously forked
copy of the process that was forked at device creation in order to
reduce the resources required for the fork and avoid performance
issues.
Fixes: cff53da374 ("radv: enable secure compile support")
(cherry picked from commit 1663bb1f77)
In the following commits we want to be able to fork an existing lightweight
fork created at device creation time. In order for the user facing process
to communicate with this new fresh fork we create some members here to hold
FIFO file descriptors and a unique id.
Here we also add a new fork enum that we use to tell the lightweight
process to create a fresh fork.
For more information on why we create a fresh fork see the following
commits.
(cherry picked from commit ef54f15da9)
Specifically when we are in non-uniform control flow, as we would need
to set the condition for the last instruction. If (for example) a
image atomic load stores directly their value on a NIR register,
last_inst would be a nop, and would fail when set the condition.
Fixes piglit test:
spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test
Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
v2: (Changes suggested by Eric Anholt)
* Cover all sig.ld* signals, not just ldunif and ldtmu, as all of
them have the same restriction.
* Update comment explaining why we add a MOV in that case
* Tweak commit message.
v3:
* Drop extra set of parens (Eric)
* Add missing ld signal to is_ld_signal to fix shader-db regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b4bc59e37e)
When the scratch ringbuffer settings are changed, the shader unit has
to be idle or we will have shaders using old and new settings.
That combination is not supported on the HW (likely the offset is
ringbuffer idx * WAVESIZE * 1024).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4eb2a1dc6f)
Two files exist in that directory:
- vulkan_xlib_randr.h
- vulkan_xlib_xrandr.h
Both were imported in 205c271562 ("vulkan: Update the XML and
headers to 1.1.70") with identical contents (ie. the
VK_EXT_acquire_xlib_display extension), but the former was never
included anywhere and can't be found upstream [1], while the latter is
included in vulkan.h and found upstream.
[1] https://github.com/KhronosGroup/Vulkan-Headers/tree/master/include/vulkan
Fixes: 205c271562 ("vulkan: Update the XML and headers to 1.1.70")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 344859c32d)
The bounds checking is actually less safe than just pushing the data.
If the bounds checking actually ever kicks in and it's not on the last
UBO push range, then the shrinking will cause all subsequent ranges to
be pushed to the wrong place in the GRF. One of the behaviors we
definitely don't want is for OOB UBO access to result in completely
unrelated UBOs returning garbage values. It's safer to just push the
UBOs as-requested. If we're really concerned about robustness, we can
emit shader code to do bounds checking which should be stupid cheap (a
CMP followed by SEL).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
A security advisory (TALOS-2019-0857/CVE-2019-5068) found that
creating shared memory regions with permission mode 0777 could allow
any user to access that memory. Several Mesa drivers use shared-
memory XImages to implement back buffers for improved performance.
This path changes the shmget() calls to use 0600 (user r/w).
Tested with legacy Xlib driver and llvmpipe.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 02c3dad0f3)
This reverts commit f30c256ec0.
See 088a2a4cab031f1505d531698109f330f94f3072
Fixes: f30c256ec0 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting
3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in
SuperTuxCart and Tropico 6 which was seen only on Haswell.
The reason for this is unknown and fix was found empirically.
The closest mention in PRM is that it should improve performance.
From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):
"When the BLEND_STATE pointer changes but not the CC_STATE pointer,
driver needs to force a CC_STATE pointer change to improve
blend performance in pixel backend."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834
Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 6f17fe0606)
Prefetch only supports the basic 2D texture case, checking is_array is
needed because 1d array textures pass the coord num_components==2 test.
Fixes: 2a0d45ae ("freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 0f5743429c)
Large programs, e.g. gnome-shell and firefox, may tax the
addressability of the Medium code model once a (potentially unbounded)
number of dynamically generated JIT-compiled shader programs are
linked in and relocated. Yet the default code model as of LLVM 8 is
Medium or even Small.
The cost of changing from Medium to Large is negligible:
- an additional 8-byte pointer stored immediately before the shader entrypoint;
- change an add-immediate (addis) instruction to a load (ld).
Testing with WebGL Conformance
(https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html)
yields clean runs with this change (and crashes without it).
Testing with glxgears shows no detectable performance difference.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1753327, 1753789, 1543572, 1747110, and 1582226
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/223
Co-authored by: Nemanja Ivanovic <nemanjai@ca.ibm.com>, Tom Stellard <tstellar@redhat.com>
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
(cherry picked from commit 9c3be6d21f)
Conflicts resolved Dylan (PIPE_ARCH -> UTIL_ARCH rename)
This reverts commit 7520478461.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 1a093a06d6)
This reverts commit 1d1b457821.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 7951eb146c)
This reverts commit 1d122c104a.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 25f596e6ba)
This reverts commit 34b1aa957a.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit ff05f16c99)
This reverts commit c1c574fdf1.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit e64b91e34a)
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.
This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
This can also be verified by simply changing fix_byte_src() to apply
on all platforms.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit eb6352162d)
This was causing uninitialized value to end up propagated to the
3DSTATE_DEPTH_BOUNDS packet, leading to asserts on packet
building due to the value being greater than 1.
Fixes: 939ddccb7a ("anv: Add support for depth bounds testing.")
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
(cherry picked from commit 0aaf47f7cd)
This prevents some additional optimizations that would change the
original result. This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a. Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.
This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958. However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false. If they had left
the calls to isnan() alone, everything would have just worked.
No shader-db changes on any Intel platform.
I also tried marking the comparison generated by the isnan() function
precise. The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().
I also considered adding a new ir_unop_isnan opcode that would implement
the functionality. During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).
This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.
Fixes: d55835b8bd ("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 9be4a422a0)
For pre-fs-dispatch texture fetch, we need to assign bary_ij to r0.x,
even if it is not used in the shader (ie. only varying use is for tex
coords). But if, for example, gl_FragCoord is used, it could get
assigned on top of bary_ij, resulting in a GPU hang.
The solution to this is two-fold: (1) the inputs/outputs rework has the
benefit of making RA realize bary_ij is a vec2, even if there are no
split/collect instructions (due to no varying fetches in the shader
itself). And (2) extend the live ranges of meta:input instructions to
the first non-input, to prevent RA from assigning the same register to
multiple inputs.
Backport note: because of (1) above, a better solution for 19.3 would be
to revert f30c256ec0.
Fixes: f30c256ec0 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b22617fb57)
This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.
v2: Found a second occurence
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 935f8f0e56)
We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.
Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!
v2: drop redundant fd = -1 (Jason)
v3: Update commit message (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 048f0690ee)
The change made in 88d665830f ("mesa: check draw buffer completeness
on glClearBufferfi/glClearBufferiv") correctly updated the state prior
to checking the framebuffer completeness on glClearBufferiv but not in
glClearBufferfi.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 88d665830f ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/2072
(cherry picked from commit f93bb90302)
Move differences in eglextchromium.h header file, then provide the same header than libglvnd-1.2
So program that omit to include eglextchromium.h will fail to build with both mesa and libglvnd headers.
Fixes: a0a8109f "include: add the definition of EGL_EXT_image_flush_external"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 57acf921e2)
When using packed vulkan-formats on little-endian systems, we need to
swap the components for the gallium formats. And since Zink isn't
big-endian safe yet, little-endian is the only endianess we care about
right now.
This fixes a bunch of piglit tests, amongs others:
- spec@arb_depth_texture@depth-level-clamp
- spec@arb_depth_texture@depthstencil-render-miplevels * d=z24
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
(cherry picked from commit b4d47e21d7)
The stage specific fields of shader_info are in an union. We've
likely been lucky that this value was either overwritten or ignored by
other stages. The recent change in shader_info layout in commit
84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
made this issue visible.
Fixes: cf2257069c ("nir/spirv: Set a default number of invocations for geometry shaders")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 087ecd9ca5)
It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 911a826141)
The workaround got accidentally moved to the wrong place
Fixes: 08d510010b aco: increase accuracy of SGPR limits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit a47e232ccd)
This fix wrong color when playing video under Android + virgl
configuration.
Fixes: 2decad495f ("gallium/dri2: Support images with multiple planes for modifiers")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Lepton Wu <lepton@chromium.org>
(cherry picked from commit 5a40e153fd)
We don't support nir_texop_txd, which is required by this cap. So let's
disable it for now.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
(cherry picked from commit b385ad0c75)
We do not support them yet, so let's not pretend.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
(cherry picked from commit a32a92f53a)
There's no good way to know if a texture-view will be created, so we
just have to accept it for all resources.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
(cherry picked from commit ca87a53b46)
We should use the format derived from the image-view here, not from the
image itselt. Otherwise, we'll end up with incompatible render-passes.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
(cherry picked from commit f3a72fd61c)
Use unsigned values otherwise signed extension will produce a 64 bits value where
the 32 left-most bits are 1.
Fixes: 307e5cc8fd ("radeonsi: tell the shader disk cache what IR is used")
Otherwise if glvnd is not installed systemwide, but only in a prefix,
it's headers wont be found. This happens because if it's headers are in
/usr/include/ then another dependence will provide the necessary -I
arguments and compilation will work.
Fixes: 035ec7a2bb
("meson: Add support for EGL glvnd")
Acked-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 5d085ad052)
Commit 5847de6e9a implemented a restriction that applies to ICL, but
wrongly marked it as also applying to GLK. Reviewers or MR !1125
pointed this, and the commit history shows removal of GLK to parts of
the patch, but it turns there was still a left-over GLK check in the
code.
This code was breaking some of the i8vec2 tests on GLK, for example:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
Removing the GLK check solves the issue for GLK. I don't see a reason
on why implementing this restriction would actually break GLK, so
there's still more to investigate here since this bug may be affecting
ICL+, but let's apply the real GLK fix while we analyze and discuss
the other possible issues.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1
on ICL")
BSpec: 3017
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit b57383a944)
The host query reset entry point didn't use the availability offset
for performance queries.
To fix this, reorder the availability of performance queries to match
other queries.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2b5f30b1d9 ("anv: implement VK_INTEL_performance_query")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit ee6fbb95a7)
In 2ca0d913ea, we began updating cso_fb->layers to the actual layer
count, rather than 0. This fixed cases where we were setting "Force
Zero RTA Index Enable" even when doing layered rendering. Sadly, it
also broke the check entirely: cso_fb->layers is now 1 for non-layered
cases, but the Force Zero RTA Index check was still comparing for 0.
Fixes: 2ca0d913ea ("iris: Fix framebuffer layer count")
(cherry picked from commit fc7b748086)
Python has the identity operator `is`, and the equality operator `==`.
Using `is` with strings sometimes works in CPython due to optimizations
(they have some kind of cache), but it may not always work.
Fixes: 96c4b135e3
("nir/algebraic: Don't put quotes around floating point literals")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 717606f9f3)
If an app first creates a compute pipeline with
VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT set, then re-compile it
without that flag, the driver should re-compile the compute shader.
Otherwise, it will return the unoptimized one.
Fixes: ce188813bf ("radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9ab27647ff)
The seccomp filter allows read/write, let us make sure nobody can
do anything with this.
Fixes: cff53da374 "radv: enable secure compile support"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 8efb8f55a6)
When switching this to dynamic state, I forgot that this also needs to
be emitted when we use a polygon-mode set to lines.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 6d30abb4f1 ("zink: use dynamic state for line-width")
(cherry picked from commit b7674829a1)
Build failure reported by i965 CI, triggered by building dynamic
pipeloaders with kmsro drivers (besides 'frost). At this point, there's
no reason to actually do that -- mesa CI didn't mind -- but let's not
break the build.
v2: Simplify script. Add extra dependencies for v3d.
Fixes: afb0d08cb0 ("pipe-loader: Default to kmsro if probe fails")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
(cherry picked from commit bf15318991)
Otherwise relocations just up and crash.
Fixes: a3153162a9 "anv: Delay allocation of relocation lists"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9ef198c59a)
Got some int->pointer warnings and 20 is not a valid pointer ....
Fixes: 2e3a635ee6 "radv: Add an early exit in the secure compile if we already have the cache entries."
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 6ced684e27)
Until 8bef4df196 the IR (TGSI or NIR) was used in disk_cache driver_flags.
This commit restores this features to avoid crashing when switching from
one IR to the other.
As radeonsi's default is TGSI, I used "driver_flags & 0x8000000 = 0" for TGSI
to keep the same driver_flags.
Fixes: 8bef4df196 ("radeonsi: add si_debug_options for convenient adding/removing of options")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 2afeed3010)
Fixes the following building error:
external/mesa/src/amd/compiler/aco_spill.cpp:1768:
error: undefined reference to 'aco::lower_to_cssa(aco::Program*, aco::live&, radv_nir_compiler_options const*)'
Fixes: 0b8216b ("aco: Lower to CSSA")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
(cherry picked from commit d688e4166c)
When starting a BLORP operation, we do the BTI-change flush. However,
when ending it and transitioning back to regular drawing, we change the
render target again - without a set_framebuffer_state() call. We need
to do the BTI flush there too. BLORP flags IRIS_DIRTY_RENDER_BUFFER
now, which will cause the next draw to get the BTI flush again.
(explanation of fix by Ken)
Fixes: 2b956a093a ("iris: totally untested icelake support")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bb0c5c487e)
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too. When it didn't find
it, the script would fail with:
Traceback (most recent call last):
File "./report.py", line 351, in <module>
main()
File "./report.py", line 182, in main
before_count = before[p][m]
KeyError: 'sends'
Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 7b3f38ef69)
libdrm returns -errno instead of directly the ioctl ret of -1.
Fixes: 1c3cda7d27 "radv: Add syncobj signal/reset/wait to winsys."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ec770085c2)
Observed an issue when looking at the code generatedy by the
image-vertex-attrib-input-output piglit test. Even though the test
itself worked fine (due to TIC 0 being used for the image), this needs
to be fixed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1b9d1e13d8)
Unfortuantely we don't know if a particular load is a real 2d image (as
would be a cube face or 2d array element), or a layer of a 3d image.
Since we pass in the TIC reference, the instruction's type has to match
what's in the TIC (experimentally). In order to properly support
bindless images, this also can't be done by looking at the current
bindings and generating appropriate code.
As a result all plain 2d loads are converted into a pair of 2d/3d loads,
with appropriate predicates to ensure only one of those actually
executes, and the values are all merged in.
This goes somewhat against the current flow, so for GM107 we do the OOB
handling directly in the surface processing logic. Perhaps the other
gens should do something similar, but that is left to another change.
This fixes dEQP tests like image_load_store.3d.*_single_layer and GL-CTS
tests like shader_image_load_store.non-layered_binding without breaking
anything else.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "20.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 869e32593a)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.