The commit caused a regression in i965 (and possibly others) since it
didn't implement v4 of DRI2's flush extension.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Addressed an earlier commit [0567ab0407] which did not land in
branch. This will be backported with a stable specific patch.
Signed-off-by: Andres Gomez <agomez@igalia.com>
MSVC has been including a xtime definition in thr/xtimec.h ever since
MSVC 2013 (which is the minimum we require for building Mesa), and
including it prevents duplicate definitions when it gets included by
LLVM.
In fact, it looks that MSVC has been including a partial C11 threads
implementation too for some time, which we should consider migrating to
once we eliminate the use of _MTX_INITIALIZER_NP in our tree.
Thanks to the anonymous helper from
https://bugs.freedesktop.org/show_bug.cgi?id=100201#c4 for spotting
this.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100201
CC: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ecfafdcbf5)
The programming note that says we need to do this still exists in the
SkyLake PRM and, from looking at the bspec, seems like it may apply to
all hardware generations SNB+. Unfortunately, this isn't particularly
clear cut since there is also language in the bspec that says you can
skip the flushing and stall to get better throughput. Experimentation
with the "Car Chase" benchmark in GL seems to indicate that some form of
flushing is still needed. This commit makes us do the full set of
flushes regardless of hardware generation. We can always reduce the
flushing later.
Reported-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6baae9625d)
A bunch of code was indented in such a way that it looked like it went
with the if statement above but it definitely didn't.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0fe3dcce4c)
This fixes rendering issues in the Vulkan port of skia on some hardware.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 01a65dc43b)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/intel/vulkan/genX_cmd_buffer.c
When using an overlayfs system (like a Docker container), rmrf_local()
fails because part of the files to be removed are in different mount
points (layouts). And thus cache-test fails.
Letting crossing mount points is not a big problem, specially because
this is just for a test, not to be used in real code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit caa616ccc4)
While it's legal to have an active blocks count > 0 on link failure.
Unless we actually assign memory for the blocks array we can end up
segfaulting in calls such as glUniformBlockBinding().
To avoid having to NULL check these api calls we simply reset the
block count to 0 if the array was not created.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit bf15b2b515)
It reads @ writes the DB cache, and we haven't flushed dst caches yet,
so DB cache may be stale. Also the user might be shader read (and probably is),
so also flush after.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
(cherry picked from commit a8c51b1cd9)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/amd/vulkan/radv_cmd_buffer.c
Previously we would just escape the loop and move everything
following the loop inside the if to the else branch of a new if
with a return flag conditional. However everything outside the
if the loop was nested in would still get executed.
Adding a new return to the then branch of the new if fixes this
and we just let a follow pass clean it up if needed.
Fixes:
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop.shader_test
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop2.shader_test
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit c1096b7f1d)
SEL can only convert between a few integer types, which we basically
never do.
Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 7dccd38b40)
We should use anv_get_layerCount() to access layerCount of VkImageSub-
resourceRange in anv_CmdClearColorImage and anv_CmdClearDepthStencil-
Image, which handles the VK_REMAINING_ARRAY_LAYERS (~0) case.
Test: Sample multithreadcmdbuf from LunarG can run without crash
Signed-off-by: Xu Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 004468de14)
Surfaces and Volumes can be freed in the worker thread.
Without this patch, pending_uploads_counter could be non-zero
in the Surfaces or Volumes dtor, leading to deadlock.
Instead decrease properly the counter before releasing the
item.
Also avoid another potential deadlock if the item is not
properly unlocked: Do not call UnlockRect which will cause deadlock,
but free directly using the deadlock safe
nine_context_get_pipe_multithread.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99246
CC: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: James Harvey <lothmordor@gmail.com>
(cherry picked from commit bd85bb51c7)
Otherwise blitter would still hold a ref to, for example, sampler-
views.
To reproduce:
glmark2 -b desktop:duration=2 --run-forever
Fixes: a8e6734 ("freedreno: support for using generic clear path")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit c03f6f12bb)
Function::getArgumentList() doesn't exist anymore, switch to using
arg_begin() (existed back to at least llvm-3.6.0).
Reviewed-by: Vedran Miletić <vedran@miletic.net>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 08f864abd9)
primcount must be a GLsizei as in the signature for MultiDrawElements
or bad things can happen.
Furthermore, an error should be flagged when primcount is negative.
Curiously, this code used to work somewhat correctly even when primcount
was negative, because the loop that checks count[i] would iterate out of
bounds and almost certainly hit a negative value at some point.
Found by an ASAN error in
GL45-CTS.gtf32.GL3Tests.draw_elements_base_vertex.draw_elements_base_vertex_primcount
Note that the OpenGL spec seems to have s/primcount/drawcount/ at some
point, and the code still reflects the old language.
v2: provide the correct spec quotes (pointed out by Ian)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit c11dcfb5e9)
In commit d2590eb65f I enabled GL 4.5
on Haswell...but failed to check if we could do indirect compute
shader dispatch...and query buffer objects.
Indirect compute shader dispatch requires command parser version 5
(kernel commit 7b9748cb513a6bef4af87b79f0da3ff7e8b56cd8, which is in
Linux v4.4). On earlier kernels we would have disabled
ARB_compute_shader, which is a mandatory part of OpenGL 4.3+.
Query buffer objects currently require MI_MATH and MI_LOAD_REGISTER_REG,
which mean command parser version 7 (Linux v4.8). On earlier kernels
we would have disabled ARB_query_buffer_object, which is a mandatory
part of OpenGL 4.4+.
The new version support looks like:
- Kernel 4.1 and older => OpenGL 3.3
- Kernel 4.2-4.3 => OpenGL 4.2
- Kernel 4.4-4.7 => OpenGL 4.3
- Kernel 4.8+ => OpenGL 4.5
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
(cherry picked from commit 9b324e4dca)
The crash is due to NULL pColorBlendState, which is legal if the
pipeline has rasterization disabled or if the subpass of the render pass
the pipeline is created against does not use any color attachments.
Test: Sample subpasses from LunarG can run without crash
Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 57595cb073)
This was meant to be checking the index type to get the correct
index not the last emitted one. This fixes:
dEQP-VK.pipeline.input_assembly.primitive_restart.index_type_uint32.triangle_strip_with_adjacency
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d06e168b87)
Helps mainly Feral-ported games, due to their use of fma()
shader-db changes:
total instructions in shared programs : 3901147 -> 3842505 (-1.50%)
total gprs used in shared programs : 471258 -> 467359 (-0.83%)
total local used in shared programs : 27405 -> 27361 (-0.16%)
total bytes used in shared programs : 35749888 -> 35214176 (-1.50%)
local gpr inst bytes
helped 17 1829 4091 4091
hurt 4 44 3 3
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 09f16de7e6)
The Vulkan spec is fairly clear about when we should and should not
write query pool results. We're also supposed to return VK_NOT_READY if
VK_QUERY_RESULT_PARTIAL_BIT is not set and we come across any queries
which are not yet finished. This fixes rendering corruptions on The
Talos Principle where geometry flickers in and out due to bogus query
results being returned by the driver. These issues are most noticable
on Sky Lake GT4 2hen running on "ultra" settings.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100182
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 08df015b9d)
[Andres Gomez: use anv_query.c instead of genX_query.c]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/intel/vulkan/genX_query.c
This reverts commit cce43f6d8c.
Redundant, as the flush already happens at si_cp_dma_prepare.
Acked-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ad4dee521d)
The index passed to get_shared_memory_ptr is an attribute slot index,
i.e. the index of a vec4 within LDS. Therefore this must be scaled by
sizeof(vec4) to give the LDS byte offset.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ce4058dafd)
Avoid a buffer overflow in ac_nir_to_llvm.c's create_function when
using more than 4 descriptor sets. radv claims support for 8.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit e88cac1df0)
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.
Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.
This also fixes a rendering issue with the speedometer in Dirt Rally.
More information can be found here https://reviews.llvm.org/D26348.
Thanks to Dave Airlie for the patch.
v2: - add a FIXME comment
- use if (HAVE_LLVM >= 0x0400) instead
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 7751ed39e4)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).
This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e0cc32b85b)
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass. Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 273b720310)
When binding as textures, the alignment can be 16. However when binding
as an image, the address has to be aligned to 256. (Also when binding as
an RT, but that can't happen with GL or current gallium APIs.)
Reported-by: Roy Spliet <nouveau@spliet.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 32dd8d59b6)
surf_usage is only useful to image views that may use HiZ buffers.
Storage image views don't use HiZ buffers.
v2: Update commit message and add an assertion.
Fixes: 055ff2ec52 ("anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ")
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 258af3a856)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/intel/vulkan/anv_image.c
Ported from radeonsi, pointed out by Tom.
"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b8ee70384a)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/amd/common/ac_nir_to_llvm.c
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards. It appears
that the same is needed for fast-clears as well. This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense. The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6b644e571e)
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things. Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5e44ef4a76)
The dynamic_offset_offset in the descriptor set binding layout is
relative to the dynamic_offset_start for the set in the pipeline
layout.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 162beb2abb)
If we have any pending flushes on the primary command buffer, these
must be performed before executing the secondary buffer.
This fixes potential corruption when the contents of a subpass which
clears any of its render targets are given in a secondary buffer: the
flushes after a fast clear would not have been performed until the
vkCmdEndRenderPass call.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 290d7e892d)
See detailed explanation of why this is needed in commit eb60a89bc3.
This spot was missed/overlooked. Basically as a result of the fact
that BEGIN_* ends up calling PUSH_SPACE, which in turn adds an extra 8
to the requested amount, we have to be mindful of that when doing bare
nouveau_pushbuf_space calls.
Reportedly this fixes some crashes when replaying a hitman trace taken
on radeonsi.
Fixes: eb60a89bc3 ("nouveau: take extra push space into account for pushbuf_space calls")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 8e6d67685e)
The header of ralloc needs to be aligned, because the compiler assumes
that malloc returns will be aligned to 8/16 bytes depending on the
platform, leading to degraded performance or alignment faults with ralloc.
Fixes SIGBUS on Raspberry Pi at high optimization levels.
This patch is not perfect for MSVC, as maybe in the future the alignment
for the most demanding data type might change to more than 8.
v2: Commit message reword/typo fix, and add a bigger explanation in the
code (by anholt)
Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit cd2b55e536)
Squashed with
ralloc: don't leave out the alignment factor
Experimentation shows that without alignment factor gcc and clang choose
a factor of 16 even on IA-32, which doesn't match what malloc() uses (8).
The problem is it makes gcc assume the pointer is 16 byte aligned, so
with -O3 it starts using aligned SSE instructions that later fault,
so always specify a suitable alignment factor.
Cc: Jonas Pfeil <pfeiljonas@gmx.de>
Fixes: cd2b55e5 "ralloc: Make sure ralloc() allocations match malloc()'s alignment."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100049
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Tested by: Mike Lothian <mike@fireburn.co.uk>
Tested by: Jonas Pfeil <pfeiljonas@gmx.de>
(cherry picked from commit ff494fe999)
Even though compute shaders cannot access the framebuffer, there is a
synchronization issue when a compute dispatch accesses a texture that
was previously bound and drawn to as a framebuffer.
Section 9.3 (Feedback Loops Between Textures and the Framebuffer) of
the OpenGL 4.5 spec rather implicitly clarifies that undefined behavior
results if the texture is still attached to the currently bound
framebuffer. However, the feedback loop is broken when the application
changes the framebuffer binding before a compute dispatch, and the
state tracker needs to let the driver known about this.
Fixes GL45-CTS.compute_shader.pipeline-post-fs on SI family Radeons.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 40c77bbf83)
exec_node::get_prev() does not guard against going past the beginning
of the list, so we need to add explicit checks here.
Found by ASAN in piglit arb_shader_storage_buffer_object-rendering.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 911391bd70)
When generating the MOV INDIRECT instruction, the source type is ignored
and it is set to destination's type. However, this is going to change in a
later patch, so we need to explicitly set the proper source type.
brw_vec8_grf() creates an float type's fs_reg by default, when the
ICP handle is actually unsigned. This patch fixes these cases before
applying the aforementioned patch.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit d8122128bc)
The lowered BSW/BXT indirect move instructions had incorrect
source types, which luckily wasn't causing incorrect assembly to be
generated due to the bug fixed in the next patch, but would have
confused the remaining back-end IR infrastructure due to the mismatch
between the IR source types and the emitted machine code.
v2:
- Improve commit log (Curro)
- Fix read_size (Curro)
- Fix DF uniform array detection in assign_constant_locations() when
it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
v3:
- Move changes in assign_constant_locations() to other patch.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 56266df7ed)
Previously, if we had accesses with different sizes to the same uniform, we might not
push it aligned with the bigger one. This is a problem in BSW/BXT when we access
an array of DF uniform with both direct and indirect addressing because for the latter
we use 32-bit MOV INDIRECT instructions. However this problem can happen with other
generations and bitsizes.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit a497ab6838)
Not all clear colors are valid. In particular, on Broadwell and
earlier, only 0/1 colors are allowed in surface state. No CTS tests are
affected outright by this because, apparently, the CTS coverage for
different clear colors is pretty terrible. However, when multisample
compression is enabled, we do hit it with CTS tests and this commit
prevents regressions when enabling MCS on Broadwell and earlier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 42b10b175d)
The wl_drm interface (akin to X11's DRI2) uses the standard set of DRM
FourCC format codes. wl_shm copies this, except for ARGB8888/XRGB8888,
which use their own definitions.
Make sure we only use wl_shm format codes when we're working with
wl_shm. Otherwise, using swrast with 32bpp formats would fail with an
error.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Daniel Stone <daniels@collabora.com> (v1)
Fixes: cb5e799448 ("egl/wayland: unify dri2_wl_create_surface implementations")
v2: [Emil Velikov: move to dri2_wl_create_window_surface]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (IRC)
(cherry picked from commit a1727aa75e)
Drop all -m*, -W*, -O*, -g* and -f* flags, with the exception of
-fno-rtti, which must be used if it's part of the llvm-config --cxxflags
output. We don't want LLVM to dictate the flags we use, and it can even
cause build failures, e.g. if LLVM and Mesa are built with different
compilers.
While we're at it, eat any whitespace preceding dropped flags as well.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 0f53404565)
Nominated-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/100028
If i-th thread could not be created it means we have i threads,
not i+1, because we start from 0.
Fixes: 404d0d5 "gallium/u_queue: add an option to have multiple worker threads"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 7f268cf12b)
Commit 4aea8fe ("gallium/u_queue: fix random crashes when the app calls
exit()") added a atexit handler which calls
util_queue_killall_and_wait() for each queue to stop the threads.
However the app is also free to use atexit handlers to clean up things,
leading to util_queue_destroy() call which will also call
util_queue_killall_and_wait() for the same queue again, causing threads
being joined twice, and that is undefined. This happens with libglut,
for example. A simple fix is to just set num_threads to 0 as there are
no more valid threads after util_queue_killall_and_wait() returns.
Fixes: 4aea8fe "gallium/u_queue: fix random crashes when the app calls exit()"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 9936121935)
For blitting we need to use the depth or stencil format, never
the combined.
This fixes:
dEQP-VK.texture.shadow.2d.nearest.less_or_equal_d32_sfloat_s8_uint
and a few others.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 800b82ea13)
Per spec, VK_QUERY_RESULT_64_BIT specifies the integer size and the
availability flag is an integer. We apparently handled this correctly
already for the copy to buffer case.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 43d833ae97)
PKT3_OCCLUSION_QUERY hangs when used in a nested IB. This only
calls it when in a primary command buffer and we change
GetQueryPoolResults to not need it. CmdCopyQueryPoolResults
still needs it so we break that behavior for secondary command buffers.
However, that would hang already and using an unitialized value is
better than a hang.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8ea34a98c0)
Otherwise the configuration fails when building independant libs
like vdpau, vaapi or omx
Fixes: 1ac40173c2 ("configure.ac: simplify EGL requirements for
drivers dependent on EGL")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5398d006de)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
configure.ac
generated-sources-dir-for macro replaces intermediates-dir-for
and LOCAL_MODULE_CLASS is defined as required by new macro,
in order to avoid the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_debug.c:29:10: fatal error: 'sid_tables.h' file not found
^
1 error generated.
Fixes: 730574c58e ("android: ac/debug: move sid_tables.h generation and
IB decode to amd/common")
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 933988901a)
Earlier changes introduced is_ycrcb flag which checks the component
order of u and v components. Condition for setting the flag was
incorrect, with ycrcb we are supposed to have cr before cb.
This patch (together with a fix in our gralloc) fixes corrupted
rendering from 'test-opengl-gl2_yuvtex' native test and corrupted
gallery thumbnail in application switcher on Android-IA.
Fixes: 51727b1cf5
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
(cherry picked from commit 0a2dcd3a8a)
The get_variable_being_redeclared() function can free 'var' because
a re-declaration of an unsized array variable can establish the size, so
we set the array type to the 'earlier' declaration and free 'var' as it is
not needed anymore.
However, the same 'var' is referenced later in ast_declarator_list::hir().
This patch fixes it by picking the ir_variable_mode from the proper
ir_variable.
This error was detected by Address Sanitizer.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99677
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a73a618933)
This fixes:
vdpauinfo: ../lib/CodeGen/TargetPassConfig.cpp:579: virtual void
llvm::TargetPassConfig::addMachinePasses(): Assertion `TPI && IPI &&
"Pass ID not registered!"' failed.
v2: use list_head, switch the call order in destroy
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 4aea8fe7e0)
This validation was added before the etnaviv drm driver landed in
the linux kernel. Due some pre-merge API changes we had to fix-up
this value but with a mainline kernel this is not a problem anymore.
Lets remove that validation which also gets rid of problem caught
by Coverity, reported to me by imirkin.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit e8d600710c)
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.
Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 066a117be7)
Also handle the GL_ARB_indirect_parameters case where the count itself
is in a buffer.
Use transfers rather than mapping the buffers directly. This anticipates
the possibility that the buffers are sparse (once ARB_sparse_buffer is
implemented), in which case they cannot be mapped directly.
Fixes GL45-CTS.gtf43.GL3Tests.multi_draw_indirect.multi_draw_indirect_type
on <= CIK.
v2:
- unmap the indirect buffer correctly
- handle the corner case where we have indirect draws, but all of them
have count 0.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
(cherry picked from commit 6a1d9684f4)
If llvm::sys::getHostCPUName() returns "generic", override
it with "pwr8" (on PPC64LE).
This is a work-around for a bug in LLVM: a table entry for "POWER8NVL"
is missing, resulting in (big-endian) "generic" being returned on
little-endian Power8NVL systems. The result is that code that
attempts to load the least significant 32 bits of a 64-bit quantity in
memory loads the wrong half.
This omission should be fixed in the next version of LLVM (4.0),
but this work-around should be left in place in case some
future version of POWER<n> also ends up unrepresented in LLVM's table.
This workaround fixes failures in the Piglit arb_gpu_shader_fp64 conversion
tests on POWER8NVL processors.
(V4: add similar comment in the code.)
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Cc: 12.0 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b934aae364)
Allocating huge buffers in VRAM is not a problem, but when those buffers
start being migrated, the kernel runs into errors because it cannot split
those buffer up for moving through GTT.
This should fix intermittent failures of
GL45-CTS.texture_buffer.texture_buffer_max_size
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 550125e1e7)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
start can only be non-zero with MultiDrawElements, which is unlikely
to occur with UNSIGNED_BYTE indices.
v2: Also fix the util_shorten_ubyte_elts_to_userptr call.
Tested with the new piglit.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit a264fee624)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeonsi/si_state_draw.c
We only use the freed ones after all free space has been used. If
the app only allocates small descriptor sets, we might go over
max_sets before the memory is full.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79
(cherry picked from commit f448701622)
We can only do the optimization if the source *is* SSA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a4393bd97f)
Squashed with commit:
i965/fs: Remove the inline pack_double_2x32 optimization
It's broken in a number of ways. In particular, a bunch of the
conditions are backwards so it doesn't actually detect what it's
supposed to detect. Since it's been broken, it hasn't actually been
helping anything so just deleting it isn't a regression.
This (and removing another optimization) were done on master in commit
b073811617.
Cc: "Kenneth Grunke" <kenneth@whitecape.org>
Cc: "Mark Janes" <mark.a.janes@intel.com>
[Emil Velikov: patch is a backport of the below "cherry pick"]
Fixes: a4393bd97f ("i965/fs: Fix the inline nir_op_pack_double optimization")
(cherry picked from commit b073811617)
The currently used range HEAD..origin/master is far too broad. It looks
for nominations within the already_landed list (branchpoint..HEAD).
Similarly we look for already_landed whiting the [possible] nominations
Rand branchpoint..origin/master.
Improve things by limiting the look ups to the branch point.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit d292f12d94)
Currently we loop (git log --grep) to check if the fix has landed. We
can simplify and make things faster by storing the already_picked list
and grep ping through it.
Slim down the message while we're here.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 71e00d62ed)
vkQueuePresentKHR() takes VkPresentInfoKHR pointer and includes a
pResults fields which must holds the results of all the images
requested to be presented. Currently we're not filling this field.
Also as a side effect we probably want to go through all the images
rather than stopping on the first error.
This commit also makes the QueuePresentKHR() implementation return the
first error encountered.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0fcb92c17d)
Commit 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
fixed the sorting of the array initializers in g_glxglvnddispatchfuncs.c
because FindGLXFunction's binary search needs these to be sorted
alphabetically.
That commit also mostly fixed the sorting of the DI_foo defines in
g_glxglvnddispatchindices.h, which is what actually matters as the
arrays are initialized using "[DI_foo] = glXfoo," but a small error
crept in which at least causes glXGetVisualFromFBConfigSGIX to not
resolve, breaking games such as "The Binding of Isaac: Rebirth" and
"Crypt of the NecroDancer" from Steam not working and possible causes
other problems too.
This commit fixes the last of the sorting errors, fixing these mentioned
games not working.
Fixes: 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 4c66f529a8)
Even though we supported both coherent and non-coherent memory types, we
effectively forced apps to use the coherent types by accident. Found by
inspection, only compile tested.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6319bfc2a6)
It's trivial to swizzle clear colors on the CPU, easily deals with the
hardware restrictions for render target swizzles, and makes swizzled
clears work on all hardware as opposed to just HSW+.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e233db6e93)
Now that we have OES_tessellation_shader, the same situation can occur
in ES too, not just GL core profile.
Having a TCS but no TES may confuse drivers - i965 crashes, for example.
This prevents regressions in
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
with some SSO pipeline validation changes I'm making.
v2: Add an ES spec citation (suggested by Alejandro)
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 05a56893aa)
From GLSL ES 3.10 spec, section 4.1.9 "Arrays":
"If an array is declared as the last member of a shader storage block
and the size is not specified at compile-time, it is sized at run-time.
In all other cases, arrays are sized only at compile-time."
In desktop GLSL it is allowed to have unsized-arrays that are
not last, as long as we can determine that they are implicitly
sized, which is detected at link-time.
With this patch Mesa reports a compilation error as glslang does with
the following shader:
buffer SSBO { vec4 data[]; vec4 moreData;};
void main (void)
{
}
Fixes:
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 5bc222ebaf)
The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.
Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b230939303)
Reenable the PPC64LE Vector-Scalar Extension for LLVM versions >= 3.8.1,
now that LLVM bug 26775 and its corollary, 25503, are fixed.
Amendment: remove extraneous spaces in macro def & invocations.
We would prefer a runtime check, e.g. via an LLVMQueryString
(analogous to glGetString, eglQueryString) or LLVMGetVersion API,
but no such API exists at this time.
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
[Emil Velikov: remove LLVM_VERSION macro]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 3f1b6ef2aa)
Earlier refactoring commits changed from one, dare I say it, broken
behaviour to another. Namely:
Before, as you explicitly --enable-gallium-llvm your selection was
ignored when llvm-config was not present/detected.
Today, the "auto" heuristics enables gallium llvm regardless if you have
llvm/llvm-config available or not.
Rework the auto-detection to attribute for llvm's presence.
v2: Set enable_gallium_llvm=no when LLVM is not found.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit d4840c0c26)
Already implicitly handled throughout, but keep it clear and disable
gallium-llvm. This change should be a no-op.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
(cherry picked from commit ce65cc1f1f)
Earlier refactoring commits started setting the above regardless if LLVM
is used or not. Move them to the respective section to restore the
original functionality.
Since we require the preprocessor flags (includes in particular) for the
header version parsing keep those as-is. They are not used outside of
configure.ac thus should not cause any side-effects.
As-is adding the C/CXXFLAGS can lead to build issues on when
cross-compiling.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
(cherry picked from commit 4d8bb9cf8c)
Conflicts:
configure.ac
Set FOUND_LLVM only when LLVM is present (checking for exact version/etc
is deferred) and use enable-gallium-llvm to indicate the global LLVM
status.
Renaming the latter is not appropriate for stable patches, so we'll
address it with a later commit.
Loosely based on work by Tobias.
v2: Check FOUND_LLVM if enable_gallium_llvm is set.
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
(cherry picked from commit 04377cbdcf)
... to where it's applicable.
Since we effectively made --enable-gallium-llvm mean --enable-llvm with
earlier commits, we need to move the requirement to guard the compnents
added for the LLVM draw.
Otherwise we'll error (as below) when building RADV w/o gallium drivers.
configure: error: --enable-gallium-llvm is required when building radv
v2: Don't remove but move the dependency (Tobias).
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
(cherry picked from commit 5869a7db75)
With this change we effectively require --enable-gallium-llvm when
building RADV. This should be perfectly safe since the gallium radeonsi
driver already explicitly requires it.
The "gallium" part in --enable-gallium-llvm is about to be removed soon
(not in stable), but until then make sure that things can build.
To reflect the requirement (as opposed to check previously) we rename
llvm_check_version_for to llvm_require_version
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
(cherry picked from commit a66ffcd736)
Drop the gallium prefix since we're about it use it throughout the
configure.
Note we do want to check for enable_gallium_llvm check since (as
explicitly requested) the toggle should mean --enable-llvm. Latter of
which to be resolved with later patches.
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
(cherry picked from commit 514a494415)
This is actually not needed because the version is checked later.
Around line 2380
if test "x$enable_gallium_llvm" == "xyes"; then
llvm_check_version_for $LLVM_REQUIRED_GALLIUM "gallium"
llvm_add_default_components "gallium"
fi
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
v2: [Emil Velikov: rebase/respin series order]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit f64d4d82bd)
We want cached GTT for all non-persistent read mappings.
Set level = 0 on purpose.
Use dma_copy, because resource_copy_region causes a failure in the PBO
read of piglit/getteximage-luminance.
If Rocket League used the READ flag, it should get cached GTT.
v2: mask out UNSYNCHRONIZED
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit d86099df0a)
The instruction has an associated label when Instruction.Label == 1,
as can be seen in ureg_emit_label() or tgsi_build_full_instruction().
This fixes dump generating extra :0 labels on conditionals, and virgl
parsing more than the expected tokens and eventually reaching "Illegal
command buffer" (when parsing more than a safety margin of 10 we
currently have).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit dc2d9b8da1)
We just increased the max UBO, so we should also increase the clamp that
we do for robustness. Similarly, as we're including the fileIndex in the
new indirect value, we should reset fileIndex to 0 so that it is not
added in a second time.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c95f821cb4)
Kepler and up unfortunately only support up to 8 constbufs. We work
around this by loading from constbufs as if they were storage buffers.
However we were not consistently applying limits to loads from these
buffers. Make sure to do the same thing we do for storage buffers.
Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1acdd62847)
Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but
it's unclear what it refers to). We need to expose an extra binding
point for the "program parameters", which means this must be 15. Remove
the last vestige of the "use c14 for immediates" idea.
Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 59ca352fc5)
As per the spec -
"The functions memoryBarrierShared() and groupMemoryBarrier() are
available only in compute shaders; the other functions are available
in all shader types."
Conform to this by adding another delegate to check for compute
shader support instead of only whether the current stage is compute
This allows some fragment shaders in Dirt Rally to compile
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 21efe2528c)
This reverts commit 0bac2551e4.
Now that we position the guardband correctly (applying translations
in addition to scaling) and made it as large (or larger) than the
render target, this shouldn't be necessary.
Now we leave guardband clipping enabled 100% of the time, like the
Windows driver does.
Fixes GL45-CTS.gtf21.GL2FixedTests.clip.clip. It tries to draw a
16384x64 rectangle, and it appears that some kind of numerical
imprecisions in the clipper result in some edge pixels going missing.
The Windows driver passes this test because of guardband clipping.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit ce8a63de6d)
Previously we disabled the guardband when the viewport was smaller than
the framebuffer on Gen6-7.5, to prevent portions of primitives from
being draw outside of the viewport. On Gen8+, we relied on the viewport
extents test to effectively scissor this away for us.
We can simply always enable scissoring instead. We already include the
viewport in the scissor rectangle, so this will effectively do the
viewport extents test for us. (The only difference is that the scissor
rectangle doesn't support sub-pixel values. I think that's okay.)
Given that the viewport extents test is essentially a second scissor,
and is enabled for basically all 3D drawing on Gen8+, it stands to
reason that scissoring is cheap. Enabling the guardband reduces the
cost of clipping, which is expensive.
The Windows driver appears to never disable guardband clipping, and
appears to use scissoring in this case. I don't know if they leave
it on universally though.
This fixes misrendering in Blender, where the "floor plane" grid lines
started rendering at wrong angles after I disabled XY clipping of line
primitives. Enabling the guardband seems to solve the issue.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99339
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit ece0e535a4)
(Patch co-authored by Jason and Ken.)
We scaled the guardband based on the viewport size, but failed to
take into account the translation portion of the viewport transform.
This meant the guardband was always centered around the origin.
We want it to be centered around the screen-space drawing area,
which is the intersection of the viewport and the render target.
At best, getting this wrong would reduce the guardband's effectiveness
in some cases. At worst, it might break things - objects outside of the
guardband are trivially rejected, so getting the guardband in the wrong
place and leaving guardband clipping enabled could cause problems.
v2: drop clamping of positive maximums.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit f3c068c5c8)
The next patch will make the guardband calculation dependent on the
transformation matrix. Instead of computing it in both atoms, just
combine them into a single atom.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 89ad7f1be6)
CMASK alignment can be greater than image data alignment, so pass
it to the app so that it knows what alignment to backing memory
should have.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit eb01b20cc4)
The GLX specification says about glXDestroyPixmap:
"The storage for the GLX pixmap will be freed when it is not current
to any client."
We're not really following this language to the letter: some of the storage
is freed immediately (in particular, the dri3_drawable, which contains both
GLXDRIdrawable and loader_dri3_drawable). So we NULL out the pointers to
that freed storage; the previous patches added the corresponding NULL-pointer
checks.
This fixes memory corruption in piglit
./bin/glx-visuals-depth/stencil -pixmap -auto
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7be0e602ed)
The GLX specification says about glXDestroyPixmap:
"The storage for the GLX pixmap will be freed when it is not current
to any client."
So arguably, functions like glXSwapIntervalMESA can be called after
glXDestroyPixmap has been called for the currently bound GLXPixmap.
In that case, the GLXDRIDrawable no longer exists, and so we just skip
those calls.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit f446f3fb33)
With a subsequent patch, we might see NULL loaderPrivates, e.g. when
a DRIdrawable is flushed whose corresponding GLXDRIdrawable was destroyed.
This resulted in a crash, since the loader vs. DRI3 drawable structures
have a non-zero offset.
Fixes glx-visuals-{depth,stencil} -pixmap
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 40c304fc06)
If we have an indirect index here we need to scale it by attribute slots
e.g. is this is vec2[256] then we get an indir_index in the 0.255 range
but the vec2 are aligned inside vec4 slots. So scale the indir index,
then extract the channels.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 106a51440d)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/amd/common/ac_nir_to_llvm.c
cs can be NULL when it comes from r600_buffer_map_sync_with_rings()
to avoid doing the same checks. It was checked for write mappings
but not for read mappings.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit af303abcdb)
This fixes
GL45-CTS.tessellation_shader.tessellation_shader_tessellation.max_in_out_attributes
on nouveau. We only support 30 patch varyings (as 2 vec4 slots end up
being used for tess level settings), but were getting 32 exposed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7d3f9ed71c)
Commit 968ffd6c86 stored the last subpass
index of all the attachments but that of the depth-stencil attachment.
This could cause depth buffers used in multiple subpasses not to be in
the requested final layout. Fix this error.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 043d92fef9)
This fixes "st/va: delay calling begin_frame until we have all parameters".
v2: call begin frame after decoder (re)creation as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
(cherry picked from commit 1338d912f5)
PresentPixmap only works if the pixmap depth matches with the
window depth, otherwise it returns a BadMatch protocol error.
Even if the depths match, the result won't look correctly
if the VDPAU RGB component order doesn't match the X11 one so
we only allow the X11 format.
For other buffers we copy them to a buffer which is send to X.
v2: only send buffers with format VDP_RGBA_FORMAT_B8G8R8A8
v3: reword commit message
v4: add comment explaining the code
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 31908d6a4a)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99637
Nominated-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Nominated-by: Michel Dänzer <michel.daenzer@amd.com> (IRC)
It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS
actually is. We know from experimentation that we need to flush the
render cache prior to emitting STATE_BASE_ADDRESS and invalidate the
texture cache afterwards. The only thing the PRM says is that, on gen8+
we're supposed to invalidate the state cache after STATE_BASE_ADDRESS
but experimentation has indicated that doing so does nothing whatsoever.
Since we don't really know, let's do just a bit more flushing in the
hopes that this won't be a problem again. In particular:
1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't
really know whether or not it's pipelined.
2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS
is a compute shader.
3) Invalidate the state and constant caches after STATE_BASE_ADDRESS
because the state may be getting cached there (we don't really know).
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 92128590bc)
We had no good reason for *not* doing this on gen7 before but we didn't
know it was needed. Recently, when trying update to Vulkan CTS version
1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear
to be STATE_BASE_ADDRESS related. This commit fixes them.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f1f9794118)
This fixes rendering of full-screen quads (and other screen-filling
geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op
on other hardware.
- It looks like SE_CLIP registers were not set at all.
I'm amazed that rendering worked without them. Emit them to
avoid issues on gc3000.
- Define constants
ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119)
ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111)
ETNA_SE_CLIP_MARGIN_RIGHT (0xffff)
ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff)
These demarcate the margin (fixp16) between the computed sizes and the
value sent to the chip. I have set these to the numbers used by the
Vivante driver for gc2000. I am not sure whether any old hardware was
relying on the old numbers, or whether those were just a guess. But if
so, these need to be moved to the _specs structure.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit 56314f5baf)
Shaders using sin/cos instructions were not working on GC3000.
The reason for this turns out to be that these chips implement sin/cos
in a different way (but using the same opcodes):
- Need their input scaled by 1/pi instead of 2/pi.
- Output an x and y component, which need to be multiplied to
get the result.
- tex_amode needs to be set to 1.
Add a new bit to the compiler specs and generate these instructions
as necessary.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit fe3bb8cdb5)
Commit 2852efcda4 moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.
Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 33e0c5d003)
Exposing rb swapped (or other swizzled) formats for rendering would
involve swizzing in the pixel shader. This is not the case at the
moment, so reject requests for creating such surfaces.
(GPUs that need an extra resolve step anyway due to multiple pixel
pipes, such as gc2000, might also do this swap in the resolve operation.
But this would be tricky to keep track of)
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit 658568941d)
Use of unsigned loop control variable with '>= 0' would lead
to infinite loop.
Reported by clang:
etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression
>= 0 is always true [-Wtautological-compare]
for (unsigned sp = c->frame_sp; sp >= 0; sp--)
~~ ^ ~
v2: Simply use the same datatype as c->frame_sp is using.
CC: <mesa-stable@lists.freedesktop.org>
Reported-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
(cherry picked from commit 82fe240a99)
This fixes a bunch of buffer related:
dEQP-VK.memory.pipeline_barrier.*
tests, that were crashing in LLVM due to this being missing.
Reviewed-by: Andres Rodriguez<andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0ecd426490)
This fixes a bug uncovered by the 17-part patch series, specifically:
"gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"
If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit a0740d59aa)
Commit 7b5878ee04 increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.
Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit a41f2527ae)
At this point, the pitch is in bytes. We haven't yet divided the pitch
by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This
means the bit 15 trick won't work.
The caller now has signed integers anyway, so just pass those through
and do the obvious check.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 02216a1ddf)
Applications may delete a shader program, create a new one, and bind it
before the next draw. With terrible luck, malloc may randomly return a
chunk of memory for the new gl_program that happened to be the exact
same pointer as our previously bound gl_program. In this case, our
logic to detect new programs in brw_upload_pipeline_state() would break:
if (brw->vertex_program != ctx->VertexProgram._Current) {
brw->vertex_program = ctx->VertexProgram._Current;
brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM;
}
Because the pointer is the same, we'd think it was the same program.
But it could be wildly different - a different stage altogether,
different sets of resources, and so on. This causes utter chaos.
As unlikely as this seems, I believe I hit this when running a subset
of the CTS in a loop, in a group of tests that churns through simple
programs, deleting and rebuilding them. Presumably malloc uses a
bucketing cache of sorts, and so freeing up a gl_program and allocating
a new one fairly quickly causes it to reuse that memory.
The result was that brw->vertex_program->info.num_ssbos claimed the
program had SSBOs, while brw->vs.base.prog_data.binding_table claimed
that there were none. This was crazy, because the binding table is
calculated from info.num_ssbos - the shader info appeared to change
between shader compile time and draw time. Careful use of watchpoints
revealed that it was being clobbered by rzalloc's memset when building
an entirely different program...
Fortunately, our 0xd0d0d0d0 canary for unused binding table entries
caused us to crash out of bounds when trying to upload SSBOs, or we
may have never discovered this heisenbug.
Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked
cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5
at 64x64 in a loop with 100 iterations.
Cc: "17.0 13.0 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 7c5629a269)
The variable replacement was unused when building w/o
ENABLE_SHADER_CACHE. Since we can mix variable declarations and code,
move it to where its used.
Fixes: 9f8dc3bf03 "utils: build sha1/disk cache only with
Android/Autoconf"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 6a5850b04a)
b3119a3 introduced a strict LLVM requirement for r300 on all
architectures and thus configure fails on architectures where LLVM is
not available or buggy.
r300 doesn't strictly require LLVM, but for performance reasons we
highly recommend LLVM usage. So require it at least on x86 and x86_64
architectures as we have done before b3119a3.
Fixes: b3119a3 ("configure.ac: Check gallium LLVM version in gallium_require_llvm")
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 1f2a890ace)
The size of the pool is slightly smaller than the size of the
structure containing the whole pool. We need to take that into account
on when setting up the internals.
Fixes a crash due to out of bound memory access in:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
v2: Drop debug traces (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c3421106ec)
When trying to blit larger tiled surfaces, the pitch can be larger than
32768 bytes, which means it won't fit in a GLshort. Passing it in will
truncate the stride to 0, which has...surprising results.
The pitch can be up to 32,768 DWords, or 128kB. We measure it in bytes,
but divide by 4 when programming it. So we need to handle values up to
131,072. Switch from GLshort to int32_t to avoid the truncation.
Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage
at widths greater than 8192.
v2: Use int32_t as negative values can be used (Jason).
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit f8f7ea508b)
SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
using a source of g127 for the single register. With a UD type, this
supposedly could read g128, which doesn't exist, causing the simulator
to get cranky. Use a UW type to avoid this.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit fcf723b647)
According to GL_KHR_vulkan_glsl, the signature of subpassLoad() is:
gvec4 subpassLoad(gsubpassInput subpass);
gvec4 subpassLoad(gsubpassInputMS subpass, int sample);
So the multisampled case always receives an explicit sample index that we
should use. The current implementation was ignoring this parameter
and using gl_SampleID value instead.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9b25769da6)
I hadn't bothered to set this bit because I figured it would just
paper over us getting the rectangle wrong. But it turns out that
there is a legitimate reason to use it, so let's do so.
The alternative would be to chop up 16k clears to multiple 8k clears,
which is pointlessly painful.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 5106df85da)
The spec section 5.2 says:
"vkAllocateCommandBuffers can be used to create multiple command
buffers. If the creation of any of those command buffers fails, the
implementation must destroy all successfully created command buffer
objects from this command, set all entries of the pCommandBuffers
array to VK_NULL_HANDLE and return the error."
Fixes:
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 25e21cb8d0)
Along the lines of what
3b804819 anv: Default PointSize to 1.0 if not written by the shader
does for anv, program a default point size in the hw of 1.0.
This preempt fixes a bunch of geom shader tests.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2ab2be092d)
If radeonsi starts compiling an optimized shader variant asynchronously
with a GL debug callback set and the application destroys the GL context,
radeonsi crashes when trying to write shader stats into the debug output
of a non-existent context after compilation, because st/mesa was destroyed
before pipe_context.
Firefox with WebGL2 enabled hits this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99456
v2: protect against a double destroy in st_create_context_priv and callers.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit d9ef549238)
OpenGL ES implementations are not allowed to ship ARB extensions, and
OpenGL implementations are not allowed to ship OES extensions.
The functionality is also included in GL_ARB_ES2_compatibility. Ever
OpenGL core-profile driver currently exposes both extensions. I don't
know of any applications that explicitly check for GL_OES_read_format,
so removing it seems very unlikely to cause problems. No functionality
is removed.
I have left this extension in place for compatibility profile. There
are still OpenGL 1.x drivers in Mesa, and adding code to check for
compatibility profile and not GL_ARB_ES2_compatibility for
GL_IMPLEMENTATION_COLOR_READ_TYPE and GL_IMPLEMENTATION_COLOR_READ_FORMAT
just feels dumb.
Three other other alternatives considered:
- Remove the string from compatibility profile drivers but leave the
functionality in place.
- Add a flag to expose the extension string, and set it in every OpenGL
driver that does not expose GL_ARB_ES2_compatibility (and those
drivers only). I tried this. You can't have two instances of an
extension in the extension table (one dummy_true for ES1 and one with
a flag for compatibility profile), so the implementation requires a
bit of effort.
- Only expose the extension in compatibility if the version is less
than 2.0. I didn't see an easy way to do this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c4a0c1efff)
we can't use the cpu implementation of fdiv, as this one uses different
lp_build_context, which causes assertion failure.
Just use default fdiv action (there is no fast rcp for doubles which we
could potentially use anyway).
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 25208949d7)
In brw_blorp_copyteximage, we use the format from the render buffer.
This could be a combined depth/stencil format. In this case, we handle
stencil properly but we give blorp the wrong ISL format. Specifically,
we would give blorp ISL_FORMAT_R32G32B32A32_FLOAT which is the wrong
size was causing GPU hangs.
Fixes: GL45-CTS.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_copyteximage
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4c180f9633)
Blits do not need any special treatment as the target buffer
object is added to render cache just as one does for normal draw.
Color clears and resolves in turn require explicit "end of pipe
synchronization". It is not clear what this means exactly but the
assumption is that render cache flush with command stream stall
should be sufficient.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 180653c357)
Current blorp logic issues unconditional "flush everything"
(see brw_emit_mi_flush()) after each render. For example, all
blits issue this unconditionally which shouldn't be needed if
they set render cache properly so that subsequent renders do
necessary flushing before drawing.
In case of piglit:
ext_framebuffer_multisample-accuracy all_samples depth_draw small
intel_hiz_exec() is always preceded by blorb blit and the
unconditional flush looks to hide the lack of stall and flushes
in depth clears. By removing the brw_emit_mi_flush() I get gpu
hangs.
This patch adds the stalls and flushes mandated by the spec
and gets rid of those hangs.
v2 (Jason, Ken): Document the rational for separating
depth cache flush and stall on Gen7.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit e6da6943fe)
Some CIK-VI docs say this is the default behavior on SI. That doesn't
answer whether it's also the default behavior on CIK-VI.
Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 573bf0940a)
Some query results struct contents are declared as cache line aligned.
Use aligned malloc, and align the whole struct, to be safe.
Fixes crash when compiling with clang.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit 00847e4f14)
CalculateProcessorTopology tries to figure out system topology by
parsing /proc/cpuinfo to determine the number of threads, cores, and
NUMA nodes. There are some architectures where the "physical id" begins
with 1 rather than 0, which was creating and empty "0" node and causing a
crash in CreateThreadPool.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97102
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b829206b07)
It seems clear that trying to multiply two pairs of doubles would result
in the temporary register getting overwritten by the second pair. So
make the code more explicit.
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 76b02d2fe1)
What a3xx docs call IJPERSPCENTERREGID.. the xy coord passed into
bary.f. We were incorrectly setting both this and gl_FragCoord.xy to
the same register resulting in all sorts of hilarity.
Fixes stk, vdrift, 0ad, probably a bunch others.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8d6af93e76)
The previous code always compared integers as 64-bit. Due to variations
in sign-extension in the code generated by nir_opt_algebraic.py, this
meant that nir_search doesn't always do what you want. Instead, 32-bit
values should be matched as 32-bit and 64-bit values should be matched
as 64-bit. While we're here we unify the unsigned and signed paths.
Now that we're using the right bit size, they should be the same since
the only difference we had before was sign extension.
This gets the UE4 bitfield_extract optimization working again. It had
stopped working due to the constant 0xff00ff00 getting sign-extended
when it shouldn't have.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bb96b03461)
In order to handle CCS_E, we stomp the image format to a UINT format and
then do some bitcasting logic in the shader. This works fine since SKL
render compression only considers the channel layout of the format and
not the format itself. In order for this to work on images that have
been fast-cleared, we need to also convert the clear color so that, when
interpreted as UINT, it provides the same bit value as it would have in
the original format. This fixes a bunch of OpenGL ES CTS tests for
copy_image when we start using CCS more aggressively.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 817f9e3b17)
In situations where libdrm_amdgpu and mesa are installed to the same
location, the mesa installed headers will take precedence over the git
source headers.
This is due to the AMDGPU_CFLAGS containing the install directory.
This situation can cause build errors if the git version of a header is
newer than the currently installed version of a header (e.g. git pull
updates vulkan.h)
Note: using the same install prefix for mesa and libdrm is probably a
common occurrence since it is described in the radeonBuildHowTo wiki:
https://www.x.org/wiki/radeonBuildHowTo/
v2: added sign-off
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit a3ad6a34c6)
Blorp can deal with depth/stencil surfaces blits/copies without the
render target requirement. Also having both render target and
depth/stencil requirement is incompatible from isl's point of view.
This fixes an image creation issue in the high level quality settings
of the Unity3D player, which requires a depth texture with src/dst
transfer & 4x multisampling.
v2: Simply aspect checking condition (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 74c23bde5b)
At least on VI, texture gather doesn't work with a 24_8 data format, so
use 8_8_8_8 and a modified swizzle instead.
A bit of background: When creating a GL_STENCIL_INDEX8 texture, we select
the X24S8 pipe format because we don't support stencil-only render targets
properly. With mip-mapping this can lead to a setup where the tiling is
incompatible with stencil texturing, and a flushed stencil texture is
used. For the flushed stencil, a literal X24S8 is used because there were
issues with an 8bpp DB->CB copy.
Longer term, it would be good if we could get away from these workarounds,
i.e. properly support an S8 format for stencil-only rendering and flushed
stencil. Since stencil texturing is somewhat rare, it's not a high
priority.
Fixes GL45-CTS.texture_cube_map_array.sampling.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
(cherry picked from commit 3cd092c415)
Earlier commit imported a SHA1 implementation and relaxed the SHA1 and
disk cache handling, broking the Windows builds.
Restrict things for now until we get to a proper fix.
Fixes: d1efa09d34 "util: import sha1 implementation from OpenBSD"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 9f8dc3bf03)
2017-01-18 20:11:20 +00:00
13223 changed files with 1223170 additions and 5030547 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.