Tested with a low-latency handwriting application on Android Nougat on
the Chrome OS Pixelbook (codename Eve) with Kabylake.
BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.
Change-Id: Ia816fa6b0a1158f81e5b63477451bf337c2001aa
Specifically, implement the extension DRI_MutableRenderBufferLoader.
However, the loader enables EGL_KHR_mutable_render_buffer only if the
DRI driver implements its half of the extension,
DRI_MutableRenderBufferDriver.
BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.
Change-Id: I7fe68a5a674d1707b1e7251d900b3affd5dd7660
A follow-up patch enables EGL_KHR_mutable_render_buffer for Android.
This patch is separate from the Android patch because I think it's
easier to review the platform-independent bits separately.
BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.
Change-Id: I07470f2862796611b141f69f47f935b97b0e04a1
If set, then the config will have __DRI_ATTRIB_MUTABLE_RENDER_BUFFER,
which translates to EGL_MUTABLE_RENDER_BUFFER_BIT_KHR.
Not used yet.
BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.
Change-Id: Icdf35794f3e9adf31e1f85740b87ce155efe1491
Define extensions DRI_MutableRenderBufferDriver and
DRI_MutableRenderBufferLoader. These are the two halves for
EGL_KHR_mutable_render_buffer.
Outside the DRI code there is one additional change. Add
gl_config::mutableRenderBuffer to match
__DRI_ATTRIB_MUTABLE_RENDER_BUFFER. Neither are used yet.
BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.
Change-Id: I4ca03d81e4557380b19c44d8d799a7cc9365d928
This pulls an 'else' block into the function's main body, making the
code easier to follow.
Without this change, the upcoming EGL_KHR_mutable_render_buffer patch
transforms dri2_make_current() into spaghetti.
BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.
Change-Id: I26be2b7a8e78a162dcd867a44f62d6f48b9a8e4d
Replace it with two fields in _EGLSurface, RequestedRenderBuffer and
ActiveRenderBuffer. (_EGLSurface::RequestedRenderBuffer replaces
_EGLSurface::RenderBuffer).
There exist *two* queryable EGL_RENDER_BUFFER states in EGL:
eglQuerySurface(EGL_RENDER_BUFFER) and
eglQueryContext(EGL_RENDER_BUFFER). _EGLContext::WindowRenderBuffer was
related to eglQueryContext but not eglQuerySurface. Post-patch,
RequestedRenderBuffer is related to eglQuerySurface and
ActiveRenderBuffer is related to eglQueryContext.
The implementation of eglQuerySurface(EGL_RENDER_BUFFER) contained
abstruse logic which required comprehending the specification
complexities of how the two EGL_RENDER_BUFFER states interact. Sometimes
it returned _EGLContext::WindowRenderBuffer, sometimes
_EGLSurface::RenderBuffer. Why? The function tried to encode the actual
logic in the EGL spec. When did the function return which variable? Go
study the EGL spec, hope you understand it, then hope Mesa mutated the
EGL_RENDER_BUFFER state in all the correct places. Have fun.
I got a headache from the mental gymnastics.
To simplify eglQuerySurface(EGL_RENDER_BUFFER), and to improve
confidence in its correctness, flatten its indirect logic. For pixmap
and pbuffer surfaces, return a hard-coded literal value, as the spec
suggests. For window surfaces, simply return ActiveRenderBuffer.
Nothing difficult here.
These changes eliminate potentially very fragile code in the upcoming
EGL_KHR_mutable_render_buffer implementation.
BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.
Change-Id: Ic5f2ab1952f26a87081bc4f78bc7fa96734c8f2a
Instead of handling software fallback inside platform_android, just
let EGL framework to handle it. With this, we still can fall back
to software driver when hardware driver actually doesn't work.
BUG=b:77302150
TEST=manual - make sure betty still boot without virgl driver.
Change-Id: I5d514f67c9dc6f68661e03fd9fc9546acd7277bd
Reviewed-on: https://chromium-review.googlesource.com/1004006
Commit-Ready: Lepton Wu <lepton@chromium.org>
Tested-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
From android cts 8.0_r4, a new test case checks if all the required egl
extensions are exposed. In the current implementation we expose KHR_image
if KHR_image_base and KHR_image_pixmap are supported but KHR_image spec
does not mandate the existence of both the extensions.
This patch preserves the current check and also provides the backend
with an option to expose the KHR_image extension.
Test: run cts -m CtsOpenGLTestCases -t \
android.opengl.cts.OpenGlEsVersionTest#testRequiredEglExtensions
Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Tapani Plli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 96fc5fbf23)
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
BUG=b:77786960
TEST=android.opengl.cts.OpenGlEsVersionTest#testRequiredEglExtensions
passes on CTS 8.1
Change-Id: I7c057ea4aa00f99885259ca0a97cac4554551c80
Reviewed-on: https://chromium-review.googlesource.com/1002796
Commit-Ready: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
My refactor in 47273d7312 missed this early return; because
of it, setting UseFallback one layer above actually prevented the
software path from being used.
Remove this early return and let each platform's dri2_initialize_*()
decide what it can do with the LIBGL_ALWAYS_SOFTWARE restriction.
platform_{surfaceless,x11,wayland} were already handling it themselves.
Fixes: 47273d7312 "egl: set UseFallback if LIBGL_ALWAYS_SOFTWARE is set"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Brendan King <Brendan.King@imgtec.com>
(cherry picked from commit 2f421651ac)
BUG=b:77302150
TEST=manual - make sure betty still boots without virgl driver.
Change-Id: I5e2ddfbd7a72bf04d83cac8f08fafbe81a77e66c
Reviewed-on: https://chromium-review.googlesource.com/1004005
Commit-Ready: Lepton Wu <lepton@chromium.org>
Tested-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
When compiling with LLVM 6.0, the test fails to detect that
-latomic is actually required, as the atomic call is inlined.
In the code itself (src/util/disk_cache.c), we see this pattern:
p_atomic_add(cache->size, - (uint64_t)size);
where cache->size is an uint64_t *, and results in the following
link time error without -latomic:
src/util/disk_cache.c:628: error: undefined reference to '__atomic_fetch_add_8'
Fix the configure test to replicate this pattern, which then
correctly realizes the need for -latomic.
BUG=b:76397110
TEST=cros_workon_make --board=caroline-arcnext --reconf arc-mesa
(am from https://patchwork.freedesktop.org/patch/213657/, dropped
meson.build change)
Change-Id: I9cdad5fd32879a3577d6ef42e278960a934b23fb
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/985676
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
It did not signal syncobjs in the fence, and also signalled too early
if there was work on the queue already, as we have to wait till that
work is done.
Fixes: d27aaae4d2 "radv: Add external fence support."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 0347a83bbf)
BUG=b:73102056
TEST=run nougat-mr1-cts-dev deqp vulkan tests.
Change-Id: I887768385833112effa1e8d414bf171640bb3564
Reviewed-on: https://chromium-review.googlesource.com/913504
Commit-Ready: Bas Nieuwenhuizen <basni@chromium.org>
Tested-by: Bas Nieuwenhuizen <basni@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Passes
dEQP-VK.api.smoke.*
dEQP-VK.wsi.android.*
with android-cts-7.1_r12 .
Unlike the initial anv implementation this does
use syncobjs instead of waiting on the CPU.
This is missing meson build coverage for now.
One possible todo is that linux 4.15 now has a
sycall that allows us to export amdgpu fence to
a sync_file, which allows us not to force all
fences and semaphores to use syncobjs. However,
I had trouble with my kernel crashing regularly
with NULL pointers, and I'm not sure how beneficial
it is in the first place given that intel uses
syncobjs for all fences if available.
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b1444c9ccb)
BUG=b:73102056
TEST=run nougat-mr1-cts-dev deqp vulkan tests.
Change-Id: I002abe9e44ceba89c15f503d8c9fa3419aa2803e
Reviewed-on: https://chromium-review.googlesource.com/913503
Commit-Ready: Bas Nieuwenhuizen <basni@chromium.org>
Tested-by: Bas Nieuwenhuizen <basni@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Per spec:
"Additionally, exporting a fence payload to a handle with copy transference has the same side effects
on the source fences payload as executing a fence reset operation. If the fence was using a
temporarily imported payload, the fences prior permanent payload will be restored."
And similar for semaphores:
"Additionally, exporting a semaphore payload to a handle with copy transference has the same side
effects on the source semaphores payload as executing a semaphore wait operation. If the
semaphore was using a temporarily imported payload, the semaphores prior permanent payload
will be restored."
Fixes: 42bc25a79c "radv: Advertise sync fd import and export."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b9f4c615f8)
BUG=b:73102056
TEST=run nougat-mr1-cts-dev deqp vulkan tests.
Change-Id: I5c79d6b69bb76b8f97018a4726d10f6b0d740350
Reviewed-on: https://chromium-review.googlesource.com/913500
Commit-Ready: Bas Nieuwenhuizen <basni@chromium.org>
Tested-by: Bas Nieuwenhuizen <basni@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Previously the dataflow propagation algorithm would calculate the ACP
live-in and -out sets in a two-pass fixed-point algorithm. The first
pass would update the live-out sets of all basic blocks of the program
based on their live-in sets, while the second pass would update the
live-in sets based on the live-out sets. This is incredibly
inefficient in the typical case where the CFG of the program is
approximately acyclic, because it can take up to 2*n passes for an ACP
entry introduced at the top of the program to reach the bottom (where
n is the number of basic blocks in the program), until which point the
algorithm won't be able to reach a fixed point.
The same effect can be achieved in a single pass by computing the
live-in and -out sets in lock-step, because that makes sure that
processing of any basic block will pick up the updated live-out sets
of the lexically preceding blocks. This gives the dataflow
propagation algorithm effectively O(n) run-time instead of O(n^2) in
the acyclic case.
The time spent in dataflow propagation is reduced by 30x in the
GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
test-case on my CHV system (the improvement is likely to be of the
same order of magnitude on other platforms). This more than reverses
an apparent run-time regression in this test-case from my previous
copy-propagation undefined-value handling patch, which was ultimately
caused by the additional work introduced in that commit to account for
undefined values being multiplied by a huge quadratic factor.
According to Chad this test was failing on CHV due to a 30s time-out
imposed by the Android CTS (this was the case regardless of my
undefined-value handling patch, even though my patch substantially
exacerbated the issue). On my CHV system this patch reduces the
overall run-time of the test by approximately 12x, getting us to
around 13s, well below the time-out.
v2: Initialize live-out set to the universal set to avoid rather
pessimistic dataflow estimation in shaders with cycles (Addresses
performance regression reported by Eero in GpuTest Piano).
Performance numbers given above still apply. No shader-db changes
with respect to master.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
Reported-by: Chad Versace <chadversary@chromium.org>
Archived-At: https://lists.freedesktop.org/archives/mesa-dev/2017-December/180489.html
(am from https://patchwork.freedesktop.org/patch/194420/)
BUG=b:67394445
TEST=No regressions in Android CTS, GLES tests.
Fixes timeouts in dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.5
on Brasswell boards.
Change-Id: I0d666c23693246b8d4fe8988f228f8c4ed7425f6
Reviewed-on: https://chromium-review.googlesource.com/862007
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
This addresses a long-standing back-end compiler bug that could lead
to cross-channel data corruption in loops executed non-uniformly. In
some cases live variables extending through a loop divergence point
(e.g. a non-uniform break) into a convergence point (e.g. the end of
the loop) wouldn't be considered live along all physical control flow
paths the SIMD thread could possibly have taken in between due to some
channels remaining in the loop for additional iterations.
This patch fixes the problem by extending the CFG with physical edges
that don't exist in the idealized non-vectorized program, but
represent valid control flow paths the SIMD EU may take due to the
divergence of logical threads. This makes sense because the i965 IR
is explicitly SIMD, and it's not uncommon for instructions to have an
influence on neighboring channels (e.g. a force_writemask_all header
setup), so the behavior of the SIMD thread as a whole needs to be
considered.
No changes in shader-db.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 4d1959e693)
This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
simplify the copy propagation dataflow logic".
BUG=b:67394445
TEST=No regressions in Android CTS, GLES tests.
Change-Id: I949f6f4e0127fec93d890e7669f870872f097a58
Reviewed-on: https://chromium-review.googlesource.com/862006
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
This makes the dataflow propagation logic of the copy propagation pass
more intelligent in cases where the destination of a copy is known to
be undefined for some incoming CFG edges, building upon the
definedness information provided by the last patch. Helps a few
programs, and avoids a handful shader-db regressions from the next
patch.
shader-db results on ILK:
total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
instructions in affected programs: 360 -> 336 (-6.67%)
helped: 8
HURT: 0
LOST: 0
GAINED: 10
shader-db results on BDW:
total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
instructions in affected programs: 7730 -> 7289 (-5.71%)
helped: 5
HURT: 2
LOST: 0
GAINED: 4
shader-db results on SKL:
total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
instructions in affected programs: 10364 -> 9293 (-10.33%)
helped: 5
HURT: 2
LOST: 0
GAINED: 2
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 9355116bda)
This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
simplify the copy propagation dataflow logic".
BUG=b:67394445
TEST=No regressions in Android CTS, GLES tests.
Change-Id: I8719e67ac14d3db8a7d6989d127ca4222cbdbfe4
Reviewed-on: https://chromium-review.googlesource.com/862005
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Currently the liveness analysis pass would extend a live interval up
to the top of the program when no unconditional and complete
definition of the variable is found that dominates all of its uses.
This can lead to a serious performance problem in shaders containing
many partial writes, like scalar arithmetic, FP64 and soon FP16
operations. The number of oversize live intervals in such workloads
can cause the compilation time of the shader to explode because of the
worse than quadratic behavior of the register allocator and scheduler
when running out of registers, and it can also cause the running time
of the shader to explode due to the amount of spilling it leads to,
which is orders of magnitude slower than GRF memory.
This patch fixes it by computing the intersection of our current live
intervals with the subset of the program that can possibly be reached
from any definition of the variable. Extending the storage allocation
of the variable beyond that is pretty useless because its value is
guaranteed to be undefined at a point that cannot be reached from any
definition.
According to Jason, this improves performance of the subgroup Vulkan
CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
improves by nearly 50x).
No significant change in the running time of shader-db (with 5%
statistical significance).
shader-db results on IVB:
total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
cycles in affected programs: 16335482 -> 16159558 (-1.08%)
helped: 5121
HURT: 4347
total spills in shared programs: 1309 -> 1288 (-1.60%)
spills in affected programs: 249 -> 228 (-8.43%)
helped: 3
HURT: 0
total fills in shared programs: 1652 -> 1597 (-3.33%)
fills in affected programs: 262 -> 207 (-20.99%)
helped: 4
HURT: 0
LOST: 2
GAINED: 209
shader-db results on BDW:
total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
cycles in affected programs: 23397142 -> 23141100 (-1.09%)
helped: 8045
HURT: 6488
total spills in shared programs: 1456 -> 1252 (-14.01%)
spills in affected programs: 465 -> 261 (-43.87%)
helped: 3
HURT: 0
total fills in shared programs: 1720 -> 1465 (-14.83%)
fills in affected programs: 471 -> 216 (-54.14%)
helped: 4
HURT: 0
LOST: 2
GAINED: 162
shader-db results on SKL:
total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
cycles in affected programs: 22560936 -> 22369874 (-0.85%)
helped: 8457
HURT: 6247
total spills in shared programs: 437 -> 437 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 870 -> 854 (-1.84%)
fills in affected programs: 16 -> 0
helped: 1
HURT: 0
LOST: 0
GAINED: 107
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit c3c1aa5aeb)
This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
simplify the copy propagation dataflow logic".
BUG=b:67394445
TEST=No regressions in Android CTS, GLES tests.
Change-Id: Icbe71f099618e45098a61502b79f3694bcc49877
Reviewed-on: https://chromium-review.googlesource.com/862004
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
This should allow the post-RA scheduler to do a slightly better job at
hiding latency in presence of instructions incurring bank conflicts.
The main purpuse of this patch is not to improve performance though,
but to get conflict cycles to show up in shader-db statistics in order
to make sure that regressions in the bank conflict mitigation pass
don't go unnoticed.
Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit acf98ff933)
This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
simplify the copy propagation dataflow logic".
BUG=b:67394445
TEST=No regressions in Android CTS, GLES tests.
Change-Id: Ie10e8bf2116b28a637fd7a3829a44a00b2867f11
Reviewed-on: https://chromium-review.googlesource.com/862003
Commit-Ready: Chad Versace <chadversary@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput. This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation. It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.
In a shader-db run on SKL this helps roughly 46k shaders:
total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
conflicts in affected programs: 816222 -> 407702 (-50.05%)
helped: 46234
HURT: 72
The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%1.13% with n=20 due to the additional work done by the
compiler back-end.
On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies. E.g. for a shader-db run on SNB:
total conflicts in shared programs: 944636 -> 623185 (-34.03%)
conflicts in affected programs: 853258 -> 531807 (-37.67%)
helped: 31052
HURT: 19
And on BDW:
total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
conflicts in affected programs: 1179787 -> 748933 (-36.52%)
helped: 47592
HURT: 70
On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
0.33% with n=16.
NOTE: This patch intentionally disregards some i965 coding conventions
for the sake of reviewability. This is addressed by the next
squash patch which introduces an amount of (for the most part
boring) boilerplate that might distract reviewers from the
non-trivial algorithmic details of the pass.
The following patch is squashed in:
SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.
Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit af2c320190)
This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
simplify the copy propagation dataflow logic".
BUG=b:67394445
TEST=No regressions in Android CTS, GLES tests.
Change-Id: I21b0563b3855434a702989fbc947b786c486f7e3
Reviewed-on: https://chromium-review.googlesource.com/862002
Commit-Ready: Chad Versace <chadversary@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Before, we always left workgroup variables as shared nir_variables and
let the driver call nir_lower_io. This adds an option to do the
lowering directly in spirv_to_nir. To do this, we implicitly assign the
variables a std430 layout and then treat them like a UBO or SSBO and
immediately lower all the way to an offset.
As a side-effect, the spirv_to_nir pass now handles variable pointers
for workgroup variables.
Archived-At: https://lists.freedesktop.org/archives/mesa-dev/2017-October/173534.html
(am from https://patchwork.freedesktop.org/patch/183827/)
Needed for SPIR-V VariablePointers capability.
BUG=b:68708929
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: Ibdfb71194fd43fe899a71e7da162ee0633d2d11a
Reviewed-on: https://chromium-review.googlesource.com/799680
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Commit-Queue: Kristian H. Kristensen <hoegsberg@chromium.org>
There is no API available to properly query the IMPLEMENTATION_DEFINED
format. As a workaround we rely here on gralloc allocating either
an arbitrary YCbCr 4:2:0 or RGBX_8888, with the latter being recognized
by lock_ycbcr failing.
(replaces commmit b0147e6603835a2cc64a99c5a6caa3316d6c2172 from
arc-12.1.0-pre2 branch / CL:367216)
BUG=b:28671744
BUG=b:33533853
BUG=b:37615277
TEST=android.view.cts.WindowTest#testSetLocalFocus
TEST=No CTS regressions on cyan and reef.
TEST=Camera preview on Poppy looks correctly
Change-Id: Ifca4a7f82a6d04ccb50e0ee17f1998ffb243f85f
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/566793
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 8e0cdc96548416708890eee94b6cff6cd68e5ca5)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I226caf644e34312628a7606fbcf65b567cf338d9
Reviewed-on: https://chromium-review.googlesource.com/780840
Commit-Queue: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
As said in the EGL_KHR_platform_android extensions
For each EGLConfig that belongs to the Android platform, the
EGL_NATIVE_VISUAL_ID attribute is an Android window format, such as
WINDOW_FORMAT_RGBA_8888.
Although it should be applicable overall.
Even though we use HAL_PIXEL_FORMAT here, those are numerically
identical to the WINDOW_FORMAT_ and AHARDWAREBUFFER_FORMAT_ ones.
Barring the said format of course. That one is only listed in HAL.
Keep in mind that even if we try to use the said format, you'll get
caught by droid_create_surface(). The function compares the format of
the underlying window, against the NATIVE_VISUAL_ID of the config.
Unfortunatelly it only prints a warning, rather than error out, likely
leading to visual corruption.
While SDL will even call ANativeWindow_setBuffersGeometry() with the
wrong format, and conviniently ignore the [expected] failure.
Cc: mesa-stable@lists.freedesktop.org
Cc: Chad Versace <chadversary@google.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tomasz Figa <tfiga@chromium.org>
(am from https://patchwork.freedesktop.org/patch/166176/)
(tfiga: Remove only respective EGL config, leave EGL image as is.)
BUG=b:33533853
TEST=dEQP-EGL.functional.*.rgba8888_window tests pass on eve
Change-Id: I8eacfe852ede88b24c1a45bff1445aacd86f6992
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/582263
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit cc1bc630a2b17693a6e8b93a6193b415b3859297)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I04bc06eeffcd4bd0cee64c55f10f0a039b6f2d73
Reviewed-on: https://chromium-review.googlesource.com/780798
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This patch adds support for opening render nodes directly from within
display initialization, Instead of relying on private interfaces
provided by gralloc.
In addition to having better separation from gralloc and being able to
use different render nodes for allocation and rendering, this also fixes
problems encountered when using the same DRI FD for gralloc and Mesa,
when both stepped each over another because of shared GEM handle
namespace.
BUG=b:29036398
TEST=No significant regressions in dEQP inside the container
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/367215
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
(cherry picked from commit 4471713aa71d83943eb195868707ebe4e6515bb6)
BUG=b:32077712
BUG=b:33533853
TEST=No CTS regressions on cyan and reef.
Change-Id: I7f901eb9dadbfc2200484666fdc6a2bc0ca42a0c
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/558138
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit d4c3c3b5b0a9a834736323dbcd43a424e9033fa2)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I050b5ba258f175d3d2543582963e4296c56df5ee
Reviewed-on: https://chromium-review.googlesource.com/780797
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Hiz causes GPU hangs on braswell, so let's disable it.
BUG=b/35570762, b/35574152
TEST=run graphics_GLBench on 3 * kefka for a total of 45 hours, no GPU hangs observed
(applied manually from src/third_party/media-libs/mesa/files)
BUG=b:33533853
TEST=No CTS regressions on Cyan and Reef.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Change-Id: I57402696fb0e970f0a38d87a33f2179b294a2cf1
Reviewed-on: https://chromium-review.googlesource.com/558133
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit ffdf27b84904d4c4e8294ce22e5fd9c423cf0d7c)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I3fd81db442bc8e97b5d44f3f03b35358dbf11318
Reviewed-on: https://chromium-review.googlesource.com/780796
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
GLSL ES 320 technically allows #line to have arbitrary expression trees
rather than integer literal constants, unlike the C and C++ preprocessor.
This is likely a completely unused feature that does not make sense.
However, Android irritatingly mandates this useless behavior, so this
patch implements a hack to try and support it.
We handle a single expression:
#line <line number expression>
but we avoid handling the double expression:
#line <line number expression> <source string expression>
because this is an ambiguous grammar. Instead, we handle the case that
wraps both in parenthesis, which is actually well defined:
#line (<line number expression>) (<source string expression>)
With this change following tests pass:
dEQP-GLES3.functional.shaders.preprocessor.builtin.line_expression_vertex
dEQP-GLES3.functional.shaders.preprocessor.builtin.line_expression_fragment
dEQP-GLES3.functional.shaders.preprocessor.builtin.line_and_file_expression_vertex
dEQP-GLES3.functional.shaders.preprocessor.builtin.line_and_file_expression_fragment
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
BUG=b:33352633
BUG=b:33247335
TEST=affected tests passing on CTS 7.1_r1 sentry
Reviewed-on: https://chromium-review.googlesource.com/427305
Tested-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Commit-Queue: Haixia Shi <hshi@chromium.org>
Trybot-Ready: Haixia Shi <hshi@chromium.org>
[chadv: Cherry-picked from branch arc-12.1.0-pre2]
(cherry picked from commit 18675d69bcd2a66483fcfc15f4c5fa5db4c257af)
(applied manually from src/third_party/media-libs/mesa/files)
BUG=b:33533853
TEST=No CTS regressions on Cyan and Reef.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Change-Id: I7afbbb386bd4a582e3f241014a83eaccad1d50d9
Reviewed-on: https://chromium-review.googlesource.com/558132
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit f0e7e697a8403e1bdf56b6f555d9488fe4f620ad)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I010627302e20d7748fb2ef2b1bcdd1ef48811072
Reviewed-on: https://chromium-review.googlesource.com/780795
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Replaces one undefined behavior with another, slightly more friendly,
undefined behavior.
This changes glScissor() behavior on i965 to clamp instead of truncate
out-of-range scissors. Technically either behavior is acceptable, but
clamping has more predictable results on out-of-range scissors.
BUG=chromium:360217
TEST=Watched some Youtube on Link; can't reproduce original bug as reported.
Signed-off-by: Corbin Simpson <simpsoco@chromium.org>
Signed-off-by: Prince Agyeman <prince.agyeman@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
(applied manually from src/third_party/media-libs/mesa/files)
BUG=b:33533853
TEST=No CTS regressions on Cyan and Reef.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Change-Id: I475deb2c102dd1b563ae4cc05f9fae5906c5c094
Reviewed-on: https://chromium-review.googlesource.com/558128
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit f390b920db89fd332f6c1398c886de43cdc4e868)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I3869ed8bc8e7e4551ff75c8e1c3fb1ec7ccb784f
Reviewed-on: https://chromium-review.googlesource.com/780791
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Since we can't repro this bug, it's hard to track it down, but it
looks like there are multiple issues with the workaround, which this
patch tries to fix.
This fixes two corner cases with the workaround:
- Fix the case where there is a depth but no stencil
- Fix the case there the depth mt hasn't been created
BUG=chromium:423546
TEST=builds and runs on link
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Prince Agyeman <prince.agyeman@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
(applied manually from src/third_party/media-libs/mesa/files)
BUG=b:33533853
TEST=No CTS regressions on Cyan and Reef.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Change-Id: Ib2813252dc825443470f67b6214c16d38981cda5
Reviewed-on: https://chromium-review.googlesource.com/558127
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 1d04841335da04fa7b97cf105ebf1514f081f7d9)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I2255687f69808e5ecb8eb9eb461921df4698108d
Reviewed-on: https://chromium-review.googlesource.com/780790
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
If we have no miptree (irb->mt == NULL) we still go ahead and look at
the stencil miptree, which causes crashes. Instead, let's return NULL if
we don't have a miptree, which will be correctly handled later.
BUG=chromium:387897
TEST=can't reproduce the bug, but compiles and runs
Signed-off-by: Prince Agyeman <prince.agyeman@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
(applied manually from src/third_party/media-libs/mesa/files)
BUG=b:33533853
TEST=No CTS regressions on Cyan and Reef.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Change-Id: Ief9c1fabd393b19a47f212885aff9e333c577785
Reviewed-on: https://chromium-review.googlesource.com/558126
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit ecfcf4db4ff27e0ec91fb31da3c49601b968346a)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I0fe67275f1227eaa4434d82594491fc27a9bd731
Reviewed-on: https://chromium-review.googlesource.com/780789
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Avoid crash on surface/sampler_view destruction when the context is gone
When we delete the context, sometimes there are pending surfaces and
sampler view left. Since mesa doesn't properly refcount them, the
context can go away before its resources. Until mesa is fixed to
properly refcount all these resources, let's just carry the destroy
function on the resource itself, which gives us a way to free it.
BUG=none
TEST=compile
Signed-off-by: Prince Agyeman <prince.agyeman@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
(applied manually from src/third_party/media-libs/mesa/files)
BUG=b:33533853
TEST=No CTS regressions on Cyan and Reef.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Change-Id: Ibfd5d2de9606beedb8275979c0f155eee61f51fb
Reviewed-on: https://chromium-review.googlesource.com/558120
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 8fb20c7216b406b560565ed23209d71dc2826a97)
BUG=b:69553386
TEST=No regressions on Eve in `cts-tradefed run cts -m CtsDeqpTestCases`.
Change-Id: I299169e3b5204b590ce6d3e4385a3dbb8bdda4fb
Reviewed-on: https://chromium-review.googlesource.com/780783
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
This build system is rather incomplete in the 17.3 branch, with multiple
bugs and user facing changes already addressed in master.
It's not shipped in the tarball and we don't want to receive bug reports
about 17.3, 18.0 is the release that I hope to have the meson build in
shape for.
Simply error() out, if anyone tries to use it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This reverts commit e8c9e65185.
With the actual bug fixed (by commit 6ac2d16901), this is not
necessary. I'm doubtful of its correctness in any case.
(cherry picked from commit a31d038208)
Fixes the following tests on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
(cherry picked from commit cfcfa0b9cd)
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.
For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.
Fixes the following test on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 6ac2d16901)
Otherwise we leak it.
Fixes: eaa56eab6d "radv: initial support for shared semaphores (v2)"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 7c25578863)
It turned out that with recent changes that call into dri3 from glFinish(),
it appears like different thread end up waiting for X events simultaneously,
causing deadlocks since they steal events from eachoter and update the dri3
counters behind eachothers backs.
This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread intends to
do the same, it's put to sleep until the first thread finishes waiting, and
then it rechecks counters and optionally retries the waiting. Threads that
poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.
This does not make dri3 drawables completely thread-safe but at least it's a
first step.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102358
Fixes: d5ba75f888 "st/dri2 Plumb the flush_swapbuffer functionality through to dri3"
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 54a58b2856)
We renamed "Function Enable" to "Enable", which broke our detection
of whether shaders are enabled or not. So, we'd see a bunch of HS/DS
packets with program offsets of 0, and think that was a valid TCS/TES.
Fixes: c032cae9ff (genxml: Rename "Function Enable" to "Enable".)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9a0465b3a3)
We want to emit invariant state at the start of a render batch. In the
past, this more or less happened: a new batch flagged BRW_NEW_CONTEXT
(because we don't have hardware contexts), which triggered the
brw_invariant_state atom. So, it would be emitted before any 3D
drawing. (Technically, there might be some BLT commands in the batch
because Gen4-5 have a single combined render/BLT ring, but that should
be harmless).
With the advent of BLORP, this broke. The first item in a batch might
be a BLORP operation, which bypasses the normal draw upload path. So,
we need to ensure invariant state happens first. To do that, we just
upload it when creating a new batch. On Gen6+ we'd need to worry about
whether it's a RENDER or BLT batch, but because we have a combined ring,
this approach should work fine on Gen4-5.
Seems to fix GPU hangs when playing hardware accelerated video with
mpv -hwdec=vaapi on Ironlake.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103529
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 8f91aa35a5)
...and provide a better citation for the existing one.
v2:
- Apply the workaround to Gen8 too, as intended (caught by Topi).
- Restructure to add bits instead of an extra flush (based on a similar
patch by Rafael Antognolli).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 8d48671492)
We were already using PTE for all render targets in case one happened to
get scanned out. However, this still wasn't 100% correct because there
are still possibly cases where we may want to texture from an external
buffer even though we don't know the caching mode. This can happen, for
instance, on buffers imported from another GPU via prime.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d7a19d69eb)
As per radeonsi, the tess factor components for isolines
are reversed.
Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f3f8615d76)
r0 in input into vertex shaders contains things like vertexid,
we need to reserve it even if we have no inputs.
This fixes a bunch of tessellation piglits.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 50330d7115)
Gather operations in both GLSL and SPIR-V require a sampler. Fixes
gathers returning garbage when using separate texture/samplers (on AMD,
was using an invalid sampler descriptor).
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 4122d00846)
We should use the result type of the OpSampledImage opcode, rather than
the type of the underlying image/samplers.
This resolves an issue when using separate images and shadow samplers
with glslang. Example:
layout (...) uniform samplerShadow s0;
layout (...) uniform texture2D res0;
...
float result = textureLod(sampler2DShadow(res0, s0), uv, 0);
For this, for the combined OpSampledImage, the type of the base image
was being used (which does not have the Depth flag set, whereas the
result type does), therefore it was not being recognised as a shadow
sampler. This led to the wrong LLVM intrinsics being emitted by RADV.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit e9eb3c4753)
Commit 259fc50545 added linker error for
mismatching uniform precision, as required by GLES 3.0 specification and
conformance test-suite.
Several Android applications, including Forge of Empires, have shaders
which violate this rule, on a dead varying that will be eliminated.
The problem affects a big number of applications using Cocos2D engine
and other GLES implementations accept this, this poses a serious
application compatibility issue.
Starting from GLSL ES 3.0, declarations with conflicting precision
qualifiers are explicitly prohibited. However GLSL ES 1.00 does not
clearly specify the behavior, except that
"Uniforms are defined to behave as if they are using the same storage in
the vertex and fragment processors and may be implemented this way.
If uniforms are used in both the vertex and fragment shaders, developers
should be warned if the precisions are different. Conversion of
precision should never be implicit."
The word "used" is not clear in this context and might refer to
1) declared (same as GLES 3.x)
2) referred after post-processing, or
3) linked after all optimizations are done.
Looking at existing applications, 2) or 3) seems to be widely adopted.
To avoid compatibility issues, turn the error into a warning if GLSL ES
version is lower than 3.0 and the data is dead in at least one of the
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97532
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0886be093f)
The L3 configuration code already considers the TCS and TES programs,
but failed to listen for TCS/TES program changes.
This was somehow missing.
Fixes: e9644cb1f9 ("i965: Consider tessellation in get_pipeline_state_l3_weights.")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit b8d42cccd0)
These flags are set for C sources, but not C++. This causes symbol
visibility leaks from the C++ parts of the Intel compiler.
Fixes: 700bebb958 ("i965: Move the back-end compiler to src/intel/compiler")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 854455498c)
13b303ff92 added the actual enums but
didn't remove the already existing XXXX ones. (And also duplicated
the "fragment" names instead of using the "vertex" names.)
Fixes: 13b303ff92 "docs: Update the list of used MESA GL enums."
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit dd38a4ee0d)
... as can happen with various types like mat4, or else we'll smash the
stack writing past the end of components_local[].
Fixes: 5a0d3e1129 ("nir: Print the components referenced for split or
packed shader in/outs.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 77a63d190a)
Targets such as omx and va can work w/o anything X related. Mandate the
xcb* dependencies only when the X11 platform is selected.
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: 63e11ac2b5 ("configure: error out if building VA w/o supported
platform")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
(cherry picked from commit 85a017230c)
Currently we error out when building GLVND w/o GLX.
That was the original premice before we had EGL. As the commit says,
that error should be reworked to honour both - do so.
v2: Drop noop *);; (Eric)
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
(cherry picked from commit b4967561c0)
Commit 05fc62d89f sets the variable, yet it forgot the update the
existing reference to append (instead of assign).
Thus as-is the expat library was discarded from the link chain when
building with Android.
Fixes: 05fc62d89f ("automake: intel: move expat handling where it's
used")
Cc: Hongxu Jia <hongxu.jia@windriver.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit ba414dba4f)
Fixes:
make[2]: Leaving directory '/home/local/mesa/mesa-17.4.0-devel/_build/sub/src'
make[2]: *** No rule to make target '../../../src/git_sha1.h.in', needed by 'git_sha1.h'. Stop.
Makefile:660: recipe for target 'all-recursive' failed
Fixes: 16be271c6e "git_sha1_gen: use git_sha1.h.in on all build systems"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit e17e8934f9)
The GL spec will soon be revised to clarify that a buffer binding for
a transform feedback buffer is only required if a variable is actually
defined to use the buffer binding point. Previously a declaration for
the default transform buffer would make it require a binding even if
nothing was declared to use the default buffer.
Affects:
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list_and_api
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4dc8458cd1)
Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't
lower indirects on the fragment shader which is wrong. Instead of using
a single indirect mask, take advantage of our new little helper.
Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 951a5dc4cc)
When I introduced gl_shader_program_data one of the intentions was to
fix a bug where a failed linking attempt freed data required by a
currently active program. However I seem to have failed to finish
hooking up the final steps required to have the data hang around.
Here we create a fresh instance of gl_shader_program_data every
time we link. gl_program has a reference to gl_shader_program_data
so it will be freed once the program is no longer active.
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102177
(cherry picked from commit 6a72eba755)
This has a bit of a surprising effect:
For the render pipeline, the upload_sampler_state_table atom emits
3DSTATE_BINDING_TABLE_POINTERS_XS. It tries to avoid this for compute:
if (GEN_GEN >= 7 && stage_state->stage != MESA_SHADER_COMPUTE) {
/* Emit a 3DSTATE_SAMPLER_STATE_POINTERS_XS packet. */
genX(emit_sampler_state_pointers_xs)(brw, stage_state);
} ...
However, we were failing to initialize brw->cs.base.stage, so it was
left as 0 (MESA_SHADER_VERTEX), causing this condition to break. We
then emitted 3DSTATE_SAMPLER_STATE_POINTERS_VS in GPGPU mode, when
trying to upload CS samplers. Nothing good can come of this.
Found by inspection while debugging a GPU hang. Jordan believes this
helps the Deus Ex: Mankind Divided benchmark mode's stability when
running with shader cache.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a16dc04ad5)
Use $(sysconfdir) instead of hardcoding /etc.
While the OpenCL spec expects the file in /etc, people building their
stack can override that, esp. !Linux users.
Furthermore this removes a fundamental violation, which results in the
system file being overwritten even as one explicitly sets --prefix
and/or DESTDIR.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-By: Aaron Watry <awatry@gmail.com>
(cherry picked from commit 0cd0958544)
Originally we tried to handle this case based on slots_valid. However,
there are a number of ways that this can go wrong. For one, we throw
away any trailing slots which either aren't written or are set to
VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not
actually write anything there. Between the lot of these, it was
possible to end up in a case where we tried to do a regular URB write
but ended up with a length of 1 which is invalid. This commit moves it
to the end and makes it based on a new boolean flag urb_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7a82ad54bb)
This isn't often a problem , when we're in a compute shader, we must
push the thread local ID so we decrement the amount of available push
space by 1 and it's no longer even and 64-bit data can, in theory, span
it. By marking those uniforms contiguous, we ensure that they never get
split in half between push and pull constants.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 25f7453c9e)
The same workaround we need for 64-bit values on little core also takes
care of the Ivy Bridge problem and does so a bit more efficiently so we
can drop that code while we're here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fd1bcccc2d)
For some reason, the any/all predicates don't work properly with SIMD32.
In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read
the correct subset of the flag register and you end up getting garbage
in the second half. Work around this by using a pair of 1-wide MOVs and
scattering the result. This fixes the any/all instructions for SIMD32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1b8ef49f48)
The any/all intrinsics return a boolean value so D or UD is the correct
type. Unfortunately, get_nir_dest has the annoying behavior of
returnning a float type by default. This causes format conversion which
gives us -1.0f or 0.0f in the register. If the consumer of the result
does an integer comparison to zero, it will give you the right boolean
value but if we do something more clever based on the 0/~0 assumption
for booleans, this will give the wrong value.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1f41663007)
In fragment shaders f0.1 is used for discards so doing ballot after a
discard can potentially cause the discard to not happen. However, we
don't support SIMD32 fragment shaders yet so this isn't a problem.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 6c00240bc6)
We have ANY/ALL32 predicates and, for the most part, they work just
fine. (See the next commit for more details.) Also, due to the way
that flag registers are handled in hardware, instruction splitting is
able to split the CMP correctly. Specifically, that hardware looks at
the execution group and knows to shift it's flag usage up correctly so a
2H instruction will write to f0.1 instead of f0.0.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit def013a863)
Before, we were careful to place the zip after the last of the split
instructions but did unzip on-demand. This changes things so that the
unzips go before all of the split instructions and the unzip comes
explicitly after all the split instructions. As a side-effect of this
change, we now emit the split instruction from highest SIMD group to
lowest instead of low to high. We could have kept the old behavior, but
it shouldn't matter and this made the code easier.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0d905597fe)
This makes it far more explicit where we're inserting the instructions
rather than the magic "before and after" stuff that the emit_[un]zip
helpers did based on block and inst.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fcd4adb9d0)
Register strides higher than 4 are uncommon but they can happen. For
instance, if you have a 64-bit extract_u8 operation, we turn that into
UB -> UQ MOV with a source stride of 8. Our previous calculation would
try to generate a stride of <32;8,8>:ub which is invalid because the
maximum horizontal stride is 4. To solve this problem, we instead use a
stride of <8;1,0>. As noted in the comment, this does not work as a
destination but that's ok as very few things actually generate that
stride.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit e8c9e65185)
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.
Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.
Fixes a memory leak with multithreading demo.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f0ae06a13c)
This patch modifies the ARB_indirect_parameters logic in
brw_draw_prims, so that our implementation isn't affected if
another application attempts to use predicates. Previously we
were using a predicate with a DELTAS_EQUAL comparison operation
and relying on the MI_PREDICATE_DATA register being 0. Our code
to initialize MI_PREDICATE_DATA to 0 was incorrect, so we were
accidentally using whatever value was written there. Because the
kernel does not initialize the MI_PREDICATE_DATA register on
hardware context creation, we might inherit the value from whatever
context was last running on the GPU (likely another process).
The Haswell command parser also does not currently allow us to write
the MI_PREDICATE_DATA register. Rather than fixing this and requiring
an updated kernel, we switch to a different approach which uses a
SRCS_EQUAL predicate that makes no assumptions about the states of any
of the predicate registers.
Fixes Piglit's spec/arb_indirect_parameters/tf-count-arrays test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103085
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 048d4c45c9)
We need to validate some structs exist before we dirty the states, and
avoid the problem in some other places.
Fixes: e027935a7 ("st/mesa: don't update unrelated states in non-draw calls such as Clear")
(cherry picked from commit cc69f2385e)
This would cause the read of the metadata content to fail, which would
prevent the linking from being skipped.
Seen on Rocket League with i965 shader cache.
Fixes: b86ecea344 "util/disk_cache: write cache item metadata to disk"
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e5b141634c)
The shared si_create_shader_selector() code already offsets the mask.
Fixes the following piglit tests:
arb_cull_distance/clip-cull-3.shader_test
arb_cull_distance/clip-cull-4.shader_test
Fixes: 29d7bdd179 (radeonsi: scan NIR shaders to obtain required info)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e80bbd6f52)
Otherwise we will leak them, load duplicates from disk rather
than memory and never write items loaded from disk to the apps
pipeline cache.
Fixes: fd24be134f 'radv: make use of on-disk cache'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 1e84e53712)
Squashed with commit:
radv: use correct alloc function when loading from disk
Fixes regression in:
dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline
Fixes: 1e84e53712 "radv: add cache items to in memory cache when reading from disk"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit e92405c55a)
It confuses CTS. This pregenerates the heap info into the
physical device, so we can use it for translating contiguous
indices into our "standard" ones.
This also makes the WSI a bit smarter in case the first preferred
heap does not exist.
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 806721429a)
GC3000 resolve-in-place assumes that the TS state is configured.
If it is not, this will result in MMU errors. This is especially
apparent when using glGenMipmaps().
Fixes: 78ade65956 ("etnaviv: Do GC3000 resolve-in-place when possible")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
(cherry picked from commit 8fbd82f464)
It is possible that the optimizer ends up in an infinite loop in
post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group()
does not find a proper scheduling. This can be deducted from
pending.count() being larger than zero and not getting smaller.
This patch works around this problem by signalling this failure so that the
optimizers bails out and the un-optimized shader is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 69eee511c6)
Previously the values were calculated by just shifting ~0 by the
invocation ID. This would end up including bits that are higher than
gl_SubGroupSizeARB. The corresponding CTS test effectively requires that
these high bits be zero so it was failing. There is a Piglit test as
well but this appears to checking the wrong values so it passes.
For the two greater-than bitmasks, this patch adds an extra mask with
(~0>>(64-gl_SubGroupSizeARB)) to force these bits to zero.
Fixes: KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680#c3
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit b697ece10a)
Only use CCS_E to render to a texture that is CCS_E-compatible with the
original texture's miptree (linear) format. This prevents render
operations from writing data that can't be decoded with the original
miptree format.
On Gen10, with the new CCS_E-enabled formats handled, this enables the
driver to pass the arb_texture_view-rendering-formats piglit test.
v2. Add a TODO for texturing. (Jason)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 9e849eb8bb)
Having moved gallium_dri.so library to /vendor/lib/dri
also symlinks need to be coherently created using TARGET_OUT_VENDOR instead of TARGET_OUT
or all non Intel drivers will not be loaded with Android N and earlier,
thus causing SurfaceFlinger SIGABRT
(v2) simplification of post install command
Fixes: c3f75d483c ("Android: move libraries to /vendor")
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Rob Herring <robh@kernel.org> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7dae419aa7)
Patch uses mem_ctx for allocation to ensure param array gets freed
later.
==6164== 48 bytes in 1 blocks are definitely lost in loss record 61 of 193
==6164== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299)
==6164== by 0x12E31C6C: ralloc_size (ralloc.c:121)
==6164== by 0x130189F1: fs_visitor::assign_constant_locations() (brw_fs.cpp:2095)
==6164== by 0x13022D32: fs_visitor::optimize() (brw_fs.cpp:5715)
==6164== by 0x13024D5A: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6229)
==6164== by 0x1302549A: brw_compile_fs (brw_fs.cpp:6570)
==6164== by 0x130C4B07: blorp_compile_fs (blorp.c:194)
==6164== by 0x130D384B: blorp_params_get_clear_kernel (blorp_clear.c:79)
==6164== by 0x130D3C56: blorp_fast_clear (blorp_clear.c:332)
==6164== by 0x12EFA439: do_single_blorp_clear (brw_blorp.c:1261)
==6164== by 0x12EFC4AF: brw_blorp_clear_color (brw_blorp.c:1326)
==6164== by 0x12EFF72B: brw_clear (brw_clear.c:297)
Fixes: 8d90e28839 ("intel/compiler: Allocate pull_param in assign_constant_locations")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 446c5726ec)
Fixes intermittent GPU hangs on Broxton with an Intel internal
test case.
There are plenty of similar fragment shaders in piglit that do
not use any varyings and any uniforms. According to the
documentation special timing is needed between pipeline stages.
Apparently we just don't hit that with piglit. Even with the
failing test case one doesn't always get the hang.
Moreover, according to the error states the hang happens
significantly later than the execution of the problematic shader.
There are multiple render cycles (primitive submissions) in between.
I've also seen error states where the ACTHD points outside the
batch. Almost as if the hardware writes somewhere that gets used
later on. That would also explain why piglit doesn't suffer from
this - most tests kick off one render cycle and any corruption
is left unseen.
v2 (Ken): Instead of enabling push constants, enable one of the
inputs (PSIZ).
v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe()
happy.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 97e01adfd5)
Mesa's DEBUG and assert's NDEBUG are not tied to each other, so we need
to explicitly compile this code out.
Fixes: 3df7892878 "vc4: Drop reloc_count tracking for debug
asserts on non-debug builds."
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 5d44e35a8f)
It appears that flushing the DB metadata is actually not sufficient
since the driver uses the new VS blit shaders. This looks quite
strange though, but it seems like we need to flush DB for fixing
the corruption.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102955
Fixes: 69ccb9dae7 (radeonsi: use new VS blit shaders (VS inputs in SGPRs)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit dd79aa4ad3)
The PRM says "The execution size must be 1." In 73137997e2, the
execution size was set to 1 when it should have been BRW_EXECUTE_1
(which maps to 0). Later, in dc2d3a7f5c, JMPI was used for
line AA on gen6 and earlier and we started manually stomping the
exeution size to BRW_EXECUTE_1 in the generator. This commit fixes the
original bug and makes brw_JMPI just do the right thing.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 73137997e2
(cherry picked from commit 562b8d458c)
Going from binary to hex has a 2x blowup.
Fixes: 1421625292 'radv: create on-disk shader cache'
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 5bfbab2fdc)
gcc is throwing this warning in my meson build:
../src/intel/compiler/brw_eu_validate.c:50:11: warning
argument 1 null where non-null expected [-Wnonnull]
return memmem(haystack.str, haystack.len,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
needle.str, needle.len) != NULL;
~~~~~~~~~~~~~~~~~~~~~~~
The first check for CONTAINS has a NULL error_msg.str and 0 len. The
glibc implementation will exit without looking at any haystack bytes if
haystack.len < needle.len, so this was safe, but silence the warning
anyway by guarding against implementation variablility.
Fixes: 122ef3799d ("i965: Only insert error message if not already present")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit e91c3540fc)
We now have linking optimisations so we want to delay dumping the
nir until after these are complete.
Fixes: 06f05040eb (radv: Link shaders)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit f0a2bbd1a4)
Squashed with commit:
radv: print NIR before LLVM IR and disassembly
It's still printed after linking, but it makes more sense to
have SPIRV->NIR->LLVM IR->ASM.
Fixes: f0a2bbd1a4 (radv: move nir print after linking is done)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9711979df0)
This fixes a regression I introduced refactoring this code,
I managed to invert range twice, I moved the inversion into
the common code, but forgot to stop doing it in the callee.
Fixes: GL45-CTS.multi_bind.dispatch_bind_buffers_base
Fixes: 35ac13ed3 (mesa/bufferobj: consolidate some codepaths between ubo/ssbo/atomics.)
Reported-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 11d688d9f0)
The IR is reused in different pipeline combinations so we need
to clone it to avoid link time optimistaions messing up the
original copy.
Fixes: 06f05040eb (radv: Link shaders)
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 013313cf89)
Meson 0.43 added the ability to pass nested lists to
include_directories, so the code that we have works for 0.43, but not
for 0.42. This patch changes the include_directories list to be flat so
it works with 0.42
fixes: 108d257a16 ("meson: build libEGL")
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
(cherry picked from commit 77f7ef0287)
According to the ARB_ES3_1_compatibility specification,
glGetFramebufferAttachmentParameteriv is supposed to accept BACK,
and it behaves exactly like BACK_LEFT.
Fixes a GL error in GFXBench 5 Aztec Ruins.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4f538c3f99)
Valgrind shows that leak is caused by gen6_upload_push_constant, add
unref push_const_bo per stage to destructor to fix this (like done for
scratch_bo).
==10952== 144 bytes in 1 blocks are definitely lost in loss record 44 of 66
==10952== at 0x4C30A1E: calloc (vg_replace_malloc.c:711)
==10952== by 0x8C02847: bo_alloc_internal.constprop.10 (brw_bufmgr.c:344)
==10952== by 0x8C425C4: intel_upload_space (intel_upload.c:101)
==10952== by 0x8C22ED0: gen6_upload_push_constants (gen6_constant_state.c:154)
v2: remove if conditions, brw_bo_unreference handles NULL (Ken, Emil)
Fixes: 24891d7c05 ("i965: Store per-stage push constant BO pointers.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0b131ca427)
Not all rendering matches the miptree format. We allow rendering to
texture views so there are cases where it may not match. In those
cases, our current scheme of just passing the value of ctx->sRGBEnabled
isn't viable. Instead, just do what we do for texturing and pass the
view format in directly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 39c5c12f8f)
It's rather surprising that we've never actually hit this before.
Aparently, Ian's SPIR-V generator currently claims the Simple when you
don't do anything complex. We really shouldn't assert-fail on it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8ab9820d34)
See my LLVM patch which fixes the root cause.
Users have to apply this patch and then they have 2 choices:
- Downgrade to LLVM 5.0
- Update to LLVM git after my LLVM patch is pushed.
It won't be possible to use current and earlier development version
of LLVM 6.0.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3f8e3c2bd8)
We currently have a bug where nir_lower_system_values gets called before
nir_lower_var_copies so it will miss any system value uses which come
from a copy_var intrinsic. Moving it to after brw_preprocess_nir fixes
this problem.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 279f8fb69c)
In order to implement the ballot intrinsic, we do a MOV from flag
register to some GRF. If that GRF is used in a SEL, cmod propagation
helpfully changes it into a MOV from the flag register with a cmod.
This is perfectly valid but when lower_simd_width comes along, it simply
splits into two instructions which both have conditional modifiers.
This is a problem since we're reading the flag register. This commit
makes us check whether or not flags_written() overlaps with the flag
values that we are reading via the instruction source and, if we have
any interference, will force us to emit a copy of the source.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fa6e74e33e)
This was the actual cause of GPU hangs fixed by 0fdd531457 ("radv:
Fix pipeline cache locking issues"), since multiple threads would end
up trying to create the variants for a single entry.
Now that we're locking around the whole of this function, this isn't
really necessary (we either create all or none of the variants), but
fix this anyway in case things change later.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: 17.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit fee9d05e21)
The kernel doesn't initialize the value of the INSTPM or CS_DEBUG_MODE2
registers at context initialization time. Instead, they're inherited
from whatever happened to be running on the GPU prior to first run of a
new context. So, when we started setting these, other contexts in the
system started inheriting our values. Since this controls whether
3DSTATE_CONSTANT_* takes a pointer or an offset, getting the wrong
setting is fatal for almost any process which isn't expecting this.
Unfortunately, VA-API and Beignet don't initialize this (nor does older
Mesa), so they will die horribly if we start doing this. UXA and SNA
don't use any push constants, so they are unaffected.
Until we have some kind of solution to this problem, I'm going to revert
this patch and abandon using the feature for now. It will lead to fewer
pushed UBO ranges on Broadwell+, which may lead to lower performance,
though I don't have any data on the impact.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102774
(cherry picked from commit 013d331220)
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.
In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.
While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.
Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.
Reported-by: Stephane Chevigny <stephane.chevigny@polymtl.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4d24a7cb97)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.