SVGA_3D_CMD_DX_GENRATE_MIPMAP & SVGA_3D_CMD_DX_SET_PREDICATION commands
are not presents in fedora 24 kernel module. Because of this
reason application like supertuxkart are not running.
v2: Add few comments and code modifications suggested by Brian P.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit 7988513ac3)
Since the function is exported like any other
public api function and put in the header
as if you could link against it, export it also
from shared objects.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 13affe0d3f)
Some games are sloppy.. perhaps because it is defined behavior for DX or
perhaps because nv blob driver defaults things to zero.
So add driconf param to force uninitialized variables to default to zero.
This issue was observed with rust, from steam store. But has surfaced
elsewhere in the past.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit f78a6b1ce3)
Devices with smaller GMEM size need more tiles. On db410c at 2048x1152,
glmark2 shadow needed ~330 tiles for fullscreen. Lets bump it up to
512. (Maybe with MRT you could end up needing more, but at that point
things are probably going to be painfully slow.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 7295428e41)
Atm the actual rule will expand to foo.o which is used for static
libraries only.
Thus the automake manual recommendation [to use OBJEXT] won't help us,
since since we're working with a shared library.
Thus let's 'demote' the file and add it back to BUILT_SOURCES. This will
manage all the complexity for us, at the (existing expense) of working
only with the all, check and install targets.
The crazy (why the issue was hard to spot):
If the dependencies (.deps/*.Plo) are already created one can alter the
anv_device.$(OBJEXT) line and/or nuke it all together. That won't lead
to any warnings/issues, even though the Makefile is regenerated.
Moral of the story:
Always rm -rf top_builddir or don't resolve the dependencies manually
and use BUILT_SOURCES.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96825
Fixes: d7a604c3f7a ("anv: use cache uuid based on the build timestamp.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit 9618e2a24c)
It has proven subtle to get it right both from the build side POV (see
commit list below) and builders due to their varying workflows.
Furthermore it does not fully fulfil the reason why it was enforced -
to detect uniqueness between different builds, in order to distinguish
and invalidate Vulkan/GL caches.
With that having a much better solution (previous commit) we can drop
this solution.
This effectively reverts the following commits:
359d9dfec3 ("mesa: automake: add directory prefix for git_sha1.h")
2c424e00c3 ("mesa: automake: ensure that git_sha1.h.tmp has the right
attributes")
b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building OOT")
8229fe68b5 ("automake: get in-tree `make distclean' working again.")
Cc: Timo Aaltonen <tjaalton@debian.org>
Cc: Haixia Shi <hshi@chromium.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 22e9357028)
Do not rely on the git sha1:
- its current truncated form makes it less unique
- it does not attribute for local (Vulkand or otherwise) changes
Use a timestamp produced at the time of build. It's perfectly unique,
unless someone explicitly thinkers with their system clock. Even then
chances of producing the exact same one are very small, if not zero.
v2: Remove .tmp rule. Its not needed since we want for the header to be
regenerated on each time we call make (Eric).
v3:
- Honour SOURCE_DATE_EPOCH, to make the build reproducible (Michel)
- Replace the generated header with a define, to prevent needless
builds on consecutive `make' and/or `make install' calls. (Dave)
v4:
- Keep the timestamp generation at make time. (Jason)
v5:
- Ensure that file is regenerated on incremental builds.
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit addb099ce8)
This reverts commit 27d456cc87.
DOH, what seems right and what is right with fp64 are always
two different things.
This regressed:
spec@arb_gpu_shader_fp64@shader_storage@layout-std140-fp64-mixed-shader
on radeonsi
Reported-by: Michel Dänzer <michel@daenzer.net>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cb728df967)
In presence of an indirect image access, the base offset should be
zeroed because the stride will be computed twice. This is a pretty
rare situation but it can happen when tex.r > 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f3b9fff3c3)
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.
This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cb828b7b18)
The image usage specified by the caller of vkCreateSwapchainKHR should be
passed onto the internal image creation. Otherwise the driver might later
crash when the user tries to use the image as a combined sampler even though
the creation was explicitly created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT.
Leaving the previous VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT as this might be
expected even if the swapchain is created without any flag.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96791
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit dbbc4fb4cc)
Immediates are stored into a separate table, and are
consolidated, so if we get an immediate we don't need
to offset it as the index it has is correct.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 27d456cc87)
Fixes a regression introduced by commit 42624ea83 which triggered
an assertion in
dEQP-GLES2.functional.texture.completeness.cube.not_positive_level_0
While stImage must have a non-zero size as verified by the caller, we also
look at the size of the base image in an attempt to make a better guess at
the level0 size (this is important when the base image size is odd). However,
the base image may have a zero size even when it exists.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96629
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 0ba053b34c)
This aligns the 4-element color float array to 16 byte boundaries. This
should allow compiler vectorizers to generate better optimizations.
Also fixes broken vectorization generated by Intel compiler.
v2: Fixed indentation and added a lengthy comment explaining the
reason for the alignment.
Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit d8d6091a84)
Encapsulate the test for which flags are needed to get a compiler to
support certain features. Along with this, give various options to try
for AVX and AVX2 support. Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different. This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.
This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.
v2: Pass preprocessor-check argument as true-state instead of
false-state for clarity.
v3: Reduce AVX2 define test to just __AVX2__. Additional defines suchas
__FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
w.r.t thier availability.
v4: Fix C++11 flags being added globally and add more logic to
swr_require_cxx_feature_flags
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@Intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
(cherry picked from commit c1bf6692be)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Khronos recommends that the GLES 3.1 library also be called libGLESv2.
It also requires that functions be statically linkable from that
library.
NOTE: Mesa has supported the EGL_KHR_get_all_proc_addresses extension
since at least Mesa 10.5, so applications targeting Linux should use
eglGetProcAddress to avoid problems running binaries on systems with
older, non-GLES 3.1 libGLESv2 libraries.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Cc: Mike Gorchak <mike.gorchak.qnx@gmail.com>
Reported-by: Mike Gorchak <mike.gorchak.qnx@gmail.com>
Acked-by: Chad Versace <chad.versace@intel.com>
(cherry picked from commit 5921f372c8)
Because Polaris uvd fw interface changes, the driver need to check fw version
to apply right interface. This change is to add uvd fw version.
Signed-off-by: sonjiang <sonny.jiang@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 28f85eab49)
[Emil Velikov: resolve trivial s/bool/boolean/ conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeon/radeon_winsys.h
Rely on the existence of a second destination when emitting a setcond
flag is dangerous, because this doesn't mean that the flag has been
correctly set. Instead rely on flagsDef like what emitX() does
for flagsSrc.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cc97b6a34a)
emit_urb_writes() contains code to emit an EOT write with no actual
data when there are no output varyings. This makes sense for the VS
and TES stages, where it's called once at the end of the program.
However, in the geometry shader stage, emit_urb_writes() is called once
for every EmitVertex(). We explicitly emit a URB write with EOT set at
the end of the shader, separately from this path. So we'd better not
terminate the thread. This could get us into trouble for shaders which
do EmitVertex() with no varyings followed by SSBO/image/atomic writes.
It also caused us to emit multiple sends with EOT set, which apparently
confuses the register allocator into not using g112-g127 for all but
the first one. This caused EU validation failures in OglGSCloth
shaders in shader-db. (The actual application was fine, but shader-db
thinks there are no outputs because it doesn't understand transform
feedback.)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 7e7e501acf)
The only part of an ir_texture which can be an array is the
offsets array in textureGatherOffsets() calls. We don't want
to lower those, because they're required to remain constants.
Fixes textureGatherOffsets with Gallium drivers such as llvmpipe,
which commit ef78df8d3b regressed.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit a36a73a7b8)
These need to be passed from the host in caps structure if they
are larger, this fixes a bunch of tests on Intel hw, that I'd
put the limits too high for.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c7cc264ca9)
On MRF platforms, we need to set base_mrf to the first MRF value we'd
like to use for the message. On send-from-GRF platforms, we set it to
-1 to indicate that the operation doesn't use MRFs.
As MRF platforms are becoming increasingly a thing of the past, we've
forgotten to bother with this. It makes more sense to set it to -1 by
default, so we don't have to think about it for new code.
I searched the code for every instance of 'mlen =' in brw_fs*cpp, and
it appears that all MRF-based messages correctly program a base_mrf.
Forgetting to set base_mrf = -1 can confuse the register allocator,
causing it to think we have a large fake-MRF region. This ends up
moving the send-with-EOT registers earlier, sometimes even out of
the g112-g127 range, which is illegal. For example, this fixes
illegal sends in Piglit's arb_gpu_shader_fp64-layout-std430-fp64-shader,
which had SSBO messages with mlen > 0 but base_mrf == 0.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 3e04e3758e)
We currently use CL_INVOCATION_COUNT for the GL_PRIMITIVES_GENERATED
query, which involves passing all primitives to the clipper. When
rasterizer discard is enabled, we program the clipper in REJECT_ALL
mode, rather than using the SOL stage's "Rendering Disable" feature.
See commit f09b91f782 for an explanation
of why we implement GL_PRIMITIVES_GENERATED this way.
Apparently the SOL stage's "Rendering Disable" feature is a lot faster
than having the clipper reject all primitives. It's safe to use when
no GL_PRIMITIVES_GENERATED query is active, as we don't care about
CL_INVOCATION_COUNT incrementing.
This patch makes us use SO_RENDERING_DISABLE when no query is active,
but continues falling back to the clipper in REJECT_ALL mode when the
queries are enabled. It brings back the perf_debug for the clipper
case (which I removed in commit 1f9445ff57, thinking it wasn't useful).
Improves performance in Gl32GSCloth by 84.8303% +/- 2.07132% (n = 10)
on my Broadwell GT2 laptop.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit b0629e6894)
Constant propagation on arrays doesn't make a lot of sense. If the
array is only accessed with constant indexes, then opt_array_splitting
would split it up. Otherwise, we have variable indexing. If there's
multiple accesses, then constant propagation would end up replicating
the data.
The lower_const_arrays_to_uniforms pass creates uniforms for each
ir_constant with array type that it encounters. This means that it
creates redundant uniforms for each copy of the constant, which means
uploading too much data. It can even mean exceeding the maximum number
of uniform components, causing link failures.
We could try and teach the pass to de-duplicate the data by hashing
constants, but it makes more sense to avoid duplicating it in the first
place. We should promote constant arrays to uniforms, then propagate
the uniform access.
Fixes the TressFX shaders from Tomb Raider, which exceeded the maximum
number of uniform components by a huge margin and failed to link.
On Broadwell:
total instructions in shared programs: 9067702 -> 9068202 (0.01%)
instructions in affected programs: 10335 -> 10835 (4.84%)
helped: 10 (Hoard, Shadow of Mordor, Amnesia: The Dark Descent)
HURT: 20 (Natural Selection 2)
loops in affected programs: 4 -> 0
The hurt programs appear to no longer have a constarray uniform, as
all constants were successfully propagated. Apparently before this
patch, we successfully unrolled a loop containing array access, but
only after promoting constant arrays to uniforms. With this patch,
we unroll it first, so all array access is direct, and the array
is split up, and individual constants are propagated. This seems
better.
Cc: mesa-stable@lists.freedesktop.org
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit fb857b5eea)
There's really no point in looking at ir_dereference_array of a
constant. It also misses cases like:
(assign () (var_ref tmp) (constant (array ...) ...))
No changes in shader-db, but keeps it working after the next commit.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit ef78df8d3b)
The scalar backend currently doesn't support variable indexing on
temporary arrays, but it does support it on uniform arrays, and
some stages support it for input arrays. Make sure these are
propagated through before exploding indirects into piles of
if-ladders unnecessarily.
On Broadwell, no instruction count change in shader-db.
total cycles in shared programs: 80675652 -> 80674928 (-0.00%)
cycles in affected programs: 649972 -> 649248 (-0.11%)
helped: 386
HURT: 165
This will help avoid code quality regressions in a future commit.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit f7741c5211)
Previously, we failed to split constant arrays. Code such as
int[2] numbers = int[](1, 2);
would generates a whole-array assignment:
(assign () (var_ref numbers)
(constant (array int 4) (constant int 1) (constant int 2)))
opt_array_splitting generally tried to visit ir_dereference_array nodes,
and avoid recursing into the inner ir_dereference_variable. So if it
ever saw a ir_dereference_variable, it assumed this was a whole-array
read and bailed. However, in the above case, there's no array deref,
and we can totally handle it - we just have to "unroll" the assignment,
creating assignments for each element.
This was mitigated by the fact that we constant propagate whole arrays,
so a dereference of a single component would usually get the desired
single value anyway. However, I plan to stop doing that shortly;
early experiments with disabling constant propagation of arrays
revealed this shortcoming.
This patch causes some arrays in Gl32GSCloth's geometry shaders to be
split, which allows other optimizations to eliminate unused GS inputs.
The VS then doesn't have to write them, which eliminates the entire VS
(5 -> 2 instructions). It still renders correctly.
No other change in shader-db.
v2: Drop !AOA check and improve a comment (feedback from Tim Arceri).
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit c264fdbc07)
opt_constant_propagation.cpp contains constant folding code which can
actually do constant propagation in some cases. It was happily
propagating constants into the left-hand-side of assignments.
For example,
(assign () (var_ref temp) (constant ...))
would brilliantly be turned into:
(assign () (constant ...) (constant ....))
This is a bigger hammer than necessary - it prevents propagation
into the left-hand-side altogether. We could certainly do better
someday. Notably, the constant propagation pass itself already
takes this approach - it's just the constant propagation pass's
built-in constant folding code (which actually propagates, too)
that was broken.
No change in shader-db, but prevents regressions after future commits.
It seems plausible that this could be hit today, but I haven't seen it
happen.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit acf5444044)
Earlier MSVC 2013 releases have troubles compiling some of our C99 code,
so make sure we have Update 4 to avoid confusion.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 805dbdf06d)
This solves a race condition where we can end up having different stages
stomp on each other because they're all trying to scratch in the same BO
but they have different views of its layout.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c2f2c8e407)
While we're here, we also fixup MEDIA_VFE_STATE and rename the field in
3DSTATE_VS on gen6-7.5 to be consistent with the others.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 45c0f60999)
The pack header generation scripts can't handle the case where you have
two addresses in the same dword; they just take whatever is the last one.
This meant that the MCS address wasn't properly getting handled. Since we
don't care about append counters, we can just re-arrange the XML for now.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 89ded099f8)
ISL was being a bit too clever for its own good and lowering the format for
us. This is all well and good *if* we always want to lower it. However,
the GL driver selectively lowers the format depending on whether the
surface is write-only or not.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d82322eb18)
This field is ignored by the hardware in this case and, on very large 1-D
textures, it can end up being larger than the maximum allowed value.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ce24097abe)
This matches better what happens on gen8 where the "Tiled Surface" and
"Tile Walke" bits are combined into a single two-bit value. This is also
more consistent with what the GL driver does.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f47e23a8b6)
For depth/stencil 1-D textures on SKL, we want them layed out in the old
format that has been used since gen4. In order for the surface state
fill-out code to handle, this it needs to distinguish based on layout
rather than just dimensionality.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5d24e9cfa1)
The docs specify that this only matters for render targets and surfaces
used with typed dataport messages. On some platforms (gen4-6) the Depth
field has more bits than RenderTargetViewExtent so we can have textures
with more levels than we can render to.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 215282c9f4)
According to the PRM, you can't set SurfaceArray for 3D or buffer textures.
There doesn't seem to be a good reason not to set it when we can. On the
other hand, if we don't set it we can end up getting strange results for
1-layer array textures such as textureSize() returning the wrong results.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bb326f7b01)
We already set the bit in the few cases where it's required by the docs so
there's no need to set it all the time. This has no noticable perf impact
for Dota 2 on Vulkan with the time demo I have.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d050ffbce9)
This commit switches clear colors to use #if's instead of a C if. This
lets us properly handle SNB where the clear color field doesn't exist.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 87f0ffa646)
This moves the #if's around so that halign and valign have different sets
of #if conditions. This also prepares us for SNB because isl_to_gen_halign
is not defined at all on gen6.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 62a5e6e031)
While designated initializers are nice, they also force us to put some
things in the initializer and some things later. Surface state setup is
complicated enough that this really hurts readability in the long run.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit caf2af4181)
The PRM states that the values put in Width, Height, and Depth should be
various bits from the value size - 1. We seem to have done this wrong
more-or-less from the start.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2a1cc94d27)
Previously, we were incrementing length but not actually putting anything
in the Y coordinate. This meant that 1-D TXF operations had a garbage
array index. If the surface is emitted as 1-D non-array, the coordinate
gets discarded and it works fine. If it happens to be bound as an array
surface, it may count as an out-of-bounds array access and you get zero.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0195299c86)
The RenderTargetViewExtent field of RENDER_SURFACE_STATE is supposed to be
set to the depth of a 3-D texture when rendering. Unfortunatley, that
field is only 9 bits on Sandy Bridge and prior so we can't actually bind
a 3-D texturing for rendering if it has depth > 512. On Ivy Bridge, this
field was bumpped to 11 bits so we can go all the way up to 2048. On Iron
Lake and prior, we don't support layered rendering and we use OffsetX/Y
hacks to render to particular layers so 2048 is ok there too.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6ba88bce64)
There's special logic around finding gl_FragData. It latches onto any
array with FRAG_RESULT_DATA0. However gl_SecondaryFragDataEXT[], added
by GL_EXT_blend_func_extended, fits those parameters as well. The real
frag data array should have index 0 though, so we can use that to
distinguish them.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96617
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 36ed1b695e)
Ever since c2581a9375, the binding table layout has depended on the
pipeline. This means that whenever we change pipelines we also need to
re-emit binding tables for the new layout.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 35b53c8d47)
SPIR-V treats it as an input but NIR wants the system value. This
shouldn't have been too much of a surprise given that we have to do the
same conversion in the GLSL IR to NIR pass.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 295e03c980)
As far as I can tell, a sequence of glBitmap followed by texture functions
that refer to a texture bound as the framebuffer is well within what should
be allowed.
Found by inspection.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e7fff3cfe1)
In the unlikely case that a program uses glBitmap to render to a framebuffer
whose texture is bound in a compute shader.
Found by inspection.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit c542b7e43d)
Cherryview and Broxton don't support DW x DW multiplication. We have
piles of code to handle this, but apparently weren't retyping in the
immediate case.
For example,
tests/spec/arb_tessellation_shader/execution/dvec3-vs-tcs-tes
makes the simulator angry about instructions such as:
mul(8) r18<1>:D r10.0<8;8,1>:D 0x00000003:D
Just retype to W or UW. It should be safe on all platforms.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit cd89c834a8)
Just setting builder->exact isn't sufficient because that only applies to
instructions that are built with the builder but instructions created
manually and only inserted using the builder are left alone.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bec07b7292)
This pass is similar to propagate_invariance in the GLSL compiler. The
real "output" of this pass is that any algebraic operations which are
eventually consumed by an invariant variable get marked as "exact".
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 202751fbb7)
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original. For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite. However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).
This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2. This is done 6 or 8 times and summed to produce
the final result which is written to a red texture. In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 68e308d853)
Apparently, these are deprecated. There's some AutoUpgrade feature which
is supposed to promote these to cmp/select, which apparently doesn't work
with jit code. It is possible it's not actually even meant to work (see
the bug filed against llvm which couldn't provide an answer neither)
but in any case this is meant to be only temporary unless the intrinsics
are really illegal. So, just use the fallback code (which should be cmp/select,
we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages
to optimize this back to pmin/pmax in the end).
This addresses https://llvm.org/bugs/show_bug.cgi?id=28176
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Aaron Watry <awatry@gmail.com>
(cherry picked from commit b0cf99165a)
This makes the check match up what we do on nv50 as well - there's no
point in switching over the push path if everything's in managed
buffers. This can happen when a shader uses a vertex without an enabled
array - we end up passing it a constant attribute.
This also has the effect of "fixing" some flickering in Talos. I have no
idea why. I've stared at the push logic forwards, backwards, and
sideways. By always forcing the push path (which is slow), the
flickering also goes away, but other rendering is still wrong
(specifically draw 383068 as identified in the bug). However by not
switching over to the push path, draw 383068 is correct.
Note that other flickering remains in Talos, like the red/green
walls/floors. This takes care of the shadow flickering though.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90513
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 154c0a42a2)
If we have a loop, instructions before the tex might be added as tex
uses, and those may in fact dominate all other uses of the tex results.
This however doesn't mean that we don't need a texbar after the tex.
Only check if uses dominate each other they are dominated by the tex.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96565
Fixes: 7752bbc44 (gk104/ir: simplify and fool-proof texbar algorithm)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1804aa0b80)
From the Cherryview's PRM, Volume 7, 3D Media GPGPU Engine, Register Region
Restrictions, page 844:
"When source or destination datatype is 64b or operation is integer DWord
multiply, indirect addressing must not be used."
v2:
- Fix it for Broxton too.
v3:
- Simplify code by using subscript() and not creating a new num_components
variable (Kenneth).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bdab572a86)
From the Cherryview PRM, Volume 7, 3D Media GPGPU Engine,
Register Region Restrictions:
"When source or destination is 64b (...), regioning in Align1
must follow these rules:
1. Source and destination horizontal stride must be aligned to
the same qword.
(...)"
v2:
- Fix it for Broxton too.
v3:
- Remove inst->regs_written change as it is not necessary (Ken)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0177dbb6c2)
We still need to recompile the passthrough shader when this value
changes, as it also affects the output vertex count. But otherwise,
we can eliminate recompiles on Gen8+.
We probably want to do this for Gen7 as well, but that requires
rewriting the input release code to use a loop, which is a trade-off
I'd need to consider in more detail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c319512e16)
Fixes three GL44-CTS.tessellation_shader subtests:
- max_patch_vertices
- single.max_patch_vertices
- tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
These use gl_PatchVerticesIn in the TES, but don't link against a
TCS (which would allow the linker to lower it to a constant). We had
no handling for the system value in the backend, so it would just
assert fail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1bc194cd64)
i965 has no special hardware for this, so we need to pass this value in
as a uniform (unless the TES is linked against a TCS, in which case the
linker can just replace this with a constant).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0be2105137)
We've had some trouble in the past with copying integers around via
float pointers, as the C compiler sometimes uses x87 floating point
registers to load values on 32-bit systems. Passing the
gl_constant_value union should be safer.
To avoid churn, this patch creates a "GLfloat *value" variable so
existing uses can stay the same.
Not observed to fix anything, but I was in the area adding more integer
state vars, and thought it'd be wise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8b408972ff)
When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.
This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a64c7cd2ba)
Back-ported from commit a64c7cd2ba:
- include util/u_format.h
- code was extracted to si_set_shader_image in master, move it back
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
--
src/gallium/drivers/radeonsi/si_descriptors.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1f895caba0)
The images struct is an uninitialized local variable on the stack. If the
callback returns 0, the struct might not have been updated and so should
be considered uninitialized. Currently the code ignores the return value,
which (depending on stack contents) might end up in reading a non-zero
value from images.image_mask and dereferencing further fields.
Another solution would be to initialize image_mask with 0, but checking
the return value seems more sensible and it is what Gallium is doing.
v2: fix typos in commit message,
fix indentation,
remove unnecessary parentheses and pointer dereference to keep line
length reasonable.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e7ab358e81)
This replaces the current bash generator with a python based generator
using mako. It's quite fast and works with both python 2.7 and python
3.5, and should work with 3.3+ and maybe even 3.2.
It produces an almost identical file except for a minor layout changes,
and the addition of a "generated file, do not edit" warning.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 5a87bc7181)
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.
To illustrate suppose we have two graphics IB's 1 and 2, which are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.
The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.
Fix this by always restoring the entire CE RAM.
v2: - Just load all descriptor set buffers instead of load and store the entire
CE RAM.
- Leave the ce_ram_dirty tracking in place for the non-preamble case.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note: This commit differs from the one in master - 54f755fa0f
("radeonsi: Reinitialize all descriptors in CE preamble.")
Pulling DF uniforms from pull constant buffer generates messages like:
send(4) g12<1>DF g12<0,1,0>F
sampler ld SIMD4x2 Surface = 1 Sampler = 0 mlen 1 rlen 1
which produces GPU hangs in Cherryview/Braswell:
"For 64-bit Align1 operation or multiplication of dwords in CHV,
source horizontal stride must be aligned to qword."
This seems to be documented in the Cherryview PRM, Volume 7, Page 843:
"When source or destination datatype is 64b or operation is integer
DWord multiply, regioning in Align1 must follow these rules:
1. Source and Destination horizontal stride must be aligned to the
same qword."
We should set the destination type to UD, D, or F so that
the register stride checker doesn't notice. The destination type of
send messages is basically irrelevant anyway.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a0ed8503b7)
Pulling DF inputs from the URB generates messages like:
send(8) g23<1>DF g1<8,8,1>UD
urb 3 SIMD8 read mlen 1 rlen 2 { align1 1Q };
which makes the simulator angry:
"For 64-bit Align1 operation or multiplication of dwords in CHV,
source horizontal stride must be aligned to qword."
This seems to be documented in the Cherryview PRM, Volume 7, Page 823:
"When source or destination datatype is 64b or operation is integer
DWord multiply, regioning in Align1 must follow these rules:
1. Source and Destination horizontal stride must be aligned to the
same qword."
Setting the source horizontal stride to QWord is insane, as it's the
message header containing 8 URB handles in a single 32-bit DWord.
Instead, we should whack the destination type to UD, D, or F so that
the register stride checker doesn't notice. The destination type of
send messages is basically irrelevant anyway.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit ed3ba651f6)
Cherryview/Broxton annoyingly have a minimum number of VS URB entries
of 34, which is not a multiple of 8. When the VS size is less than 9,
the number of VS entries has to be a multiple of 8.
Notably, BLORP programmed the minimum number of VS URB entries (34), with
a size of 1 (less than 9), which is invalid.
It seemed like this could be a problem in the regular URB code as well,
so I went ahead and updated that to be safe.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 9f37df06da)
Change MESA into Mesa in CL_PLATFORM_VERSION and CL_DEVICE_VERSION. For
both, always append git version suffix from git_sha1.h.
v5: move semicolon to same line as MESA_GIT_SHA1.
v4: drop #ifdef guards.
v3: add missing include.
v2: change CL_DEVICE_VERSION as well.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 4825264f75)
Squashed with commit
clover: Include generated sources in AM_CPPFLAGS
git_sha1.c is generated in $(top_builddir)/src.
Fixes out-of-tree builds since 4825264f75.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96516
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit fafe026dbe)
ISTR having suggested this during review of the recent FP64 changes to
the SIMD lowering pass, but it doesn't look like it was taken into
account in the end. Using the fs_reg::component_size helper instead
of this open-coded variant makes sure that the stride is taken into
account correctly. Fixes at least the following piglit tests with
spilling forced on (since otherwise regs_written would be calculated
incorrectly and the spilling code would be rather confused about how
much data needs to be spilled):
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit bd9f972651)
I haven't found any mention of this in the hardware docs, but
experimentally what seems to be going on is that when the per-thread
scratch slot size is changed between two pipelined draw calls, shader
invocations using the old and new scratch size setting may end up
being executed in parallel, causing their scratch offset calculations
to be based in a different partitioning of the scratch space, which
can cause their thread-local scratch space to overlap leading to
cross-thread scratch corruption.
I've been experimenting with alternative workarounds, like emitting a
PIPE_CONTROL with DC flush and CS stall between draw (or dispatch
compute) calls using different per-thread scratch allocation settings,
or avoiding reuse of the scratch BO if the per-thread scratch
allocation doesn't exactly match the original. Both seem to be as
effective as this workaround, but they have potential performance
implications, while this should be basically for free.
Fixes over 40 failures in our CI system with spilling forced on
(including CTS, dEQP and Piglit failures) on a number of different
platforms from Gen4 to Gen9. The 'glsl-max-varyings' piglit test
seems to be able to reproduce this bug consistently in the vertex
shader on at least Gen4, Gen8 and Gen9 with spilling forced on.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a84b5d43e2)
This will be used to find out what per-thread slot size a previously
allocated scratch BO was used with in order to fix a hardware race
condition without introducing additional stalls or memory allocations.
Instead of calling brw_get_scratch_bo() manually from the various
codegen functions, call a new helper function that keeps track of the
per-thread scratch size and conditionally allocates a larger scratch
BO.
v2: Handle BO allocation manually instead of relying on
brw_get_scratch_bo (Ken).
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d960284e44)
The bitwise arithmetic trick used in brw_get_scratch_size() to clamp
the scratch allocation to 1KB has the unintended side effect that it
will cause us to allocate 2x the required amount of scratch space if
the original per-thread scratch size happened to be already a power of
two. Instead use the obvious MAX2 idiom to clamp the scratch
allocation to the expected range.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 013ae4a70a)
In the Vulkan driver, we have the generation number (a compile time
constant) but not necessarily the brw_device_info struct. I meant
to rework the function to take a generation number instead of a
brw_device_info pointer to accomodate this. But I forgot, and left
it taking a brw_device_info pointer, while making Vulkan pass the
generation number (8, 9, ...) directly. This led to crashes.
Brown paper bag fix for commit 87d062a940.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96504
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 5a0d294d38)
Add guards to prevent dereferencing NULL dynamic pipeline state. Asserts
of pCreateInfo members are moved to the earliest points at which they
should not be NULL.
This fixes a segfault seen in the McNopper demo, VKTS_Example09.
v3 (Jason Ekstrand):
- Fix disabled rasterization check
- Revert opaque detection of color attachment usage
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a4a5917248)
To reduce confusion, clarify that the state being copied is not dynamic.
This agrees with the Vulkan spec's usage of the term. Various sections
specify that the various pipeline state which have VkDynamicState enums
(e.g. viewport, scissor, etc.) may or may not be dynamic.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a0d84a9ef9)
We already check that the address is not "too far", but we should also
clamp the UBO index in order to avoid looking at the wrong place in the
driver cb. This is a pretty rare situation though.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7f257abc1b)
Removes the need to set LIBVA_DRIVER_NAME=gallium for supported targets and is
consistent with vdpau and general gallium drivers.
Note: some versions of libva can detect the gallium name and use the
backend. Although that behaviour seems inconsistent since it only works
for some platforms/backends.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 0c0f841e5d)
When building from a release tarball (where the generated/built files
are in srcdir) in an OOT fashion we need to have both builddir and
srcdir in the includes list.
Otherwise we'll error out, as the file (header gen_knobs.h in this case)
won't be in the location where we are looking.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fcb5a75a66)
With earlier commit we've handled the `make distclean' out of tree
build, yet we failed to attribute that for in-tree builds the test
condition will return 1. Thus effectively the target will be considered
as "failed".
Fixes: b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building
OOT")
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 8229fe68b5)
We were programming the number of threads per subslice, when we should
have been programming the total number of threads on the GPU as a whole.
Thanks to Curro and Jordan for helping track this down!
On Skylake GT3e:
- Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
- Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
- Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.
On Broadwell GT2:
- Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
- Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
- Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
0.255654% (n=25).
On Haswell GT3e:
- Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
by roughly 1.10x.
- Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
- Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
0.432771% (n=64).
On Ivybridge GT2:
- Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
by roughly 1.03x.
- Improves performance in Synmark's G/43CSDof by roughly 1.25x.
- No change in Synmark's Gl43CSCloth (n=28).
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 0fb85ac08d)
I don't know that anything actually guarantees this, but if we exceed
the limits, we may end up overflowing and trashing random buffers that
happen to be nearby in the VMA space, leading to rendering corruption,
hangs, or worse.
We should really fix this properly. However, the pitfall has existed
for ages, so for now we should at least detect it.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 1db37ebecf)
Most scratch stages use power of two sizes, in kilobytes, where
0 means 1kB. But compute shaders on Haswell have a minimum of 2kB,
and use a representation where 0 = 2kB.
This meant that we were effectively telling the hardware to allocate
each thread twice as much space as we meant to, while simultaneously
not allocating that much space in the buffer, leading to overflows.
Note that the existing code is completely wrong for Ivybridge,
but that will take additional work to sort out, so I've left it
as is for now. A subsequent commit will take care of that.
Together with the previous patches, this fixes rendering corruption
on Synmark's Gl43CSDof on Haswell.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 147a90d82a)
Curro figured this out by investigating the simulator. Apparently
there's also a workaround in the Windows driver. I'm not sure it's
actually documented anywhere.
We were underallocating the scratch buffer by a factor of 128/70.
v2: Rename threads_per_subslice to scratch_ids_per_subslice
(suggested by Jordan Justen).
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a7d029d3df)
We were allocating enough space for the number of threads per subslice,
when we should have been allocating space for the number of threads in
the entire GPU.
Even though we currently run with a reduced thread count (due to a bug),
we might still overflow the scratch buffer because the address
calculation is based on the FFTID, which can depend on exactly which
threads, EUs, and threads are executing. We need to allocate enough
for every possible thread that could run.
Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+.
Earlier platforms need additional bug fixes.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 2213ffdb4b)
This was fixed in revision 47 of the ARB_dsa spec in Oct 22, 2015. Since
it's horrible to have differing APIs across library versions, we should
attempt to minimize the impact by backporting it as far as possible and
hope no one notices.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7d7e015381)
This brings in the fixed glClearNamedFramebufferfi definition, as well
as a lot of GLsizei -> GLsizeiptr changes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 92351a71a8)
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."
So, the rectangles sharing just an edge shouldn't overlap.
-----------
| |
------- ---
| | |
| | |
------- ---
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 466b320163)
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."
So, the rectangles sharing just an edge shouldn't overlap.
-----------
| |
------- ---
| | |
| | |
------- ---
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit f8679badd4)
This reworks the #if guards a bit. When Emil originally wrote them, he
just guarded everything. However, part of what anv_entrypoints_gen.py
generates is a hash table for looking up entrypoints based on their name.
This table *cannot* get out of sync between C and python regardless of
preprocessor flags. In order to prevent this, this commit makes us use
void pointers in the dispatch table for those entrypoints which aren't
available. This means that the dispatch table size and entry order is
constant and it should never get out-of-sync with the python.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8d37556ec9)
When an applications specifies mip levels _before_ setting a mipmap texture
filter, we will initially guess a single texture level. When the second level
image is created, we try to allocate the full texture -- however, we get the
base level size guess wrong if that size is odd. This leads to yet another
re-allocation of the texture later during st_finalize_texture.
Even worse, this re-allocation breaks a (reasonable) assumption made by
st_generate_mipmaps, because the re-allocation in the finalization call will
again allocate a single-level pipe texture (based on the non-mipmap texture
filter!). As a result, mipmap generation fails in interesting ways.
All of this can be avoided by just using the fact that we already know the
size of the base level.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95529
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 42624ea837)
At this point, the limits are probably more-or-less correct. If there is
an invalid limit, that's a bug not a FINSHME.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a1e69930e4)
This way the the bind map (which we're caching) is mostly independent of
the pipeline layout. The only coupling remaining is that we pull the array
size of a binding out of the layout. However, that size is also specified
in the shader and should always match so it's not really coupled. This
rendering issues in Dota 2.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a1a25db699)
Since applications are allowed to specify some set of bindings which need
not be dense they also need not be in order. For most things, this doesn't
matter, but it could result getting the wrong dynamic offsets. This adds a
quick-and-dirty sort to ensure that everything is always in increasing
order of binding index.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c13c5ac561)
With glx of gstreamer-vaapi, the temporary pixmap for front buffer gets
renewed in each frame, so when we receive a new pixmap, should get a new
front buffer for it.
This also fixes Totem player playback corruption.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2ad443e4cc)
From original commit, the macro "if HAVE_DRI3" was in Makefile.sources,
this file is shared with SCons, SCons is not able to parse this marco,
the SCons build failed. Jose quickly gave two approaches and quick fix
with his second approach, thanks Jose for the solutions and fixes.
This patch is Jose's first approach, and it's more proper, because the
dri3 c file should not be included to build when DRI3 is not enabled.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0ef8500aab)
GLX documentation states:
glXCreateNewContext can generate the following errors: (...)
GLXBadFBConfig if config is not a valid GLXFBConfig
Function checks if the given config is a valid config and sets proper
error code.
Fixes currently crashing glx-fbconfig-bad Piglit test.
v2: coding style cleanups (Emil, Topi)
use DefaultScreen macro (Emil)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cf804b4455)
When Kristian implemented GL_TEXTURE_EXTERNAL_OES, he hooked it up for gen8
but not for gen7 or earlier. It all works, we just need to emit the states
for the extra planes.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 037ce5d734)
When calling virgl_fence_wait() with timeout=0,
virgl_{drm,vtest}_resource_is_busy() is called. However, it returns TRUE
for a busy resource, whereace virgl_fence_wait() should return TRUE for
a completed (non-busy) resource.
This fixes running supertuxkart in a VM (I could not reproduce locally
with vtest though there is a similar fix)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit dc81b3ad43)
The width0/height0/depth0 on stObj may not have been set at this point.
Observed in a trace that set up levels 2..9 of a 2d texture, and set the base
level to 2, with height 1. This made the guess logic always bail.
Originally investigated by Ilia Mirkin, this patch gets rid of the somewhat
redundant storage of width0/height0/depth0 and makes sure we always compute
pipe texture sizes that are compatible with the base level image of the
GL texture.
Fixes the gl-1.2-texture-base-level piglit test provided by Brian Paul.
v2:
- try to re-use an existing pipe texture when possible
- handle a corner case where the base level is not level 0 and it is of
size 1x1x1
v3:
- ptHeight = ptWidth in cube map 1x1 case (suggested by Brian)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit bd5c41fe5f)
We were previously unconditionally doing this for arrays and ubo's, and
ignoring texture/storage/atomic buffers. Instead use the usage history
to determine which atoms need to be revalidated.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6e6fd911da)
R9G9B9E5 is the only uncompressed one hopefully.
This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit d4d733e39d)
A texture may be redefined with _NEW_TEXTURE, which might have been
bound to a shader image slot. We have to revalidate the image atoms to
pick up on the new resource.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c81b090c92)
Sometimes a register source can actually be double- or even quad-wide.
We must make sure that the inserted texbars take that width into
account.
Based on an earlier patch by Samuel Pitoiset.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 71ad8a173f)
Like floats, we should use the round toward 0 mode instead of the
nearest one (which is the default) for doubles to integers.
This fixes all arb_gpu_shader_fp64 piglits which convert doubles to
integers (16 tests).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 08ddfe7b2f)
The GL4.5 spec quote seems clear on this:
"The value -1 will be returned by either command if an error occurs,
if name does not identify an active variable on programInterface,
or if name identifies an active variable that does not have a valid
location assigned, as described above."
This fixes:
GL45-CTS.program_interface_query.output-built-in
[airlied: use _mesa_program_resource_location_index as
suggested by Eduardo]
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 07403014c3)
When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.
const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2d6f82a294)
Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cb30727648)
From GLSL 4.5 spec, "4.4.2.3 Geometry Outputs".
"all geometry shader output vertex count declarations in a
program must declare the same count."
Fixes:
GL45-CTS.geometry_shader.output.conflicted_output_vertices_max
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4c86399378)
For 3D textures we shouldn't be using NumLayers, we need
to get it from the depth.
This fixes:
GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ff2e569153)
With tessellation shaders we can have cases where we have
arrays of anon structs, so make sure we match using without_array().
Fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in
v2:
test lengths match as well (Ilia)
v3:
descend array lengths to check for matches as well (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 1f66a4b689)
This fixes a crash in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
If we can't find the func_name in one of these paths,
we have emitted an earlier error so just return here.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6702c15810)
GL43-CTS.compute_shader.work-group-size does
uniform uint g_uniform[gl_WorkGroupSize.z + 20] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 };
The initializer triggers the GLSL 4.30/GLES3 tests
for constant sequence subexpressions, so it doesn't
happen unless you are using those, so just return
false as this path is now reachable.
v2: update commit msg with diagnosis
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4336196b7f)
Images invalidation is a bit weird on Fermi and there is already a hack
which forces invalidating all images when launching a computer shader
to help in fixing 3D<->CP interaction.
However, we need to re-validate images for compute because
nvc0_compute_invalidate_surfaces() will destroy the previous binding.
This is not really good for performance purposes but this might be
improved later.
This fixes the following piglits:
- spec/arb_compute_shader/execution/basic-uniform-access
- spec/arb_compute_shader/execution/mutiple-texture-reading
- spec/arb_compute_shader/execution/multiple-workgroups
- spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 43d3ecfb33)
We would revalidate images when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the images.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit fd6bbc2ee2)
We would revalidate buffers when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the SSBOs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0f673db6f0)
When upscaling you can end up interpolating between the edge pixel and one
past the edge. Using CLAMP_TO_EDGE seems like the most reasonable thing to
do in this case. This fixes two of the new Vulkan CTS tests in
dEQP-VK.api.copy_and_blit.blit_image.*
Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 441194edd9)
Getting rid of the default case makes the compiler warn if we are missing
cases. While we're here, we also add the one missing case.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 526a8de22d)
glslang frequently throw bogus decorations into shaders. While we are free
to assert-fail, it's a bit nicer to the application to just warn.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 62c6e94bd6)
Previously we supported a subset of capabilities and just left a default
case for the others. It's time to stop being lazy and actually audit the
capabilities. This should bring them up-to-date with reality.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5a1e56f344)
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3fb289f957)
This reverts commit c1107cec44.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode. Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):
es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_float_frag_xvary
es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 7244dc1e06)
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction. This prevents
cmod propagation from (mis)optimizing:
cmp.z.f0 tmp, ...
mov.z.f0 null, tmp
into:
cmp.z.f0 tmp, ...
which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit a2135c6fd9)
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.
v2: Don't handle partial source-destination overlap for simplicity
(Jason). No instruction count regressions with respect to v1 in
either shader-db or the few FP64 shader_runner test-cases with
partial overlap I've checked manually.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 7aa76d66a1)
The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.
This fixes:
GL45-CTS.copy_image.non_existent_mipmap
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit af7bf610cf)
This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.
This fixes:
GL45-CTS.copy_image.invalid_target
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c0856eacf1)
These two lines have been here since the file was created.
I'm guessing the second one was just for testing during dev, so it's the
one that's going away.
CoverityID: 1296205
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 17f4c723eb)
In the case of out-of-tree (OOT) builds, in particular when building
from tarball, we'll end up with the file in both srcdir and builddir.
We want the former to remain intact (since we need it on rebuild) while
the latter should be removed otherwise `make distclean' gets angry at
us.
Ideally there'll be a solution that feels a bit less of a hack. Until
then this does the job exactly as expected.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b7f7ec7843)
... when copied from git_sha1.h.
As the latter file can we lacking the write attribute, one should set it
explicitly. Otherwise we'll get a warning/failure at cleanup stage.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 2c424e00c3)
Otherwise the build will assume that we've talking about builddir, which
is not the case in the else statement.
Here the file is already generated and is part of the tarball.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 359d9dfec3)
With earlier commit we introduced support for render_node devices, which
was couples with the use of the image loader extension.
As the work was inspired by egl/wayland we (erroneously) added the
extension for the !render_node path as well.
That works for wayland, as the implementations of the DRI2 and IMAGE
loader extensions converge behind the scenes. As that is not yet
the case for Android we shouldn't expose the extension.
Fixes: 34ddef39ce ("egl: android: add dma-buf fd support")
Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 1816c837c1)
The generated sources should follow the example set by the vulkan
headers and our non-generated code. Namely: the code for all supported
platforms should be available, each one guarded by its respective
VK_USE_PLATFORM_*_KHR macro.
v2: Reword commit message.
Cc: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96285
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1 over IRC)
(cherry picked from commit b8e1f59d62)
isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.
Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}
Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 278c2212ac)
Fixes the following building error:
target C++: libmesa_glsl <= external/mesa/src/compiler/glsl/glsl_to_nir.cpp
In file included from external/mesa/src/compiler/glsl/glsl_to_nir.h:28:0,
from external/mesa/src/compiler/glsl/glsl_to_nir.cpp:28:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
compilation terminated.
build/core/binary.mk:432: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 4143245c23)
Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
the actual dependency list less obvious.
v2: Drop unrelated vulkan hunk (Jason).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit af1a0ae8ce)
With earlier commit 3689ef32af ("automake: rework the git_sha1.h rule,
include in tarball") we/I erroneously removed the PHONY rule and the
temporary file.
The former is used to ensure that the header is regenerated when on each
make invocation, while the latter helps us avoid the unneeded rebuild(s)
when the SHA1 hasn't changed.
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit af2637aa32)
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.
The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 0a3acff5b5)
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.
Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.
To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.
v4:
* Minimize size of patch that switches from the old local ID layout
to the new layout (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit b1f22c6317)
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 3ba9594f32)
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 30685392e0)
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.
When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.
The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.
For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.
v4:
* Support older local ID push constant layout as well. (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit d437798ace)
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.
We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.
We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.
v2:
* Create variable during lowering pass. (Ken)
v3:
* Don't create a variable, but instead just insert an intrisic call
to load a uniform from the allocated location. (Jason)
v4:
* Don't run this pass if thread_local_id_index < 0
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 3ef0957dac)
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.
It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.
The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)
v2:
* Add variable in intrinsics lowering pass
* Make sure the ID is pushed last in assign_constant_locations, and
that we save a spot for the ID in the push constants
v3:
* Simplify code based with Jason's suggestions.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 04fc72501a)
v4:
* Force thread_local_id_index to -1 for now, and have
fs_visitor::setup_cs_payload look at thread_local_id_index. This
enables us to more easily cut over from the old local ID layout to
the new layout, as suggested by Jason.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit fa279dfbf0)
Isolines aren't reversed. commit 5b2d8c2273 fixed this for the vec4
TES backend, but not the scalar one.
Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 25e1b8d366)
The string "[0]\0" is the same as "[0]" as far as the C string datatype
is concerned. That string has length 3. strncmp(s, length_3_string, 4)
is the same as strcmp(s, length_3_string), so make it be strcmp.
v2: Not the same as strncmp(..., 3). Noticed by Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 049bb94d2e)
With the introduction of fp64 and fp16 to nir, there are now a bunch of
float types running around. A F1 2015 shader ends up with an i2f.sat
operation, which has a nir_type_float32 destination. Allow sat on all
the float destination types.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit ca135a2612)
The initial ARB_sampler_objects spec had GL_INVALID_VALUE in it,
however version 8 of it fixed this, and the GL specs also have
the fixed value in them.
Fixes:
GL45-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6400144041)
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.
Fortunately none of tracked benchmarks showed a regression with
this.
v2 (Matt): Add missing space
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 30e9e6bd07)
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants. We at least
need to initialize them to zero. We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).
Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.
(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit a3dc99f3d4)
The driver was adding the skip components but always for buffer 0.
This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ebb81cd683)
e2791b38b4
mesa/program_interface_query: fix transform feedback varyings.
caused a regression in
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_multiple_streams
on radeonsi.
The problem was it was using the skip components varying to set
the stream id, when it should wait until a varying was written,
this just adds the varying checks in the right place.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 1fe7bbb911)
According to GL4.5 spec:
An INVALID_OPERATION error is generated if any part of the speci-
fied buffer range is mapped with MapBufferRange or MapBuffer (see sec-
tion 6.3), unless it was mapped with MAP_PERSISTENT_BIT set in the Map-
BufferRange access flags.
So we should use the if range is mapped path.
This fixes:
GL45-CTS.buffer_storage.map_persistent_buffer_sub_data
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0, 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e891f7cf55)
This really only hits for bitsets with a size of a multiple of 32. We
can end up with pos = -1 as a result of the ffs, which we in turn decide
is a valid position (since we fall through the loop and i == 1, we end
up adding 32 to it, so end up returning 31 again).
Up until recently this was largely unreachable, as the register file
sizes were all 63 or 255. However with the advent of compute shaders
which can restrict the number of registers, this can now happen.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 18d11c9989)
This reverts commit aac90ba292.
The commit caused a regression in:
piglit.spec.glsl-1_50.compiler.gs-input-nonarray-named-block.geom
Also the CTS test it was meant to fix seems like it may be bogus.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 98d40b4d11)
I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:
"The supported regioning modes for math instructions are align16,
align1 with the following restrictions:
- Scalar source is supported.
[...]
- Source and destination offset must be the same, except the case of
scalar source."
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit c1107cec44)
This is the case for SNB math instructions so we need to be careful
and insert the literal value of the immediate into the table (rather
than its absolute value) if the instruction is unable to invert the
sign of the constant on the fly.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 06d8765bc0)
Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already. The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.
Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa5 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 4fe4f6e8a7)
This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 1898673f58)
This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction. In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 485fbaff03)
Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ. Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 4b0ec9f475)
Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit bb61e24787)
Now there are not files that require python 3, so for now just remove
the python 3 dependency and use python 2. I think the right plan is to
just get all of the python ready for python 3, and then use whatever
python is available.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 604010a7ed)
The CTS test:
GL45-CTS.multi_bind.dispatch_bind_image_textures
binds 192 image uniforms, we reject this later,
but not until after we trash the contents of the
struct gl_shader.
Error now reads:
Too many compute shader image uniforms (192 > 16)
instead of
Too many compute shader image uniforms (2745344416 > 16)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f87352d769)
# Bring artifacts back from the NFS dir to the build dir where gitlab-runner
# will look for them.
cp -Rp /nfs/results/. results/
if[ -f "${STRUCTURED_LOG_FILE}"];then
cp -p ${STRUCTURED_LOG_FILE} results/
echo"Structured log file is available at https://${CI_PROJECT_ROOT_NAMESPACE}.pages.freedesktop.org/-/${CI_PROJECT_NAME}/-/jobs/${CI_JOB_ID}/artifacts/results/${STRUCTURED_LOG_FILE}"
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.