extra: The commit references a previous commit in which the changes
should have been included but, as clarified by the developer, it is
not needed for stable.
Signed-off-by: Andres Gomez <agomez@igalia.com>
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.
That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).
If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.
Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
(cherry picked from commit 8620f7ebbc)
This function is only used in two places:
1. VMware driver, but only for HUD reporting
2. st/nine state tracker, used for texture memory accounting
Fixes: a69efa9482 ("util: add new util_resource_size() function in
u_resource.[ch]")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit dde8309cde)
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045fee), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).
v2: set it in all shader stages.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3835009796)
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit aab0bfc648)
VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.
The fix is also verified on VCE physical mode card.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 6d74cb2570)
(cherry picked from commit 6a353479a7)
Squashed with:
util: Use preprocessor correctly
Fixes: 6a353479a7 ("util: Assume little endian in the absence of
platform-specific handling")
(cherry picked from commit b8cbad624b)
Squashed with:
util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC
MSVC doesn't support #warning?! Getting really tired of this.
(cherry picked from commit 676761252b)
Squashed with:
util: Also include endian.h on cygwin
If u_endian.h can't determine the endianess, the default behaviour in sha1.c
is to build for big-endian
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 2c62ccb10a)
>From GLSL 4.5 spec, section "7.1 Built-In Language Variables", page 130 of
the PDF states:
"If multiple shaders using members of a built-in block belonging to
the same interface are linked together in the same program, they must
all redeclare the built-in block in the same way, as described in
section 4.3.9 “Interface Blocks” for interface-block matching, or a
link-time error will result."
Fixes:
* GL45-CTS.CommonBugs.CommonBug_PerVertexValidation
v2 (Neil Roberts):
Explicitly look for gl_PerVertex in the symbol tables instead of
waiting to find a variable in the interface.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102677
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit f9de7f5596)
This effectively factorizes a couple of similar routines.
v2 (Neil Roberts): Non-trivial rebase on master
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit f5fe99ac85)
Some symbols gathered in the symbols table during parsing are needed
later for the compile and link stages, so they are moved along the
process. Currently, only functions and non-temporary variables are
copied between symbol tables. However, the built-in gl_PerVertex
interface blocks are also needed during the linking stage (the last
step), to match re-declared blocks of inter-stage shaders.
This patch adds a new utility function that will factorize current code
that copies functions and variables between two symbol tables, and in
addition will copy explicitly declared gl_PerVertex blocks too.
The function will be used in a subsequent patch.
v2 (Neil Roberts):
Allow the src symbol table to be NULL and explicitly copy the
gl_PerVertex symbols in case they are not referenced in the exec_list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit 4c62a270a9)
The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.
So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit ca63a5ed3e)
The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):
"For all of the interpolation functions, interpolant must be an l-value
from an in declaration; this can include a variable, a block or
structure member, an array element, or some combination of these.
Component selection operators (e.g., .xy) may be used when specifying
interpolant."
For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.
v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
(the ES WG decided that the relaxed rules may apply in a future version
but not retroactively; see also
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 4f42450b86)
GLSL shaders can access the normal scale factor with the built-in
gl_NormalScale. Mesa's modelspace lighting optimization uses a different
normal scale factor than defined in the spec. We have to take care not
to use this factor for gl_NormalScale.
Mesa already defines two seperate states: state.normalScale and
state.internal.normalScale. The first is used by the glsl compiler
while the later is used by the fixed function T&L pipeline. Previously
the only difference was some component swizzling. With this commit
state.normalScale always uses the normal scale factor for eyespace
lighting.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit c3ee464d7a)
spotExponent and spotCosCutoff were swapped in the
gl_builtin_uniform_element struct.
Now the order matches across gl_builtin_uniform_element,
glsl_struct_field and the spec.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 9bdb5457f4)
We failed to take the start into account for how many vertices to draw in
this round, so we would end up decrementing count below 0, which as an
unsigned number meant we would loop until the CLs soon ran out of space.
When I wrote the code I was thinking about how to use the previously
emitted shader state (no index bias baked into the elements) by emitting
up to 65535 and then only re-emitting with bias for the second wround, but
that doesn't work if the start is over 65535. Instead, just delay
emitting shader state until we get into the drawarrays GFXH-515 loop and
always bake the bias in when we're doing the workaround.
(cherry picked from commit 84ab48c15c)
The picture_id was assumed to be a frame number so in 0-31.
But the vaapi client gstreamer-vaapi uses the surfaces handles
as identifier which are unsigned int.
This bug can happen when using a lot of vaapi surfaces within
the same process. Indeed Mesa/st/va increments a counter for the
surface ID: mesa/util/u_handle_table.c::handle_table_add which
starts from 0 and incremented by 1 at each call.
So creating more than 32 surfaces was a problem.
The following bug contains a test that reproduces the problem
by running a couple of vaapih264enc in the same process. The
above also explains why there was no pb when running them in
separated processes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102006
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Tested-by: Tomas Rataj <rataj28@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-and-tested-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
(cherry picked from commit 91d93aa621)
64-bit operations on Atom parts have additional restrictions over their
big-core counterparts (validated by later patches).
Specifically, the restriction that "Source and Destination horizontal
stride must be aligned to the same qword" is violated by most shift
operations since NIR uses a 32-bit value as the shift count argument,
and this causes instructions like
shl(8) g19<1>Q g5<4,4,1>Q g23<4,4,1>UD
where src1 has a 32-bit stride, but the dest and src0 have a 64-bit
stride.
This caused ~4 pixels in the ARB_shader_ballot piglit test
fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no
ARB_gpu_shader_int64 test hit this case because they operate on
uniforms, and their scalar regions are an exception to the restriction.
We work around this by effectively unpacking the shift count, so that we
can read it with a 64-bit stride in the shift instruction. Unfortunately
the unpack (a MOV with a dst stride of 2) is a partial write, and cannot
be copy-propagated or CSE'd.
Bugzilla: https://bugs.freedesktop.org/101984
(cherry picked from commit b541945c20)
Swr caches fb contents in tiles. Those tiles are stored on a per-context
basis.
When switching contexts that share resources we need to make sure that
the tiles of the old context are being stored and the tiles of the new
context are being invalidated (marked as invalid, hence contents need
to be reloaded).
The context does not get any dirty bits to identify this case. This has
to be, then, coordinated by the resources that are being shared between
the contexts.
Add a "curr_pipe" hook in swr_resource that will allow us to identify a
MakeCurrent of the above form during swr_update_derived(). At that time,
we invalidate the tiles of the new context. The old context, will need to
have already store its tiles by that time, which happens during glFlush().
glFlush() is being called at the beginning of MakeCurrent.
So, the sequence of operations is:
- At the beginning of glXMakeCurrent(), glFlush() will store the tiles
of all bound surfaces of the old context.
- After the store, a fence will guarantee that the all tile store make
it to the surface
- During swr_update_derived(), when we validate the new context, we check
all resources to see what changed, and if so, we invalidate the
current tiles.
Fixes rendering problems with CEI/Ensight.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit b9aa0fa7d6)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103732
This fixes a crash in:
KHR-GL45.enhanced_layouts.xfb_block_stride
Fixes: 0822517936 "glsl: add helper to process xfb qualifiers during linking"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d53ccccb2)
The flag is on the pipe_resource, not the r600_resource.
I don't see an obvious bug related to this, but it could potentially lead
to suboptimal placement of some resources.
Fixes: a41587433c ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 5e2962c949)
deviceName string is declared, assigned and freed but actually
never used in dri3_create_screen() function.
Fixes: 2d94601582 ("Add DRI3+Present loader")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit d555929239)
Fix incomplete check of input params in blorp_surf_convert_to_uncompressed()
which can lead to NULL pointer dereferencing.
Fixes: 5ae8043fed ("intel/blorp: Add an entrypoint for doing
bit-for-bit copies")
Fixes: f395d0abc8 ("intel/blorp: Internally expose
surf_convert_to_uncompressed")
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit cdb3eb7174)
[Emil Velikov: drop non-applicable x/y hunk]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/intel/blorp/blorp_blit.c
We want to program the 3DSTATE_RASTER field to the gl_context value,
not the other way around.
Fixes: 13ac46557a (i965: Port Gen8+ 3DSTATE_RASTER state to genxml.)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 760e0156df)
nir_validate.c's #endif already had the correct NDEBUG comment
Fixes: dcb1acdea0 "nir/validate: Only build in debug mode"
Fixes: 9ff71b649b "i965/nir: Validate that NIR passes call nir_metadata_preserve()"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 7b85b9b877)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/compiler/nir/nir.h
The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.
The following program
unsigned char array[4] = {1, 2, 3, 4};
int *ptr = (int *)array;
for (int i = 0; i < 4; i++) {
printf("%02x", array[i]);
}
printf("\n");
printf("%08x\n", *ptr);
prints
01020304
04030201
on little endian, and
01020304
01020304
on big endian.
On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.
To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.
Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab ("glsl: Add initial functions to implement an
on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c690a7a8cd)
The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.
Fixes: d1efa09d34 ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 532674303a)
This helps avoid compiler warningss in the next commit - everything
was initialized, but it wasn't obvious to static analysis.
Suggested-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit d6d16c0218)
If an array is accessed within an if block, then currently it is not known
whether the value in the address register is involved in the evaluation of the
if condition, and converting the if condition may actually result in
out-of-bounds array access. Consequently, if blocks that contain indirect array
access should not be converted.
Fixes piglits on r600/BARTS:
spec/glsl-1.10/execution/variable-indexing/
vs-output-array-float-index-wr
vs-output-array-vec3-index-wr
vs-output-array-vec4-index-wr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6c268ea79a)
This fixes hangs on cayman with
tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test
This has a single if/else in it, and when this peephole activated,
it would set the jump target to NULL if there was no instruction
after the final POP. This adds a NOP if we get a jump in this case,
and seems to fix the hangs, so we have a valid target for the ELSE
instruction to go to, instead of 0 (which causes infinite loops).
v2: update last_cf correctly. (I had some other patches hide this)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 579ec9c311)
When floating point textures are created on OpenGL ES 2.0, driver
is free to choose used internal format. Mesa makes this decision in
adjust_for_oes_float_texture. Error checking for glTexImage2D properly
checks that sized formats are not used. We use same error checking
path for glTexSubImage2D (since there is lot of overlap), however since
those checks include internalFormat checks, we need to pass original
internalFormat passed by the client. Patch adds oes_float_internal_format
that does reverse adjust_for_oes_float_texture to get that format.
Fixes following test failure:
ES2-CTS.gtf.GL2ExtensionTests.texture_float.texture_float
(when running test with MESA_GLES_VERSION_OVERRIDE=2.0)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103227
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 1e508e10d9)
So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END,
CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction
was added to add the EOP flag, even though this is not actually
needed, because all these instrutions support the EOP flag.
This patch removes the fixup code, adds setting the EOP flag for the
according instructions as well as others like CF_OP_TEX and CF_OP_VTX,
and adds writing out EOP for this type of instruction in the disassembler.
This also fixes a bug where shaders were created that didn't actually have
the EOP flag set in the last CF instruction, which might have resulted
in GPU lockups.
[airlied: cleaned up a little]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 1d076aafbc)
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.
In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.
While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.
Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.
Reported-by: Stephane Chevigny <stephane.chevigny@polymtl.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4d24a7cb97)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/compiler/glsl/meson.build
generate_array_index fails to check whether the target of a subroutine
call exists in the AST, potentially passing around null ir_rvalue
pointers eventuating in abort/segfault.
Fixes: fd01840c0b ("glsl: add AoA support to subroutines")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438
(cherry picked from commit f09c2cefdd)
We want to emit invariant state at the start of a render batch. In the
past, this more or less happened: a new batch flagged BRW_NEW_CONTEXT
(because we don't have hardware contexts), which triggered the
brw_invariant_state atom. So, it would be emitted before any 3D
drawing. (Technically, there might be some BLT commands in the batch
because Gen4-5 have a single combined render/BLT ring, but that should
be harmless).
With the advent of BLORP, this broke. The first item in a batch might
be a BLORP operation, which bypasses the normal draw upload path. So,
we need to ensure invariant state happens first. To do that, we just
upload it when creating a new batch. On Gen6+ we'd need to worry about
whether it's a RENDER or BLT batch, but because we have a combined ring,
this approach should work fine on Gen4-5.
Seems to fix GPU hangs when playing hardware accelerated video with
mpv -hwdec=vaapi on Ironlake.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103529
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 8f91aa35a5)
...and provide a better citation for the existing one.
v2:
- Apply the workaround to Gen8 too, as intended (caught by Topi).
- Restructure to add bits instead of an extra flush (based on a similar
patch by Rafael Antognolli).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 8d48671492)
[Andres Gomez: brw->gen not yet dropped in favor of devinfo->gen]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_pipe_control.c
Squashed with:
i965: Revert Gen8 aspect of VF PIPE_CONTROL workaround.
This apparently causes hangs on Broadwell, so let's back it out for now.
I think there are other PIPE_CONTROL workarounds that we're missing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103787
(cherry picked from commit a01ba366e0)
[Andres Gomez: brw->gen not yet dropped in favor of devinfo->gen]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_pipe_control.c
Fixes the following tests on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
(cherry picked from commit cfcfa0b9cd)
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.
For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.
Fixes the following test on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 6ac2d16901)
Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1dc45d75bb)
[Andres Gomez: brw->gen not yet dropped in favor of devinfo->gen]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_pipe_control.c
src/mesa/drivers/dri/i965/intel_blit.c
Otherwise we leak it.
Fixes: eaa56eab6d "radv: initial support for shared semaphores (v2)"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 7c25578863)
As per radeonsi, the tess factor components for isolines
are reversed.
Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f3f8615d76)
We renamed "Function Enable" to "Enable", which broke our detection
of whether shaders are enabled or not. So, we'd see a bunch of HS/DS
packets with program offsets of 0, and think that was a valid TCS/TES.
Fixes: c032cae9ff (genxml: Rename "Function Enable" to "Enable".)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9a0465b3a3)
The L3 configuration code already considers the TCS and TES programs,
but failed to listen for TCS/TES program changes.
This was somehow missing.
Fixes: e9644cb1f9 ("i965: Consider tessellation in get_pipeline_state_l3_weights.")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit b8d42cccd0)
These flags are set for C sources, but not C++. This causes symbol
visibility leaks from the C++ parts of the Intel compiler.
Fixes: 700bebb958 ("i965: Move the back-end compiler to src/intel/compiler")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 854455498c)
Gather operations in both GLSL and SPIR-V require a sampler. Fixes
gathers returning garbage when using separate texture/samplers (on AMD,
was using an invalid sampler descriptor).
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 4122d00846)
We should use the result type of the OpSampledImage opcode, rather than
the type of the underlying image/samplers.
This resolves an issue when using separate images and shadow samplers
with glslang. Example:
layout (...) uniform samplerShadow s0;
layout (...) uniform texture2D res0;
...
float result = textureLod(sampler2DShadow(res0, s0), uv, 0);
For this, for the combined OpSampledImage, the type of the base image
was being used (which does not have the Depth flag set, whereas the
result type does), therefore it was not being recognised as a shadow
sampler. This led to the wrong LLVM intrinsics being emitted by RADV.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit e9eb3c4753)
Targets such as omx and va can work w/o anything X related. Mandate the
xcb* dependencies only when the X11 platform is selected.
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: 63e11ac2b5 ("configure: error out if building VA w/o supported
platform")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
(cherry picked from commit 85a017230c)
Currently we error out when building GLVND w/o GLX.
That was the original premice before we had EGL. As the commit says,
that error should be reworked to honour both - do so.
v2: Drop noop *);; (Eric)
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
(cherry picked from commit b4967561c0)
The GL spec will soon be revised to clarify that a buffer binding for
a transform feedback buffer is only required if a variable is actually
defined to use the buffer binding point. Previously a declaration for
the default transform buffer would make it require a binding even if
nothing was declared to use the default buffer.
Affects:
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list_and_api
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4dc8458cd1)
When I introduced gl_shader_program_data one of the intentions was to
fix a bug where a failed linking attempt freed data required by a
currently active program. However I seem to have failed to finish
hooking up the final steps required to have the data hang around.
Here we create a fresh instance of gl_shader_program_data every
time we link. gl_program has a reference to gl_shader_program_data
so it will be freed once the program is no longer active.
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102177
(cherry picked from commit 6a72eba755)
This has a bit of a surprising effect:
For the render pipeline, the upload_sampler_state_table atom emits
3DSTATE_BINDING_TABLE_POINTERS_XS. It tries to avoid this for compute:
if (GEN_GEN >= 7 && stage_state->stage != MESA_SHADER_COMPUTE) {
/* Emit a 3DSTATE_SAMPLER_STATE_POINTERS_XS packet. */
genX(emit_sampler_state_pointers_xs)(brw, stage_state);
} ...
However, we were failing to initialize brw->cs.base.stage, so it was
left as 0 (MESA_SHADER_VERTEX), causing this condition to break. We
then emitted 3DSTATE_SAMPLER_STATE_POINTERS_VS in GPGPU mode, when
trying to upload CS samplers. Nothing good can come of this.
Found by inspection while debugging a GPU hang. Jordan believes this
helps the Deus Ex: Mankind Divided benchmark mode's stability when
running with shader cache.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a16dc04ad5)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_context.c
Use $(sysconfdir) instead of hardcoding /etc.
While the OpenCL spec expects the file in /etc, people building their
stack can override that, esp. !Linux users.
Furthermore this removes a fundamental violation, which results in the
system file being overwritten even as one explicitly sets --prefix
and/or DESTDIR.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-By: Aaron Watry <awatry@gmail.com>
(cherry picked from commit 0cd0958544)
Originally we tried to handle this case based on slots_valid. However,
there are a number of ways that this can go wrong. For one, we throw
away any trailing slots which either aren't written or are set to
VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not
actually write anything there. Between the lot of these, it was
possible to end up in a case where we tried to do a regular URB write
but ended up with a length of 1 which is invalid. This commit moves it
to the end and makes it based on a new boolean flag urb_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7a82ad54bb)
This isn't often a problem , when we're in a compute shader, we must
push the thread local ID so we decrement the amount of available push
space by 1 and it's no longer even and 64-bit data can, in theory, span
it. By marking those uniforms contiguous, we ensure that they never get
split in half between push and pull constants.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 25f7453c9e)
The same workaround we need for 64-bit values on little core also takes
care of the Ivy Bridge problem and does so a bit more efficiently so we
can drop that code while we're here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fd1bcccc2d)
For some reason, the any/all predicates don't work properly with SIMD32.
In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read
the correct subset of the flag register and you end up getting garbage
in the second half. Work around this by using a pair of 1-wide MOVs and
scattering the result. This fixes the any/all instructions for SIMD32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1b8ef49f48)
The any/all intrinsics return a boolean value so D or UD is the correct
type. Unfortunately, get_nir_dest has the annoying behavior of
returnning a float type by default. This causes format conversion which
gives us -1.0f or 0.0f in the register. If the consumer of the result
does an integer comparison to zero, it will give you the right boolean
value but if we do something more clever based on the 0/~0 assumption
for booleans, this will give the wrong value.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1f41663007)
We have ANY/ALL32 predicates and, for the most part, they work just
fine. (See the next commit for more details.) Also, due to the way
that flag registers are handled in hardware, instruction splitting is
able to split the CMP correctly. Specifically, that hardware looks at
the execution group and knows to shift it's flag usage up correctly so a
2H instruction will write to f0.1 instead of f0.0.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit def013a863)
Commit 259fc50545 added linker error for
mismatching uniform precision, as required by GLES 3.0 specification and
conformance test-suite.
Several Android applications, including Forge of Empires, have shaders
which violate this rule, on a dead varying that will be eliminated.
The problem affects a big number of applications using Cocos2D engine
and other GLES implementations accept this, this poses a serious
application compatibility issue.
Starting from GLSL ES 3.0, declarations with conflicting precision
qualifiers are explicitly prohibited. However GLSL ES 1.00 does not
clearly specify the behavior, except that
"Uniforms are defined to behave as if they are using the same storage in
the vertex and fragment processors and may be implemented this way.
If uniforms are used in both the vertex and fragment shaders, developers
should be warned if the precisions are different. Conversion of
precision should never be implicit."
The word "used" is not clear in this context and might refer to
1) declared (same as GLES 3.x)
2) referred after post-processing, or
3) linked after all optimizations are done.
Looking at existing applications, 2) or 3) seems to be widely adopted.
To avoid compatibility issues, turn the error into a warning if GLSL ES
version is lower than 3.0 and the data is dead in at least one of the
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97532
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0886be093f)
Since it also uses the output vector before writing to memory.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit c07d719e8b)
[Bas Nieuwenhuizen: resolve conflicts]
Conflicts:
src/amd/vulkan/radv_shader.c
Due to LLVM bugs. Fixes a bunch of dEQP-VK.glsl.indexing.*
tests.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6ce550453f)
[Bas Nieuwenhuizen: resolve conflicts]
Conflicts:
src/amd/vulkan/radv_shader.c
It looks the original indirect mask was probably copied from
ANV.
Sascha Willems demo results:
tessellation ~4000 -> ~4200 fps
V2: continue lowering local indirects due to llvm deficiencies.
(cherry picked from commit 087e010b2b)
[Bas Nieuwenhuizen: patch is a backport for 17.2 of the cherry-pick above]
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.
Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.
Fixes a memory leak with multithreading demo.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f0ae06a13c)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/amd/vulkan/radv_cmd_buffer.c
We need to validate some structs exist before we dirty the states, and
avoid the problem in some other places.
Fixes: e027935a7 ("st/mesa: don't update unrelated states in non-draw calls such as Clear")
(cherry picked from commit cc69f2385e)
It confuses CTS. This pregenerates the heap info into the
physical device, so we can use it for translating contiguous
indices into our "standard" ones.
This also makes the WSI a bit smarter in case the first preferred
heap does not exist.
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 806721429a)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/amd/vulkan/radv_device.c
src/amd/vulkan/radv_private.h
src/amd/vulkan/radv_wsi.c
Mesa's DEBUG and assert's NDEBUG are not tied to each other, so we need
to explicitly compile this code out.
Fixes: 3df7892878 "vc4: Drop reloc_count tracking for debug
asserts on non-debug builds."
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 5d44e35a8f)
It is possible that the optimizer ends up in an infinite loop in
post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group()
does not find a proper scheduling. This can be deducted from
pending.count() being larger than zero and not getting smaller.
This patch works around this problem by signalling this failure so that the
optimizers bails out and the un-optimized shader is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 69eee511c6)
Previously the values were calculated by just shifting ~0 by the
invocation ID. This would end up including bits that are higher than
gl_SubGroupSizeARB. The corresponding CTS test effectively requires that
these high bits be zero so it was failing. There is a Piglit test as
well but this appears to checking the wrong values so it passes.
For the two greater-than bitmasks, this patch adds an extra mask with
(~0>>(64-gl_SubGroupSizeARB)) to force these bits to zero.
Fixes: KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680#c3
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit b697ece10a)
Only use CCS_E to render to a texture that is CCS_E-compatible with the
original texture's miptree (linear) format. This prevents render
operations from writing data that can't be decoded with the original
miptree format.
On Gen10, with the new CCS_E-enabled formats handled, this enables the
driver to pass the arb_texture_view-rendering-formats piglit test.
v2. Add a TODO for texturing. (Jason)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 9e849eb8bb)
Fixes intermittent GPU hangs on Broxton with an Intel internal
test case.
There are plenty of similar fragment shaders in piglit that do
not use any varyings and any uniforms. According to the
documentation special timing is needed between pipeline stages.
Apparently we just don't hit that with piglit. Even with the
failing test case one doesn't always get the hang.
Moreover, according to the error states the hang happens
significantly later than the execution of the problematic shader.
There are multiple render cycles (primitive submissions) in between.
I've also seen error states where the ACTHD points outside the
batch. Almost as if the hardware writes somewhere that gets used
later on. That would also explain why piglit doesn't suffer from
this - most tests kick off one render cycle and any corruption
is left unseen.
v2 (Ken): Instead of enabling push constants, enable one of the
inputs (PSIZ).
v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe()
happy.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 97e01adfd5)
According to the ARB_ES3_1_compatibility specification,
glGetFramebufferAttachmentParameteriv is supposed to accept BACK,
and it behaves exactly like BACK_LEFT.
Fixes a GL error in GFXBench 5 Aztec Ruins.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4f538c3f99)
Valgrind shows that leak is caused by gen6_upload_push_constant, add
unref push_const_bo per stage to destructor to fix this (like done for
scratch_bo).
==10952== 144 bytes in 1 blocks are definitely lost in loss record 44 of 66
==10952== at 0x4C30A1E: calloc (vg_replace_malloc.c:711)
==10952== by 0x8C02847: bo_alloc_internal.constprop.10 (brw_bufmgr.c:344)
==10952== by 0x8C425C4: intel_upload_space (intel_upload.c:101)
==10952== by 0x8C22ED0: gen6_upload_push_constants (gen6_constant_state.c:154)
v2: remove if conditions, brw_bo_unreference handles NULL (Ken, Emil)
Fixes: 24891d7c05 ("i965: Store per-stage push constant BO pointers.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0b131ca427)
Not all rendering matches the miptree format. We allow rendering to
texture views so there are cases where it may not match. In those
cases, our current scheme of just passing the value of ctx->sRGBEnabled
isn't viable. Instead, just do what we do for texturing and pass the
view format in directly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 39c5c12f8f)
[Andres Gomez: remove code which was trivially modified previously]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_draw.c
src/mesa/drivers/dri/i965/brw_wm_surface_state.c
It's rather surprising that we've never actually hit this before.
Aparently, Ian's SPIR-V generator currently claims the Simple when you
don't do anything complex. We really shouldn't assert-fail on it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8ab9820d34)
This fixes garbage there if we don't flush TC L2 after rendering.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 759526813b)
This unbreaks waffle/gbm (piglit/gbm) which fails initialization.
v2: also don't set queryDmaBufFormats
Reviewed-by: Daniel Stone <daniel@fooishbar.org>
(cherry picked from commit a65db0ad1c)
The PRM says "The execution size must be 1." In 73137997e2, the
execution size was set to 1 when it should have been BRW_EXECUTE_1
(which maps to 0). Later, in dc2d3a7f5c, JMPI was used for
line AA on gen6 and earlier and we started manually stomping the
exeution size to BRW_EXECUTE_1 in the generator. This commit fixes the
original bug and makes brw_JMPI just do the right thing.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 73137997e2
(cherry picked from commit 562b8d458c)
We currently have a bug where nir_lower_system_values gets called before
nir_lower_var_copies so it will miss any system value uses which come
from a copy_var intrinsic. Moving it to after brw_preprocess_nir fixes
this problem.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 279f8fb69c)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/intel/vulkan/anv_pipeline.c
In order to implement the ballot intrinsic, we do a MOV from flag
register to some GRF. If that GRF is used in a SEL, cmod propagation
helpfully changes it into a MOV from the flag register with a cmod.
This is perfectly valid but when lower_simd_width comes along, it simply
splits into two instructions which both have conditional modifiers.
This is a problem since we're reading the flag register. This commit
makes us check whether or not flags_written() overlaps with the flag
values that we are reading via the instruction source and, if we have
any interference, will force us to emit a copy of the source.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fa6e74e33e)
This was causing Android clang version 3.8.256229 to miscompile,
presumably due to strict aliasing.
Fixes: 14dc281c13 ("vc4: Enforce one-uniform-per-instruction after optimization.")
(cherry picked from commit e5fea0d621)
The kernel doesn't initialize the value of the INSTPM or CS_DEBUG_MODE2
registers at context initialization time. Instead, they're inherited
from whatever happened to be running on the GPU prior to first run of a
new context. So, when we started setting these, other contexts in the
system started inheriting our values. Since this controls whether
3DSTATE_CONSTANT_* takes a pointer or an offset, getting the wrong
setting is fatal for almost any process which isn't expecting this.
Unfortunately, VA-API and Beignet don't initialize this (nor does older
Mesa), so they will die horribly if we start doing this. UXA and SNA
don't use any push constants, so they are unaffected.
Until we have some kind of solution to this problem, I'm going to revert
this patch and abandon using the feature for now. It will lead to fewer
pushed UBO ranges on Broadwell+, which may lead to lower performance,
though I don't have any data on the impact.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102774
(cherry picked from commit 013d331220)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_state_upload.c
src/mesa/drivers/dri/i965/intel_screen.c
And just reference pipe_resources to it in the validate callbacks.
Avoids pipe_resource leaks when st_framebuffer_validate ends up calling
the validate callback multiple times, e.g. when a window is resized.
v2:
* Use generic stable tag instead of Fixes: tag, since the problem could
already happen before the commit referenced in v1 (Thomas Hellstrom)
* Use memset to initialize the array on the stack instead of allocating
the array with os_calloc.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
(cherry picked from commit 7561da367b)
Squashed with:
st/osmesa: include u_inlines.h for pipe_resource_reference
Fixes build failure due to unresolved symbol.
Fixes: 7561da367b "st/mesa: Initialize textures array in
st_framebuffer_validate"
Trivial.
(cherry picked from commit 8c9e7c9638)
So one of the CTS tests tries to allocate a 16384x1 2048 array
texture. This overflows a bunch of calculations when we want it
tiled as the heights goes to 128.
addrlib returns us the correct size (16GB or so), but we mangle
it in the htile calcs due to the 32-bit offset fields, then
userspace gives us the reduced number and we try to allocate
it on a heap and things blow up.
We really need to give the app back the correct size for the
image so we can blow up properly in memory allocation later.
This should fix hangs in
dEQP-VK.pipeline.render_to_image.core.1d_array.huge.width_layers.r8g8b8a8_unorm_d32_sfloat_s8_uint
since
Fixes: ad3d98da9f (radv: enable tc compatible htile for d32s8 also.)
Now there's an open question if we should be enabling tc-compat
htile at all for shallow textures like the above.
This might cause some other wierd side effects in CTS even
without the tc compat so:
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 35c66f3e40)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>
Conflicts:
src/amd/vulkan/radv_private.h
The user does not need to know the specifics of the struct, as only a
pointer to it is used.
Just forward declare the struct making the header self-contained.
v2: Remove deprecation warning text/bugzilla - patch does no help there.
Cc: Greg V <greg@unrelenting.technology>
Fixes: 5cddb1ce3c ("wayland: Add an extension to create wl_buffers from
EGLImages")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
(cherry picked from commit 66ebdfbd44)
Otherwise we could have a failure followed by a smaller write that
succeeds and get a corrupted blob. If we ever OOM, we should stop.
v2 (Jason Ekstrand):
- Initialize the new boolean member in create_blob
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit e03717efbd)
Older kernels fail the va_op with this flag set. If the kernel
supports GFX9 usefully, it will also support this flag.
Fixes: e8d57802fe "radv/gfx9: allocate events from uncached VA space"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 96f80c8d4d)
The dmabuf interface requires a valid modifier to be sent. If we don't
explicitly get a modifier from the driver, we can't know what to send;
it must be inferred from legacy side-channels (or assumed to linear, if
none exists).
If we have no modifier, then we can only have a single-plane format
anyway, so fall back to the old wl_drm buffer import path.
Fixes: a65db0ad1c ("st/dri: don't expose modifiers in EGL if the driver doesn't implement them")
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b65d6dafd6)
When creating a wl_buffer from a DRIImage, we extract all the DRIImage
information via queryImage. Check whether or not it actually succeeds,
either bailing out if the query was critical, or providing sensible
fallbacks for information which was not available in older DRIImage
versions.
Fixes: a65db0ad1c ("st/dri: don't expose modifiers in EGL if the driver doesn't implement them")
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6273d2f269)
The current convenience function GetEnv feeds the results of getenv
directly into std::string(). That is a bad idea, since the variable
may be unset, thus we feed NULL into the C++ construct.
The latter of which is not allowed and leads to a crash.
v2: Better variable name, implicit char* -> std::string conversion (Eric)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101832
Fixes: a25093de71 ("swr/rast: Implement JIT shader caching to disk")
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Cc: Bernhard Rosenkraenzer <bero@lindev.ch>
[Emil Velikov: make an actual commit from the misc diff]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Laurent Carlier <lordheavym@gmail.com> (v1)
(cherry picked from commit 21e271024d)
The hardware does this automatically for unorm formats, but we need to
do it manually for unorm depth formats that have been upgraded to
Z32_FLOAT.
Fixes dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
and others.
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 6eb9483912)
The hardware usually does this automatically. However, we upgrade
depth to Z32_FLOAT to enable TC-compatible HTILE, which means the
hardware no longer clamps the comparison value for us.
The only way to tell in the shader whether a clamp is required
seems to be to communicate an additional bit in the descriptor
table. While VI has some unused bits in the resource descriptor,
those bits have unfortunately all been used in gfx9. So we use
an unused bit in the sampler state instead.
Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f
and many other tests in dEQP-GLES3.functional.texture.shadow.*
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 4c56e07029)
[Emil Velikov: handle lack of dirty_mask in original patch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeonsi/si_descriptors.c
Found by address sanitizer.
The loop here tries to be safe, but in doing so, it ends up doing
exactly the wrong thing: the safe foreach is for when the loop
variable (inst) could be deleted and nothing else. However, this
particular can delete inst's successor, but not inst itself.
Fixes: 8c6a0ebaad ("st/mesa: add st fp64 support (v7.1)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 2703fa613b)
Make sure we actually allocate two adjacent TGSI temporaries. The
current code fails e.g. when an arithmetic operation has two
operands with indirect accesses.
I will send out a new piglit test
(arb_gpu_shader_int64/execution/indirect-array-two-accesses.shader_test)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 541208cf13)
Previously buffer offsets were passed in explicitly as an offset, which
had to be added to the resource address. Now they are passed in via an
increased 'start' parameter. As a result, we were double-adding the
start offset in this kind of situation.
This condition was triggered by piglit's draw-elements test which has a
requisite glMultiDrawElements in combination with a small enough number
of vertices to go through the immediate push path.
Fixes: 330d0607ed ("gallium: remove pipe_index_buffer and set_index_buffer")
Reported-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit b20bccbcac)
So it appears the Vulkan SPIR-V fma opcode can be equivalent to a
mad operation, and the fma hw opcode on AMD hw is issued like a double
opcode so is slower. Also the radeonsi stack does this.
This appears to improve performance on a number of games from Feral,
and thanks to Feral for noticing the problem.
I'm reposting this one as Marek indicated he thinks this is what
we should be doing on AMD hw.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2c61594d84)
[Emil Velikov: use correct file radv_shader.c -> radv_pipeline.c]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/amd/vulkan/radv_shader.c
Applications might pass in a buffer that is sized too large and rely
on the extra space of the buffer not being overwritten.
Fixes dEQP-GLES31.functional.state_query.internal_format.partial_query.num_sample_counts
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 9a8f13a33b)
TGSI was adjusted to always pass in 64-bit integers but nouveau was left
with the old semantics. Update to the new thing.
Fixes: d10fbe5159 (st/glsl_to_tgsi: fix 64-bit integer bit shifts)
Reported-by: Karol Herbst <karolherbst@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ce6da2a026)
This ensures that everything gets cleaned up properly. In particular,
it fixes a memory leak where we were leaking the push constants
structs.
Valgrind stats on
dEQP-VK.pipeline.push_constant.graphics_pipeline.range_size_128 :
Before:
HEAP SUMMARY:
in use at exit: 2,467,513 bytes in 1,305 blocks
total heap usage: 697,853 allocs, 696,530 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 1,068 bytes in 11 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
After:
HEAP SUMMARY:
in use at exit: 2,467,381 bytes in 1,304 blocks
total heap usage: 697,853 allocs, 696,531 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 936 bytes in 10 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0763f814d7)
When writing to set > 0, we were just wrongly writing to set 0. This
commit fixes this by lazily allocating each set as we write to them.
We didn't go for having them directly into the command buffer as this
would require an additional ~45Kb per command buffer.
v2: Allocate push descriptors from system memory rather than in BO
streams. (Lionel)
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Fixes: 9f60ed98e5 ("anv: add VK_KHR_push_descriptor support")
Reported-by: Daniel Ribeiro Maciel <daniel.maciel@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit d296dea54e)
In check_os_altivec_support(), allow control of Altivec (first PPC vector
instruction set) code generation via a new environmental control,
GALLIVM_ALTIVEC, which is expected to take on a value of 1 or 0.
The default is to enable Altivec code generation.
This environmental control of Altivec code generation is initially
available only #ifdef DEBUG.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 1359af930e)
In init_native_targets, allow the passing of additional options to
the LLC compiler via new GALLIVM_LLC_OPTIONS environmental control.
This option is available only #ifdef DEBUG, initially.
At top, add #include <llvm-c/Support.h> for LLVMParseCommandLineOptions()
declaration.
v2: Fix compile error with old llvm versions (sroland)
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 5c75f0c8bb)
For Vulkan SPIR-V the spec states
fma() Inherited from OpFMul followed by OpFAdd.
Matt says the backend will do the right thing depending on the
hardware being compiled for, if you use the fmuladd intrinsic.
Using the Mad Max pts test, on high settings at 4K:
CHP: 55->60
HGDD: 46->50
LM: 55->60
No change on Stronghold.
Thanks to Feral for spending the time to track this down.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4e93d6baae)
[Emil Velikov: s/ac_to_float_type/to_float_type/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/amd/common/ac_nir_to_llvm.c
Use st_egl_image instead. radeonsi doesn't like when we create
a pipe_surface with PIPE_FORMAT_NV12.
This fixes NV12 texturing on radeonsi using kmscube.
Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 2d62817da9)
Per the SPIR-V spec 2.11 Structured Control Flow:
"The only blocks in a construct that can branch outside the construct are
...
- a break block for the innermost loop it is inside of.
..."
With
"Break block: A block containing a branch to the Merge Block of a loop header's merge instruction."
Note that it puts no restriction on not being in an if or switch within the innermost loop.
This passes the loop_break block to the switch body so it can properly detect loop breaks.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit ef61d09d5b)
Similar to the way width/pitch alignment works, it seems like we need to
do similar for height. Otherwise the BLIT from system memory to GMEM
can over-fetch beyond the end of the buffer, triggering a fault.
I'm not sure if there is a better solution yet. Possibly we could fall
back to pre-a5xx style DRAW packets for cases where BLIT might over-
fetch. (We in theory have that problem already with rendering to higher
mipmap levels, although fortunately those tend to use GMEM bypass.)
This fixes issues reported with glamor.
Reported-by: don.harbin@linaro.org
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 16ac70bdcf)
The hardware registers store the half-size/width in 12.4 fixed point
format, so 8192 is the maximum.
Fixes dEQP-GLES3.functional.rasterization.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 30e37289ea)
This is a bit conservative, but a more precise solution requires access
to the rasterizer state. This is something to tackle after the fork between
r600 and radeonsi.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 4d74432dd3)
We'll use it in the scissors / clip / guardband state.
v2: avoid a performance regression on r600 when applied to
(pre-fork) stable branches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f86a112b07)
GLSL ES requires both, and while GLSL explicitly doesn't require correct
overflow handling, it does appear to require handling input inf/denorms
correctly.
Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.*
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 93bf9c114b)
[Emil Velikov: init_num_operands() does not exist on branch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/compiler/glsl/lower_instructions.cpp
A tempting alternative fix would be adding a lock/unlock pair in
util_queue_fence_is_signalled. However, that wouldn't actually
improve anything in the semantics of util_queue_fence_is_signalled,
while making that test much more heavy-weight. So this lock/unlock
pair in util_queue_fence_destroy for "flushing out" other threads
that may still be in util_queue_fence_signal looks like the better
fix.
v2: rephrase the comment
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
(cherry picked from commit a208cd7ae4)
Fix the custom cube coord selection sequence to be identical to
the hardware v_cubesc/tc and OpenGL spec. Affects texture sampling
with user-provided derivatives.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 5be5c1e0fa)
Overriding the default (no-op) swizzle is clearly counter-productive,
since the whole point is putting the destination register as one of
the source operands so that it remains unmodified when the assignment
condition is false.
Fragment depth and stencil outputs are a special case due to how their
source swizzles are manipulated in translate_src when compiling to
TGSI.
Fixes dEQP-GLES2.functional.shaders.conditionals.if.*_vertex
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 8ea7d3a5c8)
Only on GFX9 we implement them as 2D images.
This fixes:
dEQP-VK.image.image_size.1d_array.readonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_7x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_writeonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_writeonly_7x1
dEQP-VK.image.image_size.1d_array.writeonly_12x34
dEQP-VK.image.image_size.1d_array.writeonly_1x1
dEQP-VK.image.image_size.1d_array.writeonly_32x32
dEQP-VK.image.image_size.1d_array.writeonly_7x1
Fixes: 1bcb953e16 "radv: handle GFX9 1D textures"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 979978ee06)
A recent commit fixed the case of 8888 integer cube maps, which need the
workaround of replacing the data format with USCALED/SSCALED. However,
this broke the case of non-8888 integer cube maps; those still need the
fix of shifting the texture coordinates.
Fixes KHR-GL45.texture_gather.plain-gather-int-cube-array and similar.
Fixes: 6fb0c1013b ("radeonsi: workaround for gather4 on integer cube maps")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6d23f7c65d)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 052b974fed)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Conflicts:
src/amd/common/ac_llvm_build.c
src/amd/common/ac_llvm_build.h
We originally implemented caching to avoid unneeded round-trips to the
compositor when querying surface capabilities etc. to set up the
swapchain. Unfortunately, this doesn't work if vkDestroyInstance is
called after the Wayland connection has been dropped. In this case, we
end up trying to clean up already destroyed wl_proxy objects which leads
to crashes. In particular most of dEQP-VK.wsi.wayland is crashing
thanks to this problem.
This commit gets rid of the cache and simply embeds the wsi_wl_display
struct in the swapchain. While we're at it, we can get rid of the
wl_event_queue that we were storing in the swapchain because we can just
use the one in the embedded wsi_wl_display.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Bugzilla: https://bugs.freedesktop.org/102578
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4369102498)
v2: Force T and R wrap modes to GL_CLAMP_TO_EDGE for 1D textures.
This fixes a regression in tex1d-2dborder. The test uses a 1D texture
but it provides S and T texture coordinates. Since the T wrap mode
would (correctly) be set to GL_CLAMP, the texture would gradually
blend (incorrectly) with the border color.
I also tried setting NV20_3D_TEX_FORMAT_DIMS_1D instead of
NV20_3D_TEX_FORMAT_DIMS_2D for 1D textures, but that did not help.
It is possible that the same problem exists for 2D textures with the
R-wrap mode, but I don't think there are any piglit tests for that.
No test changes on NV20 (10de:0201).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 953a3cf0fd)
dri2_fallback_swap_interval() currently used to stub out swap interval
support in Android backend does nothing besides returning EGL_FALSE.
This causes at least one known application (Android Snapchat) to fail
due to an unexpected error and my loose interpretation of the EGL 1.5
specification justifies it. Relevant quote below:
The function
EGLBoolean eglSwapInterval(EGLDisplay dpy, EGLint interval);
specifies the minimum number of video frame periods per buffer swap
for the draw surface of the current context, for the current rendering
API. [...]
The parameter interval specifies the minimum number of video frames
that are displayed before a buffer swap will occur. The interval
specified by the function applies to the draw surface bound to the
context that is current on the calling thread. [...] interval is
silently clamped to minimum and maximum implementation dependent
values before being stored; these values are defined by EGLConfig
attributes EGL_MIN_SWAP_INTERVAL and EGL_MAX_SWAP_INTERVAL
respectively.
The default swap interval is 1.
Even though it does not specify the exact behavior if the platform does
not support changing the swap interval, the default assumed state is the
swap interval of 1, which I interpret as a value that eglSwapInterval()
should succeed if called with, even if there is no ability to change the
interval (but there is no change requested). Moreover, since the
behavior is defined to clamp the requested value to minimum and maximum
and at least the default value of 1 must be present in the range, the
implementation might be expected to have a valid range, which in case of
the feature being unsupported, would correspond to {1} and any request
might be expected to be clamped to this value.
This is further confirmed by the code in _eglSwapInterval() in
src/egl/main/eglsurface.c, which is the default fallback implementation
for EGL drivers not implementing its own. The problem with it is that
the DRI2 EGL driver provides its own implementation that calls into
platform backends, so we cannot just simply fall back to it.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
cleared_and_retried is always reset to false when jumping to the retry
label, thus leading to an infinite retry loop.
Fix that by moving the cleared_and_retried variable definitions at the
beginning of the function. While we're at it, move the create variable
with the other local variables and explicitly reset its content in the
retry path.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: 78087676c9 "vc4: Restructure the simulator mode."
(cherry picked from commit ef578906d8)
I was overwriting view->texture with the shadow resource when we need to
do shadow copies (retiling or baselevel rebase), but that tripped up some
critical new sanity checking in state_tracker (making sure that stObj->pt
hasn't changed from view->texture through TexImage-related paths).
To avoid that, move the shadow resource to the vc4_sampler_view struct.
Fixes: f0ecd36ef8 ("st/mesa: add an entirely separate codepath for setting up buffer views")
(cherry picked from commit 68c91a87d7)
Atomic operation sources are scalar values, but we were failing to
select the .x component of the second operand. For example,
atomicCounterCompSwapARB(counter, 5u, 10u)
would generate
mov(8) vgrf4.x:D, 5D
mov(8) vgrf5.x:D, 10D
mov(8) vgrf9.x:UD, vgrf4.xyzw:D
mov(8) vgrf9.y:UD, vgrf5.xyzw:D
which wrongly selects the .y component of vgrf5, so the actual 10u value
would get dead code eliminated. The swizzle works for the other source,
but both of them ought to be .xxxx.
Fixes the compare and swap CTS tests in:
KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase
Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 66342c997f)
Embarassingly, someone enabled the ARB_shader_atomic_counter_ops
extension for Gen7+ but never added the intrinsics to the switch
statement in the vec4 backend, so they just hit an unreachable()
call and died.
Fixes: 40dd45d0c6 (i965: Enable ARB_shader_atomic_counter_ops)
Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit a62fe34098)
On some platforms, gcc generates library calls when __atomic_* functions
are used, but does not link the required library (libatomic) automatically
(supposedly to allow the app to use some other atomics implementation?).
Detect this at configure time and add the library when needed. Tested
on armel (library was added) and on x86_64 (was not, as expected).
Some documentation on this is provided in GCC wiki:
https://gcc.gnu.org/wiki/Atomic/GCCMM
Fixes: 8915f0c0 "util: use GCC atomic intrinsics with explicit memory model"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102573
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 2ef7f23820)
There's no reason to use va_copy here.
CID: 1418113
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: e7fc664b91 ("winsys/amdgpu: add addrlib - texture
addressing and alignment calculator")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 34126ed248)
fixes: This commit addressed earlier commits dcf46e99 and 60878dd0 which
did not land in branch.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This fixes a bug with nearest ("point") mip selection when the fractional
part of max_lod is in (0.5,1). In this case, the spec mandates that
we still select the mip level ceil(max_lod) in the clamping case. However,
MIP_POINT_PRECLAMP will clamp before the mip selection, which is wrong.
Supposedly this setting was originally copied from the closed Vulkan
driver, but as far as I can tell, closed Vulkan was actually changed back
recently :)
Fixes dEQP-GLES3.functional.texture.mipmap.2d.max_lod.{nearest,linear}_nearest
Fixes: f7420ef5b4 ("radeonsi: enable some sampler fields to match the closed driver")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 704ddbcdf6)
By leaving the compiled shader in the context's stage state, the next
compile of a new FS would look in the old compiled FS for figuring out
whether to set various dirty flags for the VS compile. Clear out the
pointer when deleting the program, and make sure that we always mark the
state as dirty if the previous program had been lost. Fixes valgrind
warnings on glsl-max-varyings.
Fixes: 2350569a78 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
(cherry picked from commit 3752ad28f2)
The blitter will bind just the depth buffer, which flushes the current job
if we had both a color and depth/stencil. If the clear was doing partial
depth/stencil (quad-based) and color (tile-based), we'd go on to try to
set up the rest of the tile clear in the now flushed job.
Instead, move the partial clear up before we start setting up the job for
the current FBO state, and re-fetch the job if we're continuing on to a
tile-based clear. Fixes valgrind failures in fbo-depthtex.
Fixes: 9421a6065c ("vc4: Fix fallback to quad clears of depth in GLX.")
(cherry picked from commit 9940fb4205)
I was trying to continue the hash table loop, not the inner loop. This
tended to work out, because we would have *just* freed the job struct.
Fixes some valgrind failures in fbo-depthtex.
Fixes: f597ac3966 ("vc4: Implement job shuffling")
(cherry picked from commit d88a75182d)
Otherwise we end up using a 32-bit comparison which didn't end well.
Timothy caught this while playing around with some opt passes.
Fixes: 278580729a (st/glsl_to_tgsi: add support for 64-bit integers)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit a7a7bf21bd)
fixes: This commit addressed earlier commits 61ad2f13 and 6dcc54b4 which
did not land in branch
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
In Vulkan, for 'z' (depth) component, the scale and translate values
for the viewport transformation are:
pz = maxDepth - minDepth
oz = minDepth
zf = pz × zd + oz
Being zd, the third component in vertex's normalized device coordinates.
Fixes: dEQP-VK.draw.inverted_depth_ranges.*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d2cd9deeb8)
util_pack_color may leave undefined values in the upper half of the packed
integer. As our hardware needs the upper 16 bits to mirror the lower 16bits,
this breaks clears of those formats if the undefined values aren't masked off.
I've only observed the issue with R5G6B5_UNORM surfaces, other 16bpp
formats seem to work fine.
Fixes: d6aa2ba2b2 (etnaviv: replace translate_clear_color with util_pack_color)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit e9d37d68cf)
Like for cube map (array) gather, we need to round to nearest on <= VI.
Fixes tests in dEQP-GLES3.functional.shaders.texture_functions.texture.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 87f7c7bd65)
Prevent an overflow caused by too many output variables. To limit the
scope of the issue, write to the assigned array only for the non-ES
fragment shader path, which is the only place where it's needed.
Since the function will bail with an error when output variables with
overlapping components are found, (max # of FS outputs) * 4 is an upper
limit to the space we need.
Found by address sanitizer.
Fixes dEQP-GLES3.functional.attribute_location.bind_aliasing.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 15cae12804)
Squashed with commit:
glsl/linker: properly fix output variable overlap check
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102904
Fixes: 15cae12804 ("glsl/linker: fix output variable overlap check")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit df8767a14e)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
If we don't have a depth piece, we don't get a correct
swizzle mode and we hit an assert in addrlib.
In case of no depth get the preferrred swizzle mode for
stencil alone.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c4ac522511)
Squashed with commit:
ac/surface: handle error when choosing preferred swizzle mode
CID: 1418140
Fixes: c4ac522511 ("ac/surface: handle S8 on gfx9")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit eb71394ff3)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.
Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e0af3bed2c)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Conflicts:
src/amd/common/ac_nir_to_llvm.c
This is the same workaround that radv already applied in commit
3ece76f03d ("radv/ac: gather4 cube workaround integer").
Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6fb0c1013b)
This fixes a crash on Haswell when we try to upload a stencil texture
with blorp. It would also be a problem if someone tried to texture from
stencil after glBlitFramebuffers.
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit a43d379000)
Platforms without particular atomic operations require the
implementations in u_atomic.c
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where
they're missing")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d075a4089e)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Conflicts:
src/util/Makefile.am
libunwind is a optional dependency used by the gallium aux module
(libgallium) and consequently the final binaries must be linked against
it. To test whether the library is properly specified in the link pass
add it to the travis-ci build environment and force its use.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 39fe51c1e3)
In Ubuntu Trusty the default version of llvm is 3.4 and the build was
actually randomly picking 3.5 or 3.9. Adding libunwind would then result
is build success or failure depending of what version was picked.
Install the llvm-3.3-dev package and force its use: On one hand it is
the minimum required version we want to the build test against, and on
the other hand forcing the version stabilizes the build.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d3675812b5)
With the shaders in the ssao demo, the nir_opt_if wasn't
working properly without this, after this the if gets optimised
so that loop unrolling gets called.
(loop unrolling fails due to instruction count, but at least
it gets to do that.)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 64d9bd149a)
[Juan A. Suarez: apply patch over src/amd/vulkan/radv_pipeline.c]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Conflicts:
src/amd/vulkan/radv_shader.c
We can't use it anyway in fast clears, and on GFX9 it seems to
actually hange the card if we specify it.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
(cherry picked from commit 1a172fb113)
The current DCC init routine doesn't account for initializing a
single layer or level. Multilayer seems hard for small textures on
pre-GFX9 as tre metadata for the layers can be interleaved. For
GFX9 multilevel textures are a problem for similar reasons.
So just disable this for now, until we handle the texture modes
correctly.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
(cherry picked from commit bee83b2661)
Fixes: e8d57802f (radv/gfx9: allocate events from uncached VA space)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a5add6fb30)
Otherwise, the simultaneous uage bit doesn't get set from the begin
info, which we need for batchchaining.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit dec7b38fe6)
Currently if table_size is 0, it's falling through to:
unreachable("hash table should never be full");
But table_size can be 0 when RADV_DEBUG=nocache is set, or when the
table allocation fails (which is not considered an error).
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit b8dd69e1b4)
This fixes a rendering issue with Hitman when bindless textures
are enabled.
Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 59101e771d)
Found by address sanitizer:
==22621==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61400000cbd8 at pc 0x7f561610a4ff bp 0x7ffca85f9d50 sp 0x7ffca85f94f8
READ of size 344 at 0x61400000cbd8 thread T0
#0 0x7f561610a4fe (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x5f4fe)
#1 0x7f560bb305a5 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 0x7f560bb305a5 in blob_write_bytes ../../../mesa-src/src/compiler/glsl/blob.c:136
#3 0x7f560be7d7ff in encode_type_to_blob ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:153
#4 0x7f560be81222 in write_program_resource_data ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:950
#5 0x7f560be81222 in write_program_resource_list ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1118
#6 0x7f560be81222 in shader_cache_write_program_metadata(gl_context*, gl_shader_program*) ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1407
#7 0x7f560b825fdb in link_program ../../../mesa-src/src/mesa/main/shaderapi.c:1163
Fixes: 073a84ff60 ("glsl: stop adding pointers from glsl_struct_field to the cache")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 4da6cf6c98)
gl_SampleMaskIn is supposed to contain set bits only for the samples that
are covered by the current fragment shader invocation, but the VGPR
initialization hardware loads the set of all bits that are covered at the
current pixel.
Fixes various tests in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 92c4277990)
[Emil Velikov: attribute for the lack of add_arg*checked() API]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeonsi/si_shader.c
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4009370232)
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit a6618afd27)
This will allow us to easily skip them when writting the struct
to disk cache.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3ea3f75723)
In the following patch we will stop writing the pointer to cache.
Unfortunately adding empty strings to that cache seems to be the
only thing we can do here once we no longer have the pointers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 44918a1979)
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 22154823d2)
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 073a84ff60)
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 37d453b55a)
As of 4.11, the kernel isn't bothering to set the subslice hashing mode
on Apollolake, leaving it at the default of 8x8. (It initializes it to
16x4 on most platforms.)
Performance data for GPUTest Triangle on Apollolake at 1024x640:
X-tiled RT:
-----------
8x8 -> 16x4: 2.4325% +/- 0.383683% (n=107)
8x8 -> 8x4: -3.75105% +/- 0.592491% (n=40)
8x8 -> 16x16: 6.17238% +/- 0.67157% (n=30)
Y-tiled RT:
-----------
8x8 -> 16x4: 1.30307% +/- 0.297292% (n=205)
8x8 -> 8x4: -0.769282% +/- 0.729557% (n=35)
8x8 -> 16x16: 3.00254% +/- 0.715503% (n=40)
8x MSAA RT (INTEL_FORCE_MSAA=8):
--------------------------------
8x8 -> 16x4: 1.38889% +/- 0.93729% (n=7)
8x8 -> 8x4: -2.10643% +/- 1.15153% (n=3)
8x8 -> 16x16: 3.87183% +/- 1.08851% (n=5)
Based on this, we choose 16x16 for Apollolake.
Skylake GT2 with X-tiled buffers appears to be a toss-up between 16x4
and 16x16, and with Y-tiled buffers it doesn't seem to really matter.
So we'll leave Skylake alone for now.
The hashing mode doesn't seem to make a measurable impact on more
complex benchmarks.
Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit ebd2fd6ef3)
This field covers the whole resource.
Fixes:
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.3d.format.*
dEQP-VK.texture.filtering.3d.combinations.*
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ebd2a5354d)
As GFX9 can't handle 1D depth textures, radeonsi and
apparantly pro just update all 1D textures to 2D,
and work around it.
This ports the workarounds from radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 1bcb953e16)
Conflicts:
src/amd/common/ac_nir_to_llvm.c
Work out the width/height from the level manually, as on GFX9
we won't minify the iview width/height.
This fixes:
dEQP-VK.api.image_clearing.core.clear_color_image* on gfx9
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2f5b4490b5)
Don't get distracted by record dereferences between array references.
Fixes dEQP-GLES31.functional.tessellation.user_defined_io.per_vertex_block.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 03203b7448)
This copies what amdgpu-pro does, and allocates the memory
for an event with an uncached mtype.
This fixes hangs with:
dEQP-VK.api.command_buffers.record_simul_use_primary
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e8d57802fe)
This is a precursor to the gfx9 fix to use uncached for the event
memory. Move to the interface which allows setting the flags,
but wrap it to avoid having to copy it around the place.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 76ac8fafad)
This reverts commit 611076a41a.
With the two previous commits, vega shouldn't be unstable,
doesn't pass CTS, but can do a complete run, and games shouldn't
hang anymore, so bring it back online.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e38685cc62)
This is required on GFX9, fixes a bug in Talos where all the
mipmaps overlay each other.
Just pushing this as well as it fixes Talos.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6d929d3f85)
This causes hangs in some of the CTS tests with a 2d
1536x2 texture.
This fixes hangs with:
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.iew_type.1d_aray.format.r4g4b4a4_unorm_pack16.count_1.size.512x1_array_of_3
if we reenable it, make sure these don't regress.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d118ff8765)
cs_invocations are currently unsupported, but leaving the field uninitialized
is even worse.
fixes on nvc0:
* KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values
* KHR-GL45.pipeline_statistics_query_tests_ARB.functional_non_rendering_commands_do_not_affect_queries
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit b672c3833b)
Fix loading of a 3x16 vector as a single 48-bit load
on big-endian systems (PPC64, S390).
Roland Scheidegger's commit e827d91756
plus Ray Strode's patch reduce pre-Roland Piglit failures from ~4000 to ~2000. This patch fixes
three of the four regressions observed by Ray:
- draw-vertices
- draw-vertices-half-float
- draw-vertices-half-float_gles2
One regression remains:
- draw-vertices-2101010
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 57c8ead0cd)
lp_build_fetch_rgba_soa fetches a texel from a texture.
Part of that process involves first gathering the element
together from memory into a packed format, and then breaking
out the individual color channels into separate, parallel
arrays.
The code fails to account for endianess when reading the packed
values.
This commit attempts to correct the problem by reversing the order
the packed values are read on big endian systems.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ray Strode <rstrode@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 75cb6e3617)
This fixes some crashes in the dEQP-VK.memory.requirements.core.* tests.
I'm not sure whether or not passing out-of-bound formats into the query
is supposed to be allowed but there's no harm in protecting ourselves
from it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/101956
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 242211933a)
Squashed with:
anv: fix off by one in array check
`anv_formats[ARRAY_SIZE(anv_formats)]` is already one too far.
Spotted by Coverity.
CovID: 1417259
Fixes: 242211933a "anv/formats: Nicely handle unknown VkFormat enums"
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
(cherry picked from commit 0c7272a66c)
Instead of saving primitive offset in the minmax cache key,
save the actual buffer offset which is used in the cache lookup.
Fixes rendering artifact seen with GoogleEarth when run with
VMware driver.
v2: Per Brian's comment, initialize offset to avoid compiler warning.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 2d93b462b4)
Squashed with:
vbo: fix build errors on android
incompatible pointer to integer conversion assigning to 'GLintptr' (aka 'int')
from 'const char *' [-Werror,-Wint-conversion]
offset = indices;
^ ~~~~~~~
Fixes: 2d93b462b4 ("vbo: fix offset in minmax cache key")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 0986f68632)
If llvmpipe_set_scissor_states() is never called, we still need to be sure
that derived scissor/clip state is updated. As of commit 743ad599a9
that function might not be called.
Fixes regressed Piglit gl-1.0-scissor-offscreen -fbo -auto test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101709
Fixes: 743ad599a9 ("st/mesa: don't set 16 scissors and 16 viewports
if they're unused")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 88cdf16871)
Adding extra infra is not acceptable for stable. Offending code has been
wrapped in a Android compile guard with an earlier commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise eglCreateWaylandBufferFromImageWL will fail, since we
have no "supported" format.
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 9e07005e87)
For most/all cases today, we have wl_drm available alongside wl_dmabuf.
Yet in the long run, we want to make sure the latter can operate without
any traces of the former.
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit da100fe697)
The wl_drm wrapper is created before the wl display/surface ones.
Thus make sure we destroy it after them. In reality it should not make
any difference either way.
Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 1a8015e753)
(commit split out by Bas Nieuwenhuizen)
Fixes: 65477bae9c "radv: enable GFX9 on radv"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9573bd70e1)
One could easily introduce version 3 of the DRI2fenceExtension,
extending the struct, while not implementing the above function.
Thus we'll end up with NULL pointer, and dereferencing it won't fare
too well.
Fixes: 0201f01dc4 ("egl: add EGL_ANDROID_native_fence_sync")
Cc: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit f0d053cb6d)
If we merge a mapping with the mapping before it, we also need
to not only change the offset, but also the bo offset.
Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 9b7e663da1)
We don't use the render path so totally unneeded.
Fixes: 19be95f71e "radv: add subpass resolve compute path"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit bd81cb3206)
Recording secondaries with no framebuffer attachment may
make this happen, though this might not be the complete solution.
(esp if someone does meta stuff in there, would we have to
save things, not sure).
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4a091b0788)
When I added gfx9 I did it wrong, this fixes it.
Fixes: 5247b311e9 "radv/gfx9: fix set predication packet."
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 12fd0f8dc1)
This code was separated from the validation code so it could
use used with KHR_no_error paths. The return values were inverted
to reflect the name of the helper, but here the condtion was
mistakenly inverted rather than the return value.
Fixes: 4df2931a87 (mesa/vbo: move some Draw checks out of validation)
Reported-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a4635c84dc)
In get_back_bo, we use wl_display_dispatch_queue() to block and wait for
a buffer release event. However, not all Wayland compositors flush the
client socket on posting a buffer-release event, so by only blocking
client-side, we may block indefinitely, or at least need to wait for an
input event / frame completion to arrive for the compositor to flush.
We now use dispatch_queue as a first pass, but if our entire buffer pool
is exhausted, use a roundtrip (an immediately-triggered wl_callback) to
ensure that the compositor flushes out our release event immediately.
[daniels: Modified comment and commit message.]
Signed-off-by: Kai Chen <kai.chen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 151188d1e3)
The int32->float semantic conversion got dropped in a testcase,
because the src was already float. On closer inspection I decided
to add a few more casts for integer op operands to be safe too.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 43595db302)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/amd/common/ac_nir_to_llvm.c
All of the coordinates and LOD args are integers for TXF. This mostly
doesn't matter, except for converting into a levelZero=true operation by
removing an explicit zero LOD. For the comparison against zero to work
properly, the sType of the instruction has to be set correctly.
Fixes: KHR-GL45.robust_buffer_access_behavior.texel_fetch
Reported-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 96be442b77)
For gfx9 the addressing for images has changed, so we need to
provide the hw with the level0, however we still need to scale
for format block differences (so our compressed upload paths still
work).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit bae7723e13)
Avoid passing the vulkan image creation into the image view descriptor
setup. This cleans up the usage of range inside the init, instead
using the properly inited values in the image view.
This is just a cleanup but some future vega changes will depend on it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 5378b5d071)
GFX9 needs the SX MRT blend registers programmed, port over
the code from radeonsi to workout the values from the blend
state, and program the registers on rbplus systems.
This fixes lots of:
dEQP-VK.pipeline.blend.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 9c080100d3)
When changing fast clear colors, we need to emit new SURFACE_STATE
with the updated color at the next draw call.
Most things work today because the atoms that handle SURFACE_STATE
for images (mutable images, textures, render targets) also listen to
BRW_NEW_BLORP, causing us to re-emit these on every BLORP operation.
However, this is overkill - most BLORP operations don't require us
to re-emit SURFACE_STATE.
One case where this is broken today is a fast clear to a different
color followed by a non-coherent framebuffer fetch. The renderbuffer
read atom doesn't listen to BRW_NEW_BLORP, and would not get the new
fast clear color.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 54c41af0aa)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_tcs_surface_state.c
In f9fd976e8a we changed the clear value to be stored as an
isl_color_value. This had the side-effect same clear value check is now
happening directly between the f32[0] field of the isl_color_value and
ctx->Depth.Clear. This isn't what we want for two reasons. One is that
the comparison happens in floating point even for Z16 and Z24 formats.
Worse than that, ctx->Depth.Clear is a double so, even for 32-bit float
formats, we were comparing as doubles and not floats. This means that
the test basically always fails for anything other than 0.0f and 1.0f.
This caused a slight performance regression in Lightsmark 2008 because
it was using a depth clear value of 0.999 which can't be stored in a
32-bit float so we were doing unneeded resolves.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/101678
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0ae9ce0f29)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_clear.c
Both statically linking libLLVMCore and dynamically linking libLLVM causes
duplicated symbols in gallium_dri.so and it fails to dlopen. We don't
really need to link libLLVMCore, but just need generated headers to be
built first. Dynamically linking to libLLVM instead is enough to do
that. Thanks to Qiang Yu for finding the root cause.
With this change, we can align all versions and just have libLLVM as a
shared lib dependency.
This also requires changes in the M and N versions of LLVM to export the
include paths for libLLVM. AOSP master is okay.
Fixes: 26aee6f4d5 ("Android: rework LLVM build support")
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Qiang Yu <Qiang.Yu@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
(cherry picked from commit 4734bfc02a)
In 76e2f390f9, when Topi switched num_samples from 0 to 1 for
single-sampled, he accidentally switched the last parameter in the call
to miptree_create_for_teximage from 0 to 1 thinking it was num_samples
when it was actually layout_flags. Switching from 0 to 1 added the
MIPTREE_LAYOUT_ACCELERATED_UPLOAD flag which causes us to allocate a
busy BO instead of an idle one. This caused the subsequent CPU upload
to consistently stall. The end result was a 15% performance drop in the
SynMark v7 DrvRes microbenchmark. This restores the old behavior and
fixes the performance regression.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes: 76e2f390f9
Bugzilla: https://bugs.freedesktop.org/102260
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit f24cf82d6d)
This little optimization improves the performance of SynMark v7
TexFilterTri by almost 10% on Sky Lake GT4 among other improvements.
We've been doing it for some time but somehow it got dropped during
the miptree refactoring.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/102258
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 61d2f3f1c2)
Looking at NewDriverState is not safe in general. The state atom system
is set up to ensure that new bits that get added to NewDriverState get
accumulated into the set of bits used when emitting atoms but it doesn't
go the other way. If we read NewDriverState, we may not get the full
picture because the per-pipeline state (3D or compute) does not get
added to NewDriverState before state emit is done. It's especially
dangerous to do this from BLORP (either explicitly or implicitly when
BLORP calls gen7_upload_urb) because that does not happen during one of
the normal state upload paths.
This commit solves the problem by whacking all of the per-shader-stage
URB sizes to zero whenever we change the total URB size. We still have
to flag BRW_NEW_URB_SIZE to ensure that the gen7_urb atom triggers but
the actual decision in gen7_upload_urb can now be based entirely on URB
sizes rather than on state atoms. This also makes BLORP correct because
it just asks for a new URB config whenever the vsize is too small and so
any change to the total URB size will trigger blorp to re-emit as well
because 0 < vs_entry_size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/102289
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d5e217dbfd)
EGLimages are shared with external users, and we don't know what they're
going to do with them. They might scan them out. They might access
them in a way that doesn't work with our explicit clflushing.
It's safest to simply mark them non-coherent.
Chris Wilson caught this problem and wrote a similar (though less
aggressive) patch to solve it; the miptree code has since undergone
a lot of refactoring so I had to rewrite it.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit bc56dfbf3f)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/intel_mipmap_tree.c
The only one of the three remaining flags that has anything whatsoever
to do with layout is TILING_NONE. This commit renames them to
MIPTREE_CREATE_*, documents the meaning of each flag, and makes the
create functions take an actual enum type so GDB will print them nicely.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 24a0da338f)
The only force tiling flag we really care about is LAYOUT_TILING_NONE.
The others don't actually do anything but add confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 55116839d9)
The flag hasn't affected actual surface layout for some time. The only
purpose it served was to set bo->cache_coherent = false on the BO used
to create the miptree. This is fairly silly because we can just set
that directly from the caller where it makes much more sense.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a5a673dfa7)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/intel_mipmap_tree.c
The one caller of is_mcs_supported passes 0 in as the layout_flags
unconditionally.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 0e4d9a4b37)
For stable the regressions were addressed by reverting the offending
commits - see previous commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This reverts commit fcbb93e860 and
also commit 7c5b204e38 to avoid
compilation errors.
Basically, the parameter indexes look wrong when a non-bindless
sampler is declared inside a nested struct (because it is skipped).
I think it's safer to just restore the previous behaviour which is
here since ages and also because the initial attempt is only a
little performance improvement.
This fixes a regression with
ES2-CTS.functional.shaders.struct.uniform.sampler_nested*.
Note: this patch is only for 17.2 since the fixes in master are rather
invasive. Namely:
365d34540f mesa: correctly calculate the storage offset for i915
de0e62e106 st/mesa: correctly calculate the storage offset
Fixes: fcbb93e860 ("mesa: stop assigning unused storage for non-bindless
opaque types")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101983
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The make_shareable function deletes the aux buffer and then whacks
aux_usage to ISL_AUX_USAGE_NONE but not unsetting supports_fast_clear.
Since we only look at supports_fast_clear to decide whether or not to do
fast clears, this was causing assertion failures.
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101925
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit e7a52cc381)
If you don't pass this, the compiler refuses to compile the assembly for
pre-v7 CPUs. This also keeps us from building identical, non-NEON code on
aarch64 and x86.
Fixes: a373f77662 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.")
v2: Fix Android build by just appending NEON_C_SOURCES when
ARCH_ARM_HAVE_NEON.
Tested-by: Rob Herring <robh@kernel.org>
(cherry picked from commit bd5efbd70b)
[Emil Velikov: address libvc4_la_LIBADD conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/vc4/Makefile.am
I've been trying to get away without these conditionals in vc4's NEON
code, but it meant compiling extra unused code on x86, and build failing
on ARMv6.
v2: Use the _arm/_arm64 flags to simplify detection (suggested by Rob),
but hide the _arm version under ARCH_ARM_HAVE_NEON to keep from trying
to build this stuff for armv5te.
Tested-by: Rob Herring <robh@kernel.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit ba8533b6ea)
We need to link librt for u_thread.h's clock_gettime() call.
Fixes: b822d9dd67 ("gallium/util: move u_queue.{c,h} to src/util")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b94ddc181b)
BLEND_STATE packing was modified to be variable-length in:
9670124e31 genxml: Make BLEND_STATE command support variable length array.
The initial gen10.xml still had the old, fixed-length style
definition for BLEND_STATE. So gen10_upload_blend_state would
overwrite the packed BLEND_STATE_ENTRYs with its own fixed array
of all-zero entries when packing BLEND_STATE. This caused
BLEND_STATE upload to not work at all.
Fixes: aa416f515a ("i965/genxml: Add gen10.xml")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit d6539608a4)
LLC platforms are magic in that reads from the CPU are always cache
coherent, or rather GPU writes that bypass LLC do still invalidate the
appropriate cache line.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 49eda75df6)
This affects which inputs are marked as used. In a situation where only
the texture instruction uses an input, it might have been ignored as
unused due to input masks.
Affects subtests of KHR-GL45.texture_cube_map_array.sampling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 054c54d1be)
We continue in the code to do some more things with the rhs, including
setting a constant initializer. If the type is wrong, this causes some
confusion down the line, leading to assertions. This makes sure that the
rhs processing continues to flow as-if the type was correct to start
with (even though the state has been marked as an error state).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101766
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 978c4c597a)
intel_miptree_texture_aux_usage() takes an isl_format, but we are
passing a mesa_format. clang warns:
brw_blorp.c:305:52: warning: implicit conversion from enumeration
type 'mesa_format' to different enumeration type
'enum isl_format' [-Wenum-conversion]
intel_miptree_texture_aux_usage(brw, src_mt, src_format);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~
Fixes: fc1639e46d ("i965/blorp: Use texture/render_aux_usage for blits")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit f7dfc44c61)
Since we don't iterate to a fixed point, we can end up in situations
where we have a SAT instruction + a long immediate. This is not legal.
However since it's immediately computable, just run unary straight away
to handle the situation.
Fixes: 24a799ad35 ("nv50/ir: fix ConstantFolding with saturation")
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 165e18dd21)
This seems like a workaround, but we don't see the bug on CIK/VI.
On SI with the dEQP-VK.memory.pipeline_barrier.host_read_transfer_dst.*
tests, when one tests complete, the first flush at the start of the next
test causes a VM fault as we've destroyed the VM, but we end up flushing
the compute shader then, and it must still be in the process of doing
something.
Could also be a kernel difference between SI and CIK.
v2: hit this with a bigger hammer. This fixes a bunch of hangs
in the vk cts with the robustness tests.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101334
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 82ba384c10)
This ports the workaround from radeonsi, that was missing in radv.
This fixes Talos rendering when MSAA is enabled on my Tahiti card.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8bf3930751)
This just copies the code from the -pro shaders,
and fixes the tests on CIK.
With this CIK passes the same set of conformance
tests as VI.
Fixes: 83e58b03 (radv: flush f32->f16 conversion denormals to zero. (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3f389f75b6)
The argument here is a bitmask, so the old code selected .xy, which
got silently truncated to .x when constructing the vec4 from components,
instead of using .w.
Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit acba3a3151)
It justs works with the fragment shader resolve, so no need to do
a custom conversion. In fact with SRGB dest, it actually gives
wrong results.
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 15e5a7a683)
These seem to store very bogus results. Luckily there is some code
that converts srgb->linear already, so just making the descriptor
format UNORM should work.
Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8286c3a49f)
This fixes corrupted shadows in Unigine Valley.
The corruption disappeared when I stopped setting IMG_DATA_FORMAT_24_8
for depth.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 27fef5d52d)
Also, silence an obnoxious finishme that started occurring for all
GL applications which use stencil after the i965 ISL conversion.
v2: Check against 3DSTATE_STENCIL_BUFFER's pitch bits when using
separate stencil, and 3DSTATE_DEPTH_BUFFER's bits when using
combined depth-stencil.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 5563872dbf)
If we have an invalid display fed into the functions, the display lookup
will return NULL. Thus as we attempt to get the platform type, we'll
deref. it leading to a crash.
Keep in mind that this will not happen if Mesa is built without X11 or
when the legacy eglCreate*Surface codepaths are used.
A similar check was added with earlier commit 5e97b8f5ce ("egl: Fix
crashes in eglCreate*Surface), although it was only applicable when the
surfaceless platform is built.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 26fbb9eacd)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/egl/main/eglapi.c
UE4Editor has this issue.
This commit prevents hangs (release build) or assertion failures (debug
build). It doesn't fix the editor, but catastrophic scenarios are
prevented.
Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4630ede102)
For mul(a, +-1) codegen can generate OP_MOV with a saturation flag
set which is ignored at emission. The same can happen with add(a, 0),
and others.
Adding an assert for detecting more of such issues.
Fixes wrongly rendered water in Hitman Absolution running under wine.
Also a few shaders in Mad Max and Alien Isolation produce such MOVs.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: generalize the fix for other cases]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 24a799ad35)
Originally, I had moved it to the caller to make some things easier when
adding the CCS modifier. However, this broke DRI2 because
intel_process_dri2_buffer calls intel_miptree_create_for_bo but never
calls intel_miptree_alloc_aux. Also, in hindsight, it should be pretty
easy to make the CCS modifier stuff work even if create_for_bo allocates
the CCS when DISABLE_AUX is not set.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8e5808fc0c)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/intel_mipmap_tree.c
We were calculating the total height of 2D surfaces by multiplying the
row pitch by the number of slices. This means that we actually request
slightly more space than actually needed since the padding on the last
slice is unnecessary. For tiled surfaces this is not likely to make a
difference. For linear surfaces, on the other hand, this means we may
require additional memory. In particular, this makes the i965 driver
reject EGL imports of buffers which do not have this extra padding.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4d27c6095e)
The docs contain a bunch of commentary about the need to pad various
surfaces out to multiples of something or other. However, all of those
requirements are about avoiding GTT errors due to missing pages when the
data port or sampler accesses slightly out-of-bounds. However, because
the kernel already fills all the empty space in our GTT with the scratch
page, we never have to worry about faulting due to OOB reads. There are
two caveats to this:
1) There is some potential for issues with caches here if extra data
ends up in a cache we don't expect due to OOB reads. However,
because we always trash the entire cache whenever we need to move
anything between cache domains, this shouldn't be an issue.
2) There is a potential issue if a surface gets placed at the very top
of the GTT by the kernel. In this case, the hardware could
potentially end up trying to read past the top of the GTT. If it
nicely wraps around at the 48-bit (or 32-bit) boundary, then this
shouldn't be an issue thanks to the scratch page. If it doesn't,
then we need to come up with something to handle it.
Up until some of the GL move to ISL, having the padding code in there
just caused us to harmlessly use a bit more memory in Vulkan. However,
now that we're using ISL sizes to validate external dma-buf images,
these padding requirements are causing us to reject otherwise valid
images due to the size of the BO being too small.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c15b92ce11)
This is a bug in the app, but I'd rather avoid hanging the GPU,
esp if someone is running in validation and it takes out their
development environment.
v2: get it right, reverse the polarity.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 36a1b61321)
X/GLX can't handle them. This removes almost 500 GLX visuals that were
incorrectly exposed.
This is a less invasive version of Marek's .getCapability series.
Note: the patch is not applicable for master, but only for the 17.2
branch.
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f33d8af7aa "st/dri: add 32-bit RGBX/RGBA formats"
[Emil Velikov: commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported by valgrind at:
glsl_to_tgsi_visitor::visit(ir_expression*) (st_glsl_to_tgsi.cpp:1560)
When compiling the Deus Ex shaders.
Fixes: 28a5e7104 ("st/glsl_to_tgsi: handle precise modifier")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 06237fc9e1)
GLES/gl.h has historically provided some typedefs that are not
used in the API itself. Restore these typedefs that were lost to
avoid breaking applications.
These seem to be the only typedefs removed in the update.
Fixes: 7fd0817 "Update Khronos-supplied headers"
[Eric: added a big warning to revert this patch when pulling the updated header]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 3db05ed1d1)
This reverts commit 3008161d28,
which caused a regression for VMWare.
The initial code had some recursion in it, that I removed by accident
trying to add back the recursion broke lots of things, take the high
road and revert for now.
Fixes: 3008161d (st_glsl_to_tgsi: rewrite rename registers to use array fully.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b8bea9a050)
This fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*
for a2r10g10b10 formats as destination on SI/CIK hardware.
This adds support to the meta program for emitting 10-bit
outputs, and adds 10-bit support to the fragment shader key.
It also only does the int8/10 on SI/CIK.
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit df61a05019)
In some APU situations the reported visible size can be larger than
VRAM size. This properly clamps the value.
Surprisingly both CTS and spec seem to allow a heap type with size 0,
so this seemed like the easiest option to me.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 4ae84efbc5 "radv: Use enum for memory heaps."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 8229706ad8)
If dual object compile fails (as seems to happen with virgl a
fair bit, and does piglit even have any tests for it?), we end up
not restarting the pull params, so we call
vec4_visitor::move_uniform_array_access_to_pull_constant
a second time and it runs over the ends of the alloc.
Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test
running inside virgl on ivybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 271fa3a684)
The cacheline alignment restriction is on the base address; the pitch
can be anything.
Fixes assertion failures when using primus (say, on glxgears, which
creates a 300x300 linear BGRX surface with a pitch of 1200):
intel_blit.c:190: get_blit_intratile_offset_el: Assertion `mt->surf.row_pitch % 64 == 0' failed.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 595a47b829)
This implements a wait for glXWaitGL, glXCopySubBuffer, dri flush_front and
creation of fake front until all pending SwapBuffers have been committed to
hardware. Among other things this fixes piglit glx-copy-sub-buffers on dri3.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 185ef06fd2)
The issue here is that the immediate is treated as a 64-bit value,
and fetching it does not work reliably with swizzles that are different
from xy and zw.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit da83687c4b)
This looks like it's supported since llvm 3.9 at least,
so switch over radeonsi and radv to using it, -pro also
uses this. We can now drop creating lds for these operations
as the ds_swizzle operation doesn't actually write to lds at all.
Acked-by: Marek Olšák <marek.olsak@amd.com>
(stable requested due to fixing radv CIK conformance tests)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cb6f16dce9)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/amd/common/ac_nir_to_llvm.c
This makes it match radeonsi. The LLVM backend itself will emit the
correct instruction, but LLVM might do incorrect optimizations since it
thinks the output is undefined when the input is 0, even though it's not
supposed to be. We really need a new intrinsic, or for the backend to
become smarter and recognize this pattern.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
(cherry picked from commit 6d731c5651)
Otherwise, code generation fails. This has become necessary since some
shaders are wrapped in control flow.
Fixes: 081ac6e5c6 ("radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2879a602dd)
The slower convert-and-copy process performs a bad conversion
because it converts the value to signed 64-bit integer, but
bindless uniform handles are considered unsigned 64-bit.
This fixes "Check glUniform*() with mixed texture units/handles"
from arb_bindless_texture-uniform piglit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b38c9c57f2)
Since array splitting for AoA is disabled, we have to retrieve
the type of the first non-array type when an array of images is
declared inside a structure. Otherwise, it will hit an assert
in glsl_type::sampler_index() because it expects either a sampler
or an image type.
This fixes a regression in the following piglit test:
arb_bindless_texture/compiler/images/arrays-of-struct.frag
Fixes: 57165f2ef8 ("glsl: disable array splitting for AoA")
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f99e9335e2)
On SI this was causing a hang in
dEQP-VK.pipeline.render_to_image.core.2d_array.mipmap.r16g16_sint_s8_uint
This was due to not handling the tile mode index for depth like
I fixed previously for new GPUs.
Fixes: 01d0c5a9 (radv: fix stencil regression since new addrlib import)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 800d162209)
The host doesn't understand this yet, so drop it for now.
Fixes: virgl regressions.
Fixes: af22adee4f (tgsi: add precise flag to tgsi_instruction)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 554aa09440)
This fixes corruption with bindless textures in Dawn Of War 3.
The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit f4d095cc65)
With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.
This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.
Fixes GL45-CTS.geometry_shader.rendering.rendering.*
v2: ensure that the TCS epilog doesn't run for non-existing patches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 081ac6e5c6)
The shader that is used to copy vertex data out of the vs/gs shaders to
the user-specified buffer (streamout or SO shader) was not using the
correct offsets.
Adjust the offsets that are used just for the SO shader:
- Make sure that position is handled in the same special way
as in the vs/gs shaders
- Use the correct offset to be passed in the core
- consolidate register slot mapping logic into one function, since it's
been calculated in 2 different places (one for calcuating the slot mask,
and one for the register offsets themselves
Also make room for all attibutes in the backend vertex area.
Fixes:
- all vtk GL2PS tests
- 18 piglit tests (16 ext_transform_feedback tests,
arb-quads-follow-provoking-vertex and primitive-type gl_points
v2:
- take care of more SGV slots in slot mapping logic
- trim feState.vsVertexSize
- fix GS interface and incorporate GS while calculating vsVertexSize
Note that vsVertexSize is used in the core as the one parameter that
controls vertex size between all stages, so it has to be adjusted appropriately
for the whole vs/gs/fs pipeline.
Also note that GS and SO is not fully implemented. This will be addressed
later.
fixes:
- fixes total of 20 piglit tests
CC: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit 194ff5eed1)
This ports 72e46c988 to radv.
radeonsi: apply a TC L1 write corruption workaround for SI
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e77ff11ffe)
This ports: da7453666a
radeonsi: don't apply the Z export bug workaround to Hainan
to radv.
Just noticed in passing.
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a81e99f50a)
Until we support sync fd, don't report the info.
Fixes CTS dEQP-VK.api.external.semaphore.sync_fd.* from crashing.
Fixes: eaa56eab6 (radv: initial support for shared semaphores (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6cbc8cf178)
Always initialise whandle.modifier for DRIImage modifier queries, so if
the driver doesn't support it then we return false for the query.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: d33fe8b84e ("st/dri: enable DRIimage modifier queries")
(cherry picked from commit 45383d32d4)
In the DRIImage queryImage hook, check if resource_get_handle() failed
and return FALSE if so.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b4a18f13ce)
If the underlying driver does not support modifiers, dmabuf will still
advertise formats through the 'modifier' event, but send them with an
invalid modifier. Ignore them if this is the case, rather than passing
them through to the driver.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
(cherry picked from commit dd072cf4b1)
With commit 5124bf9823, a framebuffer interface hash table is
created in st_gl_api_create(), which is called in
dri_init_screen_helper() for each screen. When the hash table is
overwritten with multiple calls to st_gl_api_create(), it can cause
race condition. This patch fixes the problem by creating a
framebuffer interface hash table per state tracker manager.
Fixes crash with steam.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876
Fixes: 5124bf9823 ("st/mesa: add destroy_drawable interface")
Tested-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit bbc29393d3)
Squashed with:
st/mesa: fix unconditional return in st_framebuffer_iface_remove
Noticed by James Legg @ Feral.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 914f11e75b)
Squashed with:
st/mesa: Fix inversed test in st_api_destroy_drawable
Fixes a drawable leak.
Fixes: bbc29393d3 ("st/mesa: create framebuffer iface hash table per
st manager")
Bugzilla: https://bugs.freedesktop.org/101930
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 57132d126f)
We'll fail to flag an error if the context flags appear after the
no-error attribute in the context attribute list.
Delay the check to after attribute parsing to fix this.
Fixes: 4909519a66 ("egl: Add EGL_KHR_create_context_no_error support")
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: add fixes/stable tags, commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 39bf7756b9)
The number of supported waves per thread group has been reduced to 16
with gfx9. Trying to use 32 waves causes hangs, and barriers might
not work correctly with > 16 waves.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a0e6b9a2db)
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).
While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 65fbaab0b7)
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 31f1863ace)
Increase the value, not the pointer to the stack variable.
Caught by Coverity (CID 1415574). Not shipped in a real release.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit f6e674fa51)
Fixes: 63a43f4161 ("i965: Refactor miptree to isl converter and adjustment")
I don't know how I managed to leave this here for so long. Found when
working on a 1:1 overlapping blit extension for X11.
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 93fec49a75)
As Chris commented, it makes more sense to have batch buffer flushes
before the query. Usually applications like frame_retrace do a series
of queries and in that case, with flushes at the end of the queries,
we might still have the first query contained in 2 different batchs.
More generally it would be quite usual to have the query contained in
2 batch buffers because we never now what's the fill rate of the
current batch buffer.
If we move the flushing at the beginning of the queries, it's pretty
much guaranteed that queries will be contained in a single batch
buffer (unless the amount of commands is huge, but then it's only fair
to include reloading request times in the measurements).
Fixes: adafe4b733 ("i965: perf: minimize the chances to spread queries across batchbuffers")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9f439ae120)
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77240">Bug 77240</a> - khrplatform.h not installed if EGL is disabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95530">Bug 95530</a> - Stellaris - colored overlay of sectors doesn't render on i965</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96449">Bug 96449</a> - Dying Light reports OpenGL version 3.0 with mesa-git</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96958">Bug 96958</a> - [SKL] Improper rendering in Europa Universalis IV</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97957">Bug 97957</a> - Awful screen tearing in a separate X server with DRI3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98238">Bug 98238</a> - Witcher 2: objects are black when changing lod on Radeon Pitcairn</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98428">Bug 98428</a> - Undefined non-weak-symbol in dri-drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99467">Bug 99467</a> - [radv] DOOM 2016 + wine. Green screen everywhere (but can be started)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100937">Bug 100937</a> - Mesa fails to build with GCC 4.8</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100945">Bug 100945</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100988">Bug 100988</a> - glXGetCurrentDisplay() no longer works for FakeGLX contexts?</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101071">Bug 101071</a> - compiling glsl fails with undefined reference to `pthread_create'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101088">Bug 101088</a> - `gallium: remove pipe_index_buffer and set_index_buffer` causes glitches and crash in gallium nine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101110">Bug 101110</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101189">Bug 101189</a> - Latest git fails to compile with radeon</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101252">Bug 101252</a> - eglGetDisplay() is not thread safe</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101254">Bug 101254</a> - VDPAU videos don't start playing with r600 gallium driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101340">Bug 101340</a> - i915_surface.c:108:4: error: too few arguments to function ‘util_blitter_default_src_texture’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101360">Bug 101360</a> - Assertion failure comparing result of ballotARB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101401">Bug 101401</a> - [REGRESSION][BISECTED] GDM fails to start after 8ec4975cd83365c791a1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101418">Bug 101418</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101538">Bug 101538</a> - From "Use isl for hiz layouts" commit onwards, everything crashes with Mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101539">Bug 101539</a> - [Regresion] [IVB] Segment fault in recent commit in intel_miptree_level_has_hiz under Ivy bridge</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101558">Bug 101558</a> - [regression][bisected] MPV playing video via opengl "randomly" results in only part of the window / screen being rendered with Mesa GIT.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101596">Bug 101596</a> - Blender renders black UI elements</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101607">Bug 101607</a> - Regression in anisotropic filtering from "i965: Convert fs sampler state to use genxml"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101657">Bug 101657</a> - strtod.c:32:10: fatal error: xlocale.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101666">Bug 101666</a> - bitfieldExtract is marked as a built-in function on OpenGL ES 3.0, but was added in OpenGL ES 3.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101683">Bug 101683</a> - Some games hang while loading when compositing is shut off or absent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101703">Bug 101703</a> - No stencil buffer allocated when requested by GLUT</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101704">Bug 101704</a> - [regression][bisected] glReadPixels() from pbuffer failing in Android CTS camera tests</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101766">Bug 101766</a> - Assertion `!"invalid type"' failed when constant expression involves literal of different type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101774">Bug 101774</a> - gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101775">Bug 101775</a> - Xorg segfault since 147d7fb "st/mesa: add a winsys buffers list in st_context"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101829">Bug 101829</a> - read-after-free in st_framebuffer_validate</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101831">Bug 101831</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101851">Bug 101851</a> - [regression] libEGL_common.a undefined reference to '__gxx_personality_v0'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101867">Bug 101867</a> - Launch options window renders black in Feral Games in current Mesa trunk</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101876">Bug 101876</a> - SIGSEGV when launching Steam</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102024">Bug 102024</a> - FORMAT_FEATURE_SAMPLED_IMAGE_BIT not supported for D16_UNORM and D32_SFLOAT</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102148">Bug 102148</a> - Crash when running qopenglwidget example on mesa llvmpipe win32</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102573">Bug 102573</a> - fails to build on armel</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102844">Bug 102844</a> - memory leak with glDeleteProgram for shader program type GL_COMPUTE_SHADER</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102847">Bug 102847</a> - swr fail to build with llvm-5.0.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102904">Bug 102904</a> - piglit and gl45 cts linker tests regressed</li>
</ul>
<h2>Changes</h2>
<p>Alexandru-Liviu Prodea (1):</p>
<ul>
<li>Scons: Add LLVM 5.0 support</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: Check for GFX9 for 1D arrays in image_size intrinsic.</li>
</ul>
<p>Boris Brezillon (1):</p>
<ul>
<li>broadcom/vc4: Fix infinite retry in vc4_bo_alloc()</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv/nir: call opt_remove_phis after trivial continues.</li>
<li>ac/surface: handle S8 on gfx9</li>
<li>st/glsl->tgsi: fix u64 to bool comparisons.</li>
</ul>
<p>David Airlie (1):</p>
<ul>
<li>radv: add gfx9 scissor workaround</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.1</li>
<li>automake: enable libunwind in `make distcheck'</li>
</ul>
<p>Eric Anholt (4):</p>
<ul>
<li>broadcom/vc4: Fix use-after-free for flushing when writing to a texture.</li>
<li>broadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.</li>
<li>broadcom/vc4: Fix use-after-free when deleting a program.</li>
<li>broadcom/vc4: Keep pipe_sampler_view->texture matching the original texture.</li>
</ul>
<p>Gert Wollny (2):</p>
<ul>
<li>travis: force llvm-3.3 for "make Gallium ST Other"</li>
<li>travis: Add libunwind-dev to gallium/make builds</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>configure: check if -latomic is needed for __atomic_*</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>nv20: Fix GL_CLAMP</li>
</ul>
<p>Jason Ekstrand (6):</p>
<ul>
<li>i965/blorp: Set r8stencil_needs_update when writing stencil</li>
<li>vulkan/wsi/wayland: Stop printing out the DRM device</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101832">Bug 101832</a> - [PATCH][regression][bisect] Xorg fails to start after f50aa21456d82c8cb6fbaa565835f1acc1720a5d</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102852">Bug 102852</a> - Scons: Support the new Scons 3.0.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102940">Bug 102940</a> - Regression: Vulkan KMS rendering crashes since 17.2</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Add R16G16B16A16_SNORM fast clear support</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>nir/spirv: Allow loop breaks in a switch body.</li>
<li>radv: Only set the MTYPE flags on GFX9+.</li>
</ul>
<p>Ben Crocker (4):</p>
<ul>
<li>gallivm: fix typo in debug_printf message</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103388">Bug 103388</a> - Linking libcltgsi.la (llvm/codegen/libclllvm_la-common.lo) fails with "error: no match for 'operator-'" with GCC-7, Mesa from Git and current LLVM revisions</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (8):</p>
<ul>
<li>cherry-ignore: configure.ac: rework llvm detection and handling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97532">Bug 97532</a> - Regression: GLB 2.7 & Glmark-2 GLES versions segfault due to linker precision error (259fc505) on dead variable</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103787">Bug 103787</a> - [BDW,BSW] gpu hang on spec.arb_pipeline_statistics_query.arb_pipeline_statistics_query-comp</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94739">Bug 94739</a> - Mesa 11.1.2 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101378">Bug 101378</a> - interpolateAtSample check for input parameter is too strict</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102435">Bug 102435</a> - [skl,kbl] [drm] GPU HANG: ecode 9:0:0x86df7cf9, in csgo_linux64 [4947], reason: Hang on rcs, action: reset</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102552">Bug 102552</a> - Null dereference due to not checking return value of util_format_description</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103412">Bug 103412</a> - gallium/wgl: Another fix to context creation without prior SetPixelFormat()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103616">Bug 103616</a> - Increased difference from reference image in shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103966">Bug 103966</a> - Mesa 17.2.5 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104119">Bug 104119</a> - radv: OpBitFieldInsert produces 0 with a loop counter for Insert</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.