extra: The commit references a previous commit in which the changes
should have been included but, as clarified by the developer, it is
not needed for stable.
Signed-off-by: Andres Gomez <agomez@igalia.com>
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.
That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).
If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.
Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
(cherry picked from commit 8620f7ebbc)
This function is only used in two places:
1. VMware driver, but only for HUD reporting
2. st/nine state tracker, used for texture memory accounting
Fixes: a69efa9482 ("util: add new util_resource_size() function in
u_resource.[ch]")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit dde8309cde)
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045fee), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).
v2: set it in all shader stages.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3835009796)
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit aab0bfc648)
VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.
The fix is also verified on VCE physical mode card.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 6d74cb2570)
(cherry picked from commit 6a353479a7)
Squashed with:
util: Use preprocessor correctly
Fixes: 6a353479a7 ("util: Assume little endian in the absence of
platform-specific handling")
(cherry picked from commit b8cbad624b)
Squashed with:
util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC
MSVC doesn't support #warning?! Getting really tired of this.
(cherry picked from commit 676761252b)
Squashed with:
util: Also include endian.h on cygwin
If u_endian.h can't determine the endianess, the default behaviour in sha1.c
is to build for big-endian
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 2c62ccb10a)
>From GLSL 4.5 spec, section "7.1 Built-In Language Variables", page 130 of
the PDF states:
"If multiple shaders using members of a built-in block belonging to
the same interface are linked together in the same program, they must
all redeclare the built-in block in the same way, as described in
section 4.3.9 “Interface Blocks” for interface-block matching, or a
link-time error will result."
Fixes:
* GL45-CTS.CommonBugs.CommonBug_PerVertexValidation
v2 (Neil Roberts):
Explicitly look for gl_PerVertex in the symbol tables instead of
waiting to find a variable in the interface.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102677
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit f9de7f5596)
This effectively factorizes a couple of similar routines.
v2 (Neil Roberts): Non-trivial rebase on master
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit f5fe99ac85)
Some symbols gathered in the symbols table during parsing are needed
later for the compile and link stages, so they are moved along the
process. Currently, only functions and non-temporary variables are
copied between symbol tables. However, the built-in gl_PerVertex
interface blocks are also needed during the linking stage (the last
step), to match re-declared blocks of inter-stage shaders.
This patch adds a new utility function that will factorize current code
that copies functions and variables between two symbol tables, and in
addition will copy explicitly declared gl_PerVertex blocks too.
The function will be used in a subsequent patch.
v2 (Neil Roberts):
Allow the src symbol table to be NULL and explicitly copy the
gl_PerVertex symbols in case they are not referenced in the exec_list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit 4c62a270a9)
The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.
So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit ca63a5ed3e)
The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):
"For all of the interpolation functions, interpolant must be an l-value
from an in declaration; this can include a variable, a block or
structure member, an array element, or some combination of these.
Component selection operators (e.g., .xy) may be used when specifying
interpolant."
For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.
v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
(the ES WG decided that the relaxed rules may apply in a future version
but not retroactively; see also
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 4f42450b86)
GLSL shaders can access the normal scale factor with the built-in
gl_NormalScale. Mesa's modelspace lighting optimization uses a different
normal scale factor than defined in the spec. We have to take care not
to use this factor for gl_NormalScale.
Mesa already defines two seperate states: state.normalScale and
state.internal.normalScale. The first is used by the glsl compiler
while the later is used by the fixed function T&L pipeline. Previously
the only difference was some component swizzling. With this commit
state.normalScale always uses the normal scale factor for eyespace
lighting.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit c3ee464d7a)
spotExponent and spotCosCutoff were swapped in the
gl_builtin_uniform_element struct.
Now the order matches across gl_builtin_uniform_element,
glsl_struct_field and the spec.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 9bdb5457f4)
We failed to take the start into account for how many vertices to draw in
this round, so we would end up decrementing count below 0, which as an
unsigned number meant we would loop until the CLs soon ran out of space.
When I wrote the code I was thinking about how to use the previously
emitted shader state (no index bias baked into the elements) by emitting
up to 65535 and then only re-emitting with bias for the second wround, but
that doesn't work if the start is over 65535. Instead, just delay
emitting shader state until we get into the drawarrays GFXH-515 loop and
always bake the bias in when we're doing the workaround.
(cherry picked from commit 84ab48c15c)
The picture_id was assumed to be a frame number so in 0-31.
But the vaapi client gstreamer-vaapi uses the surfaces handles
as identifier which are unsigned int.
This bug can happen when using a lot of vaapi surfaces within
the same process. Indeed Mesa/st/va increments a counter for the
surface ID: mesa/util/u_handle_table.c::handle_table_add which
starts from 0 and incremented by 1 at each call.
So creating more than 32 surfaces was a problem.
The following bug contains a test that reproduces the problem
by running a couple of vaapih264enc in the same process. The
above also explains why there was no pb when running them in
separated processes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102006
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Tested-by: Tomas Rataj <rataj28@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-and-tested-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
(cherry picked from commit 91d93aa621)
64-bit operations on Atom parts have additional restrictions over their
big-core counterparts (validated by later patches).
Specifically, the restriction that "Source and Destination horizontal
stride must be aligned to the same qword" is violated by most shift
operations since NIR uses a 32-bit value as the shift count argument,
and this causes instructions like
shl(8) g19<1>Q g5<4,4,1>Q g23<4,4,1>UD
where src1 has a 32-bit stride, but the dest and src0 have a 64-bit
stride.
This caused ~4 pixels in the ARB_shader_ballot piglit test
fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no
ARB_gpu_shader_int64 test hit this case because they operate on
uniforms, and their scalar regions are an exception to the restriction.
We work around this by effectively unpacking the shift count, so that we
can read it with a 64-bit stride in the shift instruction. Unfortunately
the unpack (a MOV with a dst stride of 2) is a partial write, and cannot
be copy-propagated or CSE'd.
Bugzilla: https://bugs.freedesktop.org/101984
(cherry picked from commit b541945c20)
Swr caches fb contents in tiles. Those tiles are stored on a per-context
basis.
When switching contexts that share resources we need to make sure that
the tiles of the old context are being stored and the tiles of the new
context are being invalidated (marked as invalid, hence contents need
to be reloaded).
The context does not get any dirty bits to identify this case. This has
to be, then, coordinated by the resources that are being shared between
the contexts.
Add a "curr_pipe" hook in swr_resource that will allow us to identify a
MakeCurrent of the above form during swr_update_derived(). At that time,
we invalidate the tiles of the new context. The old context, will need to
have already store its tiles by that time, which happens during glFlush().
glFlush() is being called at the beginning of MakeCurrent.
So, the sequence of operations is:
- At the beginning of glXMakeCurrent(), glFlush() will store the tiles
of all bound surfaces of the old context.
- After the store, a fence will guarantee that the all tile store make
it to the surface
- During swr_update_derived(), when we validate the new context, we check
all resources to see what changed, and if so, we invalidate the
current tiles.
Fixes rendering problems with CEI/Ensight.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit b9aa0fa7d6)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103732
This fixes a crash in:
KHR-GL45.enhanced_layouts.xfb_block_stride
Fixes: 0822517936 "glsl: add helper to process xfb qualifiers during linking"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d53ccccb2)
The flag is on the pipe_resource, not the r600_resource.
I don't see an obvious bug related to this, but it could potentially lead
to suboptimal placement of some resources.
Fixes: a41587433c ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 5e2962c949)
deviceName string is declared, assigned and freed but actually
never used in dri3_create_screen() function.
Fixes: 2d94601582 ("Add DRI3+Present loader")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit d555929239)
Fix incomplete check of input params in blorp_surf_convert_to_uncompressed()
which can lead to NULL pointer dereferencing.
Fixes: 5ae8043fed ("intel/blorp: Add an entrypoint for doing
bit-for-bit copies")
Fixes: f395d0abc8 ("intel/blorp: Internally expose
surf_convert_to_uncompressed")
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit cdb3eb7174)
[Emil Velikov: drop non-applicable x/y hunk]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/intel/blorp/blorp_blit.c
We want to program the 3DSTATE_RASTER field to the gl_context value,
not the other way around.
Fixes: 13ac46557a (i965: Port Gen8+ 3DSTATE_RASTER state to genxml.)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 760e0156df)
nir_validate.c's #endif already had the correct NDEBUG comment
Fixes: dcb1acdea0 "nir/validate: Only build in debug mode"
Fixes: 9ff71b649b "i965/nir: Validate that NIR passes call nir_metadata_preserve()"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 7b85b9b877)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/compiler/nir/nir.h
The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.
The following program
unsigned char array[4] = {1, 2, 3, 4};
int *ptr = (int *)array;
for (int i = 0; i < 4; i++) {
printf("%02x", array[i]);
}
printf("\n");
printf("%08x\n", *ptr);
prints
01020304
04030201
on little endian, and
01020304
01020304
on big endian.
On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.
To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.
Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab ("glsl: Add initial functions to implement an
on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c690a7a8cd)
The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.
Fixes: d1efa09d34 ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 532674303a)
This helps avoid compiler warningss in the next commit - everything
was initialized, but it wasn't obvious to static analysis.
Suggested-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit d6d16c0218)