Compare commits

...

498 Commits

Author SHA1 Message Date
Andres Gomez
3281775ab9 docs: add sha256 checksums for 17.2.8
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-23 00:54:11 +02:00
Andres Gomez
3482790712 docs: add release notes for 17.2.8
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-22 22:39:47 +02:00
Andres Gomez
fb618171b6 Update version to 17.2.8
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-22 22:34:12 +02:00
Andres Gomez
d9dd945d5c cherry-ignore: r600: set DX10_CLAMP for compute shader too
extra: The commit references a previous commit in which the changes
should have been included but, as clarified by the developer, it is
not needed for stable.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-20 19:40:38 +02:00
Andres Gomez
03538c210f cherry-ignore: main: Clear shader program data whenever ProgramBinary is called
extra: The commit just references a fix for an additional change in
its v2.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-20 19:40:38 +02:00
Andres Gomez
dd0b6dfd91 cherry-ignore: radv: port merge tess info from anv
fixes: This commit addressed earlier commit d1c9f30d7f which did not
land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-20 19:40:38 +02:00
Andres Gomez
fb342ac0c8 cherry-ignore: added 17.3 nominations.
stable: 17.3 nominations only.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-20 19:40:38 +02:00
Andres Gomez
aa8742864a cherry-ignore: swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.
stable: This commit addressed earlier commit ead0dfe31e which did
not land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-12-20 19:40:38 +02:00
Iago Toral Quiroga
5226e37717 i965/vec4: use a temp register to compute offsets for pull loads
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.

That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).

If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.

Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components

Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
(cherry picked from commit 8620f7ebbc)
2017-12-20 19:40:38 +02:00
Bas Nieuwenhuizen
a8ee722262 radv: Fix multi-layer blits.
We did not set the layer correctly for the dst, as we would keep
using the base layer. Same for the source image.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102710
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b42e106d4d)
2017-12-20 19:40:38 +02:00
Brian Paul
334ae3b0d2 gallium/aux: include nr_samples in util_resource_size() computation
This function is only used in two places:
1. VMware driver, but only for HUD reporting
2. st/nine state tracker, used for texture memory accounting

Fixes: a69efa9482 ("util: add new util_resource_size() function in
u_resource.[ch]")

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit dde8309cde)
2017-12-20 19:40:38 +02:00
Roland Scheidegger
3730b04e81 r600: use DX10_CLAMP bit in shader setup
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045fee), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).

v2: set it in all shader stages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3835009796)
2017-12-20 19:40:38 +02:00
Roland Scheidegger
5ffdc3a2b2 r600: use min_dx10/max_dx10 instead of min/max
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit aab0bfc648)
2017-12-20 19:40:38 +02:00
Brian Paul
9f94d6aa65 xlib: call _mesa_warning() instead of fprintf()
We use _mesa_warning() everywhere else in this code.  Change requested
by Rick Irons of Mathworks.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 7a46063803)
2017-12-20 19:40:38 +02:00
Bas Nieuwenhuizen
8f8b5c0ad3 spirv: Fix loading an entire block at once.
There is no chain, so  checking the length ends with a SEGFAULT.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103579
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit b926da241a)
[Andres Gomez: 16 bits types and vtn_assert/fail not yet in 17.2]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/compiler/spirv/vtn_variables.c
2017-12-20 19:40:38 +02:00
Leo Liu
2e6593806c radeon/vce: move destroy command before feedback command
VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.

The fix is also verified on VCE physical mode card.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 6d74cb2570)
2017-12-20 19:40:38 +02:00
Matt Turner
e0160dd368 util: Add a SHA1 unit test program
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 513d7ffa23)
(cherry picked from commit 60ed1a07f2)

Squashed with:

util: scons: wire up the sha1 test

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 5d03a68640)
(cherry picked from commit 455ff75892)
2017-12-20 19:40:22 +02:00
Matt Turner
2186a9ee6b util: Assume little endian in the absence of platform-specific handling
(cherry picked from commit 6a353479a7)

Squashed with:

util: Use preprocessor correctly

Fixes: 6a353479a7 ("util: Assume little endian in the absence of
                      platform-specific handling")
(cherry picked from commit b8cbad624b)

Squashed with:

util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC

MSVC doesn't support #warning?! Getting really tired of this.

(cherry picked from commit 676761252b)

Squashed with:

util: Also include endian.h on cygwin

If u_endian.h can't determine the endianess, the default behaviour in sha1.c
is to build for big-endian

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 2c62ccb10a)
2017-12-20 16:16:47 +02:00
Emil Velikov
9ca5f55786 docs: add sha256 checksums for 17.2.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-12-14 13:49:09 +00:00
Emil Velikov
834c36fa65 docs: add release notes for 17.2.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-12-14 13:27:23 +00:00
Emil Velikov
3200305737 Update version to 17.2.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-12-14 13:17:30 +00:00
Eduardo Lima Mitev
62eb86dd4c glsl/linker: Check that re-declared, inter-shader built-in blocks match
>From GLSL 4.5 spec, section "7.1 Built-In Language Variables", page 130 of
the PDF states:

    "If multiple shaders using members of a built-in block belonging to
     the same interface are linked together in the same program, they must
     all redeclare the built-in block in the same way, as described in
     section 4.3.9 “Interface Blocks” for interface-block matching, or a
     link-time error will result."

Fixes:
* GL45-CTS.CommonBugs.CommonBug_PerVertexValidation

v2 (Neil Roberts):
Explicitly look for gl_PerVertex in the symbol tables instead of
waiting to find a variable in the interface.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102677
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit f9de7f5596)
2017-12-08 18:27:01 +00:00
Eduardo Lima Mitev
2dc404f73f glsl: Use the utility function to copy symbols between symbol tables
This effectively factorizes a couple of similar routines.

v2 (Neil Roberts): Non-trivial rebase on master

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit f5fe99ac85)
2017-12-08 18:27:01 +00:00
Eduardo Lima Mitev
f76bea8abc glsl_parser_extra: Add utility to copy symbols between symbol tables
Some symbols gathered in the symbols table during parsing are needed
later for the compile and link stages, so they are moved along the
process. Currently, only functions and non-temporary variables are
copied between symbol tables. However, the built-in gl_PerVertex
interface blocks are also needed during the linking stage (the last
step), to match re-declared blocks of inter-stage shaders.

This patch adds a new utility function that will factorize current code
that copies functions and variables between two symbol tables, and in
addition will copy explicitly declared gl_PerVertex blocks too.

The function will be used in a subsequent patch.

v2 (Neil Roberts):
Allow the src symbol table to be NULL and explicitly copy the
gl_PerVertex symbols in case they are not referenced in the exec_list.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit 4c62a270a9)
2017-12-08 18:27:01 +00:00
Nicolai Hähnle
32b0691249 glsl: fix interpolateAtXxx(some_vec[idx], ...) with dynamic idx
The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.

So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit ca63a5ed3e)
2017-12-08 18:27:01 +00:00
Nicolai Hähnle
267fe4f1f9 glsl: allow any l-value of an input variable as interpolant in interpolateAt*
The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):

   "For all of the interpolation functions, interpolant must be an l-value
    from an in declaration; this can include a variable, a block or
    structure member, an array element, or some combination of these.
    Component selection operators (e.g., .xy) may be used when specifying
    interpolant."

For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.

v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
    (the ES WG decided that the relaxed rules may apply in a future version
     but not retroactively; see also
     dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 4f42450b86)
2017-12-08 18:27:01 +00:00
Fabian Bieler
07da9fd947 glsl: Fix gl_NormalScale.
GLSL shaders can access the normal scale factor with the built-in
gl_NormalScale.  Mesa's modelspace lighting optimization uses a different
normal scale factor than defined in the spec.  We have to take care not
to use this factor for gl_NormalScale.

Mesa already defines two seperate states: state.normalScale and
state.internal.normalScale.  The first is used by the glsl compiler
while the later is used by the fixed function T&L pipeline.  Previously
the only difference was some component swizzling.  With this commit
state.normalScale always uses the normal scale factor for eyespace
lighting.

Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit c3ee464d7a)
2017-12-08 18:27:01 +00:00
Fabian Bieler
fe15da8245 glsl: Match order of gl_LightSourceParameters elements.
spotExponent and spotCosCutoff were swapped in the
gl_builtin_uniform_element struct.
Now the order matches across gl_builtin_uniform_element,
glsl_struct_field and the spec.

Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 9bdb5457f4)
2017-12-08 18:27:01 +00:00
Eric Anholt
511cfe1d70 broadcom/vc4: Fix handling of GFXH-515 workaround with a start vertex count.
We failed to take the start into account for how many vertices to draw in
this round, so we would end up decrementing count below 0, which as an
unsigned number meant we would loop until the CLs soon ran out of space.

When I wrote the code I was thinking about how to use the previously
emitted shader state (no index bias baked into the elements) by emitting
up to 65535 and then only re-emitting with bias for the second wround, but
that doesn't work if the start is over 65535.  Instead, just delay
emitting shader state until we get into the drawarrays GFXH-515 loop and
always bake the bias in when we're doing the workaround.

(cherry picked from commit 84ab48c15c)
2017-12-08 18:27:01 +00:00
Julien Isorce
d95c2edcde st/va: change frame_idx from array to hash table
The picture_id was assumed to be a frame number so in 0-31.
But the vaapi client gstreamer-vaapi uses the surfaces handles
as identifier which are unsigned int.

This bug can happen when using a lot of vaapi surfaces within
the same process. Indeed Mesa/st/va increments a counter for the
surface ID: mesa/util/u_handle_table.c::handle_table_add which
starts from 0 and incremented by 1 at each call.
So creating more than 32 surfaces was a problem.

The following bug contains a test that reproduces the problem
by running a couple of vaapih264enc in the same process. The
above also explains why there was no pb when running them in
separated processes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102006
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Tested-by: Tomas Rataj <rataj28@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-and-tested-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
(cherry picked from commit 91d93aa621)
2017-12-08 18:27:01 +00:00
Denis Pauk
3f1c8733d2 gallium/{r600, radeonsi}: Fix segfault with color format (v2)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102552

v2: Patch cleanup proposed by Nicolai Hähnle.
    * deleted changes in si_translate_texformat.

Cc: Nicolai Hähnle <nhaehnle@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 74d2456491)
2017-12-08 18:27:01 +00:00
Vinson Lee
ac78f7c76a anv: Check if memfd_create is already defined.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103909
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 8c1e4b1afc)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	configure.ac
	meson.build
	src/intel/vulkan/anv_allocator.c
2017-12-08 18:27:01 +00:00
Matt Turner
2a266cb675 i965/fs: Unpack count argument to 64-bit shift ops on Atom
64-bit operations on Atom parts have additional restrictions over their
big-core counterparts (validated by later patches).

Specifically, the restriction that "Source and Destination horizontal
stride must be aligned to the same qword" is violated by most shift
operations since NIR uses a 32-bit value as the shift count argument,
and this causes instructions like

   shl(8)          g19<1>Q         g5<4,4,1>Q      g23<4,4,1>UD

where src1 has a 32-bit stride, but the dest and src0 have a 64-bit
stride.

This caused ~4 pixels in the ARB_shader_ballot piglit test
fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no
ARB_gpu_shader_int64 test hit this case because they operate on
uniforms, and their scalar regions are an exception to the restriction.

We work around this by effectively unpacking the shift count, so that we
can read it with a 64-bit stride in the shift instruction. Unfortunately
the unpack (a MOV with a dst stride of 2) is a partial write, and cannot
be copy-propagated or CSE'd.

Bugzilla: https://bugs.freedesktop.org/101984
(cherry picked from commit b541945c20)
2017-12-08 18:27:01 +00:00
Frank Richter
e53135725f gallium/wgl: fix default pixel format issue
When creating a context without SetPixelFormat() don't blindly take the
pixel format reported by GDI. Instead, look for our own closest pixel
format.

Minor clean-ups added by Brian Paul.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103412
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit bf41b2b262)
2017-12-08 18:27:01 +00:00
Juan A. Suarez Romero
7d19ff0d67 glsl: add varying resources for arrays of complex types
This patch is mostly a patch done by Ilia Mirkin.

It fixes KHR-GL45.enhanced_layouts.varying_structure_locations.

v2: fix locations for TCS/TES/GS inputs and outputs (Ilia)

CC: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103098
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit d5a641106b)
2017-12-08 18:27:01 +00:00
George Kyriazis
df137dff8f swr: Handle resource across context changes
Swr caches fb contents in tiles.  Those tiles are stored on a per-context
basis.

When switching contexts that share resources we need to make sure that
the tiles of the old context are being stored and the tiles of the new
context are being invalidated (marked as invalid, hence contents need
to be reloaded).

The context does not get any dirty bits to identify this case.  This has
to be, then, coordinated by the resources that are being shared between
the contexts.

Add a "curr_pipe" hook in swr_resource that will allow us to identify a
MakeCurrent of the above form during swr_update_derived().  At that time,
we invalidate the tiles of the new context.  The old context, will need to
have already store its tiles by that time, which happens during glFlush().
glFlush() is being called at the beginning of MakeCurrent.

So, the sequence of operations is:
- At the beginning of glXMakeCurrent(), glFlush() will store the tiles
  of all bound surfaces of the old context.
- After the store, a fence will guarantee that the all tile store make
  it to the surface
- During swr_update_derived(), when we validate the new context, we check
  all resources to see what changed, and if so, we invalidate the
  current tiles.

Fixes rendering problems with CEI/Ensight.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit b9aa0fa7d6)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103732
2017-12-08 18:27:01 +00:00
Timothy Arceri
edd1f519f4 glsl: get correct member type when processing xfb ifc arrays
This fixes a crash in:

KHR-GL45.enhanced_layouts.xfb_block_stride

Fixes: 0822517936 "glsl: add helper to process xfb qualifiers during linking"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d53ccccb2)
2017-12-08 18:27:00 +00:00
Nicolai Hähnle
f51950c413 radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check
The flag is on the pipe_resource, not the r600_resource.

I don't see an obvious bug related to this, but it could potentially lead
to suboptimal placement of some resources.

Fixes: a41587433c ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 5e2962c949)
2017-12-08 18:27:00 +00:00
Pierre Moreau
9c2f665863 nvc0/ir: Properly lower 64-bit shifts when the shift value is >32
Fixes: 61d7676df7 "nvc0/ir: add support for 64-bit shift lowering on SM20/SM30"

Fixes fs-shift-scalar-by-scalar.shader_test from piglit for the current
set-up:

uniform int64_t ival -0x7dfcfefbdf6536ff # bit pattern: 0x82030104209ac901
uniform uint64_t uval 0x1400000085010203
uniform int shl 36
uniform int shr 36
uniform int64_t iexpected_shl 0x09ac901000000000
uniform int64_t iexpected_shr -0x7dfcff0 # bit pattern: 0xfffffffff8203010
uniform uint64_t uexpected_shl 0x5010203000000000
uniform uint64_t uexpected_shr 0x0000000001400000
draw rect ortho 12 0 4 4

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 9bee12160b)
2017-12-08 18:27:00 +00:00
Vadym Shovkoplias
a552480249 glx/dri3: Remove unused deviceName variable
deviceName string is declared, assigned and freed but actually
never used in dri3_create_screen() function.

Fixes: 2d94601582 ("Add DRI3+Present loader")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit d555929239)
2017-12-08 18:27:00 +00:00
Vadym Shovkoplias
ecf8e4a64c intel/blorp: Fix possible NULL pointer dereferencing
Fix incomplete check of input params in blorp_surf_convert_to_uncompressed()
which can lead to NULL pointer dereferencing.

Fixes: 5ae8043fed ("intel/blorp: Add an entrypoint for doing
bit-for-bit copies")
Fixes: f395d0abc8 ("intel/blorp: Internally expose
surf_convert_to_uncompressed")
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit cdb3eb7174)
[Emil Velikov: drop non-applicable x/y hunk]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/intel/blorp/blorp_blit.c
2017-12-08 18:27:00 +00:00
Kenneth Graunke
14df2f2d27 i965: Fix Smooth Point Enables.
We want to program the 3DSTATE_RASTER field to the gl_context value,
not the other way around.

Fixes: 13ac46557a (i965: Port Gen8+ 3DSTATE_RASTER state to genxml.)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 760e0156df)
2017-12-08 18:27:00 +00:00
Eric Engestrom
83c85b28a4 compiler: use NDEBUG to guard asserts
nir_validate.c's #endif already had the correct NDEBUG comment

Fixes: dcb1acdea0 "nir/validate: Only build in debug mode"
Fixes: 9ff71b649b "i965/nir: Validate that NIR passes call nir_metadata_preserve()"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 7b85b9b877)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/compiler/nir/nir.h
2017-12-08 18:27:00 +00:00
Matt Turner
6a698f7f92 util: Fix disk_cache index calculation on big endian
The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.

The following program

        unsigned char array[4] = {1, 2, 3, 4};
        int *ptr = (int *)array;

        for (int i = 0; i < 4; i++) {
            printf("%02x", array[i]);
        }
        printf("\n");

        printf("%08x\n", *ptr);

prints

   01020304
   04030201

on little endian, and

   01020304
   01020304

on big endian.

On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.

To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.

Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab ("glsl: Add initial functions to implement an
                      on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c690a7a8cd)
2017-12-08 18:27:00 +00:00
Matt Turner
1cb18fc98b util: Fix SHA1 implementation on big endian
The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.

Fixes: d1efa09d34 ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 532674303a)
2017-12-08 18:27:00 +00:00
Matt Turner
4473097d92 i965/fs: Handle negating immediates on MADs when propagating saturates
MADs don't take immediate sources, but we allow them in the IR since it
simplifies a lot of things. I neglected to consider that case.

Fixes: 4009a9ead4 ("i965/fs: Allow saturate propagation to propagate
                      negations into MADs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616
Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit a05af1f7b8)
2017-12-08 18:27:00 +00:00
Ben Crocker
63f44f2e0f docs/llvmpipe: document ppc64le as alternative architecture to x86.
Power8, Power8NV, and Power9 are supported on an equal footing
with X86.

Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

[Eric: changed formatting, reworded a bit (with Ben's ack)]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 060eb314eb)
2017-12-08 18:27:00 +00:00
James Legg
773282c1f8 nir/opcodes: Fix constant-folding of bitfield_insert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119
CC: <mesa-stable@lists.freedesktop.org>
CC: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 947470d10b)
2017-12-08 18:27:00 +00:00
Alex Smith
6c9c07b0f9 radv: Add LLVM version to the device name string
Allows apps to determine the LLVM version so that they can decide
whether or not to enable workarounds for LLVM issues.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 8fda98c4f1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/vulkan/radv_device.c
	src/amd/vulkan/radv_private.h
2017-12-08 18:27:00 +00:00
Kenneth Graunke
85724a9a69 meta: Fix ClearTexture with GL_DEPTH_COMPONENT.
We only handled unpacking for GL_DEPTH_STENCIL formats.

Cemu was hitting _mesa_problem() for an unsupported format in
_mesa_unpack_float_32_uint_24_8_depth_stencil_row(), because the
format was depth-only, rather than depth-stencil.

Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94739
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103966
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 8705ed13e3)
2017-12-08 18:27:00 +00:00
Kenneth Graunke
47482090ea meta: Initialize depth/clear values on declaration.
This helps avoid compiler warningss in the next commit - everything
was initialized, but it wasn't obvious to static analysis.

Suggested-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit d6d16c0218)
2017-12-08 18:27:00 +00:00
Gert Wollny
b3b08d534c r600/sb: do not convert if-blocks that contain indirect array access
If an array is accessed within an if block, then currently it is not known
whether the value in the address register is involved in the evaluation of the
if condition, and converting the if condition may actually result in
out-of-bounds array access. Consequently, if blocks that contain indirect array
access should not be converted.

Fixes piglits on r600/BARTS:
spec/glsl-1.10/execution/variable-indexing/
  vs-output-array-float-index-wr
  vs-output-array-vec3-index-wr
  vs-output-array-vec4-index-wr

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6c268ea79a)
2017-12-08 18:27:00 +00:00
Emil Velikov
76b54923c0 cherry-ignore: radeonsi: allow DMABUF exports for local buffers
Commit depends on at least 8b3a257851, which did not land in branch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-12-08 18:27:00 +00:00
Marek Olšák
4d0ec672cb radeonsi: flush the context after resource_copy_region for buffer exports
Cc: 17.2 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 5e805cc74b)
[Emil Velikov: s/si_texture_disable_dcc/r600_texture_disable_dcc/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/radeon/r600_texture.c
2017-12-08 18:27:00 +00:00
Jason Ekstrand
b50154eff7 i965: Disable regular fast-clears (CCS_D) on gen9+
This partially reverts commit 3e57e9494c
which caused a bunch of GPU hangs on several Source titles.  To date, we
have no clue why these hangs are actually happening.  This undoes the
final effect of 3e57e9494c and gets us back to not hanging.  Tested
with Team Fortress 2.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102435
Fixes: 3e57e9494c
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ee57b15ec7)
2017-12-08 18:27:00 +00:00
Marek Olšák
a995d6bb78 radeonsi/gfx9: fix importing shared textures with DCC
VI has 11 dwords at least. GFX9 has 10 dwords.

Cc: 17.2 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit ed4780383c)
[Emil Velikov: s|radeon/r600_texture.c|radeonsi/si_state.c|]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/radeon/r600_texture.c
2017-12-08 17:30:34 +00:00
Marek Olšák
e893ae4d9d radeonsi: fix layered DCC fast clear
Cc: 17.2 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 6863651bbd)
2017-12-08 17:30:34 +00:00
Dave Airlie
de438f1569 r600/sb: handle jump after target to end of program. (v2)
This fixes hangs on cayman with
tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test

This has a single if/else in it, and when this peephole activated,
it would set the jump target to NULL if there was no instruction
after the final POP. This adds a NOP if we get a jump in this case,
and seems to fix the hangs, so we have a valid target for the ELSE
instruction to go to, instead of 0 (which causes infinite loops).

v2: update last_cf correctly. (I had some other patches hide this)

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 579ec9c311)
2017-12-08 17:30:34 +00:00
Ben Crocker
ab5585ff9f docs/llvmpipe.html: Minor edits
Language and spelling fixups in three places.

Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

[Eric: move two fixes from the other patch to this one.]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit b43daf7bf6)
2017-12-08 17:30:34 +00:00
Kai Wasserbäch
b25baea79d docs: Point to apt.llvm.org for development snapshot packages
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit d25123e23a)
2017-12-08 17:30:34 +00:00
Tapani Pälli
4f97100a97 mesa/gles: adjust internal format in glTexSubImage2D error checks
When floating point textures are created on OpenGL ES 2.0, driver
is free to choose used internal format. Mesa makes this decision in
adjust_for_oes_float_texture. Error checking for glTexImage2D properly
checks that sized formats are not used. We use same error checking
path for glTexSubImage2D (since there is lot of overlap), however since
those checks include internalFormat checks, we need to pass original
internalFormat passed by the client. Patch adds oes_float_internal_format
that does reverse adjust_for_oes_float_texture to get that format.

Fixes following test failure:
   ES2-CTS.gtf.GL2ExtensionTests.texture_float.texture_float

(when running test with MESA_GLES_VERSION_OVERRIDE=2.0)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103227
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 1e508e10d9)
2017-12-08 17:30:34 +00:00
Emil Velikov
779a028fbc gl_table.py: add extern C guard for the generated glapitable.h
The header can be included from C++, hence contents should have
appropriate notation.

Cc: mesa-stable@lists.freedesktop.org
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit c7616ac069)
2017-12-08 17:30:34 +00:00
Gert Wollny
c2ff8e3e5d r600: Emit EOP for more CF instruction types
So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END,
CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction
was added to add the EOP flag, even though this is not actually
needed, because all these instrutions support the EOP flag.

This patch removes the fixup code, adds setting the EOP flag for the
according instructions as well as others like CF_OP_TEX and CF_OP_VTX,
and adds writing out EOP for this type of instruction in the disassembler.

This also fixes a bug where shaders were created that didn't actually have
the EOP flag set in the last CF instruction, which might have resulted
in GPU lockups.

[airlied: cleaned up a little]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 1d076aafbc)
2017-12-08 17:30:34 +00:00
Ilia Mirkin
a8a5a97f46 glsl: fix derived cs variables
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.

In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.

While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.

Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.

Reported-by: Stephane Chevigny <stephane.chevigny@polymtl.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4d24a7cb97)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/compiler/glsl/meson.build
2017-12-08 17:30:34 +00:00
Andres Gomez
b9b60dbf55 docs: remove bug 103626 from fix list as per 17.2.6
Bug https://bugs.freedesktop.org/show_bug.cgi?id=103626 was
incorrectly listed as fixed.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-26 02:15:43 +02:00
Andres Gomez
93c2beafc0 docs: add sha256 checksums for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-26 01:40:36 +02:00
Andres Gomez
00b52f8e99 docs: add release notes for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-26 01:32:53 +02:00
Andres Gomez
6f34ba40f3 Update version to 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-26 01:26:34 +02:00
Andres Gomez
9888913a57 cherry-ignore: Revert "intel/fs: Use a pure vertical stride for large register strides"
extra: The commit just references a proper fix that has already
landed.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:24 +02:00
Andres Gomez
a231bbd639 cherry-ignore: egl: pass the dri2_dpy to the $plat_teardown functions
fixes: This commit makes reference to 2 other commits but none have
made it to the 17.2 queue.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:24 +02:00
Andres Gomez
f6d3eed513 cherry-ignore: glsl: Fix typo fragement -> fragment
fixes: This commit is only a typo correction on an error message.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:24 +02:00
Andres Gomez
ba0f965771 cherry-ignore: added 17.3 nominations.
stable: 17.3 nominations only.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:24 +02:00
Andres Gomez
8cd5bd2a8a cherry-ignore: i965: Mark BOs as external when we export their handle
stable: These commits addressed earlier commit 2c4097aff1 which did
not land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:24 +02:00
Andres Gomez
807da6e270 cherry-ignore: anv/cmd_buffer: Take bo_offset into account in fast clear state addresses
stable: This commit addressed earlier commit a62a979335 which did
not land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:24 +02:00
Andres Gomez
c9325693f1 cherry-ignore: anv/cmd_buffer: Advance the address when initializing clear colors
stable: This commit depends on earlier commit 3735af0415 which did
not land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:24 +02:00
Andres Gomez
1519933ce2 cherry-ignore: r600/shader: reserve first register of vertex shader.
stable: This commit addressed earlier commit ea1b97714d which did
not land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:23 +02:00
Andres Gomez
d255b77237 cherry-ignore: intel/fs: refactors
stable: These commits are refactorings rather than fixes.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:23 +02:00
Andres Gomez
96be105d41 cherry-ignore: intel/fs: Use the original destination region for int MUL lowering
stable: These commits resulted in a CTS regression being addressed at
https://bugs.freedesktop.org/show_bug.cgi?id=103626 .

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:23 +02:00
Andres Gomez
010751e95d cherry-ignore: intel/nir: Use the correct indirect lowering masks in link_shaders
stable: These commits addressed earlier commit 379b24a40d which did
not land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:23 +02:00
Andres Gomez
f562d93049 cherry-ignore: intel/fs: Use a pure vertical stride for large register strides
stable: This commit is not really needed after 6ac2d16901.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-22 18:41:23 +02:00
Nicolai Hähnle
49277c6e36 ddebug: fix use-after-free of streamout targets
Fixes: b47727a83a ("ddebug: implement pipelined hang detection mode")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 16f8da2997)
2017-11-22 18:41:23 +02:00
George Barrett
ae629ca805 glsl: Catch subscripted calls to undeclared subroutines
generate_array_index fails to check whether the target of a subroutine
call exists in the AST, potentially passing around null ir_rvalue
pointers eventuating in abort/segfault.

Fixes: fd01840c0b ("glsl: add AoA support to subroutines")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438
(cherry picked from commit f09c2cefdd)
2017-11-22 18:41:23 +02:00
Kenneth Graunke
a445658178 i965: Upload invariant state once at the start of the batch on Gen4-5.
We want to emit invariant state at the start of a render batch.  In the
past, this more or less happened: a new batch flagged BRW_NEW_CONTEXT
(because we don't have hardware contexts), which triggered the
brw_invariant_state atom.  So, it would be emitted before any 3D
drawing.  (Technically, there might be some BLT commands in the batch
because Gen4-5 have a single combined render/BLT ring, but that should
be harmless).

With the advent of BLORP, this broke.  The first item in a batch might
be a BLORP operation, which bypasses the normal draw upload path.  So,
we need to ensure invariant state happens first.  To do that, we just
upload it when creating a new batch.  On Gen6+ we'd need to worry about
whether it's a RENDER or BLT batch, but because we have a combined ring,
this approach should work fine on Gen4-5.

Seems to fix GPU hangs when playing hardware accelerated video with
mpv -hwdec=vaapi on Ironlake.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103529
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 8f91aa35a5)
2017-11-22 18:41:23 +02:00
Derek Foreman
0d02e91c2c egl/wayland: Add a fallback when fourcc query isn't supported
When queryImage doesn't support __DRI_IMAGE_ATTRIB_FOURCC wayland clients
will die with a NULL derefence in wl_proxy_add_listener.

Attempt to provide a simple fallback to keep ancient systems working.

Fixes: 6595c69951 ("egl/wayland: Remove more surface specifics from
create_wl_buffer")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103519
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 0db36caa19)

Squashed with:

egl: fix var type

queryImage() takes an `int*`; compiler is warning about the
signed<->unsigned pointer mismatch.

Fixes: 0db36caa19 "egl/wayland: Add a fallback when fourcc
       query isn't supported"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Derek Foreman <derekf@osg.samsung.com>
(cherry picked from commit ca95d7ad4e)
2017-11-22 18:41:23 +02:00
Kenneth Graunke
f288607eb7 i965: Implement another VF cache invalidate workaround on Gen8+.
...and provide a better citation for the existing one.

v2:
- Apply the workaround to Gen8 too, as intended (caught by Topi).
- Restructure to add bits instead of an extra flush (based on a similar
  patch by Rafael Antognolli).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 8d48671492)
[Andres Gomez: brw->gen not yet dropped in favor of devinfo->gen]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_pipe_control.c

Squashed with:

i965: Revert Gen8 aspect of VF PIPE_CONTROL workaround.

This apparently causes hangs on Broadwell, so let's back it out for now.
I think there are other PIPE_CONTROL workarounds that we're missing.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103787
(cherry picked from commit a01ba366e0)
[Andres Gomez: brw->gen not yet dropped in favor of devinfo->gen]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_pipe_control.c
2017-11-22 18:41:18 +02:00
Matt Turner
b3410696e0 i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK
Fixes the following tests on CHV, BXT, and GLK:
    KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
    dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115

(cherry picked from commit cfcfa0b9cd)
2017-11-21 18:16:46 +02:00
Matt Turner
70b1c115b8 i965/fs: Fix extract_i8/u8 to a 64-bit destination
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.

For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.

Fixes the following test on CHV, BXT, and GLK:
   KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

(cherry picked from commit 6ac2d16901)
2017-11-21 18:16:46 +02:00
Anuj Phogat
82876e24c4 i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DW
Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1dc45d75bb)
[Andres Gomez: brw->gen not yet dropped in favor of devinfo->gen]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_pipe_control.c
	src/mesa/drivers/dri/i965/intel_blit.c
2017-11-21 18:16:46 +02:00
Anuj Phogat
653a203937 i965: Program DWord Length in MI_FLUSH_DW
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6165fda59b)

Squashed with:

i965: Remove DWord length from MI_FLUSH_DW definition

Fixes: 6165fda59b ("i965: Program DWord Length in MI_FLUSH_DW")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 822fd2341d)
2017-11-21 18:16:46 +02:00
Tim Rowley
8edbc8f109 swr/rast: Faster emulated simd16 permute
Speed up simd16 frontend (default) on avx/avx2 platforms;
fixes performance regression caused by switch to simdlib.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d8489517a5)
2017-11-21 18:16:46 +02:00
Tim Rowley
0f4dfee254 swr/rast: Use gather instruction for i32gather_ps on simd16/avx512
Speed up avx512 platforms; fixes performance regression caused
by swithc to simdlib.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 439904847e)
2017-11-21 18:16:46 +02:00
Bas Nieuwenhuizen
256733683b radv: Free temporary syncobj after waiting on it.
Otherwise we leak it.

Fixes: eaa56eab6d "radv: initial support for shared semaphores (v2)"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 7c25578863)
2017-11-21 18:16:46 +02:00
Bas Nieuwenhuizen
75efb540e2 radv: Free syncobj with multiple imports.
Otherwise we can leak the old syncobj.

Fixes: eaa56eab6d "radv: initial support for shared semaphores (v2)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 917d3b43f2)
2017-11-21 18:16:46 +02:00
Jason Ekstrand
179acf6579 i965: Add stencil buffers to cache set regardless of stencil texturing
We may access them as a texture using blorp regardless of whether or not
stencil texturing is enabled.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 6830ba0d3b)
2017-11-21 18:16:46 +02:00
Dave Airlie
84f765ce25 r600: fix isoline tess factor component swapping.
As per radeonsi, the tess factor components for isolines
are reversed.

Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f3f8615d76)
2017-11-21 18:16:46 +02:00
Kenneth Graunke
0465cfe0b0 intel/tools: Fix detection of enabled shader stages.
We renamed "Function Enable" to "Enable", which broke our detection
of whether shaders are enabled or not.  So, we'd see a bunch of HS/DS
packets with program offsets of 0, and think that was a valid TCS/TES.

Fixes: c032cae9ff (genxml: Rename "Function Enable" to "Enable".)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9a0465b3a3)
2017-11-21 18:16:46 +02:00
Kenneth Graunke
9229774b86 i965: Make L3 configuration atom listen for TCS/TES program updates.
The L3 configuration code already considers the TCS and TES programs,
but failed to listen for TCS/TES program changes.

This was somehow missing.

Fixes: e9644cb1f9 ("i965: Consider tessellation in get_pipeline_state_l3_weights.")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit b8d42cccd0)
2017-11-21 18:16:45 +02:00
Dylan Baker
f5e82045ca autotools: Set C++ visibility flags on Intel
These flags are set for C sources, but not C++. This causes symbol
visibility leaks from the C++ parts of the Intel compiler.

Fixes: 700bebb958 ("i965: Move the back-end compiler to src/intel/compiler")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 854455498c)
2017-11-21 18:16:45 +02:00
Adam Jackson
167f50ae68 glx/dri3: Fix passing renderType into glXCreateContext
Without this, trying to create a GLX_RGBA_FLOAT_TYPE_ARB context would
fail, because GLX_RGBA_TYPE would be a mismatch with the fbconfig.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 257edb5b9a)
2017-11-21 18:16:45 +02:00
Adam Jackson
d53558fd67 glx/drisw: Fix glXMakeCurrent(dpy, None, ctx)
This is perfectly legal in GL 3.0+.

Fixes piglit/glx-create-context-current-no-framebuffer.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 033cfb17db)
2017-11-21 18:16:45 +02:00
Alex Smith
2c9d72046f nir/spirv: tg4 requires a sampler
Gather operations in both GLSL and SPIR-V require a sampler. Fixes
gathers returning garbage when using separate texture/samplers (on AMD,
was using an invalid sampler descriptor).

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 4122d00846)
2017-11-21 18:16:45 +02:00
Alex Smith
9d49fd747c spirv: Use correct type for sampled images
We should use the result type of the OpSampledImage opcode, rather than
the type of the underlying image/samplers.

This resolves an issue when using separate images and shadow samplers
with glslang. Example:

    layout (...) uniform samplerShadow s0;
    layout (...) uniform texture2D res0;
    ...
    float result = textureLod(sampler2DShadow(res0, s0), uv, 0);

For this, for the combined OpSampledImage, the type of the base image
was being used (which does not have the Depth flag set, whereas the
result type does), therefore it was not being recognised as a shadow
sampler. This led to the wrong LLVM intrinsics being emitted by RADV.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit e9eb3c4753)
2017-11-21 18:16:45 +02:00
Emil Velikov
a3adb73893 configure.ac: require xcb* for the omx/va/... when using x11 platform
Targets such as omx and va can work w/o anything X related. Mandate the
xcb* dependencies only when the X11 platform is selected.

Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: 63e11ac2b5 ("configure: error out if building VA w/o supported
platform")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
(cherry picked from commit 85a017230c)
2017-11-21 18:16:45 +02:00
Emil Velikov
c7e12a3afc configure.ac: loosen --enable-glvnd check to honour egl
Currently we error out when building GLVND w/o GLX.

That was the original premice before we had EGL. As the commit says,
that error should be reworked to honour both - do so.

v2: Drop noop *);; (Eric)

Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
(cherry picked from commit b4967561c0)
2017-11-21 18:16:45 +02:00
Neil Roberts
3ca2b00799 glsl: Transform fb buffers are only active if a variable uses them
The GL spec will soon be revised to clarify that a buffer binding for
a transform feedback buffer is only required if a variable is actually
defined to use the buffer binding point. Previously a declaration for
the default transform buffer would make it require a binding even if
nothing was declared to use the default buffer.

Affects:
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list_and_api

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4dc8458cd1)
2017-11-21 18:16:45 +02:00
Timothy Arceri
731076f0fd mesa: rework how we free gl_shader_program_data
When I introduced gl_shader_program_data one of the intentions was to
fix a bug where a failed linking attempt freed data required by a
currently active program. However I seem to have failed to finish
hooking up the final steps required to have the data hang around.

Here we create a fresh instance of gl_shader_program_data every
time we link. gl_program has a reference to gl_shader_program_data
so it will be freed once the program is no longer active.

Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102177
(cherry picked from commit 6a72eba755)
2017-11-21 18:16:45 +02:00
Timothy Arceri
96fe2da88b glsl: use the correct parent when allocating program data members
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9c33533586)
2017-11-21 18:16:45 +02:00
Timothy Arceri
386cc0c6a8 glsl: drop cache_fallback
This turned out to be a dead end, it is much easier and less error
prone to just cache the IR used by the drivers backend e.g. TGSI or
NIR.

Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit cf05bb506a)
2017-11-21 18:16:45 +02:00
Kenneth Graunke
dfa57d01e4 i965: properly initialize brw->cs.base.stage to MESA_SHADER_COMPUTE
This has a bit of a surprising effect:

For the render pipeline, the upload_sampler_state_table atom emits
3DSTATE_BINDING_TABLE_POINTERS_XS.  It tries to avoid this for compute:

   if (GEN_GEN >= 7 && stage_state->stage != MESA_SHADER_COMPUTE) {
      /* Emit a 3DSTATE_SAMPLER_STATE_POINTERS_XS packet. */
      genX(emit_sampler_state_pointers_xs)(brw, stage_state);
   } ...

However, we were failing to initialize brw->cs.base.stage, so it was
left as 0 (MESA_SHADER_VERTEX), causing this condition to break.  We
then emitted 3DSTATE_SAMPLER_STATE_POINTERS_VS in GPGPU mode, when
trying to upload CS samplers.  Nothing good can come of this.

Found by inspection while debugging a GPU hang.  Jordan believes this
helps the Deus Ex: Mankind Divided benchmark mode's stability when
running with shader cache.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a16dc04ad5)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_context.c
2017-11-21 18:16:45 +02:00
Emil Velikov
7d61bf5085 targets/opencl: don't hardcode the icd file install to /etc/...
Use $(sysconfdir) instead of hardcoding /etc.

While the OpenCL spec expects the file in /etc, people building their
stack can override that, esp. !Linux users.

Furthermore this removes a fundamental violation, which results in the
system file being overwritten even as one explicitly sets --prefix
and/or DESTDIR.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-By: Aaron Watry <awatry@gmail.com>
(cherry picked from commit 0cd0958544)
2017-11-21 18:16:45 +02:00
Jason Ekstrand
61401e971a intel/fs: Rework zero-length URB write handling
Originally we tried to handle this case based on slots_valid.  However,
there are a number of ways that this can go wrong.  For one, we throw
away any trailing slots which either aren't written or are set to
VARYING_SLOT_PAD.  Second, even if PSIZ is a valid slot, we may not
actually write anything there.  Between the lot of these, it was
possible to end up in a case where we tried to do a regular URB write
but ended up with a length of 1 which is invalid.  This commit moves it
to the end and makes it based on a new boolean flag urb_written.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7a82ad54bb)
2017-11-21 18:16:44 +02:00
Jason Ekstrand
a2230871bf intel/fs: Mark 64-bit values as being contiguous
This isn't often a problem , when we're in a compute shader, we must
push the thread local ID so we decrement the amount of available push
space by 1 and it's no longer even and 64-bit data can, in theory, span
it.  By marking those uniforms contiguous, we ensure that they never get
split in half between push and pull constants.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 25f7453c9e)
2017-11-21 18:16:44 +02:00
Jason Ekstrand
a868bd7a7c intel/fs: Fix integer multiplication lowering for src/dst hazards
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d54f8ec744)
2017-11-21 18:16:44 +02:00
Jason Ekstrand
eb14fb2719 intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core
The same workaround we need for 64-bit values on little core also takes
care of the Ivy Bridge problem and does so a bit more efficiently so we
can drop that code while we're here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fd1bcccc2d)
2017-11-21 18:16:44 +02:00
Jason Ekstrand
73342c304d intel/eu/reg: Add a subscript() helper
This is similar to the identically named fs_reg helper.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 10e4feed39)
2017-11-21 18:16:44 +02:00
Jason Ekstrand
00ccd7856e intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all
For some reason, the any/all predicates don't work properly with SIMD32.
In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read
the correct subset of the flag register and you end up getting garbage
in the second half.  Work around this by using a pair of 1-wide MOVs and
scattering the result.  This fixes the any/all instructions for SIMD32.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1b8ef49f48)
2017-11-21 18:16:44 +02:00
Jason Ekstrand
952225ae37 intel/fs: Use an explicit D type for vote any/all/eq intrinsics
The any/all intrinsics return a boolean value so D or UD is the correct
type.  Unfortunately, get_nir_dest has the annoying behavior of
returnning a float type by default.  This causes format conversion which
gives us -1.0f or 0.0f in the register.  If the consumer of the result
does an integer comparison to zero, it will give you the right boolean
value but if we do something more clever based on the 0/~0 assumption
for booleans, this will give the wrong value.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1f41663007)
2017-11-21 18:16:44 +02:00
Jason Ekstrand
977cd76f10 intel/fs: Use ANY/ALL32 predicates in SIMD32
We have ANY/ALL32 predicates and, for the most part, they work just
fine.  (See the next commit for more details.)  Also, due to the way
that flag registers are handled in hardware, instruction splitting is
able to split the CMP correctly.  Specifically, that hardware looks at
the execution group and knows to shift it's flag usage up correctly so a
2H instruction will write to f0.1 instead of f0.0.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit def013a863)
2017-11-21 18:16:44 +02:00
Andres Gomez
96ad27f8fc docs: add sha256 checksums for 17.2.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-11 01:23:24 +02:00
Andres Gomez
ae52410bf0 docs: add release notes for 17.2.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-10 15:33:58 +02:00
Andres Gomez
2c582b4cf7 Update version to 17.2.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-10 15:24:51 +02:00
Andres Gomez
0195b78aa7 cherry-ignore: automake: include git_sha1.h.in in release tarball
fixes: This commit has more than one Fixes tag but the commit it
addresses didn't land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-07 18:45:28 +02:00
Andres Gomez
a23bd4ea87 cherry-ignore: added 17.3 nominations.
stable: 17.3 nominations only.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-07 18:44:48 +02:00
Andres Gomez
27cd0abe8e cherry-ignore: intel/fs: Alloc pull constants off mem_ctx
stable: This commit addressed earlier commit 8d90e28839 which did not
land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-07 18:44:44 +02:00
Andres Gomez
b5dc8c43aa cherry-ignore: etnaviv: don't do resolve-in-place without valid TS
stable: This commit addressed earlier commit 78ade65956 which did not
land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-07 18:44:38 +02:00
Andres Gomez
526ecbe417 cherry-ignore: i965: fix blorp stage_prog_data->param leak
stable: This commit addressed earlier commit 8d90e28839 which did not
land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-07 18:42:55 +02:00
Andres Gomez
0d91d13564 cherry-ignore: radv: copy indirect lowering settings from radeonsi
fixes: remove 6ce550453f and 059434e1763, which were depending in now
backported 087e010b2b.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-07 18:35:54 +02:00
Tomasz Figa
fd0cf2c946 glsl: Allow precision mismatch on dead data with GLSL ES 1.00
Commit 259fc50545 added linker error for
mismatching uniform precision, as required by GLES 3.0 specification and
conformance test-suite.

Several Android applications, including Forge of Empires, have shaders
which violate this rule, on a dead varying that will be eliminated.
The problem affects a big number of applications using Cocos2D engine
and other GLES implementations accept this, this poses a serious
application compatibility issue.

Starting from GLSL ES 3.0, declarations with conflicting precision
qualifiers are explicitly prohibited. However GLSL ES 1.00 does not
clearly specify the behavior, except that

  "Uniforms are defined to behave as if they are using the same storage in
  the vertex and fragment processors and may be implemented this way.
  If uniforms are used in both the vertex and fragment shaders, developers
  should be warned if the precisions are different. Conversion of
  precision should never be implicit."

The word "used" is not clear in this context and might refer to
 1) declared (same as GLES 3.x)
 2) referred after post-processing, or
 3) linked after all optimizations are done.

Looking at existing applications, 2) or 3) seems to be widely adopted.
To avoid compatibility issues, turn the error into a warning if GLSL ES
version is lower than 3.0 and the data is dead in at least one of the
shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97532
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0886be093f)
2017-11-07 18:35:54 +02:00
Bas Nieuwenhuizen
6a73458510 radv: Disallow indirect outputs for GS on GFX9 as well.
Since it also uses the output vector before writing to memory.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit c07d719e8b)
[Bas Nieuwenhuizen: resolve conflicts]

Conflicts:
        src/amd/vulkan/radv_shader.c
2017-11-07 18:35:54 +02:00
Bas Nieuwenhuizen
9ba45e7d33 radv: Don't use vgpr indexing for outputs on GFX9.
Due to LLVM bugs. Fixes a bunch of dEQP-VK.glsl.indexing.*
tests.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6ce550453f)
[Bas Nieuwenhuizen: resolve conflicts]

Conflicts:
        src/amd/vulkan/radv_shader.c
2017-11-07 18:35:54 +02:00
Timothy Arceri
662cff8fe4 radv: copy indirect lowering settings from radeonsi
It looks the original indirect mask was probably copied from
ANV.

Sascha Willems demo results:

tessellation ~4000 -> ~4200 fps

V2: continue lowering local indirects due to llvm deficiencies.

(cherry picked from commit 087e010b2b)
[Bas Nieuwenhuizen: patch is a backport for 17.2 of the cherry-pick above]
2017-11-07 18:35:54 +02:00
Dave Airlie
0b0b7f1833 radv: add initial copy descriptor support. (v2)
It appears the latest dota2 vulkan uses this,
and we get a hang in VR mode without it.

v2: remove finishme I left in after finishing.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4bcb48b831)
2017-11-07 18:35:54 +02:00
Dave Airlie
23eaeeb88a radv: free attachments on end command buffer.
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.

Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.

Fixes a memory leak with multithreading demo.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f0ae06a13c)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/amd/vulkan/radv_cmd_buffer.c
2017-11-07 18:35:54 +02:00
Dave Airlie
d8e0f66b55 i915g: make gears run again.
We need to validate some structs exist before we dirty the states, and
avoid the problem in some other places.

Fixes: e027935a7 ("st/mesa: don't update unrelated states in non-draw calls such as Clear")
(cherry picked from commit cc69f2385e)
2017-11-07 18:35:54 +02:00
Bas Nieuwenhuizen
bd2037da82 radv: Don't expose heaps with 0 memory.
It confuses CTS. This pregenerates the heap info into the
physical device, so we can use it for translating contiguous
indices into our "standard" ones.

This also makes the WSI a bit smarter in case the first preferred
heap does not exist.

Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 806721429a)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/amd/vulkan/radv_device.c
	src/amd/vulkan/radv_private.h
	src/amd/vulkan/radv_wsi.c
2017-11-07 18:35:54 +02:00
Eric Engestrom
388b68a61e vc4: fix release build
Mesa's DEBUG and assert's NDEBUG are not tied to each other, so we need
to explicitly compile this code out.

Fixes: 3df7892878 "vc4: Drop reloc_count tracking for debug
       asserts on non-debug builds."
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 5d44e35a8f)
2017-11-07 18:35:54 +02:00
Gert Wollny
71a33028ba r600/sb: bail out if prepare_alu_group() doesn't find a proper scheduling
It is possible that the optimizer ends up in an infinite loop in
post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group()
does not find a proper scheduling. This can be deducted from
pending.count() being larger than zero and not getting smaller.

This patch works around this problem by signalling this failure so that the
optimizers bails out and the un-optimized shader is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 69eee511c6)
2017-11-07 18:35:53 +02:00
Neil Roberts
f33f5e9ad6 nir/opt_intrinsics: Fix values for gl_SubGroupG{e,t}MaskARB
Previously the values were calculated by just shifting ~0 by the
invocation ID. This would end up including bits that are higher than
gl_SubGroupSizeARB. The corresponding CTS test effectively requires that
these high bits be zero so it was failing. There is a Piglit test as
well but this appears to checking the wrong values so it passes.

For the two greater-than bitmasks, this patch adds an extra mask with
(~0>>(64-gl_SubGroupSizeARB)) to force these bits to zero.

Fixes: KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680#c3
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Neil Roberts <nroberts@igalia.com>
(cherry picked from commit b697ece10a)
2017-11-07 18:35:53 +02:00
Nanley Chery
af1dc1cf87 i965: Check CCS_E compatibility for texture view rendering
Only use CCS_E to render to a texture that is CCS_E-compatible with the
original texture's miptree (linear) format. This prevents render
operations from writing data that can't be decoded with the original
miptree format.

On Gen10, with the new CCS_E-enabled formats handled, this enables the
driver to pass the arb_texture_view-rendering-formats piglit test.

v2. Add a TODO for texturing. (Jason)

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 9e849eb8bb)
2017-11-07 18:35:53 +02:00
Topi Pohjolainen
3d7546a312 intel/compiler/gen9: Pixel shader header only workaround
Fixes intermittent GPU hangs on Broxton with an Intel internal
test case.

There are plenty of similar fragment shaders in piglit that do
not use any varyings and any uniforms. According to the
documentation special timing is needed between pipeline stages.
Apparently we just don't hit that with piglit. Even with the
failing test case one doesn't always get the hang.

Moreover, according to the error states the hang happens
significantly later than the execution of the problematic shader.
There are multiple render cycles (primitive submissions) in between.
I've also seen error states where the ACTHD points outside the
batch. Almost as if the hardware writes somewhere that gets used
later on. That would also explain why piglit doesn't suffer from
this - most tests kick off one render cycle and any corruption
is left unseen.

v2 (Ken): Instead of enabling push constants, enable one of the
          inputs (PSIZ).
v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe()
                 happy.

Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 97e01adfd5)
2017-11-07 18:35:53 +02:00
Kenneth Graunke
67fc6d43bd mesa: Accept GL_BACK in get_fb0_attachment with ARB_ES3_1_compatibility.
According to the ARB_ES3_1_compatibility specification,
glGetFramebufferAttachmentParameteriv is supposed to accept BACK,
and it behaves exactly like BACK_LEFT.

Fixes a GL error in GFXBench 5 Aztec Ruins.

Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4f538c3f99)
2017-11-07 18:35:53 +02:00
Tapani Pälli
71ea414782 i965: unref push_const_bo in intelDestroyContext
Valgrind shows that leak is caused by gen6_upload_push_constant, add
unref push_const_bo per stage to destructor to fix this (like done for
scratch_bo).

   ==10952== 144 bytes in 1 blocks are definitely lost in loss record 44 of 66
   ==10952==    at 0x4C30A1E: calloc (vg_replace_malloc.c:711)
   ==10952==    by 0x8C02847: bo_alloc_internal.constprop.10 (brw_bufmgr.c:344)
   ==10952==    by 0x8C425C4: intel_upload_space (intel_upload.c:101)
   ==10952==    by 0x8C22ED0: gen6_upload_push_constants (gen6_constant_state.c:154)

v2: remove if conditions, brw_bo_unreference handles NULL (Ken, Emil)

Fixes: 24891d7c05 ("i965: Store per-stage push constant BO pointers.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0b131ca427)
2017-11-07 18:35:53 +02:00
Jason Ekstrand
e7ddddc989 i965/miptree: Take an isl_format in render_aux_usage
Not all rendering matches the miptree format.  We allow rendering to
texture views so there are cases where it may not match.  In those
cases, our current scheme of just passing the value of ctx->sRGBEnabled
isn't viable.  Instead, just do what we do for texturing and pass the
view format in directly.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 39c5c12f8f)
[Andres Gomez: remove code which was trivially modified previously]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_draw.c
	src/mesa/drivers/dri/i965/brw_wm_surface_state.c
2017-11-07 18:35:53 +02:00
Jason Ekstrand
6dc8e69da6 i965/blorp: Use more temporary isl_format variables
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 78e50185d6)
2017-11-07 18:35:53 +02:00
Jason Ekstrand
78ff359d04 i965/blorp: Use blorp_to_isl_format for src_isl_format in blit_miptrees
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 94389943b6)
2017-11-07 18:35:53 +02:00
Jason Ekstrand
9e27cdc771 spirv: Claim support for the simple memory model
It's rather surprising that we've never actually hit this before.
Aparently, Ian's SPIR-V generator currently claims the Simple when you
don't do anything complex.  We really shouldn't assert-fail on it.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8ab9820d34)
2017-11-07 18:35:53 +02:00
Leo Liu
1317bdc3a1 radeon/video: add gfx9 offsets when rejoin the video surface
For CPU access.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit ea3dc75d72)
2017-11-07 18:35:53 +02:00
Nicolai Hähnle
e95c4af12d amd/common/gfx9: workaround DCC corruption more conservatively
Fixes KHR-GL45.texture_swizzle.smoke and others on Vega.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102809
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f9ccfda9bc)
2017-11-07 18:35:53 +02:00
Marek Olšák
d132c0d792 ac/surface/gfx9: don't allow DCC for the smallest mipmap levels
This fixes garbage there if we don't flush TC L2 after rendering.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 759526813b)
2017-11-07 18:35:53 +02:00
Marek Olšák
e28870b9fe st/dri: don't expose modifiers in EGL if the driver doesn't implement them
This unbreaks waffle/gbm (piglit/gbm) which fails initialization.

v2: also don't set queryDmaBufFormats

Reviewed-by: Daniel Stone <daniel@fooishbar.org>
(cherry picked from commit a65db0ad1c)
2017-11-07 18:35:53 +02:00
Andres Gomez
1b10d0851a docs: add sha256 checksums for 17.2.4
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-30 16:53:44 +02:00
Andres Gomez
a4b72e2643 docs: add release notes for 17.2.4
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-30 16:46:20 +02:00
Andres Gomez
fe9dc0fad6 Update version to 17.2.4
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-30 16:37:54 +02:00
Andres Gomez
162fa27b12 cherry-ignore: broadcom/vc5: Propagate vc4 aliasing fix to vc5.
extra: Commit is not applicable when ade416d023 is missing.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-27 18:07:38 +03:00
Andres Gomez
b3207d9ff7 cherry-ignore: mesa/bufferobj: don't double negate the range
fixes: This commit addressed earlier commit 35ac13ed3 which did not
land in branch.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-27 18:07:38 +03:00
Andres Gomez
6cab29e973 cherry-ignore: radv: Disallow indirect outputs for GS on GFX9 as well.
fixes: Commit is not applicable when 6ce550453f is missing.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-27 18:07:38 +03:00
Andres Gomez
1c2f79066c cherry-ignore: radv: Don't use vgpr indexing for outputs on GFX9.
fixes: Commit is not applicable when 087e010b2b is missing.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-27 18:07:38 +03:00
Andres Gomez
0c63d53765 cherry-ignore: added 17.3 nominations.
stable: 17.3 nominations only.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-27 18:07:38 +03:00
Andres Gomez
52019bad1d cherry-ignore: glsl: fix derived cs variables
stable: Commit is too big for stable at this point.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-27 18:07:38 +03:00
Andres Gomez
9bdd943ff2 cherry-ignore: configure.ac: rework llvm detection and handling
stable: Commits are too invasive for 17.2.

Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-10-27 18:07:38 +03:00
Jason Ekstrand
1393d37d7b intel/eu: Use EXECUTE_1 for JMPI
The PRM says "The execution size must be 1."  In 73137997e2, the
execution size was set to 1 when it should have been BRW_EXECUTE_1
(which maps to 0).  Later, in dc2d3a7f5c, JMPI was used for
line AA on gen6 and earlier and we started manually stomping the
exeution size to BRW_EXECUTE_1 in the generator.  This commit fixes the
original bug and makes brw_JMPI just do the right thing.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 73137997e2
(cherry picked from commit 562b8d458c)
2017-10-27 18:07:38 +03:00
Jason Ekstrand
74f1903234 anv/pipeline: Call nir_lower_system_valaues after brw_preprocess_nir
We currently have a bug where nir_lower_system_values gets called before
nir_lower_var_copies so it will miss any system value uses which come
from a copy_var intrinsic.  Moving it to after brw_preprocess_nir fixes
this problem.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 279f8fb69c)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/intel/vulkan/anv_pipeline.c
2017-10-27 18:07:37 +03:00
Jason Ekstrand
aac0807f48 intel/fs: Handle flag read/write aliasing in needs_src_copy
In order to implement the ballot intrinsic, we do a MOV from flag
register to some GRF.  If that GRF is used in a SEL, cmod propagation
helpfully changes it into a MOV from the flag register with a cmod.
This is perfectly valid but when lower_simd_width comes along, it simply
splits into two instructions which both have conditional modifiers.
This is a problem since we're reading the flag register.  This commit
makes us check whether or not flags_written() overlaps with the flag
values that we are reading via the instruction source and, if we have
any interference, will force us to emit a copy of the source.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fa6e74e33e)
2017-10-27 18:07:37 +03:00
Jan Vesely
03f3899f99 clover: Fix compilation after clang r315871
v2: use a more generic compat function
v3: rename and formatting cleanup

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103388
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a6d38f476b)
2017-10-27 18:07:37 +03:00
Jason Ekstrand
87a9a989ee nir/intrinsics: Set the correct num_indices for load_output
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit c1b84256cc)
2017-10-27 18:07:37 +03:00
Matthew Nicholls
ce725baa7c ac/nir: generate correct instruction for atomic min/max on unsigned images
v2: fix silly typo

Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 27a0b24bf2)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/amd/common/ac_nir_to_llvm.c
2017-10-27 18:07:37 +03:00
Bas Nieuwenhuizen
f46ba9ee35 ac/nir: Fix nir_texop_lod on GFX for 1D arrays.
Fixes: 1bcb953e16 'radv: handle GFX9 1D textures'
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2c5b43c87f)
2017-10-27 18:07:37 +03:00
Stefan Schake
2d51d41865 broadcom/vc4: Fix aliasing issue
This was causing Android clang version 3.8.256229 to miscompile,
presumably due to strict aliasing.

Fixes: 14dc281c13 ("vc4: Enforce one-uniform-per-instruction after optimization.")
(cherry picked from commit e5fea0d621)
2017-10-27 18:07:37 +03:00
Kenneth Graunke
cbc081b871 i965: Revert absolute mode for constant buffer pointers.
The kernel doesn't initialize the value of the INSTPM or CS_DEBUG_MODE2
registers at context initialization time.  Instead, they're inherited
from whatever happened to be running on the GPU prior to first run of a
new context.  So, when we started setting these, other contexts in the
system started inheriting our values.  Since this controls whether
3DSTATE_CONSTANT_* takes a pointer or an offset, getting the wrong
setting is fatal for almost any process which isn't expecting this.

Unfortunately, VA-API and Beignet don't initialize this (nor does older
Mesa), so they will die horribly if we start doing this.  UXA and SNA
don't use any push constants, so they are unaffected.

Until we have some kind of solution to this problem, I'm going to revert
this patch and abandon using the feature for now.  It will lead to fewer
pushed UBO ranges on Broadwell+, which may lead to lower performance,
though I don't have any data on the impact.

Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102774
(cherry picked from commit 013d331220)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_state_upload.c
	src/mesa/drivers/dri/i965/intel_screen.c
2017-10-27 18:07:24 +03:00
Michel Dänzer
546b4d455a st/mesa: Initialize textures array in st_framebuffer_validate
And just reference pipe_resources to it in the validate callbacks.

Avoids pipe_resource leaks when st_framebuffer_validate ends up calling
the validate callback multiple times, e.g. when a window is resized.

v2:
* Use generic stable tag instead of Fixes: tag, since the problem could
  already happen before the commit referenced in v1 (Thomas Hellstrom)
* Use memset to initialize the array on the stack instead of allocating
  the array with os_calloc.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
(cherry picked from commit 7561da367b)

Squashed with:

st/osmesa: include u_inlines.h for pipe_resource_reference

Fixes build failure due to unresolved symbol.

Fixes: 7561da367b "st/mesa: Initialize textures array in
                     st_framebuffer_validate"

Trivial.

(cherry picked from commit 8c9e7c9638)
2017-10-27 18:04:59 +03:00
Henri Verbeet
9cbf8c910e vulkan/wsi: Free the event in x11_manage_fifo_queues().
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Fixes: e73d136a02 ("vulkan/wsi/x11: Implement FIFO mode.")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com
(cherry picked from commit 3de87f7cd7)
2017-10-27 18:04:59 +03:00
Dave Airlie
1eb4cbc934 radv/image: bump all the offset to uint64_t.
So one of the CTS tests tries to allocate a 16384x1 2048 array
texture. This overflows a bunch of calculations when we want it
tiled as the heights goes to 128.

addrlib returns us the correct size (16GB or so), but we mangle
it in the htile calcs due to the 32-bit offset fields, then
userspace gives us the reduced number and we try to allocate
it on a heap and things blow up.

We really need to give the app back the correct size for the
image so we can blow up properly in memory allocation later.

This should fix hangs in
dEQP-VK.pipeline.render_to_image.core.1d_array.huge.width_layers.r8g8b8a8_unorm_d32_sfloat_s8_uint
since
Fixes: ad3d98da9f (radv: enable tc compatible htile for d32s8 also.)

Now there's an open question if we should be enabling tc-compat
htile at all for shallow textures like the above.

This might cause some other wierd side effects in CTS even
without the tc compat so:
Cc: "17.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 35c66f3e40)
[Andres Gomez: resolve trivial conflicts]
Signed-off-by: Andres Gomez <agomez@igalia.com>

Conflicts:
	src/amd/vulkan/radv_private.h
2017-10-27 18:04:59 +03:00
Marek Olšák
fba44d91d0 Revert "mesa: fix texture updates for ATI_fragment_shader"
This reverts commit 9d54025cd1.

It breaks KOTOR.

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5d071bf04b)
2017-10-27 18:04:59 +03:00
Samuel Pitoiset
9cba4d491c radv: add the draw count buffer to the list of buffers
My guess is that the GPU is going to report VM faults if
vkCmdDrawIndirectCountAMD() (and friends) are used.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 3e5f27faf3)
2017-10-27 18:04:59 +03:00
Emil Velikov
facc851818 docs: add sha256 checksums for 17.2.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-10-19 13:28:13 +01:00
Emil Velikov
28dc4b64f2 docs: add release notes for 17.2.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-10-19 13:10:20 +01:00
Emil Velikov
ea38f4c33a Update version to 17.2.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-10-19 13:02:58 +01:00
Emil Velikov
23c08dabc3 eglmesaext: add forward declaration for struct wl_buffers
The user does not need to know the specifics of the struct, as only a
pointer to it is used.

Just forward declare the struct making the header self-contained.

v2: Remove deprecation warning text/bugzilla - patch does no help there.

Cc: Greg V <greg@unrelenting.technology>
Fixes: 5cddb1ce3c ("wayland: Add an extension to create wl_buffers from
EGLImages")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
(cherry picked from commit 66ebdfbd44)
2017-10-17 16:59:31 +01:00
Emil Velikov
dc9bd1dade wayland-drm: use a copy of the wayland_drm_callbacks struct
The callbacks may be called even when they are no longer valid.
Say, the user is dlclose(ing) libEGL while the buffers are being
destroyed.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Derek Foreman <derekf@osg.samsung.com>
(cherry picked from commit 0cfd6f6cfc)
2017-10-17 16:59:31 +01:00
Jason Ekstrand
d001ff1267 nir: Get rid of the variable on vote intrinsics
This looks like a copy+paste error.  They don't actually write into that
variable as would be implied by putting the return there.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 3442c9fc3e)
2017-10-17 16:59:31 +01:00
Jason Ekstrand
88a16c895b nir/opcodes: Fix constant-folding of ufind_msb
We didn't fold correctly in the case of 0x1 because we never let the
loop counter hit 0.  Switching it to bit >= 0 solves this problem.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit a0947921eb)
2017-10-17 16:59:31 +01:00
Jason Ekstrand
b640bf38ca glsl/blob: Return false from grow_to_fit if we've ever failed
Otherwise we could have a failure followed by a smaller write that
succeeds and get a corrupted blob.  If we ever OOM, we should stop.

v2 (Jason Ekstrand):
 - Initialize the new boolean member in create_blob

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit e03717efbd)
2017-10-17 16:59:31 +01:00
Jason Ekstrand
4d1ae3283c glsl/blob: Return false from ensure_can_read on overrun
Otherwise, if you have a large read fail and then try to do a small
read, the small read may succeed even though it's at the wrong offset.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7118851374)
2017-10-17 16:59:31 +01:00
Eric Engestrom
d56aa9fe43 scons: use python3-compatible print()
These changes were generated using python's `2to3` tool.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102852
Reported-by: Alex Granni <liviuprodea@yahoo.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 7d48219b3a)
2017-10-17 16:59:31 +01:00
Bas Nieuwenhuizen
3b657e4ff5 radv: Only set the MTYPE flags on GFX9+.
Older kernels fail the va_op with this flag set. If the kernel
supports GFX9 usefully, it will also support this flag.

Fixes: e8d57802fe "radv/gfx9: allocate events from uncached VA space"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 96f80c8d4d)
2017-10-17 16:59:31 +01:00
Daniel Stone
99d3661bce egl/wayland: Don't use dmabuf with no modifiers
The dmabuf interface requires a valid modifier to be sent. If we don't
explicitly get a modifier from the driver, we can't know what to send;
it must be inferred from legacy side-channels (or assumed to linear, if
none exists).

If we have no modifier, then we can only have a single-plane format
anyway, so fall back to the old wl_drm buffer import path.

Fixes: a65db0ad1c ("st/dri: don't expose modifiers in EGL if the driver doesn't implement them")
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b65d6dafd6)
2017-10-17 16:59:31 +01:00
Daniel Stone
0f6e89dfe0 egl/wayland: Check queryImage return for wl_buffer
When creating a wl_buffer from a DRIImage, we extract all the DRIImage
information via queryImage. Check whether or not it actually succeeds,
either bailing out if the query was critical, or providing sensible
fallbacks for information which was not available in older DRIImage
versions.

Fixes: a65db0ad1c ("st/dri: don't expose modifiers in EGL if the driver doesn't implement them")
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6273d2f269)
2017-10-17 16:59:31 +01:00
Emil Velikov
bee97ec32e swr/rast: do not crash on NULL strings returned by getenv
The current convenience function GetEnv feeds the results of getenv
directly into std::string(). That is a bad idea, since the variable
may be unset, thus we feed NULL into the C++ construct.

The latter of which is not allowed and leads to a crash.

v2: Better variable name, implicit char* -> std::string conversion (Eric)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101832
Fixes: a25093de71 ("swr/rast: Implement JIT shader caching to disk")
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Cc: Bernhard Rosenkraenzer <bero@lindev.ch>
[Emil Velikov: make an actual commit from the misc diff]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Laurent Carlier <lordheavym@gmail.com> (v1)
(cherry picked from commit 21e271024d)
2017-10-17 16:59:31 +01:00
Nicolai Hähnle
9f4b0a336c radeonsi: clamp border colors for upgraded depth textures
The hardware does this automatically for unorm formats, but we need to
do it manually for unorm depth formats that have been upgraded to
Z32_FLOAT.

Fixes dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
and others.

Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 6eb9483912)
2017-10-17 16:59:31 +01:00
Nicolai Hähnle
74a28d85de radeonsi: clamp depth comparison value only for fixed point formats
The hardware usually does this automatically. However, we upgrade
depth to Z32_FLOAT to enable TC-compatible HTILE, which means the
hardware no longer clamps the comparison value for us.

The only way to tell in the shader whether a clamp is required
seems to be to communicate an additional bit in the descriptor
table. While VI has some unused bits in the resource descriptor,
those bits have unfortunately all been used in gfx9. So we use
an unused bit in the sampler state instead.

Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f
and many other tests in dEQP-GLES3.functional.texture.shadow.*

Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 4c56e07029)
[Emil Velikov: handle lack of dirty_mask in original patch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/radeonsi/si_descriptors.c
2017-10-17 16:59:31 +01:00
Nicolai Hähnle
f805a61e04 st/glsl_to_tgsi: fix a use-after-free in merge_two_dsts
Found by address sanitizer.

The loop here tries to be safe, but in doing so, it ends up doing
exactly the wrong thing: the safe foreach is for when the loop
variable (inst) could be deleted and nothing else. However, this
particular can delete inst's successor, but not inst itself.

Fixes: 8c6a0ebaad ("st/mesa: add st fp64 support (v7.1)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 2703fa613b)
2017-10-17 16:59:31 +01:00
Lionel Landwerlin
6957dfb0d8 anv: bo_cache: allow importing a BO larger than needed
It's not a problem if a BO has been allocated larger than we need it
to be.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102940
Fixes: 818b857914 ("anv: Use the BO cache for DeviceMemory allocations")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c0a4f56fb9)
2017-10-17 16:59:31 +01:00
Nicolai Hähnle
410f4dbcb1 st/glsl_to_tgsi: fix indirect access to 64-bit integer
Make sure we actually allocate two adjacent TGSI temporaries. The
current code fails e.g. when an arithmetic operation has two
operands with indirect accesses.

I will send out a new piglit test
(arb_gpu_shader_int64/execution/indirect-array-two-accesses.shader_test)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 541208cf13)
2017-10-17 16:59:31 +01:00
Ilia Mirkin
d22e779d6a nv50,nvc0: fix push hint logic in presence of a start offset
Previously buffer offsets were passed in explicitly as an offset, which
had to be added to the resource address. Now they are passed in via an
increased 'start' parameter. As a result, we were double-adding the
start offset in this kind of situation.

This condition was triggered by piglit's draw-elements test which has a
requisite glMultiDrawElements in combination with a small enough number
of vertices to go through the immediate push path.

Fixes: 330d0607ed ("gallium: remove pipe_index_buffer and set_index_buffer")
Reported-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit b20bccbcac)
2017-10-17 16:59:31 +01:00
Dave Airlie
41ec2af2a8 radv: lower ffma in nir.
So it appears the Vulkan SPIR-V fma opcode can be equivalent to a
mad operation, and the fma hw opcode on AMD hw is issued like a double
opcode so is slower. Also the radeonsi stack does this.

This appears to improve performance on a number of games from Feral,
and thanks to Feral for noticing the problem.

I'm reposting this one as Marek indicated he thinks this is what
we should be doing on AMD hw.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2c61594d84)
[Emil Velikov: use correct file radv_shader.c -> radv_pipeline.c]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/vulkan/radv_shader.c
2017-10-17 16:59:31 +01:00
Alex Smith
0bd7be0142 radv: Add R16G16B16A16_SNORM fast clear support
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 25d76fd658)
2017-10-17 16:59:30 +01:00
Nicolai Hähnle
4bcadb533f st/mesa: don't clobber glGetInternalformat* buffer for GL_NUM_SAMPLE_COUNTS
Applications might pass in a buffer that is sized too large and rely
on the extra space of the buffer not being overwritten.

Fixes dEQP-GLES31.functional.state_query.internal_format.partial_query.num_sample_counts

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 9a8f13a33b)
2017-10-17 16:59:30 +01:00
Ilia Mirkin
2eae2a6f0e nv50/ir: fix 64-bit integer shifts
TGSI was adjusted to always pass in 64-bit integers but nouveau was left
with the old semantics. Update to the new thing.

Fixes: d10fbe5159 (st/glsl_to_tgsi: fix 64-bit integer bit shifts)
Reported-by: Karol Herbst <karolherbst@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ce6da2a026)
2017-10-17 16:59:30 +01:00
Józef Kucia
077f925473 anv: Do not assert() on VK_ATTACHMENT_UNUSED
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 91ba331ef4)
2017-10-17 16:59:30 +01:00
Józef Kucia
2e92d16f9d spirv: Fix SpvOpAtomicISub
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit e0acb630a5)
2017-10-17 16:59:30 +01:00
Emil Velikov
83dcf9dc33 cherry-ignore: add "anv/wsi: Allocate enough memory for the entire image"
Addresses bug introduced with a feature patch, which is not in branch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-10-17 16:59:28 +01:00
Lionel Landwerlin
406e7e0e17 anv/cmd_buffer: Reset state in cmd_buffer_destroy
This ensures that everything gets cleaned up properly. In particular,
it fixes a memory leak where we were leaking the push constants
structs.

Valgrind stats on
dEQP-VK.pipeline.push_constant.graphics_pipeline.range_size_128 :

Before:
HEAP SUMMARY:
    in use at exit: 2,467,513 bytes in 1,305 blocks
  total heap usage: 697,853 allocs, 696,530 frees, 138,466,600 bytes allocated

LEAK SUMMARY:
   definitely lost: 1,068 bytes in 11 blocks
   indirectly lost: 24,669 bytes in 412 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 2,441,776 bytes in 882 blocks
        suppressed: 0 bytes in 0 blocks

After:
HEAP SUMMARY:
    in use at exit: 2,467,381 bytes in 1,304 blocks
  total heap usage: 697,853 allocs, 696,531 frees, 138,466,600 bytes allocated

LEAK SUMMARY:
   definitely lost: 936 bytes in 10 blocks
   indirectly lost: 24,669 bytes in 412 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 2,441,776 bytes in 882 blocks
        suppressed: 0 bytes in 0 blocks

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0763f814d7)
2017-10-17 16:59:02 +01:00
Lionel Landwerlin
60466859fa anv/cmd_buffer: fix push descriptors with set > 0
When writing to set > 0, we were just wrongly writing to set 0. This
commit fixes this by lazily allocating each set as we write to them.

We didn't go for having them directly into the command buffer as this
would require an additional ~45Kb per command buffer.

v2: Allocate push descriptors from system memory rather than in BO
    streams. (Lionel)

Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Fixes: 9f60ed98e5 ("anv: add VK_KHR_push_descriptor support")
Reported-by: Daniel Ribeiro Maciel <daniel.maciel@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit d296dea54e)
2017-10-17 16:59:02 +01:00
Marek Olšák
014b5a7209 glsl_to_tgsi: fix instruction order for bindless textures
We emitted instructions loading the bindless handle after the memory
instruction.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 985338e2cb)
2017-10-17 16:59:02 +01:00
Jason Ekstrand
f9c4f22f5a intel/compiler: Don't propagate cmod into integer multiplies
No shader-db change on Sky Lake.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7463d50580)
2017-10-17 16:59:02 +01:00
Jason Ekstrand
4ae1a62b26 intel/compiler: Don't cmod propagate into a saturated operation
Shader-db results on Sky Lake:

    total instructions in shared programs: 12954445 -> 12955125 (0.01%)
    instructions in affected programs: 141862 -> 142542 (0.48%)
    helped: 0
    HURT: 626

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit b91ecee04a)
2017-10-17 16:59:02 +01:00
Ben Crocker
ea3ad52ad3 gallivm/ppc64le: allow environmental control of Altivec code generation
In check_os_altivec_support(), allow control of Altivec (first PPC vector
instruction set) code generation via a new environmental control,
GALLIVM_ALTIVEC, which is expected to take on a value of 1 or 0.
The default is to enable Altivec code generation.

This environmental control of Altivec code generation is initially
available only #ifdef DEBUG.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 1359af930e)
2017-10-17 16:59:02 +01:00
Ben Crocker
3f5d2b768c gallivm/ppc64le: adjust VSX code generation control.
In lp_build_create_jit_compiler_for_module(), advance the minimum
version of LLVM for VSX code generation to 4.0; this is the minimum
revision at which several known VSX code generation bugs are fixed:

  https://llvm.org/bugs/show_bug.cgi?id=25503 (fixed in 3.8.1)
  https://llvm.org/bugs/show_bug.cgi?id=26775 (fixed in 3.8.1)
  https://llvm.org/bugs/show_bug.cgi?id=33531 (fixed in 4.0)

An llc performance bug introduced in LLVM 4.0,

  https://llvm.org/bugs/show_bug.cgi?id=34647

is still pending as of LLVM 5.0, but only has a pronounced effect on
one of the Piglit tests: ext_transform_feedback-max-varyings.

All changes tested via Piglit.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit e93f056a4e)
2017-10-17 16:59:02 +01:00
Ben Crocker
070d2dcfac gallivm: allow additional llc options
In init_native_targets, allow the passing of additional options to
the LLC compiler via new GALLIVM_LLC_OPTIONS environmental control.
This option is available only #ifdef DEBUG, initially.
At top, add #include <llvm-c/Support.h> for LLVMParseCommandLineOptions()
declaration.

v2: Fix compile error with old llvm versions (sroland)

Cc: "17.2" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 5c75f0c8bb)
2017-10-17 16:59:02 +01:00
Ben Crocker
1a8ccdc6e9 gallivm: fix typo in debug_printf message
In gallivm_compile_module, fix a typo in the
debug_printf("Invoke as \"llc ..." message.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 3a9feb4db8)
2017-10-17 16:59:02 +01:00
Leo Liu
608bea62ca st/va: don't re-allocate interlaced buffer with pakced format
It caused corruption, when vlVaPutImage putting raw data to the fields

v2: add RGB formats since it got uploaded here as well

Cc: mesa-stable@lists.freedesktop.org
Cc: Andy Furniss <adf.lists@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 0fa950ecd3)
2017-10-17 16:59:02 +01:00
Leo Liu
dc47b179ed st/vdpau: don't re-allocate interlaced buffer with packed YUV format
It caused corruption, when vlVdpVideoSurfacePutBitsYCbCr putting YUV to the fields

Cc: mesa-stable@lists.freedesktop.org
Cc: Andy Furniss <adf.lists@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 327480d10f)
2017-10-17 16:59:02 +01:00
Dave Airlie
1146591d4b radv: emit fmuladd instead of fma to llvm.
For Vulkan SPIR-V the spec states
fma() Inherited from OpFMul followed by OpFAdd.

Matt says the backend will do the right thing depending on the
hardware being compiled for, if you use the fmuladd intrinsic.

Using the Mad Max pts test, on high settings at 4K:
CHP: 55->60
HGDD: 46->50
LM: 55->60
No change on Stronghold.

Thanks to Feral for spending the time to track this down.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4e93d6baae)
[Emil Velikov: s/ac_to_float_type/to_float_type/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/common/ac_nir_to_llvm.c
2017-10-17 16:59:02 +01:00
Emil Velikov
76c6bbca7c cherry-ignore: add "anv: Remove unreachable cases from isl_format_for_size"
The commit causes a number of regressions with dEQP

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-10-17 16:57:45 +01:00
Lionel Landwerlin
3315bb4f08 intel: compiler: vec4: add missing default 0 lod
We set a similar default value for LOD in the fs backend for TXS/TXL.
Without this we end up generating invalid MOV with a null src.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit d3acc240d0)
2017-10-17 16:53:25 +01:00
Józef Kucia
cb72969d8b anv: Fix vkCmdFillBuffer()
The vkCmdFillBuffer() command fills a buffer with an uint32_t value.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.1 17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 15fdbf9c39)
2017-10-11 17:44:36 +01:00
Marek Olšák
1093207463 st/mesa: don't use pipe_surface for passing information about EGLImage
Use st_egl_image instead. radeonsi doesn't like when we create
a pipe_surface with PIPE_FORMAT_NV12.

This fixes NV12 texturing on radeonsi using kmscube.

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 2d62817da9)
2017-10-11 17:44:36 +01:00
Bas Nieuwenhuizen
f5f515a023 nir/spirv: Allow loop breaks in a switch body.
Per the SPIR-V spec 2.11 Structured Control Flow:

"The only blocks in a construct that can branch outside the construct are

...
- a break block for the innermost loop it is inside of.
..."

With

"Break block: A block containing a branch to the Merge Block of a loop header's merge instruction."

Note that it puts no restriction on not being in an if or switch within the innermost loop.

This passes the loop_break block to the switch body so it can properly detect loop breaks.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit ef61d09d5b)
2017-10-11 17:44:36 +01:00
Rob Clark
18db986354 freedreno/a5xx: fix missing restore state
RB_CLEAR_CNTL seems to be in a funny state after boot (at least on
8x96/a530).

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 7f3eab03fe)
2017-10-11 17:44:36 +01:00
Rob Clark
c386538036 freedreno/a5xx: align height to GMEM
Similar to the way width/pitch alignment works, it seems like we need to
do similar for height.  Otherwise the BLIT from system memory to GMEM
can over-fetch beyond the end of the buffer, triggering a fault.

I'm not sure if there is a better solution yet.  Possibly we could fall
back to pre-a5xx style DRAW packets for cases where BLIT might over-
fetch.  (We in theory have that problem already with rendering to higher
mipmap levels, although fortunately those tend to use GMEM bypass.)

This fixes issues reported with glamor.

Reported-by: don.harbin@linaro.org
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 16ac70bdcf)
2017-10-11 17:44:36 +01:00
Nicolai Hähnle
d0b52003d0 radeonsi: fix maximum advertised point size / line width
The hardware registers store the half-size/width in 12.4 fixed point
format, so 8192 is the maximum.

Fixes dEQP-GLES3.functional.rasterization.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 30e37289ea)
2017-10-11 17:44:36 +01:00
Nicolai Hähnle
6e903ae7d5 radeonsi: deduce rast_prim correctly for tessellation point mode
Together with the previous patches, this fixes
dEQP-GLES31.functional.primitive_bounding_box.wide_points.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a3fa3b2e02)
2017-10-11 17:44:36 +01:00
Nicolai Hähnle
6e3788e3b1 radeonsi: don't discard points and lines
This is a bit conservative, but a more precise solution requires access
to the rasterizer state. This is something to tackle after the fork between
r600 and radeonsi.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 4d74432dd3)
2017-10-11 17:44:36 +01:00
Nicolai Hähnle
05e2ed7889 radeonsi: move current_rast_prim to r600_common_context
We'll use it in the scissors / clip / guardband state.

v2: avoid a performance regression on r600 when applied to
    (pre-fork) stable branches

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f86a112b07)
2017-10-11 17:44:36 +01:00
Nicolai Hähnle
f2fc168517 glsl/lower_instruction: handle denorms and overflow in ldexp correctly
GLSL ES requires both, and while GLSL explicitly doesn't require correct
overflow handling, it does appear to require handling input inf/denorms
correctly.

Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.*

Cc: mesa-stable@lists.freedesktop.org
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 93bf9c114b)
[Emil Velikov: init_num_operands() does not exist on branch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/compiler/glsl/lower_instructions.cpp
2017-10-11 17:44:06 +01:00
Nicolai Hähnle
eac6932c66 util/queue: fix a race condition in the fence code
A tempting alternative fix would be adding a lock/unlock pair in
util_queue_fence_is_signalled. However, that wouldn't actually
improve anything in the semantics of util_queue_fence_is_signalled,
while making that test much more heavy-weight. So this lock/unlock
pair in util_queue_fence_destroy for "flushing out" other threads
that may still be in util_queue_fence_signal looks like the better
fix.

v2: rephrase the comment

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
(cherry picked from commit a208cd7ae4)
2017-10-11 17:22:55 +01:00
Nicolai Hähnle
7bedb1fd12 radeonsi/gfx9: fix geometry shaders without output vertices
Not that those are super common or useful, but hey! Fun corner cases
of the API...

Fixes dEQP-GLES31.functional.geometry_shading.emit.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 7dfa891f32)
2017-10-11 17:22:55 +01:00
Nicolai Hähnle
332f6e9b3b amd/common: fix build_cube_select
Fix the custom cube coord selection sequence to be identical to
the hardware v_cubesc/tc and OpenGL spec. Affects texture sampling
with user-provided derivatives.

Fixes dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 5be5c1e0fa)
2017-10-11 17:22:55 +01:00
Nicolai Hähnle
f5cbe1f9b8 st/glsl_to_tgsi: fix conditional assignments to packed shader outputs
Overriding the default (no-op) swizzle is clearly counter-productive,
since the whole point is putting the destination register as one of
the source operands so that it remains unmodified when the assignment
condition is false.

Fragment depth and stencil outputs are a special case due to how their
source swizzles are manipulated in translate_src when compiling to
TGSI.

Fixes dEQP-GLES2.functional.shaders.conditionals.if.*_vertex
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

(cherry picked from commit 8ea7d3a5c8)
2017-10-11 17:22:55 +01:00
Marek Olšák
f6d64c7dc1 mesa: fix texture updates for ATI_fragment_shader
Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d54025cd1)
2017-10-11 17:22:55 +01:00
Leo Liu
80db5b3420 st/va: use pipe transfer_map to map upload buffer
The function pipe_buffer_map() is only for linear pipe buffer,
with height as 0, and it's not for any 2D textures.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6ed61b8d3f)
2017-10-11 17:22:55 +01:00
Juan A. Suarez Romero
5a71ed6fa5 docs: add sha256 checksums for 17.2.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-10-02 18:10:02 +02:00
Juan A. Suarez Romero
bc12538a8e docs: add release notes for 17.2.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-10-02 17:26:10 +02:00
Juan A. Suarez Romero
98dce315c8 Update version to 17.2.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-10-02 17:15:13 +02:00
Bas Nieuwenhuizen
1535e8a5d4 radv: Check for GFX9 for 1D arrays in image_size intrinsic.
Only on GFX9 we implement them as 2D images.

This fixes:
dEQP-VK.image.image_size.1d_array.readonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_7x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_writeonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_writeonly_7x1
dEQP-VK.image.image_size.1d_array.writeonly_12x34
dEQP-VK.image.image_size.1d_array.writeonly_1x1
dEQP-VK.image.image_size.1d_array.writeonly_32x32
dEQP-VK.image.image_size.1d_array.writeonly_7x1

Fixes: 1bcb953e16 "radv: handle GFX9 1D textures"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 979978ee06)
2017-10-02 17:12:17 +02:00
Nicolai Hähnle
a66a70480f radeonsi: fix a regression in integer cube map handling
A recent commit fixed the case of 8888 integer cube maps, which need the
workaround of replacing the data format with USCALED/SSCALED. However,
this broke the case of non-8888 integer cube maps; those still need the
fix of shifting the texture coordinates.

Fixes KHR-GL45.texture_gather.plain-gather-int-cube-array and similar.

Fixes: 6fb0c1013b ("radeonsi: workaround for gather4 on integer cube maps")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6d23f7c65d)
2017-10-02 17:12:17 +02:00
Nicolai Hähnle
bd522741c4 amd/common: move ac_build_phi from radeonsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 052b974fed)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/amd/common/ac_llvm_build.c
	src/amd/common/ac_llvm_build.h
2017-10-02 17:12:17 +02:00
Jason Ekstrand
b78c664115 vulkan/wsi/wayland: Return better error messages
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4fe3913b96)
2017-09-28 13:54:58 +00:00
Jason Ekstrand
2467b50e45 vulkan/wsi/wayland: Copy wl_proxy objects from oldSwapchain if available
This should save us some round trips while resizing.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 537b9bc3e4)
2017-09-28 13:54:58 +00:00
Jason Ekstrand
b5a70210af vulkan/wsi/wayland: Stop caching Wayland displays
We originally implemented caching to avoid unneeded round-trips to the
compositor when querying surface capabilities etc. to set up the
swapchain.  Unfortunately, this doesn't work if vkDestroyInstance is
called after the Wayland connection has been dropped.  In this case, we
end up trying to clean up already destroyed wl_proxy objects which leads
to crashes.  In particular most of dEQP-VK.wsi.wayland is crashing
thanks to this problem.

This commit gets rid of the cache and simply embeds the wsi_wl_display
struct in the swapchain.  While we're at it, we can get rid of the
wl_event_queue that we were storing in the swapchain because we can just
use the one in the embedded wsi_wl_display.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Bugzilla: https://bugs.freedesktop.org/102578
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4369102498)
2017-09-28 13:54:58 +00:00
Jason Ekstrand
5edc4dc080 vulkan/wsi/wayland: Refactor wsi_wl_display code
We convert it over to an inti/finish model and make create/destroy
wrappers for the former.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 77181d9580)
2017-09-28 13:54:57 +00:00
Ian Romanick
6587cbfca2 nv20: Fix GL_CLAMP
v2: Force T and R wrap modes to GL_CLAMP_TO_EDGE for 1D textures.
This fixes a regression in tex1d-2dborder.  The test uses a 1D texture
but it provides S and T texture coordinates.  Since the T wrap mode
would (correctly) be set to GL_CLAMP, the texture would gradually
blend (incorrectly) with the border color.

I also tried setting NV20_3D_TEX_FORMAT_DIMS_1D instead of
NV20_3D_TEX_FORMAT_DIMS_2D for 1D textures, but that did not help.

It is possible that the same problem exists for 2D textures with the
R-wrap mode, but I don't think there are any piglit tests for that.

No test changes on NV20 (10de:0201).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 953a3cf0fd)
2017-09-28 13:54:57 +00:00
Tomasz Figa
6b0ded06b1 egl/dri2: Implement swapInterval fallback in a conformant way
dri2_fallback_swap_interval() currently used to stub out swap interval
support in Android backend does nothing besides returning EGL_FALSE.
This causes at least one known application (Android Snapchat) to fail
due to an unexpected error and my loose interpretation of the EGL 1.5
specification justifies it. Relevant quote below:

    The function

        EGLBoolean eglSwapInterval(EGLDisplay dpy, EGLint interval);

    specifies the minimum number of video frame periods per buffer swap
    for the draw surface of the current context, for the current rendering
    API. [...]

    The parameter interval specifies the minimum number of video frames
    that are displayed before a buffer swap will occur. The interval
    specified by the function applies to the draw surface bound to the
    context that is current on the calling thread. [...] interval is
    silently clamped to minimum and maximum implementation dependent
    values before being stored; these values are defined by EGLConfig
    attributes EGL_MIN_SWAP_INTERVAL and EGL_MAX_SWAP_INTERVAL
    respectively.

    The default swap interval is 1.

Even though it does not specify the exact behavior if the platform does
not support changing the swap interval, the default assumed state is the
swap interval of 1, which I interpret as a value that eglSwapInterval()
should succeed if called with, even if there is no ability to change the
interval (but there is no change requested). Moreover, since the
behavior is defined to clamp the requested value to minimum and maximum
and at least the default value of 1 must be present in the range, the
implementation might be expected to have a valid range, which in case of
the feature being unsupported, would correspond to {1} and any request
might be expected to be clamped to this value.

This is further confirmed by the code in _eglSwapInterval() in
src/egl/main/eglsurface.c, which is the default fallback implementation
for EGL drivers not implementing its own. The problem with it is that
the DRI2 EGL driver provides its own implementation that calls into
platform backends, so we cannot just simply fall back to it.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-09-28 13:54:57 +00:00
Boris Brezillon
d9aba007c0 broadcom/vc4: Fix infinite retry in vc4_bo_alloc()
cleared_and_retried is always reset to false when jumping to the retry
label, thus leading to an infinite retry loop.

Fix that by moving the cleared_and_retried variable definitions at the
beginning of the function.  While we're at it, move the create variable
with the other local variables and explicitly reset its content in the
retry path.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: 78087676c9 "vc4: Restructure the simulator mode."
(cherry picked from commit ef578906d8)
2017-09-28 13:54:57 +00:00
Eric Anholt
e22ab89f9f broadcom/vc4: Keep pipe_sampler_view->texture matching the original texture.
I was overwriting view->texture with the shadow resource when we need to
do shadow copies (retiling or baselevel rebase), but that tripped up some
critical new sanity checking in state_tracker (making sure that stObj->pt
hasn't changed from view->texture through TexImage-related paths).

To avoid that, move the shadow resource to the vc4_sampler_view struct.

Fixes: f0ecd36ef8 ("st/mesa: add an entirely separate codepath for setting up buffer views")
(cherry picked from commit 68c91a87d7)
2017-09-28 13:54:57 +00:00
Jason Ekstrand
1b100aae7b vulkan/wsi/wayland: Stop printing out the DRM device
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 016de7e155)
2017-09-28 13:54:57 +00:00
Kenneth Graunke
ec89ab30c9 i965/vec4: Fix swizzles on atomic sources.
Atomic operation sources are scalar values, but we were failing to
select the .x component of the second operand.  For example,

   atomicCounterCompSwapARB(counter, 5u, 10u)

would generate

   mov(8) vgrf4.x:D, 5D
   mov(8) vgrf5.x:D, 10D

   mov(8) vgrf9.x:UD, vgrf4.xyzw:D
   mov(8) vgrf9.y:UD, vgrf5.xyzw:D

which wrongly selects the .y component of vgrf5, so the actual 10u value
would get dead code eliminated.  The swizzle works for the other source,
but both of them ought to be .xxxx.

Fixes the compare and swap CTS tests in:
KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase

Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 66342c997f)
2017-09-28 13:54:57 +00:00
Kenneth Graunke
8619a6b1cc i965/vec4: Actually handle atomic op intrinsics.
Embarassingly, someone enabled the ARB_shader_atomic_counter_ops
extension for Gen7+ but never added the intrinsics to the switch
statement in the vec4 backend, so they just hit an unreachable()
call and died.

Fixes: 40dd45d0c6 (i965: Enable ARB_shader_atomic_counter_ops)
Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit a62fe34098)
2017-09-28 13:54:57 +00:00
Samuel Pitoiset
47c8ffe719 radv: fix saved compute state when doing statistics/occlusion queries
We are pushing 16-bytes of constants, so we have to save/restore
the same amount of data to avoid data corruption.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4b407a62c7)
2017-09-28 13:54:57 +00:00
Grazvydas Ignotas
84f3374f1c configure: check if -latomic is needed for __atomic_*
On some platforms, gcc generates library calls when __atomic_* functions
are used, but does not link the required library (libatomic) automatically
(supposedly to allow the app to use some other atomics implementation?).

Detect this at configure time and add the library when needed. Tested
on armel (library was added) and on x86_64 (was not, as expected).

Some documentation on this is provided in GCC wiki:
https://gcc.gnu.org/wiki/Atomic/GCCMM

Fixes: 8915f0c0 "util: use GCC atomic intrinsics with explicit memory model"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102573
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 2ef7f23820)
2017-09-28 13:54:57 +00:00
Nicolai Hähnle
5e5882a7aa amd/addrlib: fix missing va_end() after va_copy()
There's no reason to use va_copy here.

CID: 1418113
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: e7fc664b91 ("winsys/amdgpu: add addrlib - texture
                              addressing and alignment calculator")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 34126ed248)
2017-09-28 13:54:57 +00:00
Juan A. Suarez Romero
89db1532fc cherry-ignore: add "radv: copy the number of viewports/scissors at pipeline bind time"
fixes: This commit addressed earlier commits dcf46e99 and 60878dd0 which
did not land in branch.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-09-28 13:54:57 +00:00
Nicolai Hähnle
5e8ddba33f radeonsi: set MIP_POINT_PRECLAMP to 0
This fixes a bug with nearest ("point") mip selection when the fractional
part of max_lod is in (0.5,1). In this case, the spec mandates that
we still select the mip level ceil(max_lod) in the clamping case. However,
MIP_POINT_PRECLAMP will clamp before the mip selection, which is wrong.

Supposedly this setting was originally copied from the closed Vulkan
driver, but as far as I can tell, closed Vulkan was actually changed back
recently :)

Fixes dEQP-GLES3.functional.texture.mipmap.2d.max_lod.{nearest,linear}_nearest

Fixes: f7420ef5b4 ("radeonsi: enable some sampler fields to match the closed driver")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 704ddbcdf6)
2017-09-28 13:54:57 +00:00
Eric Anholt
fc2b01adc9 broadcom/vc4: Fix use-after-free when deleting a program.
By leaving the compiled shader in the context's stage state, the next
compile of a new FS would look in the old compiled FS for figuring out
whether to set various dirty flags for the VS compile.  Clear out the
pointer when deleting the program, and make sure that we always mark the
state as dirty if the previous program had been lost.  Fixes valgrind
warnings on glsl-max-varyings.

Fixes: 2350569a78 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
(cherry picked from commit 3752ad28f2)
2017-09-28 13:54:57 +00:00
Eric Anholt
fe53c8bc21 broadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.
The blitter will bind just the depth buffer, which flushes the current job
if we had both a color and depth/stencil.  If the clear was doing partial
depth/stencil (quad-based) and color (tile-based), we'd go on to try to
set up the rest of the tile clear in the now flushed job.

Instead, move the partial clear up before we start setting up the job for
the current FBO state, and re-fetch the job if we're continuing on to a
tile-based clear.  Fixes valgrind failures in fbo-depthtex.

Fixes: 9421a6065c ("vc4: Fix fallback to quad clears of depth in GLX.")
(cherry picked from commit 9940fb4205)
2017-09-28 13:54:57 +00:00
Eric Anholt
60efffcec0 broadcom/vc4: Fix use-after-free for flushing when writing to a texture.
I was trying to continue the hash table loop, not the inner loop.  This
tended to work out, because we would have *just* freed the job struct.
Fixes some valgrind failures in fbo-depthtex.

Fixes: f597ac3966 ("vc4: Implement job shuffling")
(cherry picked from commit d88a75182d)
2017-09-28 13:54:56 +00:00
Dave Airlie
5d1deeeab3 st/glsl->tgsi: fix u64 to bool comparisons.
Otherwise we end up using a 32-bit comparison which didn't end well.

Timothy caught this while playing around with some opt passes.

Fixes: 278580729a (st/glsl_to_tgsi: add support for 64-bit integers)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit a7a7bf21bd)
2017-09-28 13:54:56 +00:00
Juan A. Suarez Romero
dbccd8465d cherry-ignore: add "radv: Check for GFX9 for 1D arrays in image_size intrinsic."
fixes: This commit addressed earlier commits 61ad2f13 and 6dcc54b4 which
did not land in branch

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-09-28 13:54:56 +00:00
Juan A. Suarez Romero
de44476572 cherry-ignore: add "radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug"
fixes: This commit addressed an earlier commit c7e9ebb3ab which did not
land in branch

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-09-28 13:54:56 +00:00
Leo Liu
2712e59538 st/va/postproc: use video original size for postprocessing
Otherwise the aligned size will make video scaled

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit eb51838771)
2017-09-28 13:54:56 +00:00
Samuel Iglesias Gonsálvez
e16c31f3fe anv: fix viewport transformation for z component
In Vulkan, for 'z' (depth) component, the scale and translate values
for the viewport transformation are:

pz = maxDepth - minDepth
oz = minDepth

zf = pz × zd + oz

Being zd, the third component in vertex's normalized device coordinates.

Fixes: dEQP-VK.draw.inverted_depth_ranges.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d2cd9deeb8)
2017-09-28 13:54:56 +00:00
David Airlie
dbf33a12e1 radv: add gfx9 scissor workaround
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3e54493265)
2017-09-28 13:54:56 +00:00
Lucas Stach
cd9d26c3ef etnaviv: fix 16bpp clears
util_pack_color may leave undefined values in the upper half of the packed
integer. As our hardware needs the upper 16 bits to mirror the lower 16bits,
this breaks clears of those formats if the undefined values aren't masked off.

I've only observed the issue with R5G6B5_UNORM surfaces, other 16bpp
formats seem to work fine.

Fixes: d6aa2ba2b2 (etnaviv: replace translate_clear_color with util_pack_color)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit e9d37d68cf)
2017-09-28 13:54:56 +00:00
Tim Rowley
b378fd5cf3 swr/rast: remove llvm fence/atomics from generated files
We currently don't use these instructions, and since their API
changed in llvm-5.0 having them in the autogen files broke the mesa
release tarballs which ship with generated autogen files.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102847
CC: mesa-stable@lists.freedesktop.org
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit 066d1dc951)
2017-09-28 13:54:56 +00:00
Tapani Pälli
caff607bcb mesa: free current ComputeProgram state in _mesa_free_context_data
This is already done for other programs stages, fixes a leak when using
compute programs.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102844
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 589457d97f)
2017-09-28 13:54:56 +00:00
Nicolai Hähnle
737e2245ce radeonsi: fix array textures layer coordinate
Like for cube map (array) gather, we need to round to nearest on <= VI.

Fixes tests in dEQP-GLES3.functional.shaders.texture_functions.texture.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 87f7c7bd65)
2017-09-28 13:54:56 +00:00
Nicolai Hähnle
46cfefc5be glsl/linker: fix output variable overlap check
Prevent an overflow caused by too many output variables. To limit the
scope of the issue, write to the assigned array only for the non-ES
fragment shader path, which is the only place where it's needed.

Since the function will bail with an error when output variables with
overlapping components are found, (max # of FS outputs) * 4 is an upper
limit to the space we need.

Found by address sanitizer.

Fixes dEQP-GLES3.functional.attribute_location.bind_aliasing.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 15cae12804)

Squashed with commit:

glsl/linker: properly fix output variable overlap check

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102904
Fixes: 15cae12804 ("glsl/linker: fix output variable overlap check")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit df8767a14e)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-09-28 13:54:56 +00:00
Józef Kucia
6121bc3150 anv: Fix descriptors copying
Trivial.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 65a09f98ad)
2017-09-28 13:54:56 +00:00
Dave Airlie
488460b1ef ac/surface: handle S8 on gfx9
If we don't have a depth piece, we don't get a correct
swizzle mode and we hit an assert in addrlib.

In case of no depth get the preferrred swizzle mode for
stencil alone.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c4ac522511)

Squashed with commit:

ac/surface: handle error when choosing preferred swizzle mode

CID: 1418140
Fixes: c4ac522511 ("ac/surface: handle S8 on gfx9")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit eb71394ff3)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-09-28 13:54:56 +00:00
Alexandru-Liviu Prodea
3c54a77d08 Scons: Add LLVM 5.0 support
1 new required library - LLVMBinaryFormat

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org=/show_bug.cgi?id=3D102318
Signed-off-by: Alexandru-Liviu Prodea <liviuprodea@yahoo.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c1b0137048)
2017-09-28 13:54:56 +00:00
Nicolai Hähnle
891627ec0f amd/common: add workaround for cube map array layer clamping
Fixes dEQP-GLES31.functional.texture.filtering.cube_array.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 94736d31c3)

Squashed with commit:

amd/common: add chip_class to ac_llvm_context

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 3db86d86ed)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-09-28 13:54:56 +00:00
Nicolai Hähnle
688f8415f7 amd/common: round cube array slice in ac_prepare_cube_coords
The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.

Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e0af3bed2c)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/amd/common/ac_nir_to_llvm.c
2017-09-28 13:54:55 +00:00
Nicolai Hähnle
54915ee52f radeonsi: workaround for gather4 on integer cube maps
This is the same workaround that radv already applied in commit
3ece76f03d ("radv/ac: gather4 cube workaround integer").

Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6fb0c1013b)
2017-09-28 13:54:55 +00:00
Jason Ekstrand
da517431f6 i965/blorp: Set r8stencil_needs_update when writing stencil
This fixes a crash on Haswell when we try to upload a stencil texture
with blorp.  It would also be a problem if someone tried to texture from
stencil after glBlitFramebuffers.

Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit a43d379000)
2017-09-28 13:54:55 +00:00
Matt Turner
3b4a6dbe93 util/u_atomic: Add implementation of __sync_val_compare_and_swap_8
Needed for 32-bit PowerPC.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where
they're missing")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

(cherry picked from commit 1bbe180873)
2017-09-28 13:54:55 +00:00
Matt Turner
d2dca92a6c util: Link libmesautil into u_atomic_test
Platforms without particular atomic operations require the
implementations in u_atomic.c

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where
they're missing")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d075a4089e)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/util/Makefile.am
2017-09-28 13:54:55 +00:00
Emil Velikov
d41d54b061 automake: enable libunwind in `make distcheck'
Enable the toggle to catch when the library is missing from the link
path. Better to test, fail and address before releasing Mesa ;-)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 9aba643e3c)
2017-09-28 13:54:55 +00:00
Gert Wollny
5d16640ea1 travis: Add libunwind-dev to gallium/make builds
libunwind is a optional dependency used by the gallium aux module
(libgallium) and consequently the final binaries must be linked against
it. To test whether the library is properly specified in the link pass
add it to the travis-ci build environment and force its use.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 39fe51c1e3)
2017-09-26 12:17:33 +02:00
Gert Wollny
6024a95198 travis: force llvm-3.3 for "make Gallium ST Other"
In Ubuntu Trusty the default version of llvm is 3.4 and the build was
actually randomly picking 3.5 or 3.9. Adding libunwind would then result
is build success or failure depending of what version was picked.

Install the llvm-3.3-dev package and force its use: On one hand it is
the minimum required version we want to the build test against, and on
the other hand forcing the version stabilizes the build.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d3675812b5)
2017-09-26 12:17:33 +02:00
Dave Airlie
996bf9c1cd radv/nir: call opt_remove_phis after trivial continues.
With the shaders in the ssao demo, the nir_opt_if wasn't
working properly without this, after this the if gets optimised
so that loop unrolling gets called.

(loop unrolling fails due to instruction count, but at least
it gets to do that.)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 64d9bd149a)
[Juan A. Suarez: apply patch over src/amd/vulkan/radv_pipeline.c]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/amd/vulkan/radv_shader.c
2017-09-26 12:17:33 +02:00
Emil Velikov
bd903d4ee1 docs: add sha256 checksums for 17.2.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-18 00:12:36 +01:00
Emil Velikov
d6d2b6b5ec docs: add release notes for 17.2.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-17 23:57:32 +01:00
Emil Velikov
cb778d563f Update version to 17.2.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-17 23:53:08 +01:00
Bas Nieuwenhuizen
7c3bd519e7 radv: Don't allocate CMASK for linear images.
We can't use it anyway in fast clears, and on GFX9 it seems to
actually hange the card if we specify it.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
(cherry picked from commit 1a172fb113)
2017-09-13 21:52:42 +01:00
Bas Nieuwenhuizen
c6b3732967 radv: Disable multilayer & multilevel DCC.
The current DCC init routine doesn't account for initializing a
single layer or level. Multilayer seems hard for small textures on
pre-GFX9 as tre metadata for the layers can be interleaved. For
GFX9 multilevel textures are a problem for similar reasons.

So just disable this for now, until we handle the texture modes
correctly.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
(cherry picked from commit bee83b2661)
2017-09-13 21:52:42 +01:00
Eric Engestrom
e012ade1cf docs/egl: remove reference to EGL_DRIVERS_PATH
Support for external egl drivers was dropped a few years ago.

Fixes: 209360bbb9 "egl/main: drop support for external egl drivers"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 85b66d2096)
2017-09-13 21:52:42 +01:00
Dave Airlie
41e691d605 radv/winsys: fix flags vs va_flags thinko.
Fixes: e8d57802f (radv/gfx9: allocate events from uncached VA space)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a5add6fb30)
2017-09-13 21:52:42 +01:00
Emil Velikov
b6ae4400fc egl/x11/dri3: adding missing __DRI_BACKGROUND_CALLABLE extension
Fixes: 3b7b6adf3a ("egl: Implement __DRI_BACKGROUND_CALLABLE")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f24bc18162)
2017-09-13 21:52:42 +01:00
Bas Nieuwenhuizen
c47914276e radv: Fix vkCopyImage with both depth and stencil aspects.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ff23e03d60)
2017-09-13 21:52:42 +01:00
Bas Nieuwenhuizen
536f852d42 radv: Actually set the cmd_buffer usage_flags.
Otherwise, the simultaneous uage bit doesn't get set from the begin
info, which we need for batchchaining.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit dec7b38fe6)
2017-09-13 21:52:42 +01:00
Grazvydas Ignotas
fdc4f6e684 radv: don't assert on empty hash table
Currently if table_size is 0, it's falling through to:

unreachable("hash table should never be full");

But table_size can be 0 when RADV_DEBUG=nocache is set, or when the
table allocation fails (which is not considered an error).

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit b8dd69e1b4)
2017-09-13 21:52:42 +01:00
Eric Engestrom
8cf9e5ab56 mesa/st: remove unwanted backup file
Fixes: 0ac78dc925 "util: move string_to_uint_map to glsl"
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ac0d8dc3fa)
2017-09-13 21:52:42 +01:00
Samuel Pitoiset
d5c71e44c4 radeonsi: update dirty_level_mask before dispatching
This fixes a rendering issue with Hitman when bindless textures
are enabled.

Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 59101e771d)
2017-09-13 21:52:42 +01:00
Nicolai Hähnle
53190187cc glsl: fix glsl_struct_field size calculations for shader cache
Found by address sanitizer:

==22621==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61400000cbd8 at pc 0x7f561610a4ff bp 0x7ffca85f9d50 sp 0x7ffca85f94f8
READ of size 344 at 0x61400000cbd8 thread T0
    #0 0x7f561610a4fe  (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x5f4fe)
    #1 0x7f560bb305a5 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
    #2 0x7f560bb305a5 in blob_write_bytes ../../../mesa-src/src/compiler/glsl/blob.c:136
    #3 0x7f560be7d7ff in encode_type_to_blob ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:153
    #4 0x7f560be81222 in write_program_resource_data ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:950
    #5 0x7f560be81222 in write_program_resource_list ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1118
    #6 0x7f560be81222 in shader_cache_write_program_metadata(gl_context*, gl_shader_program*) ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1407
    #7 0x7f560b825fdb in link_program ../../../mesa-src/src/mesa/main/shaderapi.c:1163

Fixes: 073a84ff60 ("glsl: stop adding pointers from glsl_struct_field to the cache")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 4da6cf6c98)
2017-09-13 21:52:42 +01:00
Emil Velikov
366e02f992 cherry-ignore: add EGL+gbm swast patches
They depend on an earlier commit which did not land in branch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-13 21:52:41 +01:00
Nicolai Hähnle
5cec093773 ac/surface: match Z and stencil tile config
Fixes various piglit tests on Stoney, see the comment.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit cffc0ae0d9)
[Emil Velikov: attribute for the different gfx6_surface_settings()]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/common/ac_surface.c
2017-09-13 21:52:02 +01:00
Nicolai Hähnle
69394e3517 radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog
gl_SampleMaskIn is supposed to contain set bits only for the samples that
are covered by the current fragment shader invocation, but the VGPR
initialization hardware loads the set of all bits that are covered at the
current pixel.

Fixes various tests in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 92c4277990)
[Emil Velikov: attribute for the lack of add_arg*checked() API]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/radeonsi/si_shader.c
2017-09-13 21:52:02 +01:00
Timothy Arceri
c0c1e05fb6 glsl: stop adding pointers from bindless structs to the cache
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4009370232)
2017-09-13 21:52:02 +01:00
Timothy Arceri
94b656cb5e glsl: stop adding pointers from shader_info to the cache
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit a6618afd27)
2017-09-13 21:52:02 +01:00
Timothy Arceri
5e0b556abc compiler: move pointers to the start of shader_info
This will allow us to easily skip them when writting the struct
to disk cache.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3ea3f75723)
2017-09-13 21:52:02 +01:00
Timothy Arceri
f087dad40d glsl: always write a name/label string to the cache
In the following patch we will stop writing the pointer to cache.

Unfortunately adding empty strings to that cache seems to be the
only thing we can do here once we no longer have the pointers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 44918a1979)
2017-09-13 21:52:02 +01:00
Timothy Arceri
83bbfaddf1 glsl: don't write uniform storage offset if there isn't one
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 22154823d2)
2017-09-13 21:52:02 +01:00
Timothy Arceri
a8eea040cd glsl: add has_uniform_storage() helper to shader cache
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2662269ad7)
2017-09-13 21:52:02 +01:00
Timothy Arceri
15ecbb2c07 glsl: stop adding pointers from glsl_struct_field to the cache
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 073a84ff60)
2017-09-13 21:52:02 +01:00
Timothy Arceri
7fef8d436a glsl: stop adding pointers from gl_shader_variable to the cache
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 37d453b55a)
2017-09-13 21:52:02 +01:00
Timothy Arceri
ea1bcfc063 glsl: allow NULL to be passed to encode_type_to_blob()
This will be used by the following commit.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 37eb67714e)
2017-09-13 21:52:01 +01:00
Eric Engestrom
6e2b9fba21 util: improve compiler guard
Glibc 2.26 has dropped xlocale.h, but the functions needed (strtod_l()
and strdof_l()) can be found in stdlib.h.
Improve the detection method to allow newer builds to still make use of
the locale-setting.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102454
Cc: Laurent Carlier <lordheavym@gmail.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 49b428470e)
2017-09-13 21:52:01 +01:00
Kenneth Graunke
dcb634df92 i965: Set "Subslice Hashing Mode" to 16x16 on Apollolake.
As of 4.11, the kernel isn't bothering to set the subslice hashing mode
on Apollolake, leaving it at the default of 8x8.  (It initializes it to
16x4 on most platforms.)

Performance data for GPUTest Triangle on Apollolake at 1024x640:

   X-tiled RT:
   -----------
   8x8 -> 16x4:   2.4325%  +/- 0.383683% (n=107)
   8x8 -> 8x4:   -3.75105% +/- 0.592491% (n=40)
   8x8 -> 16x16:  6.17238% +/- 0.67157%  (n=30)

   Y-tiled RT:
   -----------
   8x8 -> 16x4:   1.30307%  +/- 0.297292% (n=205)
   8x8 -> 8x4:   -0.769282% +/- 0.729557% (n=35)
   8x8 -> 16x16:  3.00254%  +/- 0.715503% (n=40)

   8x MSAA RT (INTEL_FORCE_MSAA=8):
   --------------------------------
   8x8 -> 16x4:   1.38889% +/- 0.93729%  (n=7)
   8x8 -> 8x4:   -2.10643% +/- 1.15153%  (n=3)
   8x8 -> 16x16:  3.87183% +/- 1.08851%  (n=5)

Based on this, we choose 16x16 for Apollolake.

Skylake GT2 with X-tiled buffers appears to be a toss-up between 16x4
and 16x16, and with Y-tiled buffers it doesn't seem to really matter.
So we'll leave Skylake alone for now.

The hashing mode doesn't seem to make a measurable impact on more
complex benchmarks.

Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit ebd2fd6ef3)
2017-09-13 21:52:01 +01:00
Dave Airlie
20be71ba7c radv/gfx9: fix image resource handling.
GFX9 changes how images are layed out, so this needs updating.

Fixes: dEQP-VK.query_pool.statistics_query.*

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3633bae36b)
2017-09-13 21:52:01 +01:00
Dave Airlie
a3d1ea347a radv/ac: bump params array for image atomic comp swap
For the comp_swap case this was overflowing and crashing
sometimes.

Fixes:
dEQP-VK.image.atomic_operations.compare_exchange.*

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit aba441be44)
2017-09-13 21:52:01 +01:00
Dave Airlie
dc640aab63 radv/gfx9: set mip0-depth correctly for 2d arrays/3d images
This field covers the whole resource.

Fixes:
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.3d.format.*
dEQP-VK.texture.filtering.3d.combinations.*

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ebd2a5354d)
2017-09-13 21:52:01 +01:00
Dave Airlie
c8076c8ea1 radv: handle GFX9 1D textures
As GFX9 can't handle 1D depth textures, radeonsi and
apparantly pro just update all 1D textures to 2D,
and work around it.

This ports the workarounds from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 1bcb953e16)

Conflicts:
	src/amd/common/ac_nir_to_llvm.c
2017-09-13 21:52:01 +01:00
Dave Airlie
62f0bb2a87 radv: don't use iview for meta image width/height.
Work out the width/height from the level manually, as on GFX9
we won't minify the iview width/height.

This fixes:
dEQP-VK.api.image_clearing.core.clear_color_image* on gfx9

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2f5b4490b5)
2017-09-13 21:52:01 +01:00
Emil Velikov
40f06286f9 cherry-ignore: add execution_type() fix to the list
As mentioned by Matt:
 "Without commit 4fab67a441, this patch isn't  needed at all."

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-13 21:50:49 +01:00
Nicolai Hähnle
0f79eb7abe st/glsl_to_tgsi: only the first (inner-most) array reference can be a 2D index
Don't get distracted by record dereferences between array references.

Fixes dEQP-GLES31.functional.tessellation.user_defined_io.per_vertex_block.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 03203b7448)
2017-09-13 21:34:03 +01:00
Dave Airlie
54c5568fa9 radv: use simpler indirect packet 3 if possible.
This fixes some observed hangs on CIK GPUs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 219d29e4d8)
[Emil Velikov: add the hunk in radv_emit_indirect_draw]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/vulkan/radv_cmd_buffer.c
2017-09-13 21:34:03 +01:00
Dave Airlie
c2f314d721 radv/gfx9: allocate events from uncached VA space
This copies what amdgpu-pro does, and allocates the memory
for an event with an uncached mtype.

This fixes hangs with:
dEQP-VK.api.command_buffers.record_simul_use_primary

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e8d57802fe)
2017-09-13 21:34:02 +01:00
Dave Airlie
fe04abc6e9 radv/winsys: use amdgpu_bo_va_op_raw.
This is a precursor to the gfx9 fix to use uncached for the event
memory. Move to the interface which allows setting the flags,
but wrap it to avoid having to copy it around the place.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 76ac8fafad)
2017-09-13 21:34:02 +01:00
Marek Olšák
0e305f0518 st/mesa: skip draw calls with pipe_draw_info::count == 0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102502

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit e4018fdd85)
2017-09-13 21:34:02 +01:00
Nicolai Hähnle
cb9dae484a radeonsi/gfx9: always flush DB metadata on framebuffer changes
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 34124e412f)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/radeonsi/si_pipe.h
	src/gallium/drivers/radeonsi/si_state.c
2017-09-13 21:34:02 +01:00
Dave Airlie
57ecf28668 Revert "radv: disable support for VEGA for now."
This reverts commit 611076a41a.

With the two previous commits, vega shouldn't be unstable,
doesn't pass CTS, but can do a complete run, and games shouldn't
hang anymore, so bring it back online.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e38685cc62)
2017-09-13 21:34:02 +01:00
Dave Airlie
78e2b539a4 radv/gfx9: set descriptor up for base_mip to level range.
This is required on GFX9, fixes a bug in Talos where all the
mipmaps overlay each other.

Just pushing this as well as it fixes Talos.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6d929d3f85)
2017-09-13 21:34:02 +01:00
Dave Airlie
86d4c203bd radv: disable 1d/2d linear optimisation on gfx9.
This causes hangs in some of the CTS tests with a 2d
1536x2 texture.

This fixes hangs with:
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.iew_type.1d_aray.format.r4g4b4a4_unorm_pack16.count_1.size.512x1_array_of_3
if we reenable it, make sure these don't regress.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d118ff8765)
2017-09-13 21:34:02 +01:00
Jason Ekstrand
bda2975eef spirv: Add support for the HelperInvocation builtin
I have no idea how this got missed but it's been missing since forever.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit e439908af9)
2017-09-13 21:34:02 +01:00
Roland Scheidegger
a4dc18efca st/mesa: fix view template initialization in try_pbo_readpixels
I think this is what the code was meant to do, albeit as far as I can tell
the redundant initialization some analyzers complain about should work as
well just fine (only the first layer will be used, if the view contains one
or more layers doesn't really matter).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102467
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 2b2c61f0df)
2017-09-13 21:34:02 +01:00
Kenneth Graunke
8f77409e2b i965: Fix crash in fallback GTT mapping.
We can't perf_debug without a context.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 52b65dfda8)
2017-09-13 21:34:02 +01:00
Rob Clark
458f52618a freedreno: skip batch-cache for compute shaders
It is kind of pointless for compute, and avoids issues with apps kicking
off more than 32 compute shaders at once.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit dc9e08b0c3)
2017-09-13 21:34:02 +01:00
Karol Herbst
402b8073ad nvc0: write 0 to pipeline_statistics.cs_invocations
cs_invocations are currently unsupported, but leaving the field uninitialized
is even worse.

fixes on nvc0:
 * KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values
 * KHR-GL45.pipeline_statistics_query_tests_ARB.functional_non_rendering_commands_do_not_affect_queries

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit b672c3833b)
2017-09-13 21:34:02 +01:00
Ben Crocker
6fe4852f59 llvmpipe: lp_build_gather_elem_vec BE fix for 3x16 load
Fix loading of a 3x16 vector as a single 48-bit load
on big-endian systems (PPC64, S390).

Roland Scheidegger's commit e827d91756
plus Ray Strode's patch reduce pre-Roland Piglit failures from ~4000 to ~2000.  This patch fixes
three of the four regressions observed by Ray:

- draw-vertices
- draw-vertices-half-float
- draw-vertices-half-float_gles2

One regression remains:
- draw-vertices-2101010

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 57c8ead0cd)
2017-09-13 21:34:02 +01:00
Ray Strode
d080f10fdf gallivm: correct channel shift logic on big endian
lp_build_fetch_rgba_soa fetches a texel from a texture.
Part of that process involves first gathering the element
together from memory into a packed format, and then breaking
out the individual color channels into separate, parallel
arrays.

The code fails to account for endianess when reading the packed
values.

This commit attempts to correct the problem by reversing the order
the packed values are read on big endian systems.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ray Strode <rstrode@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 75cb6e3617)
2017-09-13 21:34:02 +01:00
Jason Ekstrand
93b794f094 anv/formats: Nicely handle unknown VkFormat enums
This fixes some crashes in the dEQP-VK.memory.requirements.core.* tests.
I'm not sure whether or not passing out-of-bound formats into the query
is supposed to be allowed but there's no harm in protecting ourselves
from it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/101956
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 242211933a)

Squashed with:

anv: fix off by one in array check

`anv_formats[ARRAY_SIZE(anv_formats)]` is already one too far.
Spotted by Coverity.

CovID: 1417259
Fixes: 242211933a "anv/formats: Nicely handle unknown VkFormat enums"
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
(cherry picked from commit 0c7272a66c)
2017-09-13 21:33:55 +01:00
Charmaine Lee
520786586d vbo: fix offset in minmax cache key
Instead of saving primitive offset in the minmax cache key,
save the actual buffer offset which is used in the cache lookup.

Fixes rendering artifact seen with GoogleEarth when run with
VMware driver.

v2: Per Brian's comment, initialize offset to avoid compiler warning.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 2d93b462b4)

Squashed with:

vbo: fix build errors on android

incompatible pointer to integer conversion assigning to 'GLintptr' (aka 'int')
from 'const char *' [-Werror,-Wint-conversion]

      offset = indices;
             ^ ~~~~~~~

Fixes: 2d93b462b4 ("vbo: fix offset in minmax cache key")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 0986f68632)
2017-09-13 21:32:19 +01:00
Michael Olbrich
9ffe4cc6c0 egl/dri2: only destroy created objects
dri2_display_destroy may be called by dri2_initialize_wayland_drm() if
initialization fails. In this case, these objects may not be initialized.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Michael Olbrich <m.olbrich@pengutronix.de>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 81d5c31631)
2017-09-13 21:05:57 +01:00
Brian Paul
2b0e70e2ae llvmpipe: initialize llvmpipe->dirty with LP_NEW_SCISSOR
If llvmpipe_set_scissor_states() is never called, we still need to be sure
that derived scissor/clip state is updated.  As of commit 743ad599a9
that function might not be called.

Fixes regressed Piglit gl-1.0-scissor-offscreen -fbo -auto test.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101709
Fixes: 743ad599a9 ("st/mesa: don't set 16 scissors and 16 viewports
if they're unused")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>

(cherry picked from commit 88cdf16871)
2017-09-13 21:05:57 +01:00
Emil Velikov
3a9b2e0e7d cherry-ignore: ignore gfx9 tile swizzle fix
As mentioned by Dave and Bas, the patch is not applicable for 17.2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-13 21:05:56 +01:00
Emil Velikov
188010c68d cherry-ignore: add getCapability patches
Adding extra infra is not acceptable for stable. Offending code has been
wrapped in a Android compile guard with an earlier commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-13 21:05:33 +01:00
Emil Velikov
b4473dd519 docs: add sha256 checksums for 17.2.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-04 18:24:29 +01:00
Emil Velikov
f5925b2897 docs: Update 17.2.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-04 18:16:16 +01:00
Emil Velikov
702950d1ad Update version to 17.2.0(final)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-04 15:16:21 +01:00
Emil Velikov
909f2b6aa2 Update version to 17.2.0-rc6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-31 10:06:29 +01:00
Emil Velikov
3666d9ab99 egl/wayland: make sure HAS_$FORMAT is set for wl_dmabuf
Otherwise eglCreateWaylandBufferFromImageWL will fail, since we
have no "supported" format.

Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 9e07005e87)
2017-08-29 19:14:14 +01:00
Emil Velikov
5303f0c246 egl/wayland: set correct format with wl_dmabuf as wl_drm is missing
For most/all cases today, we have wl_drm available alongside wl_dmabuf.
Yet in the long run, we want to make sure the latter can operate without
any traces of the former.

Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit da100fe697)
2017-08-29 19:14:13 +01:00
Emil Velikov
8d51747dc0 egl/wayland: polish object teardown in dri2_wl_destroy_surface
The wl_drm wrapper is created before the wl display/surface ones.
Thus make sure we destroy it after them. In reality it should not make
any difference either way.

Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 1a8015e753)
2017-08-29 19:14:13 +01:00
Emil Velikov
3b41bef9a4 egl/wayland: plug leaks in dri2_wl_create_window_surface() error path
We forgot to teardown the wl display/surface wrappers.

Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 83442112d7)
2017-08-29 19:14:13 +01:00
Grazvydas Ignotas
e040bdbc8a radv: clear dynamic_shader_stages on create
Valgrind reports it's being used uninitialized.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 7780374833)
2017-08-29 19:14:13 +01:00
Dave Airlie
e4c4b0dbcb radv/wsi: Compute correct row_pitch for GFX9.
(commit split out by Bas Nieuwenhuizen)

Fixes: 65477bae9c "radv: enable GFX9 on radv"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9573bd70e1)
2017-08-29 19:14:13 +01:00
Emil Velikov
ca9817a6e8 egl: don't NULL deref the .get_capabilities function pointer
One could easily introduce version 3 of the DRI2fenceExtension,
extending the struct, while not implementing the above function.

Thus we'll end up with NULL pointer, and dereferencing it won't fare
too well.

Fixes: 0201f01dc4 ("egl: add EGL_ANDROID_native_fence_sync")
Cc: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit f0d053cb6d)
2017-08-29 19:14:13 +01:00
Bas Nieuwenhuizen
df8e65feb7 radv: Fix sparse BO mapping merging.
If we merge a mapping with the mapping before it, we also need
to not only change the offset, but also the bo offset.

Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 9b7e663da1)
2017-08-29 19:14:13 +01:00
Bas Nieuwenhuizen
ec94b874c4 radv: Fix off by one in MAX_VBS assert.
e.g. 0 + 32 <= 32 should be valid.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit fba0e07869)
2017-08-29 19:14:13 +01:00
Bas Nieuwenhuizen
17e5b0f1db radv: Don't set a new subpass on compute resolve.
We don't use the render path so totally unneeded.

Fixes: 19be95f71e "radv: add subpass resolve compute path"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit bd81cb3206)
2017-08-29 19:14:13 +01:00
Bas Nieuwenhuizen
69c793c0ce radv: Remove some intel comments from the resolve code.
These are clearly not applicable to radv.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e5c4e10769)
2017-08-29 19:14:13 +01:00
Dave Airlie
b29987ace4 radv: don't crash if we have no framebuffer
Recording secondaries with no framebuffer attachment may
make this happen, though this might not be the complete solution.

(esp if someone does meta stuff in there, would we have to
save things, not sure).

Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4a091b0788)
2017-08-29 19:14:13 +01:00
Dave Airlie
22c1e2e966 radv: fix predication on gfx9
When I added gfx9 I did it wrong, this fixes it.

Fixes: 5247b311e9 "radv/gfx9: fix set predication packet."
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 12fd0f8dc1)
2017-08-29 19:14:13 +01:00
Timothy Arceri
f01cd4feeb mesa: fix ES only draw if we have vertex positions
This code was separated from the validation code so it could
use used with KHR_no_error paths. The return values were inverted
to reflect the name of the helper, but here the condtion was
mistakenly inverted rather than the return value.

Fixes: 4df2931a87 (mesa/vbo: move some Draw checks out of validation)

Reported-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a4635c84dc)
2017-08-29 19:14:13 +01:00
Daniel Stone
c77ae9744e egl: Add dma_buf_import_modifiers for glvnd
Make sure we advertise the new entrypoints to libglvnd's EGL dispatch.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reported-by: Emmanuel Gil Peyrot <emmanuel.peyrot@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101982
Fixes: 4c412293d0 ("egl: advertise EGL_EXT_image_dma_buf_import_modifiers")
(cherry picked from commit 85ef0215dd)
2017-08-29 19:14:13 +01:00
Daniel Stone
5fdf832235 egl: Update headers from Khronos
Taken from egl-registry 7d68647c4dab.

Signed-off-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 2eee03b7a1)
2017-08-29 19:14:13 +01:00
Emil Velikov
46573f1028 util: move string_to_uint_map to glsl
The functionality is used by glsl and mesa. With the latter already
depending on the former.

With this in place the src/util/ static library libmesautil.la no longer
has a C++ dependency. Thus objects which use it (like libEGL) don't need
the C++ link.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101851
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: James Harvey <lothmordor@gmail.com>
(cherry picked from commit 0ac78dc925)
2017-08-29 19:14:13 +01:00
Jason Ekstrand
1dfbfad0b9 nir: Fix system_value_from_intrinsic for subgroups
A couple of the cases were backwards

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 63e79a8a77)
2017-08-29 19:14:13 +01:00
Ilia Mirkin
289b2a8788 st/mesa: fix handling of vertex array double inputs
The is_double_vertex_input needs to be set for arrays of doubles as
well.

Fixes KHR-GL45.enhanced_layouts.varying_array_locations

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ae53bff8b1)
2017-08-29 19:14:13 +01:00
Ilia Mirkin
1960d8f0dd glsl: fix counting of vertex shader output slots used by explicit vars
The argument to count_attribute_slots should only be set to true for
vertex inputs, not for all vertex shader varyings.

Fixes KHR-GL45.enhanced_layouts.varying_locations

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit eefeff09a7)
2017-08-29 19:14:13 +01:00
Christian Gmeiner
2c964f68a8 etnaviv: use correct param for etna_compatible_rs_format(..)
Found by code inspection.

Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 67fc3e37a7)
2017-08-29 19:14:13 +01:00
Kai Chen
5d5a13e375 egl/wayland: Use roundtrips when awaiting buffer release
In get_back_bo, we use wl_display_dispatch_queue() to block and wait for
a buffer release event. However, not all Wayland compositors flush the
client socket on posting a buffer-release event, so by only blocking
client-side, we may block indefinitely, or at least need to wait for an
input event / frame completion to arrive for the compositor to flush.

We now use dispatch_queue as a first pass, but if our entire buffer pool
is exhausted, use a roundtrip (an immediately-triggered wl_callback) to
ensure that the compositor flushes out our release event immediately.

[daniels: Modified comment and commit message.]

Signed-off-by: Kai Chen <kai.chen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 151188d1e3)
2017-08-29 19:14:13 +01:00
Dave Airlie
9d6d567046 radv/gfx9: gfx9 has buffer sizing rules like pre-VI.
This fixes:
dEQP-VK.robustness.buffer_access.* on GFX9.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 19f6906c1e)
2017-08-29 19:14:13 +01:00
Bas Nieuwenhuizen
6f0d910617 ac/nir: Cast sources of integer ops to int.
The int32->float semantic conversion got dropped in a testcase,
because the src was already float. On closer inspection I decided
to add a few more casts for integer op operands to be safe too.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 43595db302)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/common/ac_nir_to_llvm.c
2017-08-29 19:14:13 +01:00
Ilia Mirkin
a0b3095bb6 nv50/ir: properly set sType for TXF ops to U32
All of the coordinates and LOD args are integers for TXF. This mostly
doesn't matter, except for converting into a levelZero=true operation by
removing an explicit zero LOD. For the comparison against zero to work
properly, the sType of the instruction has to be set correctly.

Fixes: KHR-GL45.robust_buffer_access_behavior.texel_fetch
Reported-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 96be442b77)
2017-08-29 19:14:13 +01:00
Dave Airlie
c25e896810 radv/gfx9: don't expose linear depth on vega.
This just zeros out the linear flags for gfx9 + depth formats.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8985ad494b)
2017-08-29 19:14:13 +01:00
Dave Airlie
9172c7744a radv: don't degrade tiling mode for small compressed or depth texture.
This is what radeonsi does, so we should do the same, also vega
doesn't support linear depth textures anyways.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 5d26e0baf2)
2017-08-29 19:14:12 +01:00
Dave Airlie
adce678550 radv/gfx9: only minify image view width/height/depth before gfx9.
For gfx9 the addressing for images has changed, so we need to
provide the hw with the level0, however we still need to scale
for format block differences (so our compressed upload paths still
work).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit bae7723e13)
2017-08-29 19:14:12 +01:00
Dave Airlie
7f3555951e radv/image: don't rescale width/height if the format isn't changing
If the image view has the same format, we don't need to rescale
the w/h.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a74d987431)
2017-08-29 19:14:12 +01:00
Dave Airlie
bc39869414 radv: cleanup some image view descriptor setup.
Avoid passing the vulkan image creation into the image view descriptor
setup. This cleans up the usage of range inside the init, instead
using the properly inited values in the image view.

This is just a cleanup but some future vega changes will depend on it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 5378b5d071)
2017-08-29 19:14:12 +01:00
Dave Airlie
93aee55a7f radv/gfx9: emit sx_mrt_blend registers
GFX9 needs the SX MRT blend registers programmed, port over
the code from radeonsi to workout the values from the blend
state, and program the registers on rbplus systems.

This fixes lots of:
dEQP-VK.pipeline.blend.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 9c080100d3)
2017-08-29 19:14:12 +01:00
Dave Airlie
996e3239c9 radv: bump space check for indexed draw.
For the GFX9 packet we need one more dword.

Fixes an assert in:
dEQP-VK.draw.shader_draw_parameters.base_vertex.draw_indexed

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 864eb18527)
2017-08-29 19:14:12 +01:00
Dave Airlie
9b856ea323 radv/gfx9: fixup db/stencil disable.
This fixes disabled Z/stencil.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d987b4ab9e)
2017-08-29 19:14:12 +01:00
Dave Airlie
d8f8689101 radv/gfx9: fix level count in color register setup.
There was an off by one here.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 11834195e9)
2017-08-29 19:14:12 +01:00
Dave Airlie
dbca342014 radv/gfx9: use total levels in texture descriptor
We need to use all the levels when filling out the gfx9
descriptor.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit df09f1f3cd)
2017-08-29 19:14:12 +01:00
Kenneth Graunke
c0c03d0003 i965: Make a BRW_NEW_FAST_CLEAR_COLOR dirty bit.
When changing fast clear colors, we need to emit new SURFACE_STATE
with the updated color at the next draw call.

Most things work today because the atoms that handle SURFACE_STATE
for images (mutable images, textures, render targets) also listen to
BRW_NEW_BLORP, causing us to re-emit these on every BLORP operation.
However, this is overkill - most BLORP operations don't require us
to re-emit SURFACE_STATE.

One case where this is broken today is a fast clear to a different
color followed by a non-coherent framebuffer fetch.  The renderbuffer
read atom doesn't listen to BRW_NEW_BLORP, and would not get the new
fast clear color.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 54c41af0aa)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_tcs_surface_state.c
2017-08-29 19:14:12 +01:00
Marek Olšák
9013f80f67 radeonsi: emit VGT_REUSE_OFF in the right place
clip_regs aren't marked dirty when writes_viewport_index is changed.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 8dadb07790)
2017-08-29 19:14:12 +01:00
Marek Olšák
d93e6a0409 radeonsi/gfx9: properly handle imported textures with unexpected swizzle mode
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit fdef2f0fd1)
[Emil Velikov: r600_surface_import_metadata is not separate function,
tile_swizzle isn't in branch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/radeon/r600_texture.c
2017-08-29 19:14:12 +01:00
Marek Olšák
cf2f05ff91 radeonsi/gfx9: add a temporary workaround for a tessellation driver bug
The workaround will do for now. The root cause is still unknown.

This fixes new piglit: 16in-1out

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 166823bfd2)
2017-08-29 18:37:57 +01:00
Jason Ekstrand
91715a3681 i965/clear: Quantize the depth clear value based on the format
In f9fd976e8a we changed the clear value to be stored as an
isl_color_value.  This had the side-effect same clear value check is now
happening directly between the f32[0] field of the isl_color_value and
ctx->Depth.Clear.  This isn't what we want for two reasons.  One is that
the comparison happens in floating point even for Z16 and Z24 formats.
Worse than that, ctx->Depth.Clear is a double so, even for 32-bit float
formats, we were comparing as doubles and not floats.  This means that
the test basically always fails for anything other than 0.0f and 1.0f.
This caused a slight performance regression in Lightsmark 2008 because
it was using a depth clear value of 0.999 which can't be stored in a
32-bit float so we were doing unneeded resolves.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/101678
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0ae9ce0f29)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/mesa/drivers/dri/i965/brw_clear.c
2017-08-29 18:37:57 +01:00
Rob Herring
9ba7693a9a Android: Fix LLVM duplicated symbols linking for N and M
Both statically linking libLLVMCore and dynamically linking libLLVM causes
duplicated symbols in gallium_dri.so and it fails to dlopen. We don't
really need to link libLLVMCore, but just need generated headers to be
built first. Dynamically linking to libLLVM instead is enough to do
that. Thanks to Qiang Yu for finding the root cause.

With this change, we can align all versions and just have libLLVM as a
shared lib dependency.

This also requires changes in the M and N versions of LLVM to export the
include paths for libLLVM. AOSP master is okay.

Fixes: 26aee6f4d5 ("Android: rework LLVM build support")
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Qiang Yu <Qiang.Yu@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
(cherry picked from commit 4734bfc02a)
2017-08-29 18:37:57 +01:00
Samuel Pitoiset
1f8cb4c243 radeonsi: update non-resident bindless descriptors if needed
Only resident bindless descriptors are currently updated and
re-uploaded, this makes sure that the non-resident ones are
also updated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2843c5d15c)
2017-08-29 18:37:57 +01:00
Topi Pohjolainen
da4d1fd084 intel/blorp: Adjust intra-tile x when faking rgb with red-only
v2 (Jason): Adjust directly in surf_fake_rgb_with_red()

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101910

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 393ec1a507)
2017-08-29 18:37:57 +01:00
Dave Airlie
7789126929 ac/nir: fixup layer/viewport export for GFX9.
GFX9 moved where the viewport index export goes.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b040f51b61)
2017-08-29 18:37:57 +01:00
Christoph Haag
6389526ce5 mesa: only copy requested compressed teximage cubemap faces
This is analogous to commit 2259b11 which only fixed the regular case

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102308
Signed-off-by: Christoph Haag <haagch+mesadev@frickel.club>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 87556a650a)
2017-08-29 18:37:57 +01:00
Jason Ekstrand
2813e83f6c i965/tex: Don't pass samples to miptree_create_for_teximage
In 76e2f390f9, when Topi switched num_samples from 0 to 1 for
single-sampled, he accidentally switched the last parameter in the call
to miptree_create_for_teximage from 0 to 1 thinking it was num_samples
when it was actually layout_flags.  Switching from 0 to 1 added the
MIPTREE_LAYOUT_ACCELERATED_UPLOAD flag which causes us to allocate a
busy BO instead of an idle one.  This caused the subsequent CPU upload
to consistently stall.  The end result was a 15% performance drop in the
SynMark v7 DrvRes microbenchmark.  This restores the old behavior and
fixes the performance regression.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes: 76e2f390f9
Bugzilla: https://bugs.freedesktop.org/102260
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit f24cf82d6d)
2017-08-29 18:37:57 +01:00
Jason Ekstrand
ea5c466c91 i965/miptree: Return NONE from texture_aux_usage when fully resolved
This little optimization improves the performance of SynMark v7
TexFilterTri by almost 10% on Sky Lake GT4 among other improvements.
We've been doing it for some time but somehow it got dropped during
the miptree refactoring.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/102258
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 61d2f3f1c2)
2017-08-29 18:37:57 +01:00
Jason Ekstrand
d174214218 i965: Stop looking at NewDriverState when emitting 3DSTATE_URB
Looking at NewDriverState is not safe in general.  The state atom system
is set up to ensure that new bits that get added to NewDriverState get
accumulated into the set of bits used when emitting atoms but it doesn't
go the other way.  If we read NewDriverState, we may not get the full
picture because the per-pipeline state (3D or compute) does not get
added to NewDriverState before state emit is done.  It's especially
dangerous to do this from BLORP (either explicitly or implicitly when
BLORP calls gen7_upload_urb) because that does not happen during one of
the normal state upload paths.

This commit solves the problem by whacking all of the per-shader-stage
URB sizes to zero whenever we change the total URB size.  We still have
to flag BRW_NEW_URB_SIZE to ensure that the gen7_urb atom triggers but
the actual decision in gen7_upload_urb can now be based entirely on URB
sizes rather than on state atoms.  This also makes BLORP correct because
it just asks for a new URB config whenever the vsize is too small and so
any change to the total URB size will trigger blorp to re-emit as well
because 0 < vs_entry_size.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/102289
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d5e217dbfd)
2017-08-29 18:37:57 +01:00
Kenneth Graunke
a14dd9fff6 i965: Mark all EGLimages as non-coherent.
EGLimages are shared with external users, and we don't know what they're
going to do with them.  They might scan them out.  They might access
them in a way that doesn't work with our explicit clflushing.

It's safest to simply mark them non-coherent.

Chris Wilson caught this problem and wrote a similar (though less
aggressive) patch to solve it; the miptree code has since undergone
a lot of refactoring so I had to rewrite it.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit bc56dfbf3f)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/mesa/drivers/dri/i965/intel_mipmap_tree.c
2017-08-29 18:37:57 +01:00
Jason Ekstrand
3e62ba0907 i965/miptree: Rework create flags
The only one of the three remaining flags that has anything whatsoever
to do with layout is TILING_NONE.  This commit renames them to
MIPTREE_CREATE_*, documents the meaning of each flag, and makes the
create functions take an actual enum type so GDB will print them nicely.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 24a0da338f)
2017-08-29 18:37:57 +01:00
Jason Ekstrand
1011fd22ae i965/miptree: Delete MIPTREE_LAYOUT_TILING_(Y|ANY)
The only force tiling flag we really care about is LAYOUT_TILING_NONE.
The others don't actually do anything but add confusion.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 55116839d9)
2017-08-29 18:37:57 +01:00
Jason Ekstrand
dd03cb5c5e i965/miptree: Delete MIPTREE_LAYOUT_FOR_SCANOUT
The flag hasn't affected actual surface layout for some time.  The only
purpose it served was to set bo->cache_coherent = false on the BO used
to create the miptree.  This is fairly silly because we can just set
that directly from the caller where it makes much more sense.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a5a673dfa7)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/mesa/drivers/dri/i965/intel_mipmap_tree.c
2017-08-29 18:37:57 +01:00
Jason Ekstrand
046730b8a3 i965/miptree: Delete some unused layout flags
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 2bca18be44)
2017-08-29 18:37:57 +01:00
Jason Ekstrand
f83d979759 i965/miptree Remove layout_flags parameter form is_mcs_supported
The one caller of is_mcs_supported passes 0 in as the layout_flags
unconditionally.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 0e4d9a4b37)
2017-08-29 18:36:14 +01:00
Dave Airlie
2cdb51e76d radv: disable texture gather workaround on gfx9.
Not required anymore.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4c02e2bd95)
[Emil Velikov: chip_class is in ctx::options, not ctx::abi]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-29 16:18:49 +01:00
Emil Velikov
15f23fb855 Update version to 17.2.0-rc5
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-08-21 11:49:19 +01:00
Emil Velikov
dd4fafcfda cherry-ignore: ignore storage offset fixes
For stable the regressions were addressed by reverting the offending
commits - see previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-19 02:57:19 +01:00
Samuel Pitoiset
17a3e4891b Revert "mesa: stop assigning unused storage for non-bindless opaque types"
This reverts commit fcbb93e860 and
also  commit 7c5b204e38 to avoid
compilation errors.

Basically, the parameter indexes look wrong when a non-bindless
sampler is declared inside a nested struct (because it is skipped).
I think it's safer to just restore the previous behaviour which is
here since ages and also because the initial attempt is only a
little performance improvement.

This fixes a regression with
ES2-CTS.functional.shaders.struct.uniform.sampler_nested*.

Note: this patch is only for 17.2 since the fixes in master are rather
invasive. Namely:

365d34540f mesa: correctly calculate the storage offset for i915
de0e62e106 st/mesa: correctly calculate the storage offset

Fixes: fcbb93e860 ("mesa: stop assigning unused storage for non-bindless
opaque types")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101983
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-08-19 02:57:19 +01:00
Jason Ekstrand
85cc3533fa i965/miptree: Set supports_fast_clear = false in make_shareable
The make_shareable function deletes the aux buffer and then whacks
aux_usage to ISL_AUX_USAGE_NONE but not unsetting supports_fast_clear.
Since we only look at supports_fast_clear to decide whether or not to do
fast clears, this was causing assertion failures.

Reported-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101925
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit e7a52cc381)
2017-08-19 02:57:19 +01:00
Eric Anholt
93bd5fbfe1 broadcom/vc4: Build the vc4_tiling_lt_neon.c with -mfpu=neon on ARM.
If you don't pass this, the compiler refuses to compile the assembly for
pre-v7 CPUs.  This also keeps us from building identical, non-NEON code on
aarch64 and x86.

Fixes: a373f77662 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.")

v2: Fix Android build by just appending NEON_C_SOURCES when
    ARCH_ARM_HAVE_NEON.

Tested-by: Rob Herring <robh@kernel.org>
(cherry picked from commit bd5efbd70b)
[Emil Velikov: address libvc4_la_LIBADD conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/vc4/Makefile.am
2017-08-19 02:57:19 +01:00
Eric Anholt
833f12abdf configure.ac: Introduce HAVE_ARM_ASM/HAVE_AARCH64_ASM and the -D flags.
I've been trying to get away without these conditionals in vc4's NEON
code, but it meant compiling extra unused code on x86, and build failing
on ARMv6.

v2: Use the _arm/_arm64 flags to simplify detection (suggested by Rob),
    but hide the _arm version under ARCH_ARM_HAVE_NEON to keep from trying
    to build this stuff for armv5te.

Tested-by: Rob Herring <robh@kernel.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit ba8533b6ea)
2017-08-19 02:57:19 +01:00
Eric Anholt
1fc44a3ba0 util: Fix build on old glibc.
We need to link librt for u_thread.h's clock_gettime() call.

Fixes: b822d9dd67 ("gallium/util: move u_queue.{c,h} to src/util")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b94ddc181b)
2017-08-19 02:57:19 +01:00
Scott D Phillips
6ccb75e869 intel/genxml: Fix gen10 BLEND_STATE variable length packing
BLEND_STATE packing was modified to be variable-length in:

 9670124e31 genxml: Make BLEND_STATE command support variable length array.

The initial gen10.xml still had the old, fixed-length style
definition for BLEND_STATE. So gen10_upload_blend_state would
overwrite the packed BLEND_STATE_ENTRYs with its own fixed array
of all-zero entries when packing BLEND_STATE. This caused
BLEND_STATE upload to not work at all.

Fixes: aa416f515a ("i965/genxml: Add gen10.xml")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit d6539608a4)
2017-08-19 02:57:19 +01:00
Chris Wilson
b4fec4271b i965: Always allow CPU readback of the scanout on LLC platforms
LLC platforms are magic in that reads from the CPU are always cache
coherent, or rather GPU writes that bypass LLC do still invalidate the
appropriate cache line.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 49eda75df6)
2017-08-19 02:57:19 +01:00
Ilia Mirkin
efb9b5740e glsl: add a few missing int64 constant propagation cases
Fixes KHR-GL45.shader_ballot_tests.ShaderBallotAvailability, which
causes some silly swizzles to appear, triggering this optimization to
get hit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 9c8f017f77)
2017-08-19 02:57:19 +01:00
Dave Airlie
e44393cb23 radv: disable support for VEGA for now.
I'm working on this, but I'm not sure I'll make 17.2 at this stage,
maybe 17.2.1.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 611076a41a)
2017-08-19 02:57:19 +01:00
Ilia Mirkin
81b5a6a85b nv50/ir: fix TXQ srcMask
src0.x is always read for the LOD, irrespective of which outputs are
read.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 934511d1f3)
2017-08-19 02:57:19 +01:00
Ilia Mirkin
5795e42116 nv50/ir: fix srcMask computation for TG4 and TXF
This affects which inputs are marked as used. In a situation where only
the texture instruction uses an input, it might have been ignored as
unused due to input masks.

Affects subtests of KHR-GL45.texture_cube_map_array.sampling

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 054c54d1be)
2017-08-19 02:57:19 +01:00
Frank Richter
42e8f3ccdf gallium/os: fix os_time_get_nano() to roll over less
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102241
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 7fb7287ce7)
2017-08-19 01:46:08 +01:00
Frank Richter
e452ed26ff st/wgl: check for negative delta in wait_swap_interval()
This can happen because of rollover.  See bug report for details.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102241
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit d90e05ad48)
2017-08-19 01:46:08 +01:00
Frank Richter
c7c6ba44ca st/mesa: fix a null pointer access
Fixes crash with llvmpipe on Windows.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102148
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 496a691e35)
2017-08-19 01:46:08 +01:00
Tim Rowley
bdab7c69f5 swr/rast: Fix invalid casting for calls to Interlocked* functions
CID: 1416243, 1416244, 1416255
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit b333bc753e)
[Emil Velikov: resolve trivial conflicts - LONG -> long]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/swr/rasterizer/core/api.cpp
	src/gallium/drivers/swr/rasterizer/core/threads.cpp
2017-08-19 01:45:47 +01:00
Ilia Mirkin
47507ec1fd glsl/ast: update rhs in addition to the var's constant_value
We continue in the code to do some more things with the rhs, including
setting a constant initializer. If the type is wrong, this causes some
confusion down the line, leading to assertions. This makes sure that the
rhs processing continues to flow as-if the type was correct to start
with (even though the state has been marked as an error state).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101766
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 978c4c597a)
2017-08-19 01:42:08 +01:00
Dave Airlie
3fb5cc2637 radv/gfx9: for fast clear use is_linear flag.
The legacy test won't work on gfx9.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 694d59fbaf)
2017-08-19 01:42:08 +01:00
David Airlie
42d6e7eba2 radv/gfx9: handle GFX9 opaque metadata
port the opaque metadata changes from radeonsi for gfx9.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e43cc3e3af)
2017-08-19 01:36:25 +01:00
David Airlie
f60ff71b6f radv: emit db_htile_surface reg on gfx9 as well
This is also a GFX9 register.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 674ecbfef2)
2017-08-19 01:36:25 +01:00
Dave Airlie
f70f216564 radv/gfx9: remove some leftover gfx6 descriptor setup.
We set this later in the non-gfx9 path, just remove these
bits from here.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit fc600eb98d)
2017-08-19 01:36:25 +01:00
Dave Airlie
827bf79052 radv/gfx9: fix set predication packet.
The predication packet changed format on GFX9, update the driver.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 5247b311e9)
2017-08-19 01:36:25 +01:00
Marek Olšák
2b10b3a417 radeonsi: disable CE by default
It makes performance worse by a very small (hard to measure) amount.
We've done extensive profiling of this feature internally.

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 1ab7fed707)
2017-08-19 01:36:25 +01:00
Scott D Phillips
27e7a3a5ef i965/blorp: Correct type of src_format in call to intel_miptree_texture_aux_usage
intel_miptree_texture_aux_usage() takes an isl_format, but we are
passing a mesa_format. clang warns:

 brw_blorp.c:305:52: warning: implicit conversion from enumeration
    type 'mesa_format' to different enumeration type
    'enum isl_format' [-Wenum-conversion]
       intel_miptree_texture_aux_usage(brw, src_mt, src_format);
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~              ^~~~~~~~~~

Fixes: fc1639e46d ("i965/blorp: Use texture/render_aux_usage for blits")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit f7dfc44c61)
2017-08-19 01:36:25 +01:00
Ilia Mirkin
da005f5566 nv50/ir: clean up saturated values immediately
Since we don't iterate to a fixed point, we can end up in situations
where we have a SAT instruction + a long immediate. This is not legal.
However since it's immediately computable, just run unary straight away
to handle the situation.

Fixes: 24a799ad35 ("nv50/ir: fix ConstantFolding with saturation")
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 165e18dd21)
2017-08-19 01:36:25 +01:00
Jeremy Huddleston Sequoia
cc8ae8842b glxcmds: Fix a typo in the __APPLE__ codepath
s/DummyContext/dummyContext/

Regressed-in: 5d9b50e596
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
(cherry picked from commit c1c4c18a80)
2017-08-17 15:48:20 -07:00
Emil Velikov
3165f9877e Update version to 17.2.0-rc4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-12 17:04:27 +01:00
Dave Airlie
1e11687029 radv: force cs/ps/l2 flush at end of command stream. (v2)
This seems like a workaround, but we don't see the bug on CIK/VI.

On SI with the dEQP-VK.memory.pipeline_barrier.host_read_transfer_dst.*
tests, when one tests complete, the first flush at the start of the next
test causes a VM fault as we've destroyed the VM, but we end up flushing
the compute shader then, and it must still be in the process of doing
something.

Could also be a kernel difference between SI and CIK.

v2: hit this with a bigger hammer. This fixes a bunch of hangs
in the vk cts with the robustness tests.

Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101334
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 82ba384c10)
2017-08-11 21:00:49 +01:00
Dave Airlie
ea595756f8 radv: fix MSAA on SI gpus.
This ports the workaround from radeonsi, that was missing in radv.

This fixes Talos rendering when MSAA is enabled on my Tahiti card.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8bf3930751)
2017-08-11 21:00:46 +01:00
Dave Airlie
ffd8120284 radv: fix f16->f32 denorm handling for SI/CIK. (v2)
This just copies the code from the -pro shaders,
and fixes the tests on CIK.

With this CIK passes the same set of conformance
tests as VI.

Fixes: 83e58b03 (radv: flush f32->f16 conversion denormals to zero. (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3f389f75b6)
2017-08-11 21:00:43 +01:00
Bas Nieuwenhuizen
9d65214f3d radv: Use the correct channel for alpha in resolve srgb conversion.
The argument here is a bitmask, so the old code selected .xy, which
got silently truncated to .x when constructing the vec4 from components,
instead of using .w.

Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit acba3a3151)
2017-08-11 21:00:40 +01:00
Bas Nieuwenhuizen
a57390cee0 radv: Only convert linear->srgb in compute resolves.
It justs works with the fragment shader resolve, so no need to do
a custom conversion. In fact with SRGB dest, it actually gives
wrong results.

Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 15e5a7a683)
2017-08-11 21:00:37 +01:00
Bas Nieuwenhuizen
8b706102eb radv: Don't use SRGB format for image stores during resolve.
These seem to store very bogus results. Luckily there is some code
that converts srgb->linear already, so just making the descriptor
format UNORM should work.

Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8286c3a49f)
2017-08-11 21:00:34 +01:00
Marek Olšák
7f5d86ebaa radeonsi/gfx9: use the VI codepath for clamping Z
This fixes corrupted shadows in Unigine Valley.
The corruption disappeared when I stopped setting IMG_DATA_FORMAT_24_8
for depth.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 27fef5d52d)
2017-08-11 20:59:31 +01:00
Kenneth Graunke
1c1653d7b0 isl: Validate row pitch of stencil surfaces.
Also, silence an obnoxious finishme that started occurring for all
GL applications which use stencil after the i965 ISL conversion.

v2: Check against 3DSTATE_STENCIL_BUFFER's pitch bits when using
    separate stencil, and 3DSTATE_DEPTH_BUFFER's bits when using
    combined depth-stencil.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 5563872dbf)
2017-08-11 20:59:28 +01:00
Emil Velikov
d4100b0d09 egl: avoid eglCreatePlatform*Surface{EXT,} crash with invalid dpy
If we have an invalid display fed into the functions, the display lookup
will return NULL. Thus as we attempt to get the platform type, we'll
deref. it leading to a crash.

Keep in mind that this will not happen if Mesa is built without X11 or
when the legacy eglCreate*Surface codepaths are used.

A similar check was added with earlier commit 5e97b8f5ce ("egl: Fix
crashes in eglCreate*Surface), although it was only applicable when the
surfaceless platform is built.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 26fbb9eacd)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/egl/main/eglapi.c
2017-08-11 20:59:02 +01:00
Tim Rowley
bb6e5e5476 configure: remove trailing "-a" in swr architecture test
Fixes "configure: line 27326: test: argument expected"

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 4d9b0dcccb)
2017-08-11 20:58:19 +01:00
Marek Olšák
08d49e074d ac: fail shader compilation if libelf is replaced by an incompatible version
UE4Editor has this issue.

This commit prevents hangs (release build) or assertion failures (debug
build). It doesn't fix the editor, but catastrophic scenarios are
prevented.

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4630ede102)
2017-08-11 20:58:14 +01:00
Karol Herbst
75f5abb82f nv50/ir: fix ConstantFolding with saturation
For mul(a, +-1) codegen can generate OP_MOV with a saturation flag
set which is ignored at emission. The same can happen with add(a, 0),
and others.

Adding an assert for detecting more of such issues.

Fixes wrongly rendered water in Hitman Absolution running under wine.
Also a few shaders in Mad Max and Alien Isolation produce such MOVs.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: generalize the fix for other cases]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 24a799ad35)
2017-08-11 20:57:54 +01:00
Alex Smith
f0b6298c05 radv: Fix decompression on multisampled depth buffers
Need to take the sample count into account in the depth decompress and
resummarize pipelines and render pass.

Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2e9a13bf22)
2017-08-11 20:57:50 +01:00
Jason Ekstrand
ac04187e33 i965/miptree: Call alloc_aux in create_for_bo
Originally, I had moved it to the caller to make some things easier when
adding the CCS modifier.  However, this broke DRI2 because
intel_process_dri2_buffer calls intel_miptree_create_for_bo but never
calls intel_miptree_alloc_aux.  Also, in hindsight, it should be pretty
easy to make the CCS modifier stuff work even if create_for_bo allocates
the CCS when DISABLE_AUX is not set.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8e5808fc0c)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/mesa/drivers/dri/i965/intel_mipmap_tree.c
2017-08-11 20:57:36 +01:00
Jason Ekstrand
b1514579c2 intel/isl: Don't align the height of the last array slice
We were calculating the total height of 2D surfaces by multiplying the
row pitch by the number of slices.  This means that we actually request
slightly more space than actually needed since the padding on the last
slice is unnecessary.  For tiled surfaces this is not likely to make a
difference.  For linear surfaces, on the other hand, this means we may
require additional memory.  In particular, this makes the i965 driver
reject EGL imports of buffers which do not have this extra padding.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4d27c6095e)
2017-08-11 20:52:22 +01:00
Jason Ekstrand
dc63c715cb intel/isl: Stop padding surfaces
The docs contain a bunch of commentary about the need to pad various
surfaces out to multiples of something or other.  However, all of those
requirements are about avoiding GTT errors due to missing pages when the
data port or sampler accesses slightly out-of-bounds.  However, because
the kernel already fills all the empty space in our GTT with the scratch
page, we never have to worry about faulting due to OOB reads.  There are
two caveats to this:

 1) There is some potential for issues with caches here if extra data
    ends up in a cache we don't expect due to OOB reads.  However,
    because we always trash the entire cache whenever we need to move
    anything between cache domains, this shouldn't be an issue.

 2) There is a potential issue if a surface gets placed at the very top
    of the GTT by the kernel.  In this case, the hardware could
    potentially end up trying to read past the top of the GTT.  If it
    nicely wraps around at the 48-bit (or 32-bit) boundary, then this
    shouldn't be an issue thanks to the scratch page.  If it doesn't,
    then we need to come up with something to handle it.

Up until some of the GL move to ISL, having the padding code in there
just caused us to harmlessly use a bit more memory in Vulkan.  However,
now that we're using ISL sizes to validate external dma-buf images,
these padding requirements are causing us to reject otherwise valid
images due to the size of the BO being too small.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c15b92ce11)
2017-08-11 20:52:19 +01:00
Jason Ekstrand
9d9ea2c5a4 anv/formats: Allow sampling on depth-only formats on gen7
We can't sample from depth-stencil formats but on gen7 but we can sample
from depth-only formats.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102024
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 06d3115bb9)
2017-08-11 20:52:16 +01:00
Dave Airlie
07b5c78836 radv: avoid GPU hangs if someone does a resolve with non-multisample src (v2)
This is a bug in the app, but I'd rather avoid hanging the GPU,
esp if someone is running in validation and it takes out their
development environment.

v2: get it right, reverse the polarity.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 36a1b61321)
2017-08-11 20:52:13 +01:00
Emil Velikov
59f7fdb85e egl/x11: don't leak xfixes_query in the error path
If we get a xfixes v1.x we'll error out, without freeing the
xfixes_query reply.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit c961b679fe)
2017-08-11 20:52:08 +01:00
Emil Velikov
29df4deef2 Update version to 17.2.0-rc3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-07 12:45:40 +01:00
Jason Ekstrand
e4371d14f1 anv: Stop advertising VK_KHX_multiview
We don't want to advertise experimental extensions in actual releases.
However, there's no harm in leaving the code lying around in the tree.
2017-08-05 00:09:26 +01:00
Tomasz Figa
0b2c034f64 st/dri: enable 32-bit RGBX/RGBA formats only on Android
X/GLX can't handle them. This removes almost 500 GLX visuals that were
incorrectly exposed.

This is a less invasive version of Marek's .getCapability series.
Note: the patch is not applicable for master, but only for the 17.2
branch.

Suggested-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f33d8af7aa "st/dri: add 32-bit RGBX/RGBA formats"
[Emil Velikov: commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-05 00:09:26 +01:00
Tim Rowley
f4e163094d swr/rast: fix scons gen_knobs.h dependency
Copy/paste error was duplicating a gen_knobs.cpp rule.

Fixes: 5079c277b5 ("swr: [scons] Fix windows build")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit e4a6ae06cf)
2017-08-05 00:09:26 +01:00
Timothy Arceri
4a181e6244 mesa/st: fix conditional jump depends on uninitialised value
Reported by valgrind at:
glsl_to_tgsi_visitor::visit(ir_expression*) (st_glsl_to_tgsi.cpp:1560)

When compiling the Deus Ex shaders.

Fixes: 28a5e7104 ("st/glsl_to_tgsi: handle precise modifier")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 06237fc9e1)
2017-08-05 00:09:26 +01:00
Scott D Phillips
8ef9fe7229 gles: Restore some lost typedefs
GLES/gl.h has historically provided some typedefs that are not
used in the API itself. Restore these typedefs that were lost to
avoid breaking applications.

These seem to be the only typedefs removed in the update.

Fixes: 7fd0817 "Update Khronos-supplied headers"

[Eric: added a big warning to revert this patch when pulling the updated header]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 3db05ed1d1)
2017-08-05 00:09:26 +01:00
Dave Airlie
4f872e62c2 Revert "st_glsl_to_tgsi: rewrite rename registers to use array fully."
This reverts commit 3008161d28,
which caused a regression for VMWare.

The initial code had some recursion in it, that I removed by accident
trying to add back the recursion broke lots of things, take the high
road and revert for now.

Fixes: 3008161d (st_glsl_to_tgsi: rewrite rename registers to use array fully.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b8bea9a050)
2017-08-05 00:09:26 +01:00
Dave Airlie
c5fac38ced radv: handle 10-bit format clamping workaround.
This fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*
for a2r10g10b10 formats as destination on SI/CIK hardware.

This adds support to the meta program for emitting 10-bit
outputs, and adds 10-bit support to the fragment shader key.

It also only does the int8/10 on SI/CIK.

Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit df61a05019)
2017-08-05 00:09:26 +01:00
Bas Nieuwenhuizen
579ecfd91e radv: Don't underflow non-visible VRAM size.
In some APU situations the reported visible size can be larger than
VRAM size. This properly clamps the value.

Surprisingly both CTS and spec seem to allow a heap type with size 0,
so this seemed like the easiest option to me.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 4ae84efbc5 "radv: Use enum for memory heaps."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 8229706ad8)
2017-08-05 00:09:26 +01:00
Bruce Cherniak
6b279b3271 st/osmesa: add osmesa framebuffer iface hash table per st manager
Commit bbc29393d3 didn't include osmesa state_tracker.  This patch adds
necessary initialization.

Fixes crash in OSMesa initialization.

Created-by: Charmaine Lee <charmainel@vmware.com>
Tested-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9966c85e01)
2017-08-05 00:09:26 +01:00
Dave Airlie
6efb8d79a9 intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed.
If dual object compile fails (as seems to happen with virgl a
fair bit, and does piglit even have any tests for it?), we end up
not restarting the pull params, so we call
vec4_visitor::move_uniform_array_access_to_pull_constant
a second time and it runs over the ends of the alloc.

Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test
running inside virgl on ivybridge.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 271fa3a684)
2017-08-05 00:09:25 +01:00
Chris Wilson
795b712bd7 i965/blit: Remember to include miptree buffer offset in relocs
Remember to add the offset to the start of the buffer in the relocation
or else we write 0xff into random bytes elsewhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fb63c43fd1)
2017-08-05 00:09:25 +01:00
Kenneth Graunke
43a2b178c2 i965: Delete pitch alignment assertion in get_blit_intratile_offset_el.
The cacheline alignment restriction is on the base address; the pitch
can be anything.

Fixes assertion failures when using primus (say, on glxgears, which
creates a 300x300 linear BGRX surface with a pitch of 1200):

intel_blit.c:190: get_blit_intratile_offset_el: Assertion `mt->surf.row_pitch % 64 == 0' failed.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 595a47b829)
2017-08-05 00:09:25 +01:00
Jason Ekstrand
d5def4f5a9 spirv: Fix SpvImageFormatR16ui
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.1 17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 95c6a97464)
2017-08-05 00:09:25 +01:00
Thomas Hellstrom
f2a60ff20a dri3: Wait for all pending swapbuffers to be scheduled before touching the front
This implements a wait for glXWaitGL, glXCopySubBuffer, dri flush_front and
creation of fake front until all pending SwapBuffers have been committed to
hardware. Among other things this fixes piglit glx-copy-sub-buffers on dri3.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 185ef06fd2)
2017-08-05 00:09:25 +01:00
Nicolai Hähnle
3c8673d420 gallium/radeon: fix ARB_query_buffer_object conversion to boolean
The issue here is that the immediate is treated as a 64-bit value,
and fetching it does not work reliably with swizzles that are different
from xy and zw.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit da83687c4b)
2017-08-05 00:09:25 +01:00
Dave Airlie
381ccaa1cb radeon/ac: use ds_swizzle for derivs on si/cik.
This looks like it's supported since llvm 3.9 at least,
so switch over radeonsi and radv to using it, -pro also
uses this. We can now drop creating lds for these operations
as the ds_swizzle operation doesn't actually write to lds at all.

Acked-by: Marek Olšák <marek.olsak@amd.com>
(stable requested due to fixing radv CIK conformance tests)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cb6f16dce9)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/common/ac_nir_to_llvm.c
2017-08-05 00:09:25 +01:00
Connor Abbott
2dd6030fbb ac/nir: fix lsb emission
This makes it match radeonsi. The LLVM backend itself will emit the
correct instruction, but LLVM might do incorrect optimizations since it
thinks the output is undefined when the input is 0, even though it's not
supposed to be. We really need a new intrinsic, or for the backend to
become smarter and recognize this pattern.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
(cherry picked from commit 6d731c5651)
2017-08-05 00:09:25 +01:00
Connor Abbott
a90d99f7a5 nir: fix algebraic optimizations
The optimizations are only valid for 32-bit integers. They were
mistakenly firing for 64-bit integers as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit de91461575)
2017-08-05 00:09:25 +01:00
Marek Olšák
6806773905 Revert "st/mesa: release sampler views when redefining a texture in st_context_teximage"
This reverts commit 5c1241268b.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101961

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d85802e501)
2017-08-05 00:09:25 +01:00
Nicolai Hähnle
3e872f4b53 radeonsi: ensure that temp array allocas are in the entry block
Otherwise, code generation fails. This has become necessary since some
shaders are wrapped in control flow.

Fixes: 081ac6e5c6 ("radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2879a602dd)
2017-08-05 00:09:25 +01:00
Samuel Pitoiset
21ea75b3e9 mesa: fix mismatch when returning 64-bit bindless uniform handles
The slower convert-and-copy process performs a bad conversion
because it converts the value to signed 64-bit integer, but
bindless uniform handles are considered unsigned 64-bit.

This fixes "Check glUniform*() with mixed texture units/handles"
from arb_bindless_texture-uniform piglit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b38c9c57f2)
2017-08-05 00:09:25 +01:00
Emil Velikov
58fe86a6d6 Update version to 17.2.0-rc2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-31 10:52:13 +01:00
Samuel Pitoiset
d466a70532 st/glsl_to_tgsi: fix getting the image type for array of structs
Since array splitting for AoA is disabled, we have to retrieve
the type of the first non-array type when an array of images is
declared inside a structure. Otherwise, it will hit an assert
in glsl_type::sampler_index() because it expects either a sampler
or an image type.

This fixes a regression in the following piglit test:
arb_bindless_texture/compiler/images/arrays-of-struct.frag

Fixes: 57165f2ef8 ("glsl: disable array splitting for AoA")
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f99e9335e2)
2017-07-31 10:26:27 +01:00
Marek Olšák
e62eddcdbe st/mesa: release sampler views when redefining a texture in st_context_teximage
Noticed randomly.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 5c1241268b)
2017-07-31 10:25:55 +01:00
Dave Airlie
6d07e58afb radv: for stencil only set Z tile mode index to same value
On SI this was causing a hang in
dEQP-VK.pipeline.render_to_image.core.2d_array.mipmap.r16g16_sint_s8_uint

This was due to not handling the tile mode index for depth like
I fixed previously for new GPUs.

Fixes: 01d0c5a9 (radv: fix stencil regression since new addrlib import)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 800d162209)
2017-07-31 10:24:42 +01:00
Dave Airlie
f9e563597d virgl: drop precise modifier.
The host doesn't understand this yet, so drop it for now.

Fixes: virgl regressions.

Fixes: af22adee4f (tgsi: add precise flag to tgsi_instruction)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 554aa09440)
2017-07-31 10:23:09 +01:00
Marek Olšák
5b61ba4432 radeonsi: update dirty_level_mask only when flushing or unbinding framebuffer
This fixes corruption with bindless textures in Dawn Of War 3.

The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit f4d095cc65)
2017-07-31 10:21:17 +01:00
Marek Olšák
6625382b1c st/mesa: always unconditionally revalidate main framebuffer after SwapBuffers
This fixes the black Feral launcher window.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101867

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
(cherry picked from commit 7257c171e9)
2017-07-31 10:20:44 +01:00
Nicolai Hähnle
2bca74253d radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)
With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.

This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.

Fixes GL45-CTS.geometry_shader.rendering.rendering.*

v2: ensure that the TCS epilog doesn't run for non-existing patches

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 081ac6e5c6)
2017-07-31 10:20:20 +01:00
Nicolai Hähnle
b36ff2d1f2 radeonsi/gfx9: fix vertex idx in ES with multiple waves per threadgroup
Cc: mesa-stable@lists.freedesktop.org
Reviewed: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 873789002f)
2017-07-31 10:20:12 +01:00
George Kyriazis
99b2613ce1 swr: fix transform feedback logic
The shader that is used to copy vertex data out of the vs/gs shaders to
the user-specified buffer (streamout or SO shader) was not using the
correct offsets.

Adjust the offsets that are used just for the SO shader:
- Make sure that position is handled in the same special way
  as in the vs/gs shaders
- Use the correct offset to be passed in the core
- consolidate register slot mapping logic into one function, since it's
  been calculated in 2 different places (one for calcuating the slot mask,
  and one for the register offsets themselves

Also make room for all attibutes in the backend vertex area.

Fixes:
- all vtk GL2PS tests
- 18 piglit tests (16 ext_transform_feedback tests,
  arb-quads-follow-provoking-vertex and primitive-type gl_points

v2:

- take care of more SGV slots in slot mapping logic
- trim feState.vsVertexSize
- fix GS interface and incorporate GS while calculating vsVertexSize

Note that vsVertexSize is used in the core as the one parameter that
controls vertex size between all stages, so it has to be adjusted appropriately
for the whole vs/gs/fs pipeline.

Also note that GS and SO is not fully implemented.  This will be addressed
later.

fixes:
- fixes total of 20 piglit tests

CC: 17.2 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit 194ff5eed1)
2017-07-31 10:19:55 +01:00
Dave Airlie
f9c7549605 radv/ac: port SI TC L1 write corruption fix.
This ports 72e46c988 to radv.
    radeonsi: apply a TC L1 write corruption workaround for SI

Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e77ff11ffe)
2017-07-27 19:54:53 +01:00
Dave Airlie
2ce4f0afd3 radv/ac: realign SI workaround with radeonsi.
This ports: da7453666a
radeonsi: don't apply the Z export bug workaround to Hainan
to radv.

Just noticed in passing.

Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a81e99f50a)
2017-07-27 19:54:52 +01:00
Dave Airlie
546282e8bc radv: only report external semaphore info for opaque fd.
Until we support sync fd, don't report the info.

Fixes CTS dEQP-VK.api.external.semaphore.sync_fd.* from crashing.

Fixes: eaa56eab6 (radv: initial support for shared semaphores (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6cbc8cf178)
2017-07-27 19:54:52 +01:00
Dave Airlie
de55bc8f49 radv: fix buffer views on SI/CIK.
Fixes CTS dEQP-VK.memory.pipeline_barrier.host_write_uniform_texel_buffer.1024
on SI/CIK with radv.

Fixes: f4e499ec (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ca82ef5ac7)
2017-07-27 19:54:52 +01:00
Daniel Stone
f0b7563a36 st/dri2: Return invalid modifier when no driver support
Always initialise whandle.modifier for DRIImage modifier queries, so if
the driver doesn't support it then we return false for the query.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: d33fe8b84e ("st/dri: enable DRIimage modifier queries")
(cherry picked from commit 45383d32d4)
2017-07-27 19:54:52 +01:00
Daniel Stone
5bee196840 st/dri: Check get-handle return value in queryImage
In the DRIImage queryImage hook, check if resource_get_handle() failed
and return FALSE if so.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b4a18f13ce)
2017-07-27 19:54:52 +01:00
Daniel Stone
2a1792981c egl/wayland: Ignore invalid modifiers
If the underlying driver does not support modifiers, dmabuf will still
advertise formats through the 'modifier' event, but send them with an
invalid modifier. Ignore them if this is the case, rather than passing
them through to the driver.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
(cherry picked from commit dd072cf4b1)
2017-07-27 19:54:52 +01:00
Tim Rowley
a2b7477603 swr/rast: non-regex knob fallback code for gcc < 4.9
gcc prior to 4.9 didn't implement <regex>, causing a startup crash
in the swr knob parameter reading code.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit e21fc2c625)
2017-07-27 19:54:52 +01:00
Dave Airlie
3e777d5cab virgl: encode index buffer offset.
Fixes arb_vertex_buffer_object-combined-vertex-index

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c4652a0a5b)
2017-07-27 19:54:52 +01:00
Marek Olšák
9bb6aa5794 ac/surface: fix hybrid graphics where APU=GFX9, dGPU=older
v2: don't do it for compressed textures (bpp = 0)

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
(cherry picked from commit 5e81df0f10)
2017-07-27 19:54:52 +01:00
Marek Olšák
90bbcb93b1 radeonsi: decrease the number of compiler threads
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit ed2b3f5c81)
2017-07-27 19:54:52 +01:00
Marek Olšák
529c440dd3 gallium/radeon: make S_FIXED function signed and move it to shared code
This fixes a bug uncovered by:
    2412c4c81e
    util: Make CLAMP turn NaN into MIN.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 433f6f7ac9)
2017-07-27 19:54:52 +01:00
Charmaine Lee
94e0de90ee st/mesa: create framebuffer iface hash table per st manager
With commit 5124bf9823, a framebuffer interface hash table is
created in st_gl_api_create(), which is called in
dri_init_screen_helper() for each screen. When the hash table is
overwritten with multiple calls to st_gl_api_create(), it can cause
race condition. This patch fixes the problem by creating a
framebuffer interface hash table per state tracker manager.

Fixes crash with steam.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876
Fixes: 5124bf9823 ("st/mesa: add destroy_drawable interface")
Tested-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit bbc29393d3)

Squashed with:

st/mesa: fix unconditional return in st_framebuffer_iface_remove

Noticed by James Legg @ Feral.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 914f11e75b)

Squashed with:

st/mesa: Fix inversed test in st_api_destroy_drawable

Fixes a drawable leak.

Fixes: bbc29393d3 ("st/mesa: create framebuffer iface hash table per
                      st manager")
Bugzilla: https://bugs.freedesktop.org/101930
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 57132d126f)
2017-07-27 19:54:24 +01:00
Grigori Goronzy
3180f0fa0d egl: move KHR_no_error vs debug/robustness check further down
We'll fail to flag an error if the context flags appear after the
no-error attribute in the context attribute list.

Delay the check to after attribute parsing to fix this.

Fixes: 4909519a66 ("egl: Add EGL_KHR_create_context_no_error support")
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: add fixes/stable tags, commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

(cherry picked from commit 39bf7756b9)
2017-07-27 18:56:45 +01:00
Nicolai Hähnle
e7f14a8b52 radeonsi/gfx9: reduce max threads per block to 1024 on gfx9+
The number of supported waves per thread group has been reduced to 16
with gfx9. Trying to use 32 waves causes hangs, and barriers might
not work correctly with > 16 waves.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a0e6b9a2db)
2017-07-27 18:56:45 +01:00
Nicolai Hähnle
7f5d9d7a6d radeonsi: fix detection of DRAW_INDIRECT_MULTI on SI
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).

While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.

Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 65fbaab0b7)
2017-07-27 18:56:45 +01:00
Iago Toral Quiroga
3d0960e761 anv: only expose up to 28 vertex attributes
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 31f1863ace)
2017-07-27 18:56:45 +01:00
Iago Toral Quiroga
bdbd8ab517 anv/cmd_buffer: fix off by one error in assertion
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit a848e693ef)
2017-07-27 18:56:44 +01:00
Kenneth Graunke
47bca2cfa7 i965: Fix = vs == in MCS aux usage assert.
Caught by Coverity (CID 1415680).

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 698636cc97)
Fixes: 0f9b609cf4 ("i965/blorp: Do prepare/finish manually")
2017-07-27 18:54:41 +01:00
Kenneth Graunke
04bb687f04 i965: Fix offset addition in get_isl_surf.
Increase the value, not the pointer to the stack variable.

Caught by Coverity (CID 1415574).  Not shipped in a real release.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit f6e674fa51)
Fixes: 63a43f4161 ("i965: Refactor miptree to isl converter and adjustment")
2017-07-27 18:54:13 +01:00
Eric Anholt
4e0f29ed0b broadcom/vc4: Prefer blit via rendering to the software fallback.
I don't know how I managed to leave this here for so long.  Found when
working on a 1:1 overlapping blit extension for X11.

Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 93fec49a75)
2017-07-27 18:21:27 +01:00
Lionel Landwerlin
0ccb853cc0 i965: perf: flush batchbuffers at the beginning of queries
As Chris commented, it makes more sense to have batch buffer flushes
before the query. Usually applications like frame_retrace do a series
of queries and in that case, with flushes at the end of the queries,
we might still have the first query contained in 2 different batchs.
More generally it would be quite usual to have the query contained in
2 batch buffers because we never now what's the fill rate of the
current batch buffer.

If we move the flushing at the beginning of the queries, it's pretty
much guaranteed that queries will be contained in a single batch
buffer (unless the amount of commands is huge, but then it's only fair
to include reloading request times in the measurements).

Fixes: adafe4b733 ("i965: perf: minimize the chances to spread queries across batchbuffers")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9f439ae120)
2017-07-27 18:21:22 +01:00
Emil Velikov
a455f594bb Update version to 17.2.0-rc1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 16:59:32 +01:00
Emil Velikov
a955622c1a intel/blorp: ship blorp_genX_exec.h within the tarball
Fixes: c9cb37b2a6 ("intel/blorp: Add a partial resolve pass for MCS")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 5d47dd9c2a)
2017-07-24 16:59:32 +01:00
340 changed files with 8111 additions and 2658 deletions

View File

@@ -40,6 +40,7 @@ matrix:
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS=""
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--disable-libunwind"
addons:
apt:
packages:
@@ -66,6 +67,7 @@ matrix:
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS="swr"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
sources:
@@ -81,6 +83,7 @@ matrix:
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- env:
- LABEL="make Gallium Drivers Other"
- BUILD=make
@@ -93,6 +96,7 @@ matrix:
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS="i915,nouveau,pl111,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,etnaviv,imx"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
sources:
@@ -108,6 +112,7 @@ matrix:
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- env:
# NOTE: Analogous to SWR above, building Clover is quite slow.
- LABEL="make Gallium ST Clover"
@@ -125,6 +130,7 @@ matrix:
# Regardless - we're doing a quick build test here.
- GALLIUM_DRIVERS="i915"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
sources:
@@ -144,11 +150,14 @@ matrix:
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- env:
- LABEL="make Gallium ST Other"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.3
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --enable-xa --enable-nine --enable-xvmc --enable-vdpau --enable-va --enable-omx --enable-gallium-osmesa"
@@ -157,9 +166,12 @@ matrix:
# Regardless - we're doing a quick build test here.
- GALLIUM_DRIVERS="i915,swrast"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
# We actually want to test against llvm-3.3
- llvm-3.3-dev
# Nine requires gcc 4.6... which is the one we have right ?
- libxvmc-dev
# Build locally, for now.
@@ -174,6 +186,7 @@ matrix:
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- env:
- LABEL="make Vulkan"
- BUILD=make
@@ -186,6 +199,7 @@ matrix:
- GALLIUM_ST="--enable-dri --enable-dri3 --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS=""
- VULKAN_DRIVERS="intel,radeon"
- LIBUNWIND_FLAGS="--disable-libunwind"
addons:
apt:
sources:
@@ -367,6 +381,7 @@ script:
export CC="$CC -isystem`pwd`";
./autogen.sh --enable-debug
$LIBUNWIND_FLAGS
$DRI_LOADERS
--with-dri-drivers=$DRI_DRIVERS
$GALLIUM_ST

View File

@@ -88,6 +88,10 @@ LOCAL_CFLAGS += \
endif
endif
ifeq ($(ARCH_ARM_HAVE_NEON),true)
LOCAL_CFLAGS_arm += -DUSE_ARM_ASM
endif
LOCAL_CFLAGS_arm64 += -DUSE_AARCH64_ASM
ifneq ($(LOCAL_IS_HOST_MODULE),true)
LOCAL_CFLAGS += -DHAVE_LIBDRM

View File

@@ -92,16 +92,12 @@ define mesa-build-with-llvm
$(if $(filter $(MESA_ANDROID_MAJOR_VERSION), 4 5), \
$(warning Unsupported LLVM version in Android $(MESA_ANDROID_MAJOR_VERSION)),) \
$(if $(filter 6,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_STATIC_LIBRARIES += libLLVMCore) \
$(eval LOCAL_C_INCLUDES += external/llvm/include external/llvm/device/include),) \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0)) \
$(if $(filter 7,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_STATIC_LIBRARIES += libLLVMCore) \
$(eval LOCAL_C_INCLUDES += external/llvm/include external/llvm/device/include),) \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0)) \
$(if $(filter O,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_HEADER_LIBRARIES += llvm-headers),)
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_PATCH=0)) \
$(eval LOCAL_SHARED_LIBRARIES += libLLVM)
endef
# add subdirectories

View File

@@ -41,6 +41,7 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-xa \
--enable-xvmc \
--enable-llvm-shared-libs \
--enable-libunwind \
--with-platforms=x11,wayland,drm,surfaceless \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,nouveau,r300,pl111,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr,etnaviv,imx \

View File

@@ -50,10 +50,10 @@ except KeyError:
pass
else:
targets = targets.split(',')
print 'scons: warning: targets option is deprecated; pass the targets on their own such as'
print
print ' scons %s' % ' '.join(targets)
print
print('scons: warning: targets option is deprecated; pass the targets on their own such as')
print()
print(' scons %s' % ' '.join(targets))
print()
COMMAND_LINE_TARGETS.append(targets)

View File

@@ -1 +1 @@
17.2.0-devel
17.2.8

161
bin/.cherry-ignore Normal file
View File

@@ -0,0 +1,161 @@
# fixes: The commits are too invasive for stable. Instead the offending patches
# causing regressions have been reverted.
365d34540f331df57780dddf8da87235be0a6bcb mesa: correctly calculate the storage offset for i915
de0e62e1065e2d9172acf3ab7c70bba0160125c8 st/mesa: correctly calculate the storage offset
# stable: Add loader::getCapability patches. It's rather invasive infra
# not suitable as a bugfix.
1bf703e4ea5c4f742bc7ba55d01e5afc3f4e11f9 dri_interface,egl,gallium: only expose RGBA visuals on Android
be5773fa8dfe9255d9abaf5c7d5bbbd2d922da08 Android: fix compile error for DRI2 loader getCapability
31a6750988d7dd431f72ff1ff11bfca83bde5d8c st/dri: NULL check before deref DRI loader .getCapability
# stable: The commit addresses code that did not land in the stable branch
31bb8517a194af733deefe2d821537d994d39365 radv/gfx9: fix tile swizzle handling for gfx9
# stable: Commit is not applicable when 4fab67a4415 is missing.
d496780fb2c7f2cf0e32b6a79dc528e5156dfcb3 intel/eu/validate: Look up types on demand in execution_type()
# fixes: Depend on preseding commit which adds new public GBM API
3a5e3aa5a53cff55a5e31766d713a41ffa5a93d7 egl/drm: Fix misused x and y offsets in swrast_put_image2()
fe2a6281b3b299998fe7399e7dbcc2077d773824 egl/drm: Fix misused x and y offsets in swrast_get_image()
# fixes: This commit addressed an earlier commit c7e9ebb3ab8 which did not
# land in branch
45c5c444518b7e83d9accd9f44702fa49282a3b8 radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug
# fixes: This commit addressed earlier commits 61ad2f13 and 6dcc54b4 which did
# not land in branch
979978ee06867a531b8d56cee252f5c83920a339 radv: Check for GFX9 for 1D arrays in image_size intrinsic.
# fixes: This commit addressed earlier commits dcf46e99 and 60878dd0 which did
# not land in branch
8e9e339c530c7b82b5a29d4b3183e8f5a01eae28 radv: copy the number of viewports/scissors at pipeline bind time
# stable: The commit regresses a few dEQP tests. Namely:
# dEQP-VK.api.copy_and_blit.core.buffer_to_buffer.partial
# dEQP-VK.api.copy_and_blit.dedicated_allocation.buffer_to_buffer.partial
14555d0b7a51bd3701764fd213c2459410143431 anv: Remove unreachable cases from isl_format_for_size()
# stable: The commit addresses earlier commit a62a9793357 which is no applicable
# for the stable branch
6c7720ed78db754d52f204cbb74897aa9e65ea7e anv/wsi: Allocate enough memory for the entire image
# stable: Commits are too invasive for 17.2.
98fdff7247b6877d028d33284f9cc63189ee204e configure.ac: factor out detection for old and buggy llvm
13a53c4f5cdd664fd155c9e78fb46a4387af006c configure.ac: rework llvm libs handling for 3.9+
a7ecf7b86f4eae59f3ceac2125e5d1725c403c07 Travis: add binutils 2.26 for a few more LLVM 3.9 builds
36d6d1e931936a80da327889862ba02942ac427b configure.ac: add llvm_add_optional_component helper
df3a43018020c16c1dfa88a76c9a84c9fb85be38 configure.ac: add missing LLVM components for OpenCL
# stable: Commit is too big for stable at this point.
4d24a7cb97641cacecd371d1968f6964785822e4 glsl: fix derived cs variables
# stable: 17.3 nomination only.
fee9d05e2136b2b7c5a1ad2be7180b99f733f539 radv: Update code pointer correctly if a variant is already created
# stable: 17.3 nomination only.
d8cefaa197f02944812ef535b1b303dd5bf26848 radv: use device name in cache creation like radeonsi.
# fixes: This commit addressed earlier commit 35ac13ed3 which did not
# land in branch.
11d688d9f0d2ee4d0178d1807c0075e5e8364b1d mesa/bufferobj: don't double negate the range
# extra: Commit is not applicable when ade416d0236 is missing.
07bfdb478bf844a0ac9cf3679f51f83c4abea5a1 broadcom/vc5: Propagate vc4 aliasing fix to vc5.
# stable: This commit addressed earlier commit 8d90e28839 which did
# not land in branch.
446c5726ecb968d06a6607e0df42be1cb74948c4 i965: fix blorp stage_prog_data->param leak
# stable: This commit addressed earlier commit 78ade659569 which did
# not land in branch.
8fbd82f464f26a56167f7962174b2b69756a105a etnaviv: don't do resolve-in-place without valid TS
# stable: This commit addressed earlier commit 8d90e28839 which did
# not land in branch.
7b4387519c382cffef9c62bbbbefcfe71cfde905 intel/fs: Alloc pull constants off mem_ctx
# stable: 17.3 nomination only.
3f8e3c2bd8f54ae6817f7496be47f4e1a8860d9c radeonsi: add a workaround for weird s_buffer_load_dword behavior on SI
7dae419aa7c34af820c08896acef3b65d855188e Android: move drivers' symlinks to /vendor (v2)
# fixes: This commit has more than one Fixes tag but the commit it
# addresses didn't land in branch.
e17e8934f9e4b008bdfb4f9abd8ed4faa604c7d9 automake: include git_sha1.h.in in release tarball
# stable: This commit is not really needed after 6ac2d169019.
e8c9e65185de3e821e1e482e77906d1d51efa3ec intel/fs: Use a pure vertical stride for large register strides
# stable: These commits addressed earlier commit 379b24a40d3 which did
# not land in branch.
7364f080f9a272323ed3491f278a1eed3eb9b1a7 intel/nir: Add a helper for getting the NoIndirect mask
3e63cf893f096a7263eb1856d58417dd2d170d4b intel/nir: Break the linking code into a helper in brw_nir.c
951a5dc4cc29da996b54ae63eeba1915a3a65b4a intel/nir: Use the correct indirect lowering masks in link_shaders
# stable: These commits resulted in a CTS regression being addressed
# at https://bugs.freedesktop.org/show_bug.cgi?id=103626 .
18fde36ced4279f2577097a1a7d31b55f2f5f141 intel/fs: Use the original destination region for int MUL lowering
# stable: These commits are refactorings rather than fixes.
fcd4adb9d08094520fb8d118d3448b04c6ec1fd1 intel/fs: Pass builders instead of blocks into emit_[un]zip
0d905597fe2997c89022c76cdf84dc4fba5eb055 intel/fs: Be more explicit about our placement of [un]zip
6c00240bc650805e0b66aa6e17dbe69bbe41e446 intel/fs: Don't stomp f0.1 in SIMD16 ballot
# stable: This commit addressed earlier commit ea1b97714d9b which did
# not land in branch.
50330d7115f0d5050ec3cfe6bca2b0136222e097 r600/shader: reserve first register of vertex shader.
# stable: This commit depends on earlier commit 3735af04152b which did
# not land in branch.
a6cc361e5fd2450249847d5ee8093d26ed7ff545 anv/cmd_buffer: Advance the address when initializing clear colors
# stable: This commit addressed earlier commit a62a97933578 which did
# not land in branch.
a07f7b26198ce0f5c8799481a673754968ac5daf anv/cmd_buffer: Take bo_offset into account in fast clear state addresses
# stable: These commits addressed earlier commit 2c4097aff1b which did
# not land in branch.
344252a27f8d875572bbe65641a825af8e73845d i965/bufmgr: Add a helper to mark a BO as external
0a6a137eb27129e17298cfe9dd620205588ee4f6 i965: Mark BOs as external when we export their handle
# stable: 17.3 nomination only.
6e4d65f674a70809e6df1a4f716f874828915562 broadcom/vc5: Add vc5_drm.h to the release tarball
4639cc716e89c69da41c7b54fa938457000fbd4c intel/blorp: Use mocs.tex for depth stencil
deec84fd771876b5c0755293376df11bc95b473b anv/blorp: Add a device parameter to blorp_surf_for_anv_image
bc933d0e8462871e19328f66182c35543e334013 intel/blorp: Make the MOCS setting part of blorp_address
d7a19d69ebc032ba7207fc97bc6f10d5bb35bb99 i965: Use PTE MOCS for all external buffers
# fixes: This commit is only a typo correction on an error message.
a6932faae1074445210d392a80b94fdac147b255 glsl: Fix typo fragement -> fragment
# fixes: This commit makes reference to 2 other commits but none have
# made it to the 17.2 queue.
9b0223046668593deb9c0be0b557994bb5218788 egl: pass the dri2_dpy to the $plat_teardown functions
# extra: The commit just references a proper fix that has already
# landed.
a31d0382084c8aa860ffcef9b12592c5c44e192f Revert "intel/fs: Use a pure vertical stride for large register strides"
# stable: The commit depends on at least one other that did not land in
# branch - 8b3a2578519.
010214b403de1b5e25a549372ba6192b89e05d06 radeonsi: allow DMABUF exports for local buffers
# stable: This commit addressed earlier commit ead0dfe31ec7 which did
# not land in branch.
709f5bdc4a2bf31f422f5cf60797224c0463c10a swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.
# stable: 17.3 nomination only.
bf0904e31fb7d9cd8932d582076c8d7beb02ba89 winsys/amdgpu: disable local BOs again due to worse performance
35c3cbad3c30ad3d40a6811dd6ca2286e013bfc5 radeonsi: don't call force_dcc_off for buffers
# fixes: This commit addressed earlier commit d1c9f30d7ff7 which did
# not land in branch.
1bdeac545f4ea9f7ca6947f5da7fcf4f5b3010dc radv: port merge tess info from anv
# extra: The commit just references a fix for an additional change in its v2.
c1ff99fd70cd2ceb2cac4723e4fd5efc93834746 main: Clear shader program data whenever ProgramBinary is called
# extra: The commit references a previous commit in which the changes
# should have been included but, as clarified by the
# developer, it is not needed for stable.
71e630753ebbee82e8f8709da5488296b2c070c8 r600: set DX10_CLAMP for compute shader too

View File

@@ -410,8 +410,21 @@ int main() {
}]])], GCC_ATOMIC_BUILTINS_SUPPORTED=1)
if test "x$GCC_ATOMIC_BUILTINS_SUPPORTED" = x1; then
DEFINES="$DEFINES -DUSE_GCC_ATOMIC_BUILTINS"
dnl On some platforms, new-style atomics need a helper library
AC_MSG_CHECKING(whether -latomic is needed)
AC_LINK_IFELSE([AC_LANG_SOURCE([[
#include <stdint.h>
uint64_t v;
int main() {
return (int)__atomic_load_n(&v, __ATOMIC_ACQUIRE);
}]])], GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC=no, GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC=yes)
AC_MSG_RESULT($GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC)
if test "x$GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC" = xyes; then
LIBATOMIC_LIBS="-latomic"
fi
fi
AM_CONDITIONAL([GCC_ATOMIC_BUILTINS_SUPPORTED], [test x$GCC_ATOMIC_BUILTINS_SUPPORTED = x1])
AC_SUBST([LIBATOMIC_LIBS])
dnl Check if host supports 64-bit atomics
dnl note that lack of support usually results in link (not compile) error
@@ -773,6 +786,20 @@ if test "x$enable_asm" = xyes; then
;;
esac
;;
aarch64)
case "$host_os" in
linux*)
asm_arch=aarch64
;;
esac
;;
arm)
case "$host_os" in
linux*)
asm_arch=arm
;;
esac
;;
esac
case "$asm_arch" in
@@ -792,6 +819,14 @@ if test "x$enable_asm" = xyes; then
DEFINES="$DEFINES -DUSE_PPC64LE_ASM"
AC_MSG_RESULT([yes, ppc64le])
;;
aarch64)
DEFINES="$DEFINES -DUSE_AARCH64_ASM"
AC_MSG_RESULT([yes, aarch64])
;;
arm)
DEFINES="$DEFINES -DUSE_ARM_ASM"
AC_MSG_RESULT([yes, arm])
;;
*)
AC_MSG_RESULT([no, platform not supported])
;;
@@ -803,6 +838,28 @@ AC_CHECK_HEADER([xlocale.h], [DEFINES="$DEFINES -DHAVE_XLOCALE_H"])
AC_CHECK_HEADER([sys/sysctl.h], [DEFINES="$DEFINES -DHAVE_SYS_SYSCTL_H"])
AC_CHECK_FUNC([strtof], [DEFINES="$DEFINES -DHAVE_STRTOF"])
AC_CHECK_FUNC([mkostemp], [DEFINES="$DEFINES -DHAVE_MKOSTEMP"])
AC_CHECK_FUNC([memfd_create], [DEFINES="$DEFINES -DHAVE_MEMFD_CREATE"])
AC_MSG_CHECKING([whether strtod has locale support])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
#define _GNU_SOURCE
#include <stdlib.h>
#include <locale.h>
#ifdef HAVE_XLOCALE_H
#include <xlocale.h>
#endif
int main() {
locale_t loc = newlocale(LC_CTYPE_MASK, "C", NULL);
const char *s = "1.0";
char *end;
double d = strtod_l(s, end, loc);
float f = strtof_l(s, end, loc);
freelocale(loc);
return 0;
}]])],
[DEFINES="$DEFINES -DHAVE_STRTOD_L"];
AC_MSG_RESULT([yes]),
AC_MSG_RESULT([no]))
dnl Check to see if dlopen is in default libraries (like Solaris, which
dnl has it in libc), or if libdl is needed to get it.
@@ -1360,18 +1417,10 @@ AC_ARG_ENABLE([libglvnd],
AM_CONDITIONAL(USE_LIBGLVND, test "x$enable_libglvnd" = xyes)
if test "x$enable_libglvnd" = xyes ; then
dnl XXX: update once we can handle more than libGL/glx.
dnl Namely: we should error out if neither of the glvnd enabled libraries
dnl are built
case "x$enable_glx" in
xno)
AC_MSG_ERROR([cannot build libglvnd without GLX])
;;
xxlib | xgallium-xlib )
AC_MSG_ERROR([cannot build libgvnd when Xlib-GLX or Gallium-Xlib-GLX is enabled])
;;
xdri)
;;
esac
PKG_CHECK_MODULES([GLVND], libglvnd >= 0.2.0)
@@ -1380,6 +1429,10 @@ if test "x$enable_libglvnd" = xyes ; then
DEFINES="${DEFINES} -DUSE_LIBGLVND=1"
DEFAULT_GL_LIB_NAME=GLX_mesa
if test "x$enable_glx" = xno -a "x$enable_egl" = xno; then
AC_MSG_ERROR([cannot build libglvnd without GLX or EGL])
fi
fi
AC_ARG_WITH([gl-lib-name],
@@ -2140,7 +2193,9 @@ if test "x$enable_xvmc" = xyes -o \
"x$enable_vdpau" = xyes -o \
"x$enable_omx" = xyes -o \
"x$enable_va" = xyes; then
PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
if echo $platforms | grep -q "x11"; then
PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
fi
need_gallium_vl_winsys=yes
fi
AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes)
@@ -2551,7 +2606,7 @@ if test -n "$with_gallium_drivers"; then
if test "x$HAVE_SWR_AVX" != xyes -a \
"x$HAVE_SWR_AVX2" != xyes -a \
"x$HAVE_SWR_KNL" != xyes -a \
"x$HAVE_SWR_SKX" != xyes -a; then
"x$HAVE_SWR_SKX" != xyes; then
AC_MSG_ERROR([swr enabled but no swr architectures selected])
fi
@@ -2735,6 +2790,8 @@ AM_CONDITIONAL(HAVE_X86_ASM, test "x$asm_arch" = xx86 -o "x$asm_arch" = xx86_64)
AM_CONDITIONAL(HAVE_X86_64_ASM, test "x$asm_arch" = xx86_64)
AM_CONDITIONAL(HAVE_SPARC_ASM, test "x$asm_arch" = xsparc)
AM_CONDITIONAL(HAVE_PPC64LE_ASM, test "x$asm_arch" = xppc64le)
AM_CONDITIONAL(HAVE_AARCH64_ASM, test "x$asm_arch" = xaarch64)
AM_CONDITIONAL(HAVE_ARM_ASM, test "x$asm_arch" = xarm)
AC_SUBST([NINE_MAJOR], 1)
AC_SUBST([NINE_MINOR], 0)

View File

@@ -130,27 +130,6 @@ mesa/demos repository.</p>
runtime</p>
<dl>
<dt><code>EGL_DRIVERS_PATH</code></dt>
<dd>
<p>By default, the main library will look for drivers in the directory where
the drivers are installed to. This variable specifies a list of
colon-separated directories where the main library will look for drivers, in
addition to the default directory. This variable is ignored for setuid/setgid
binaries.</p>
<p>This variable is usually set to test an uninstalled build. For example, one
may set</p>
<pre>
$ export LD_LIBRARY_PATH=$mesa/lib
$ export EGL_DRIVERS_PATH=$mesa/lib/egl
</pre>
<p>to test a build without installation</p>
</dd>
<dt><code>EGL_DRIVER</code></dt>
<dd>

View File

@@ -20,7 +20,7 @@
The Gallium llvmpipe driver is a software rasterizer that uses LLVM to
do runtime code generation.
Shaders, point/line/triangle rasterization and vertex processing are
implemented with LLVM IR which is translated to x86 or x86-64 machine
implemented with LLVM IR which is translated to x86, x86-64, or ppc64le machine
code.
Also, the driver is multithreaded to take advantage of multiple CPU cores
(up to 8 at this time).
@@ -32,24 +32,36 @@ It's the fastest software rasterizer for Mesa.
<ul>
<li>
<p>An x86 or amd64 processor; 64-bit mode recommended.</p>
<p>
For x86 or amd64 processors, 64-bit mode is recommended.
Support for SSE2 is strongly encouraged. Support for SSE3 and SSE4.1 will
yield the most efficient code. The fewer features the CPU has the more
likely is that you run into underperforming, buggy, or incomplete code.
likely it is that you will run into underperforming, buggy, or incomplete code.
</p>
<p>
For ppc64le processors, use of the Altivec feature (the Vector
Facility) is recommended if supported; use of the VSX feature (the
Vector-Scalar Facility) is recommended if supported AND Mesa is
built with LLVM version 4.0 or later.
</p>
<p>
See /proc/cpuinfo to know what your CPU supports.
</p>
</li>
<li>
<p>LLVM: version 3.4 recommended; 3.3 or later required.</p>
<p>Unless otherwise stated, LLVM version 3.4 is recommended; 3.3 or later is required.</p>
<p>
For Linux, on a recent Debian based distribution do:
</p>
<pre>
aptitude install llvm-dev
</pre>
<p>
If you want development snapshot builds of LLVM for Debian and derived
distributions like Ubuntu, you can use the APT repository at <a
href="https://apt.llvm.org/" title="Debian Development packages for LLVM"
>apt.llvm.org</a>, which are maintained by Debian's LLVM maintainer.
</p>
<p>
For a RPM-based distribution do:
</p>
@@ -228,8 +240,8 @@ build/linux-???-debug/gallium/drivers/llvmpipe:
</ul>
<p>
Some of this tests can output results and benchmarks to a tab-separated-file
for posterior analysis, e.g.:
Some of these tests can output results and benchmarks to a tab-separated file
for later analysis, e.g.:
</p>
<pre>
build/linux-x86_64-debug/gallium/drivers/llvmpipe/lp_test_blend -o blend.tsv
@@ -240,8 +252,8 @@ for posterior analysis, e.g.:
<ul>
<li>
When looking to this code by the first time start in lp_state_fs.c, and
then skim through the lp_bld_* functions called in there, and the comments
When looking at this code for the first time, start in lp_state_fs.c, and
then skim through the lp_bld_* functions called there, and the comments
at the top of the lp_bld_*.c functions.
</li>
<li>

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.0 Release Notes / TBD</h1>
<h1>Mesa 17.2.0 Release Notes / September 4, 2017</h1>
<p>
Mesa 17.2.0 is a new development release.
@@ -33,7 +33,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
TBD.
9484ad96b4bb6cda5bbf1aef52dfa35183dc21aa6258a2991c245996c2fdaf85 mesa-17.2.0.tar.gz
3123448f770eae58bc73e15480e78909defb892f10ab777e9116c9b218094943 mesa-17.2.0.tar.xz
</pre>
@@ -56,9 +57,156 @@ Note: some of the new features are only available with certain drivers.
<h2>Bug fixes</h2>
<ul>
TBD
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=68365">Bug 68365</a> - [SNB Bisected]Piglit spec_ARB_framebuffer_object_fbo-blit-stretch fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=77240">Bug 77240</a> - khrplatform.h not installed if EGL is disabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95530">Bug 95530</a> - Stellaris - colored overlay of sectors doesn't render on i965</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96449">Bug 96449</a> - Dying Light reports OpenGL version 3.0 with mesa-git</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96958">Bug 96958</a> - [SKL] Improper rendering in Europa Universalis IV</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97957">Bug 97957</a> - Awful screen tearing in a separate X server with DRI3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98238">Bug 98238</a> - Witcher 2: objects are black when changing lod on Radeon Pitcairn</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98428">Bug 98428</a> - Undefined non-weak-symbol in dri-drivers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98833">Bug 98833</a> - [REGRESSION, bisected] Wayland revert commit breaks non-Vsync fullscreen frame updates</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99467">Bug 99467</a> - [radv] DOOM 2016 + wine. Green screen everywhere (but can be started)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100070">Bug 100070</a> - Rocket League: grass gets rendered incorrectly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100242">Bug 100242</a> - radeon buffer allocation failure during startup of Factorio</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100620">Bug 100620</a> - [SKL] 48-bit addresses break DOOM</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100690">Bug 100690</a> - [Regression, bisected] TotalWar: Warhammer corrupted graphics</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100741">Bug 100741</a> - Chromium - Memory leak</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100785">Bug 100785</a> - [regression, bisected] arb_gpu_shader5 piglit fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100854">Bug 100854</a> - YUV to RGB Color Space Conversion result is not precise</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100871">Bug 100871</a> - gles cts hangs mesa indefinitely</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100877">Bug 100877</a> - vulkan/tests/block_pool_no_free regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100892">Bug 100892</a> - Polaris 12: winsys init bad switch (missing break) initializing addrlib</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100937">Bug 100937</a> - Mesa fails to build with GCC 4.8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100945">Bug 100945</a> - Build failure in GNOME Continuous</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100988">Bug 100988</a> - glXGetCurrentDisplay() no longer works for FakeGLX contexts?</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101071">Bug 101071</a> - compiling glsl fails with undefined reference to `pthread_create'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101088">Bug 101088</a> - `gallium: remove pipe_index_buffer and set_index_buffer` causes glitches and crash in gallium nine</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101110">Bug 101110</a> - Build failure in GNOME Continuous</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101189">Bug 101189</a> - Latest git fails to compile with radeon</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101252">Bug 101252</a> - eglGetDisplay() is not thread safe</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101254">Bug 101254</a> - VDPAU videos don't start playing with r600 gallium driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101283">Bug 101283</a> - skylake: page fault accessing address 0</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101284">Bug 101284</a> - [G45] ES2-CTS.functional.texture.specification.basic_copytexsubimage2d.cube_rgba</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101294">Bug 101294</a> - radeonsi minecraft forge splash freeze since 17.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101306">Bug 101306</a> - [BXT] gles asserts in cts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101326">Bug 101326</a> - gallium/wgl: Allow context creation without prior SetPixelFormat()</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101334">Bug 101334</a> - AMD SI cards: Some vulkan apps freeze the system</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101336">Bug 101336</a> - glcpp-test.sh regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101340">Bug 101340</a> - i915_surface.c:108:4: error: too few arguments to function util_blitter_default_src_texture</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101360">Bug 101360</a> - Assertion failure comparing result of ballotARB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101401">Bug 101401</a> - [REGRESSION][BISECTED] GDM fails to start after 8ec4975cd83365c791a1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101418">Bug 101418</a> - Build failure in GNOME Continuous</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101451">Bug 101451</a> - [G33] ES2-CTS.functional.clipping.polygon regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101464">Bug 101464</a> - PrimitiveRestartNV inside a render list causes a crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101471">Bug 101471</a> - Mesa fails to build: unknown typename bool</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101535">Bug 101535</a> - [bisected] [Skylake] Kwin won't start and glxgears coredumps</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101538">Bug 101538</a> - From &quot;Use isl for hiz layouts&quot; commit onwards, everything crashes with Mesa</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101539">Bug 101539</a> - [Regresion] [IVB] Segment fault in recent commit in intel_miptree_level_has_hiz under Ivy bridge</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101558">Bug 101558</a> - [regression][bisected] MPV playing video via opengl &quot;randomly&quot; results in only part of the window / screen being rendered with Mesa GIT.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101596">Bug 101596</a> - Blender renders black UI elements</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101607">Bug 101607</a> - Regression in anisotropic filtering from &quot;i965: Convert fs sampler state to use genxml&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101657">Bug 101657</a> - strtod.c:32:10: fatal error: xlocale.h: No such file or directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101666">Bug 101666</a> - bitfieldExtract is marked as a built-in function on OpenGL ES 3.0, but was added in OpenGL ES 3.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101683">Bug 101683</a> - Some games hang while loading when compositing is shut off or absent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101703">Bug 101703</a> - No stencil buffer allocated when requested by GLUT</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101704">Bug 101704</a> - [regression][bisected] glReadPixels() from pbuffer failing in Android CTS camera tests</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101766">Bug 101766</a> - Assertion `!&quot;invalid type&quot;' failed when constant expression involves literal of different type</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101774">Bug 101774</a> - gen_clflush.h:37:7: error: implicit declaration of function __builtin_ia32_clflush</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101775">Bug 101775</a> - Xorg segfault since 147d7fb &quot;st/mesa: add a winsys buffers list in st_context&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101829">Bug 101829</a> - read-after-free in st_framebuffer_validate</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101831">Bug 101831</a> - Build failure in GNOME Continuous</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101851">Bug 101851</a> - [regression] libEGL_common.a undefined reference to '__gxx_personality_v0'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101867">Bug 101867</a> - Launch options window renders black in Feral Games in current Mesa trunk</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101876">Bug 101876</a> - SIGSEGV when launching Steam</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101910">Bug 101910</a> - [BYT] ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101925">Bug 101925</a> - playstore/webview crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101961">Bug 101961</a> - Serious Sam Fusion hangs system completely</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101982">Bug 101982</a> - Weston crashes when running an OpenGL program on i965</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101983">Bug 101983</a> - [G33] ES2-CTS.functional.shaders.struct.uniform.sampler_nested* regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102024">Bug 102024</a> - FORMAT_FEATURE_SAMPLED_IMAGE_BIT not supported for D16_UNORM and D32_SFLOAT</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102148">Bug 102148</a> - Crash when running qopenglwidget example on mesa llvmpipe win32</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102241">Bug 102241</a> - gallium/wgl: SwapBuffers freezing regularly with swap interval enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102308">Bug 102308</a> - segfault in glCompressedTextureSubImage3D</li>
</ul>
<h2>Changes</h2>
<ul>

200
docs/relnotes/17.2.1.html Normal file
View File

@@ -0,0 +1,200 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.1 Release Notes / September 17, 2017</h1>
<p>
Mesa 17.2.1 is a bug fix release which fixes bugs found since the 17.2.0 release.
</p>
<p>
Mesa 17.2.1 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
c902d8dc2540195bc570d88af1a8fd8a1774373660a27bb1d539551f46824bc1 mesa-17.2.1.tar.gz
77385d17827cff24a3bae134342234f2efe7f7f990e778109682571dbbc9ba1e mesa-17.2.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100613">Bug 100613</a> - Regression in Mesa 17 on s390x (zSystems)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101709">Bug 101709</a> - [llvmpipe] piglit gl-1.0-scissor-offscreen regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102454">Bug 102454</a> - glibc 2.26 doesn't provide anymore xlocale.h</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102467">Bug 102467</a> - src/mesa/state_tracker/st_cb_readpixels.c:178]: (warning) Redundant assignment</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102502">Bug 102502</a> - [bisected] Kodi crashes since commit 707d2e8b - gallium: fold u_trim_pipe_prim call from st/mesa to drivers</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (4):</p>
<ul>
<li>radv: Actually set the cmd_buffer usage_flags.</li>
<li>radv: Fix vkCopyImage with both depth and stencil aspects.</li>
<li>radv: Disable multilayer &amp; multilevel DCC.</li>
<li>radv: Don't allocate CMASK for linear images.</li>
</ul>
<p>Ben Crocker (1):</p>
<ul>
<li>llvmpipe: lp_build_gather_elem_vec BE fix for 3x16 load</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>llvmpipe: initialize llvmpipe-&gt;dirty with LP_NEW_SCISSOR</li>
</ul>
<p>Charmaine Lee (1):</p>
<ul>
<li>vbo: fix offset in minmax cache key</li>
</ul>
<p>Dave Airlie (12):</p>
<ul>
<li>radv: disable 1d/2d linear optimisation on gfx9.</li>
<li>radv/gfx9: set descriptor up for base_mip to level range.</li>
<li>Revert "radv: disable support for VEGA for now."</li>
<li>radv/winsys: use amdgpu_bo_va_op_raw.</li>
<li>radv/gfx9: allocate events from uncached VA space</li>
<li>radv: use simpler indirect packet 3 if possible.</li>
<li>radv: don't use iview for meta image width/height.</li>
<li>radv: handle GFX9 1D textures</li>
<li>radv/gfx9: set mip0-depth correctly for 2d arrays/3d images</li>
<li>radv/ac: bump params array for image atomic comp swap</li>
<li>radv/gfx9: fix image resource handling.</li>
<li>radv/winsys: fix flags vs va_flags thinko.</li>
</ul>
<p>Emil Velikov (7):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.0</li>
<li>cherry-ignore: add getCapability patches</li>
<li>cherry-ignore: ignore gfx9 tile swizzle fix</li>
<li>cherry-ignore: add execution_type() fix to the list</li>
<li>cherry-ignore: add EGL+gbm swast patches</li>
<li>egl/x11/dri3: adding missing __DRI_BACKGROUND_CALLABLE extension</li>
<li>Update version to 17.2.1</li>
</ul>
<p>Eric Engestrom (3):</p>
<ul>
<li>util: improve compiler guard</li>
<li>mesa/st: remove unwanted backup file</li>
<li>docs/egl: remove reference to EGL_DRIVERS_PATH</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>radv: don't assert on empty hash table</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>anv/formats: Nicely handle unknown VkFormat enums</li>
<li>spirv: Add support for the HelperInvocation builtin</li>
</ul>
<p>Karol Herbst (1):</p>
<ul>
<li>nvc0: write 0 to pipeline_statistics.cs_invocations</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Fix crash in fallback GTT mapping.</li>
<li>i965: Set "Subslice Hashing Mode" to 16x16 on Apollolake.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: skip draw calls with pipe_draw_info::count == 0</li>
</ul>
<p>Michael Olbrich (1):</p>
<ul>
<li>egl/dri2: only destroy created objects</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog</li>
</ul>
<p>Nicolai Hähnle (4):</p>
<ul>
<li>radeonsi/gfx9: always flush DB metadata on framebuffer changes</li>
<li>st/glsl_to_tgsi: only the first (inner-most) array reference can be a 2D index</li>
<li>ac/surface: match Z and stencil tile config</li>
<li>glsl: fix glsl_struct_field size calculations for shader cache</li>
</ul>
<p>Ray Strode (1):</p>
<ul>
<li>gallivm: correct channel shift logic on big endian</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno: skip batch-cache for compute shaders</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>st/mesa: fix view template initialization in try_pbo_readpixels</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radeonsi: update dirty_level_mask before dispatching</li>
</ul>
<p>Timothy Arceri (9):</p>
<ul>
<li>glsl: allow NULL to be passed to encode_type_to_blob()</li>
<li>glsl: stop adding pointers from gl_shader_variable to the cache</li>
<li>glsl: stop adding pointers from glsl_struct_field to the cache</li>
<li>glsl: add has_uniform_storage() helper to shader cache</li>
<li>glsl: don't write uniform storage offset if there isn't one</li>
<li>glsl: always write a name/label string to the cache</li>
<li>compiler: move pointers to the start of shader_info</li>
<li>glsl: stop adding pointers from shader_info to the cache</li>
<li>glsl: stop adding pointers from bindless structs to the cache</li>
</ul>
</div>
</body>
</html>

203
docs/relnotes/17.2.2.html Normal file
View File

@@ -0,0 +1,203 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.2 Release Notes / October 2, 2017</h1>
<p>
Mesa 17.2.2 is a bug fix release which fixes bugs found since the 17.2.1 release.
</p>
<p>
Mesa 17.2.2 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
8242256f3243ed3f35184ed7bf0a9070439ccdf477a3bd9cfd2437c0b2f9bc7f mesa-17.2.2.tar.gz
cf522244d6a5a1ecde3fc00e7c96935253fe22f808f064cab98be6f3faa65782 mesa-17.2.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102573">Bug 102573</a> - fails to build on armel</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102844">Bug 102844</a> - memory leak with glDeleteProgram for shader program type GL_COMPUTE_SHADER</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102847">Bug 102847</a> - swr fail to build with llvm-5.0.0</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102904">Bug 102904</a> - piglit and gl45 cts linker tests regressed</li>
</ul>
<h2>Changes</h2>
<p>Alexandru-Liviu Prodea (1):</p>
<ul>
<li>Scons: Add LLVM 5.0 support</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: Check for GFX9 for 1D arrays in image_size intrinsic.</li>
</ul>
<p>Boris Brezillon (1):</p>
<ul>
<li>broadcom/vc4: Fix infinite retry in vc4_bo_alloc()</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv/nir: call opt_remove_phis after trivial continues.</li>
<li>ac/surface: handle S8 on gfx9</li>
<li>st/glsl-&gt;tgsi: fix u64 to bool comparisons.</li>
</ul>
<p>David Airlie (1):</p>
<ul>
<li>radv: add gfx9 scissor workaround</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.1</li>
<li>automake: enable libunwind in `make distcheck'</li>
</ul>
<p>Eric Anholt (4):</p>
<ul>
<li>broadcom/vc4: Fix use-after-free for flushing when writing to a texture.</li>
<li>broadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.</li>
<li>broadcom/vc4: Fix use-after-free when deleting a program.</li>
<li>broadcom/vc4: Keep pipe_sampler_view-&gt;texture matching the original texture.</li>
</ul>
<p>Gert Wollny (2):</p>
<ul>
<li>travis: force llvm-3.3 for "make Gallium ST Other"</li>
<li>travis: Add libunwind-dev to gallium/make builds</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>configure: check if -latomic is needed for __atomic_*</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>nv20: Fix GL_CLAMP</li>
</ul>
<p>Jason Ekstrand (6):</p>
<ul>
<li>i965/blorp: Set r8stencil_needs_update when writing stencil</li>
<li>vulkan/wsi/wayland: Stop printing out the DRM device</li>
<li>vulkan/wsi/wayland: Refactor wsi_wl_display code</li>
<li>vulkan/wsi/wayland: Stop caching Wayland displays</li>
<li>vulkan/wsi/wayland: Copy wl_proxy objects from oldSwapchain if available</li>
<li>vulkan/wsi/wayland: Return better error messages</li>
</ul>
<p>Juan A. Suarez Romero (4):</p>
<ul>
<li>cherry-ignore: add "radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug"</li>
<li>cherry-ignore: add "radv: Check for GFX9 for 1D arrays in image_size intrinsic."</li>
<li>cherry-ignore: add "radv: copy the number of viewports/scissors at pipeline bind time"</li>
<li>Update version to 17.2.2</li>
</ul>
<p>Józef Kucia (1):</p>
<ul>
<li>anv: Fix descriptors copying</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965/vec4: Actually handle atomic op intrinsics.</li>
<li>i965/vec4: Fix swizzles on atomic sources.</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>st/va/postproc: use video original size for postprocessing</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>etnaviv: fix 16bpp clears</li>
</ul>
<p>Matt Turner (2):</p>
<ul>
<li>util: Link libmesautil into u_atomic_test</li>
<li>util/u_atomic: Add implementation of __sync_val_compare_and_swap_8</li>
</ul>
<p>Nicolai Hähnle (9):</p>
<ul>
<li>radeonsi: workaround for gather4 on integer cube maps</li>
<li>amd/common: round cube array slice in ac_prepare_cube_coords</li>
<li>amd/common: add workaround for cube map array layer clamping</li>
<li>glsl/linker: fix output variable overlap check</li>
<li>radeonsi: fix array textures layer coordinate</li>
<li>radeonsi: set MIP_POINT_PRECLAMP to 0</li>
<li>amd/addrlib: fix missing va_end() after va_copy()</li>
<li>amd/common: move ac_build_phi from radeonsi</li>
<li>radeonsi: fix a regression in integer cube map handling</li>
</ul>
<p>Samuel Iglesias Gonsálvez (1):</p>
<ul>
<li>anv: fix viewport transformation for z component</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: fix saved compute state when doing statistics/occlusion queries</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>mesa: free current ComputeProgram state in _mesa_free_context_data</li>
</ul>
<p>Tim Rowley (1):</p>
<ul>
<li>swr/rast: remove llvm fence/atomics from generated files</li>
</ul>
<p>Tomasz Figa (1):</p>
<ul>
<li>egl/dri2: Implement swapInterval fallback in a conformant way</li>
</ul>
</div>
</body>
</html>

181
docs/relnotes/17.2.3.html Normal file
View File

@@ -0,0 +1,181 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.3 Release Notes / October 19, 2017</h1>
<p>
Mesa 17.2.3 is a bug fix release which fixes bugs found since the 17.2.2 release.
</p>
<p>
Mesa 17.2.3 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
fb305eecfeec1fd771fdc96fff973c51871f7bd35fd2bd56cacc27b4b8823220 mesa-17.2.3.tar.gz
a0b0ec8f7b24dd044d7ab30a8c7e6d3767521e245f88d4ed5dd93315dc56f837 mesa-17.2.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101832">Bug 101832</a> - [PATCH][regression][bisect] Xorg fails to start after f50aa21456d82c8cb6fbaa565835f1acc1720a5d</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102852">Bug 102852</a> - Scons: Support the new Scons 3.0.0</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102940">Bug 102940</a> - Regression: Vulkan KMS rendering crashes since 17.2</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Add R16G16B16A16_SNORM fast clear support</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>nir/spirv: Allow loop breaks in a switch body.</li>
<li>radv: Only set the MTYPE flags on GFX9+.</li>
</ul>
<p>Ben Crocker (4):</p>
<ul>
<li>gallivm: fix typo in debug_printf message</li>
<li>gallivm: allow additional llc options</li>
<li>gallivm/ppc64le: adjust VSX code generation control.</li>
<li>gallivm/ppc64le: allow environmental control of Altivec code generation</li>
</ul>
<p>Daniel Stone (2):</p>
<ul>
<li>egl/wayland: Check queryImage return for wl_buffer</li>
<li>egl/wayland: Don't use dmabuf with no modifiers</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>radv: emit fmuladd instead of fma to llvm.</li>
<li>radv: lower ffma in nir.</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>cherry-ignore: add "anv: Remove unreachable cases from isl_format_for_size"</li>
<li>cherry-ignore: add "anv/wsi: Allocate enough memory for the entire image"</li>
<li>swr/rast: do not crash on NULL strings returned by getenv</li>
<li>wayland-drm: use a copy of the wayland_drm_callbacks struct</li>
<li>eglmesaext: add forward declaration for struct wl_buffers</li>
<li>Update version to 17.2.3</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>scons: use python3-compatible print()</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nv50/ir: fix 64-bit integer shifts</li>
<li>nv50,nvc0: fix push hint logic in presence of a start offset</li>
</ul>
<p>Jason Ekstrand (6):</p>
<ul>
<li>intel/compiler: Don't cmod propagate into a saturated operation</li>
<li>intel/compiler: Don't propagate cmod into integer multiplies</li>
<li>glsl/blob: Return false from ensure_can_read on overrun</li>
<li>glsl/blob: Return false from grow_to_fit if we've ever failed</li>
<li>nir/opcodes: Fix constant-folding of ufind_msb</li>
<li>nir: Get rid of the variable on vote intrinsics</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.2</li>
</ul>
<p>Józef Kucia (3):</p>
<ul>
<li>anv: Fix vkCmdFillBuffer()</li>
<li>spirv: Fix SpvOpAtomicISub</li>
<li>anv: Do not assert() on VK_ATTACHMENT_UNUSED</li>
</ul>
<p>Leo Liu (3):</p>
<ul>
<li>st/va: use pipe transfer_map to map upload buffer</li>
<li>st/vdpau: don't re-allocate interlaced buffer with packed YUV format</li>
<li>st/va: don't re-allocate interlaced buffer with pakced format</li>
</ul>
<p>Lionel Landwerlin (4):</p>
<ul>
<li>intel: compiler: vec4: add missing default 0 lod</li>
<li>anv/cmd_buffer: fix push descriptors with set &gt; 0</li>
<li>anv/cmd_buffer: Reset state in cmd_buffer_destroy</li>
<li>anv: bo_cache: allow importing a BO larger than needed</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>mesa: fix texture updates for ATI_fragment_shader</li>
<li>st/mesa: don't use pipe_surface for passing information about EGLImage</li>
<li>glsl_to_tgsi: fix instruction order for bindless textures</li>
</ul>
<p>Nicolai Hähnle (14):</p>
<ul>
<li>st/glsl_to_tgsi: fix conditional assignments to packed shader outputs</li>
<li>amd/common: fix build_cube_select</li>
<li>radeonsi/gfx9: fix geometry shaders without output vertices</li>
<li>util/queue: fix a race condition in the fence code</li>
<li>glsl/lower_instruction: handle denorms and overflow in ldexp correctly</li>
<li>radeonsi: move current_rast_prim to r600_common_context</li>
<li>radeonsi: don't discard points and lines</li>
<li>radeonsi: deduce rast_prim correctly for tessellation point mode</li>
<li>radeonsi: fix maximum advertised point size / line width</li>
<li>st/mesa: don't clobber glGetInternalformat* buffer for GL_NUM_SAMPLE_COUNTS</li>
<li>st/glsl_to_tgsi: fix indirect access to 64-bit integer</li>
<li>st/glsl_to_tgsi: fix a use-after-free in merge_two_dsts</li>
<li>radeonsi: clamp depth comparison value only for fixed point formats</li>
<li>radeonsi: clamp border colors for upgraded depth textures</li>
</ul>
<p>Rob Clark (2):</p>
<ul>
<li>freedreno/a5xx: align height to GMEM</li>
<li>freedreno/a5xx: fix missing restore state</li>
</ul>
</div>
</body>
</html>

132
docs/relnotes/17.2.4.html Normal file
View File

@@ -0,0 +1,132 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.4 Release Notes / October 30, 2017</h1>
<p>
Mesa 17.2.4 is a bug fix release which fixes bugs found since the 17.2.3 release.
</p>
<p>
Mesa 17.2.4 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
cb266edc5cf7226219ebaf556ca2e03dff282e0324d20afd80423a5754d1272c mesa-17.2.4.tar.gz
5ba408fecd6e1132e5490eec1a2f04466214e4c65c8b89b331be844768c2e550 mesa-17.2.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102774">Bug 102774</a> - [BDW] [Bisected] Absolute constant buffers break VAAPI in mpv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103388">Bug 103388</a> - Linking libcltgsi.la (llvm/codegen/libclllvm_la-common.lo) fails with &quot;error: no match for 'operator-'&quot; with GCC-7, Mesa from Git and current LLVM revisions</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (8):</p>
<ul>
<li>cherry-ignore: configure.ac: rework llvm detection and handling</li>
<li>cherry-ignore: glsl: fix derived cs variables</li>
<li>cherry-ignore: added 17.3 nominations.</li>
<li>cherry-ignore: radv: Don't use vgpr indexing for outputs on GFX9.</li>
<li>cherry-ignore: radv: Disallow indirect outputs for GS on GFX9 as well.</li>
<li>cherry-ignore: mesa/bufferobj: don't double negate the range</li>
<li>cherry-ignore: broadcom/vc5: Propagate vc4 aliasing fix to vc5.</li>
<li>Update version to 17.2.4</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>ac/nir: Fix nir_texop_lod on GFX for 1D arrays.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>radv/image: bump all the offset to uint64_t.</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.3</li>
</ul>
<p>Henri Verbeet (1):</p>
<ul>
<li>vulkan/wsi: Free the event in x11_manage_fifo_queues().</li>
</ul>
<p>Jan Vesely (1):</p>
<ul>
<li>clover: Fix compilation after clang r315871</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>nir/intrinsics: Set the correct num_indices for load_output</li>
<li>intel/fs: Handle flag read/write aliasing in needs_src_copy</li>
<li>anv/pipeline: Call nir_lower_system_valaues after brw_preprocess_nir</li>
<li>intel/eu: Use EXECUTE_1 for JMPI</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Revert absolute mode for constant buffer pointers.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>Revert "mesa: fix texture updates for ATI_fragment_shader"</li>
</ul>
<p>Matthew Nicholls (1):</p>
<ul>
<li>ac/nir: generate correct instruction for atomic min/max on unsigned images</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>st/mesa: Initialize textures array in st_framebuffer_validate</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: add the draw count buffer to the list of buffers</li>
</ul>
<p>Stefan Schake (1):</p>
<ul>
<li>broadcom/vc4: Fix aliasing issue</li>
</ul>
</div>
</body>
</html>

156
docs/relnotes/17.2.5.html Normal file
View File

@@ -0,0 +1,156 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.5 Release Notes / November 10, 2017</h1>
<p>
Mesa 17.2.5 is a bug fix release which fixes bugs found since the 17.2.4 release.
</p>
<p>
Mesa 17.2.5 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
25b40e72fad64b096c2d8d6fe9579369954debe7970d4ad53e5033c7eec2918b mesa-17.2.5.tar.gz
7f7f914b7b9ea0b15f2d9d01a4375e311b0e90e55683b8e8a67ce8691eb1070f mesa-17.2.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97532">Bug 97532</a> - Regression: GLB 2.7 &amp; Glmark-2 GLES versions segfault due to linker precision error (259fc505) on dead variable</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102680">Bug 102680</a> - [OpenGL CTS] KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102809">Bug 102809</a> - Rust shadows(?) flash random colours</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103142">Bug 103142</a> - R600g+sb: optimizer apparently stuck in an endless loop</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (8):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.4</li>
<li>cherry-ignore: radv: copy indirect lowering settings from radeonsi</li>
<li>cherry-ignore: i965: fix blorp stage_prog_data-&gt;param leak</li>
<li>cherry-ignore: etnaviv: don't do resolve-in-place without valid TS</li>
<li>cherry-ignore: intel/fs: Alloc pull constants off mem_ctx</li>
<li>cherry-ignore: added 17.3 nominations.</li>
<li>cherry-ignore: automake: include git_sha1.h.in in release tarball</li>
<li>Update version to 17.2.5</li>
</ul>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Don't expose heaps with 0 memory.</li>
<li>radv: Don't use vgpr indexing for outputs on GFX9.</li>
<li>radv: Disallow indirect outputs for GS on GFX9 as well.</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>i915g: make gears run again.</li>
<li>radv: free attachments on end command buffer.</li>
<li>radv: add initial copy descriptor support. (v2)</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>vc4: fix release build</li>
</ul>
<p>Gert Wollny (1):</p>
<ul>
<li>r600/sb: bail out if prepare_alu_group() doesn't find a proper scheduling</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>spirv: Claim support for the simple memory model</li>
<li>i965/blorp: Use blorp_to_isl_format for src_isl_format in blit_miptrees</li>
<li>i965/blorp: Use more temporary isl_format variables</li>
<li>i965/miptree: Take an isl_format in render_aux_usage</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>mesa: Accept GL_BACK in get_fb0_attachment with ARB_ES3_1_compatibility.</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/video: add gfx9 offsets when rejoin the video surface</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>st/dri: don't expose modifiers in EGL if the driver doesn't implement them</li>
<li>ac/surface/gfx9: don't allow DCC for the smallest mipmap levels</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>i965: Check CCS_E compatibility for texture view rendering</li>
</ul>
<p>Neil Roberts (1):</p>
<ul>
<li>nir/opt_intrinsics: Fix values for gl_SubGroupG{e,t}MaskARB</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>amd/common/gfx9: workaround DCC corruption more conservatively</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>i965: unref push_const_bo in intelDestroyContext</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>radv: copy indirect lowering settings from radeonsi</li>
</ul>
<p>Tomasz Figa (1):</p>
<ul>
<li>glsl: Allow precision mismatch on dead data with GLSL ES 1.00</li>
</ul>
<p>Topi Pohjolainen (1):</p>
<ul>
<li>intel/compiler/gen9: Pixel shader header only workaround</li>
</ul>
</div>
</body>
</html>

187
docs/relnotes/17.2.6.html Normal file
View File

@@ -0,0 +1,187 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.6 Release Notes / November 25, 2017</h1>
<p>
Mesa 17.2.6 is a bug fix release which fixes bugs found since the 17.2.5 release.
</p>
<p>
Mesa 17.2.6 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
a9ed76702ffb14ad674ad48899f5c8c7e3a0f987911878a5dfdc4117dce5b415 mesa-17.2.6.tar.gz
6ad85224620330be26ab68c8fc78381b12b38b610ade2db8716b38faaa8f30de mesa-17.2.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100438">Bug 100438</a> - glsl/ir.cpp:1376: ir_dereference_variable::ir_dereference_variable(ir_variable*): Assertion `var != NULL' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102177">Bug 102177</a> - [SKL] ES31-CTS.core.sepshaderobjs.StateInteraction fails sporadically</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103115">Bug 103115</a> - [BSW BXT GLK] dEQP-VK.spirv_assembly.instruction.compute.sconvert.int32_to_int64</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103519">Bug 103519</a> - wayland egl apps crash on start with mesa 17.2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103529">Bug 103529</a> - [GM45] GPU hang with mpv fullscreen (bisected)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103628">Bug 103628</a> - [BXT, GLK, BSW] KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103787">Bug 103787</a> - [BDW,BSW] gpu hang on spec.arb_pipeline_statistics_query.arb_pipeline_statistics_query-comp</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (2):</p>
<ul>
<li>glx/drisw: Fix glXMakeCurrent(dpy, None, ctx)</li>
<li>glx/dri3: Fix passing renderType into glXCreateContext</li>
</ul>
<p>Alex Smith (2):</p>
<ul>
<li>spirv: Use correct type for sampled images</li>
<li>nir/spirv: tg4 requires a sampler</li>
</ul>
<p>Andres Gomez (14):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.5</li>
<li>cherry-ignore: intel/fs: Use a pure vertical stride for large register strides</li>
<li>cherry-ignore: intel/nir: Use the correct indirect lowering masks in link_shaders</li>
<li>cherry-ignore: intel/fs: Use the original destination region for int MUL lowering</li>
<li>cherry-ignore: intel/fs: refactors</li>
<li>cherry-ignore: r600/shader: reserve first register of vertex shader.</li>
<li>cherry-ignore: anv/cmd_buffer: Advance the address when initializing clear colors</li>
<li>cherry-ignore: anv/cmd_buffer: Take bo_offset into account in fast clear state addresses</li>
<li>cherry-ignore: i965: Mark BOs as external when we export their handle</li>
<li>cherry-ignore: added 17.3 nominations.</li>
<li>cherry-ignore: glsl: Fix typo fragement -&gt; fragment</li>
<li>cherry-ignore: egl: pass the dri2_dpy to the $plat_teardown functions</li>
<li>cherry-ignore: Revert "intel/fs: Use a pure vertical stride for large register strides"</li>
<li>Update version to 17.2.6</li>
</ul>
<p>Anuj Phogat (2):</p>
<ul>
<li>i965: Program DWord Length in MI_FLUSH_DW</li>
<li>i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DW</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>radv: Free syncobj with multiple imports.</li>
<li>radv: Free temporary syncobj after waiting on it.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>r600: fix isoline tess factor component swapping.</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/wayland: Add a fallback when fourcc query isn't supported</li>
</ul>
<p>Dylan Baker (1):</p>
<ul>
<li>autotools: Set C++ visibility flags on Intel</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>targets/opencl: don't hardcode the icd file install to /etc/...</li>
<li>configure.ac: loosen --enable-glvnd check to honour egl</li>
<li>configure.ac: require xcb* for the omx/va/... when using x11 platform</li>
</ul>
<p>George Barrett (1):</p>
<ul>
<li>glsl: Catch subscripted calls to undeclared subroutines</li>
</ul>
<p>Jason Ekstrand (9):</p>
<ul>
<li>intel/fs: Use ANY/ALL32 predicates in SIMD32</li>
<li>intel/fs: Use an explicit D type for vote any/all/eq intrinsics</li>
<li>intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all</li>
<li>intel/eu/reg: Add a subscript() helper</li>
<li>intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core</li>
<li>intel/fs: Fix integer multiplication lowering for src/dst hazards</li>
<li>intel/fs: Mark 64-bit values as being contiguous</li>
<li>intel/fs: Rework zero-length URB write handling</li>
<li>i965: Add stencil buffers to cache set regardless of stencil texturing</li>
</ul>
<p>Kenneth Graunke (5):</p>
<ul>
<li>i965: properly initialize brw-&gt;cs.base.stage to MESA_SHADER_COMPUTE</li>
<li>i965: Make L3 configuration atom listen for TCS/TES program updates.</li>
<li>intel/tools: Fix detection of enabled shader stages.</li>
<li>i965: Implement another VF cache invalidate workaround on Gen8+.</li>
<li>i965: Upload invariant state once at the start of the batch on Gen4-5.</li>
</ul>
<p>Matt Turner (2):</p>
<ul>
<li>i965/fs: Fix extract_i8/u8 to a 64-bit destination</li>
<li>i965/fs: Split all 32-&gt;64-bit MOVs on CHV, BXT, GLK</li>
</ul>
<p>Neil Roberts (1):</p>
<ul>
<li>glsl: Transform fb buffers are only active if a variable uses them</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>ddebug: fix use-after-free of streamout targets</li>
</ul>
<p>Tim Rowley (2):</p>
<ul>
<li>swr/rast: Use gather instruction for i32gather_ps on simd16/avx512</li>
<li>swr/rast: Faster emulated simd16 permute</li>
</ul>
<p>Timothy Arceri (3):</p>
<ul>
<li>glsl: drop cache_fallback</li>
<li>glsl: use the correct parent when allocating program data members</li>
<li>mesa: rework how we free gl_shader_program_data</li>
</ul>
</div>
</body>
</html>

247
docs/relnotes/17.2.7.html Normal file
View File

@@ -0,0 +1,247 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.7 Release Notes / December 14, 2017</h1>
<p>
Mesa 17.2.7 is a bug fix release which fixes bugs found since the 17.2.6 release.
</p>
<p>
Mesa 17.2.7 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
e8d837a1cd55014e636e9caf6c75cfbe1b3e4be9ab3fa125f5ef38398aa12e97 mesa-17.2.7.tar.gz
50cfdea8df55045797b4d0409591c04c784d9551c4da09b8178874dbe5a37a68 mesa-17.2.7.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94739">Bug 94739</a> - Mesa 11.1.2 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101378">Bug 101378</a> - interpolateAtSample check for input parameter is too strict</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102006">Bug 102006</a> - gstreamer vaapih264enc segfault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102435">Bug 102435</a> - [skl,kbl] [drm] GPU HANG: ecode 9:0:0x86df7cf9, in csgo_linux64 [4947], reason: Hang on rcs, action: reset</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102552">Bug 102552</a> - Null dereference due to not checking return value of util_format_description</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102677">Bug 102677</a> - [OpenGL CTS] KHR-GL45.CommonBugs.CommonBug_PerVertexValidation fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103098">Bug 103098</a> - [OpenGL CTS] KHR-GL45.enhanced_layouts.varying_structure_locations fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103227">Bug 103227</a> - [G965 G45 ILK] ES2-CTS.gtf.GL2ExtensionTests.texture_float.texture_float regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103393">Bug 103393</a> - glDispatchComputeGroupSizeARB : gl_GlobalInvocationID.x != gl_WorkGroupID.x * gl_LocalGroupSizeARB.x + gl_LocalInvocationID.x</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103412">Bug 103412</a> - gallium/wgl: Another fix to context creation without prior SetPixelFormat()</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103616">Bug 103616</a> - Increased difference from reference image in shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103626">Bug 103626</a> - [SNB] ES3-CTS.functional.shaders.precision</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103732">Bug 103732</a> - [swr] often gets stuck in piglit's glx-multi-context-single-window test</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103909">Bug 103909</a> - anv_allocator.c:113:1: error: static declaration of memfd_create follows non-static declaration</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103966">Bug 103966</a> - Mesa 17.2.5 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104119">Bug 104119</a> - radv: OpBitFieldInsert produces 0 with a loop counter for Insert</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104143">Bug 104143</a> - r600/sb: clobbers gl_Position -&gt; gl_FragCoord</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Add LLVM version to the device name string</li>
</ul>
<p>Andres Gomez (2):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.6</li>
<li>docs: remove bug 103626 from fix list as per 17.2.6</li>
</ul>
<p>Ben Crocker (2):</p>
<ul>
<li>docs/llvmpipe.html: Minor edits</li>
<li>docs/llvmpipe: document ppc64le as alternative architecture to x86.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>r600/sb: handle jump after target to end of program. (v2)</li>
</ul>
<p>Denis Pauk (1):</p>
<ul>
<li>gallium/{r600, radeonsi}: Fix segfault with color format (v2)</li>
</ul>
<p>Eduardo Lima Mitev (3):</p>
<ul>
<li>glsl_parser_extra: Add utility to copy symbols between symbol tables</li>
<li>glsl: Use the utility function to copy symbols between symbol tables</li>
<li>glsl/linker: Check that re-declared, inter-shader built-in blocks match</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>gl_table.py: add extern C guard for the generated glapitable.h</li>
<li>cherry-ignore: radeonsi: allow DMABUF exports for local buffers</li>
<li>Update version to 17.2.7</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>broadcom/vc4: Fix handling of GFXH-515 workaround with a start vertex count.</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>compiler: use NDEBUG to guard asserts</li>
</ul>
<p>Fabian Bieler (2):</p>
<ul>
<li>glsl: Match order of gl_LightSourceParameters elements.</li>
<li>glsl: Fix gl_NormalScale.</li>
</ul>
<p>Frank Richter (1):</p>
<ul>
<li>gallium/wgl: fix default pixel format issue</li>
</ul>
<p>George Kyriazis (1):</p>
<ul>
<li>swr: Handle resource across context changes</li>
</ul>
<p>Gert Wollny (2):</p>
<ul>
<li>r600: Emit EOP for more CF instruction types</li>
<li>r600/sb: do not convert if-blocks that contain indirect array access</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>glsl: fix derived cs variables</li>
</ul>
<p>James Legg (1):</p>
<ul>
<li>nir/opcodes: Fix constant-folding of bitfield_insert</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965: Disable regular fast-clears (CCS_D) on gen9+</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>glsl: add varying resources for arrays of complex types</li>
</ul>
<p>Julien Isorce (1):</p>
<ul>
<li>st/va: change frame_idx from array to hash table</li>
</ul>
<p>Kai Wasserbäch (1):</p>
<ul>
<li>docs: Point to apt.llvm.org for development snapshot packages</li>
</ul>
<p>Kenneth Graunke (3):</p>
<ul>
<li>meta: Initialize depth/clear values on declaration.</li>
<li>meta: Fix ClearTexture with GL_DEPTH_COMPONENT.</li>
<li>i965: Fix Smooth Point Enables.</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>radeonsi: fix layered DCC fast clear</li>
<li>radeonsi/gfx9: fix importing shared textures with DCC</li>
<li>radeonsi: flush the context after resource_copy_region for buffer exports</li>
</ul>
<p>Matt Turner (4):</p>
<ul>
<li>i965/fs: Handle negating immediates on MADs when propagating saturates</li>
<li>util: Fix SHA1 implementation on big endian</li>
<li>util: Fix disk_cache index calculation on big endian</li>
<li>i965/fs: Unpack count argument to 64-bit shift ops on Atom</li>
</ul>
<p>Nicolai Hähnle (3):</p>
<ul>
<li>radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check</li>
<li>glsl: allow any l-value of an input variable as interpolant in interpolateAt*</li>
<li>glsl: fix interpolateAtXxx(some_vec[idx], ...) with dynamic idx</li>
</ul>
<p>Pierre Moreau (1):</p>
<ul>
<li>nvc0/ir: Properly lower 64-bit shifts when the shift value is &gt;32</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>mesa/gles: adjust internal format in glTexSubImage2D error checks</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>glsl: get correct member type when processing xfb ifc arrays</li>
</ul>
<p>Vadym Shovkoplias (2):</p>
<ul>
<li>intel/blorp: Fix possible NULL pointer dereferencing</li>
<li>glx/dri3: Remove unused deviceName variable</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>anv: Check if memfd_create is already defined.</li>
</ul>
</div>
</body>
</html>

112
docs/relnotes/17.2.8.html Normal file
View File

@@ -0,0 +1,112 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.8 Release Notes / December 22, 2017</h1>
<p>
Mesa 17.2.8 is a bug fix release which fixes bugs found since the 17.2.7 release.
</p>
<p>
Mesa 17.2.8 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
c715c3a3d6fe26a69c096f573ec416e038a548f0405e3befedd5136517527a84 mesa-17.2.8.tar.gz
6e940345cceaadfd805d701ed2b956589fa77fe8c39991da30ed51ea6b9d095f mesa-17.2.8.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102710">Bug 102710</a> - vkCmdBlitImage with arrayLayers &gt; 1 fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103007">Bug 103007</a> - [OpenGL CTS] [HSW] KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103544">Bug 103544</a> - Graphical glitches r600 in game this war of mine linux native</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103579">Bug 103579</a> - Vertex shader causes compiler to crash in SPIRV-to-NIR</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (6):</p>
<ul>
<li>cherry-ignore: swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.</li>
<li>cherry-ignore: added 17.3 nominations.</li>
<li>cherry-ignore: radv: port merge tess info from anv</li>
<li>cherry-ignore: main: Clear shader program data whenever ProgramBinary is called</li>
<li>cherry-ignore: r600: set DX10_CLAMP for compute shader too</li>
<li>Update version to 17.2.8</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>spirv: Fix loading an entire block at once.</li>
<li>radv: Fix multi-layer blits.</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>xlib: call _mesa_warning() instead of fprintf()</li>
<li>gallium/aux: include nr_samples in util_resource_size() computation</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.7</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>i965/vec4: use a temp register to compute offsets for pull loads</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/vce: move destroy command before feedback command</li>
</ul>
<p>Matt Turner (2):</p>
<ul>
<li>util: Assume little endian in the absence of platform-specific handling</li>
<li>util: Add a SHA1 unit test program</li>
</ul>
<p>Roland Scheidegger (2):</p>
<ul>
<li>r600: use min_dx10/max_dx10 instead of min/max</li>
<li>r600: use DX10_CLAMP bit in shader setup</li>
</ul>
</div>
</body>
</html>

View File

@@ -31,14 +31,14 @@ extern "C" {
** This header is generated from the Khronos OpenGL / OpenGL ES XML
** API Registry. The current version of the Registry, generator scripts
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/egl
** http://www.khronos.org/registry/egl
**
** Khronos $Revision$ on $Date$
** Khronos $Git commit SHA1: a732b061e7 $ on $Git commit date: 2017-06-17 23:27:53 +0100 $
*/
#include <EGL/eglplatform.h>
/* Generated on date 20161230 */
/* Generated on date 20170627 */
/* Generated C header for:
* API: egl

View File

@@ -31,14 +31,14 @@ extern "C" {
** This header is generated from the Khronos OpenGL / OpenGL ES XML
** API Registry. The current version of the Registry, generator scripts
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/egl
** http://www.khronos.org/registry/egl
**
** Khronos $Revision$ on $Date$
** Khronos $Git commit SHA1: a732b061e7 $ on $Git commit date: 2017-06-17 23:27:53 +0100 $
*/
#include <EGL/eglplatform.h>
#define EGL_EGLEXT_VERSION 20161230
#define EGL_EGLEXT_VERSION 20170627
/* Generated C header for:
* API: egl
@@ -133,6 +133,15 @@ EGLAPI EGLint EGLAPIENTRY eglLabelObjectKHR (EGLDisplay display, EGLenum objectT
#endif
#endif /* EGL_KHR_debug */
#ifndef EGL_KHR_display_reference
#define EGL_KHR_display_reference 1
#define EGL_TRACK_REFERENCES_KHR 0x3352
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYDISPLAYATTRIBKHRPROC) (EGLDisplay dpy, EGLint name, EGLAttrib *value);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglQueryDisplayAttribKHR (EGLDisplay dpy, EGLint name, EGLAttrib *value);
#endif
#endif /* EGL_KHR_display_reference */
#ifndef EGL_KHR_fence_sync
#define EGL_KHR_fence_sync 1
typedef khronos_utime_nanoseconds_t EGLTimeKHR;
@@ -555,6 +564,11 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurfacePointerANGLE (EGLDisplay dpy, EGLSu
#define EGL_DISCARD_SAMPLES_ARM 0x3286
#endif /* EGL_ARM_pixmap_multisample_discard */
#ifndef EGL_EXT_bind_to_front
#define EGL_EXT_bind_to_front 1
#define EGL_FRONT_BUFFER_EXT 0x3464
#endif /* EGL_EXT_bind_to_front */
#ifndef EGL_EXT_buffer_age
#define EGL_EXT_buffer_age 1
#define EGL_BUFFER_AGE_EXT 0x313D
@@ -564,6 +578,30 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurfacePointerANGLE (EGLDisplay dpy, EGLSu
#define EGL_EXT_client_extensions 1
#endif /* EGL_EXT_client_extensions */
#ifndef EGL_EXT_compositor
#define EGL_EXT_compositor 1
#define EGL_PRIMARY_COMPOSITOR_CONTEXT_EXT 0x3460
#define EGL_EXTERNAL_REF_ID_EXT 0x3461
#define EGL_COMPOSITOR_DROP_NEWEST_FRAME_EXT 0x3462
#define EGL_COMPOSITOR_KEEP_NEWEST_FRAME_EXT 0x3463
typedef EGLBoolean (EGLAPIENTRYP PFNEGLCOMPOSITORSETCONTEXTLISTEXTPROC) (const EGLint *external_ref_ids, EGLint num_entries);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLCOMPOSITORSETCONTEXTATTRIBUTESEXTPROC) (EGLint external_ref_id, const EGLint *context_attributes, EGLint num_entries);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLCOMPOSITORSETWINDOWLISTEXTPROC) (EGLint external_ref_id, const EGLint *external_win_ids, EGLint num_entries);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLCOMPOSITORSETWINDOWATTRIBUTESEXTPROC) (EGLint external_win_id, const EGLint *window_attributes, EGLint num_entries);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLCOMPOSITORBINDTEXWINDOWEXTPROC) (EGLint external_win_id);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLCOMPOSITORSETSIZEEXTPROC) (EGLint external_win_id, EGLint width, EGLint height);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLCOMPOSITORSWAPPOLICYEXTPROC) (EGLint external_win_id, EGLint policy);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglCompositorSetContextListEXT (const EGLint *external_ref_ids, EGLint num_entries);
EGLAPI EGLBoolean EGLAPIENTRY eglCompositorSetContextAttributesEXT (EGLint external_ref_id, const EGLint *context_attributes, EGLint num_entries);
EGLAPI EGLBoolean EGLAPIENTRY eglCompositorSetWindowListEXT (EGLint external_ref_id, const EGLint *external_win_ids, EGLint num_entries);
EGLAPI EGLBoolean EGLAPIENTRY eglCompositorSetWindowAttributesEXT (EGLint external_win_id, const EGLint *window_attributes, EGLint num_entries);
EGLAPI EGLBoolean EGLAPIENTRY eglCompositorBindTexWindowEXT (EGLint external_win_id);
EGLAPI EGLBoolean EGLAPIENTRY eglCompositorSetSizeEXT (EGLint external_win_id, EGLint width, EGLint height);
EGLAPI EGLBoolean EGLAPIENTRY eglCompositorSwapPolicyEXT (EGLint external_win_id, EGLint policy);
#endif
#endif /* EGL_EXT_compositor */
#ifndef EGL_EXT_create_context_robustness
#define EGL_EXT_create_context_robustness 1
#define EGL_CONTEXT_OPENGL_ROBUST_ACCESS_EXT 0x30BF
@@ -618,6 +656,21 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQueryDisplayAttribEXT (EGLDisplay dpy, EGLint a
#define EGL_GL_COLORSPACE_BT2020_PQ_EXT 0x3340
#endif /* EGL_EXT_gl_colorspace_bt2020_pq */
#ifndef EGL_EXT_gl_colorspace_display_p3
#define EGL_EXT_gl_colorspace_display_p3 1
#define EGL_GL_COLORSPACE_DISPLAY_P3_EXT 0x3363
#endif /* EGL_EXT_gl_colorspace_display_p3 */
#ifndef EGL_EXT_gl_colorspace_display_p3_linear
#define EGL_EXT_gl_colorspace_display_p3_linear 1
#define EGL_GL_COLORSPACE_DISPLAY_P3_LINEAR_EXT 0x3362
#endif /* EGL_EXT_gl_colorspace_display_p3_linear */
#ifndef EGL_EXT_gl_colorspace_scrgb
#define EGL_EXT_gl_colorspace_scrgb 1
#define EGL_GL_COLORSPACE_SCRGB_EXT 0x3351
#endif /* EGL_EXT_gl_colorspace_scrgb */
#ifndef EGL_EXT_gl_colorspace_scrgb_linear
#define EGL_EXT_gl_colorspace_scrgb_linear 1
#define EGL_GL_COLORSPACE_SCRGB_LINEAR_EXT 0x3350
@@ -670,6 +723,13 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQueryDmaBufModifiersEXT (EGLDisplay dpy, EGLint
#endif
#endif /* EGL_EXT_image_dma_buf_import_modifiers */
#ifndef EGL_EXT_image_implicit_sync_control
#define EGL_EXT_image_implicit_sync_control 1
#define EGL_IMPORT_SYNC_TYPE_EXT 0x3470
#define EGL_IMPORT_IMPLICIT_SYNC_EXT 0x3471
#define EGL_IMPORT_EXPLICIT_SYNC_EXT 0x3472
#endif /* EGL_EXT_image_implicit_sync_control */
#ifndef EGL_EXT_multiview_window
#define EGL_EXT_multiview_window 1
#define EGL_MULTIVIEW_VIEW_COUNT_EXT 0x3134
@@ -769,6 +829,12 @@ EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerOutputEXT (EGLDisplay dpy, EGLStr
#endif
#endif /* EGL_EXT_stream_consumer_egloutput */
#ifndef EGL_EXT_surface_CTA861_3_metadata
#define EGL_EXT_surface_CTA861_3_metadata 1
#define EGL_CTA861_3_MAX_CONTENT_LIGHT_LEVEL_EXT 0x3360
#define EGL_CTA861_3_MAX_FRAME_AVERAGE_LEVEL_EXT 0x3361
#endif /* EGL_EXT_surface_CTA861_3_metadata */
#ifndef EGL_EXT_surface_SMPTE2086_metadata
#define EGL_EXT_surface_SMPTE2086_metadata 1
#define EGL_SMPTE2086_DISPLAY_PRIMARY_RX_EXT 0x3341
@@ -781,6 +847,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerOutputEXT (EGLDisplay dpy, EGLStr
#define EGL_SMPTE2086_WHITE_POINT_Y_EXT 0x3348
#define EGL_SMPTE2086_MAX_LUMINANCE_EXT 0x3349
#define EGL_SMPTE2086_MIN_LUMINANCE_EXT 0x334A
#define EGL_METADATA_SCALING_EXT 50000
#endif /* EGL_EXT_surface_SMPTE2086_metadata */
#ifndef EGL_EXT_swap_buffers_with_damage

View File

@@ -70,6 +70,7 @@ typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYWAYLANDBUFFERWL) (EGLDisplay dpy, st
#ifndef EGL_WL_create_wayland_buffer_from_image
#define EGL_WL_create_wayland_buffer_from_image 1
struct wl_buffer;
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI struct wl_buffer * EGLAPIENTRY eglCreateWaylandBufferFromImageWL(EGLDisplay dpy, EGLImageKHR image);
#endif

View File

@@ -50,9 +50,22 @@ extern "C" {
#ifndef GL_VERSION_ES_CM_1_0
#define GL_VERSION_ES_CM_1_0 1
/*
* XXX: Temporary fix; needs to be reverted as part of the next
* header update.
* For more details:
* https://github.com/KhronosGroup/OpenGL-Registry/pull/76
* https://lists.freedesktop.org/archives/mesa-dev/2017-June/161647.html
*/
#include <KHR/khrplatform.h>
typedef khronos_int8_t GLbyte;
typedef khronos_float_t GLclampf;
typedef short GLshort;
typedef unsigned short GLushort;
typedef void GLvoid;
typedef unsigned int GLenum;
#include <KHR/khrplatform.h>
typedef khronos_float_t GLfloat;
typedef khronos_int32_t GLfixed;
typedef unsigned int GLuint;

View File

@@ -104,7 +104,6 @@ GL_API void GL_APIENTRY glBlendEquationOES (GLenum mode);
#ifndef GL_OES_byte_coordinates
#define GL_OES_byte_coordinates 1
typedef khronos_int8_t GLbyte;
#endif /* GL_OES_byte_coordinates */
#ifndef GL_OES_compressed_ETC1_RGB8_sub_texture
@@ -128,7 +127,6 @@ typedef khronos_int8_t GLbyte;
#ifndef GL_OES_draw_texture
#define GL_OES_draw_texture 1
typedef short GLshort;
#define GL_TEXTURE_CROP_RECT_OES 0x8B9D
typedef void (GL_APIENTRYP PFNGLDRAWTEXSOESPROC) (GLshort x, GLshort y, GLshort z, GLshort width, GLshort height);
typedef void (GL_APIENTRYP PFNGLDRAWTEXIOESPROC) (GLint x, GLint y, GLint z, GLint width, GLint height);
@@ -409,7 +407,6 @@ GL_API GLbitfield GL_APIENTRY glQueryMatrixxOES (GLfixed *mantissa, GLint *expon
#ifndef GL_OES_single_precision
#define GL_OES_single_precision 1
typedef khronos_float_t GLclampf;
typedef void (GL_APIENTRYP PFNGLCLEARDEPTHFOESPROC) (GLclampf depth);
typedef void (GL_APIENTRYP PFNGLCLIPPLANEFOESPROC) (GLenum plane, const GLfloat *equation);
typedef void (GL_APIENTRYP PFNGLDEPTHRANGEFOESPROC) (GLclampf n, GLclampf f);

View File

@@ -26,7 +26,7 @@
/* Khronos platform-specific types and definitions.
*
* $Revision: 23298 $ on $Date: 2013-09-30 17:07:13 -0700 (Mon, 30 Sep 2013) $
* $Revision: 32517 $ on $Date: 2016-03-11 02:41:19 -0800 (Fri, 11 Mar 2016) $
*
* Adopters may modify this file to suit their platform. Adopters are
* encouraged to submit platform specific modifications to the Khronos
@@ -98,11 +98,7 @@
* This precedes the return type of the function in the function prototype.
*/
#if defined(_WIN32) && !defined(__SCITECH_SNAP__)
# if defined(KHRONOS_DLL_EXPORTS)
# define KHRONOS_APICALL __declspec(dllexport)
# else
# define KHRONOS_APICALL __declspec(dllimport)
# endif
# define KHRONOS_APICALL __declspec(dllimport)
#elif defined (__SYMBIAN32__)
# define KHRONOS_APICALL IMPORT_C
#elif (defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 303) \
@@ -231,7 +227,7 @@ typedef signed short int khronos_int16_t;
typedef unsigned short int khronos_uint16_t;
/*
* Types that differ between LLP64 and LP64 architectures - in LLP64,
* Types that differ between LLP64 and LP64 architectures - in LLP64,
* pointers are 64 bits, but 'long' is still 32 bits. Win64 appears
* to be the only LLP64 architecture in current use.
*/

View File

@@ -157,6 +157,19 @@ def check_header(env, header):
env = conf.Finish()
return have_header
def check_functions(env, functions):
'''Check if all of the functions exist'''
conf = SCons.Script.Configure(env)
have_functions = True
for function in functions:
if not conf.CheckFunc(function):
have_functions = False
env = conf.Finish()
return have_functions
def check_prog(env, prog):
"""Check whether this program exists."""
@@ -244,16 +257,16 @@ def generate(env):
# Backwards compatability with the debug= profile= options
if env['build'] == 'debug':
if not env['debug']:
print 'scons: warning: debug option is deprecated and will be removed eventually; use instead'
print
print ' scons build=release'
print
print('scons: warning: debug option is deprecated and will be removed eventually; use instead')
print('')
print(' scons build=release')
print('')
env['build'] = 'release'
if env['profile']:
print 'scons: warning: profile option is deprecated and will be removed eventually; use instead'
print
print ' scons build=profile'
print
print('scons: warning: profile option is deprecated and will be removed eventually; use instead')
print('')
print(' scons build=profile')
print('')
env['build'] = 'profile'
if False:
# Enforce SConscripts to use the new build variable
@@ -287,7 +300,7 @@ def generate(env):
env['build_dir'] = build_dir
env.SConsignFile(os.path.join(build_dir, '.sconsign'))
if 'SCONS_CACHE_DIR' in os.environ:
print 'scons: Using build cache in %s.' % (os.environ['SCONS_CACHE_DIR'],)
print('scons: Using build cache in %s.' % (os.environ['SCONS_CACHE_DIR'],))
env.CacheDir(os.environ['SCONS_CACHE_DIR'])
env['CONFIGUREDIR'] = os.path.join(build_dir, 'conf')
env['CONFIGURELOG'] = os.path.join(os.path.abspath(build_dir), 'config.log')
@@ -339,6 +352,9 @@ def generate(env):
if check_header(env, 'xlocale.h'):
cppdefines += ['HAVE_XLOCALE_H']
if check_functions(env, ['strtod_l', 'strtof_l']):
cppdefines += ['HAVE_STRTOD_L']
if platform == 'windows':
cppdefines += [
'WIN32',
@@ -369,8 +385,8 @@ def generate(env):
if env['embedded']:
cppdefines += ['PIPE_SUBSYSTEM_EMBEDDED']
if env['texture_float']:
print 'warning: Floating-point textures enabled.'
print 'warning: Please consult docs/patents.txt with your lawyer before building Mesa.'
print('warning: Floating-point textures enabled.')
print('warning: Please consult docs/patents.txt with your lawyer before building Mesa.')
cppdefines += ['TEXTURE_FLOAT_ENABLED']
env.Append(CPPDEFINES = cppdefines)

View File

@@ -68,13 +68,13 @@ def generate(env):
if env['platform'] == 'windows':
# XXX: There is no llvm-config on Windows, so assume a standard layout
if llvm_dir is None:
print 'scons: LLVM environment variable must be specified when building for windows'
print('scons: LLVM environment variable must be specified when building for windows')
return
# Try to determine the LLVM version from llvm/Config/config.h
llvm_config = os.path.join(llvm_dir, 'include/llvm/Config/llvm-config.h')
if not os.path.exists(llvm_config):
print 'scons: could not find %s' % llvm_config
print('scons: could not find %s' % llvm_config)
return
llvm_version_major_re = re.compile(r'^#define LLVM_VERSION_MAJOR ([0-9]+)')
llvm_version_minor_re = re.compile(r'^#define LLVM_VERSION_MINOR ([0-9]+)')
@@ -92,10 +92,10 @@ def generate(env):
llvm_version = distutils.version.LooseVersion('%s.%s' % (llvm_version_major, llvm_version_minor))
if llvm_version is None:
print 'scons: could not determine the LLVM version from %s' % llvm_config
print('scons: could not determine the LLVM version from %s' % llvm_config)
return
if llvm_version < distutils.version.LooseVersion(required_llvm_version):
print 'scons: LLVM version %s found, but %s is required' % (llvm_version, required_llvm_version)
print('scons: LLVM version %s found, but %s is required' % (llvm_version, required_llvm_version))
return
env.Prepend(CPPPATH = [os.path.join(llvm_dir, 'include')])
@@ -104,7 +104,26 @@ def generate(env):
])
env.Prepend(LIBPATH = [os.path.join(llvm_dir, 'lib')])
# LIBS should match the output of `llvm-config --libs engine mcjit bitwriter x86asmprinter irreader`
if llvm_version >= distutils.version.LooseVersion('4.0'):
if llvm_version >= distutils.version.LooseVersion('5.0'):
env.Prepend(LIBS = [
'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',
'LLVMDebugInfoCodeView', 'LLVMCodeGen',
'LLVMScalarOpts', 'LLVMInstCombine',
'LLVMTransformUtils',
'LLVMBitWriter', 'LLVMX86Desc',
'LLVMMCDisassembler', 'LLVMX86Info',
'LLVMX86AsmPrinter', 'LLVMX86Utils',
'LLVMMCJIT', 'LLVMExecutionEngine', 'LLVMTarget',
'LLVMAnalysis', 'LLVMProfileData',
'LLVMRuntimeDyld', 'LLVMObject', 'LLVMMCParser',
'LLVMBitReader', 'LLVMMC', 'LLVMCore',
'LLVMSupport',
'LLVMIRReader', 'LLVMAsmParser',
'LLVMDemangle', 'LLVMGlobalISel', 'LLVMDebugInfoMSF',
'LLVMBinaryFormat',
])
elif llvm_version >= distutils.version.LooseVersion('4.0'):
env.Prepend(LIBS = [
'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',
@@ -212,14 +231,14 @@ def generate(env):
else:
llvm_config = os.environ.get('LLVM_CONFIG', 'llvm-config')
if not env.Detect(llvm_config):
print 'scons: %s script not found' % llvm_config
print('scons: %s script not found' % llvm_config)
return
llvm_version = env.backtick('%s --version' % llvm_config).rstrip()
llvm_version = distutils.version.LooseVersion(llvm_version)
if llvm_version < distutils.version.LooseVersion(required_llvm_version):
print 'scons: LLVM version %s found, but %s is required' % (llvm_version, required_llvm_version)
print('scons: LLVM version %s found, but %s is required' % (llvm_version, required_llvm_version))
return
try:
@@ -245,13 +264,13 @@ def generate(env):
env.ParseConfig('%s --system-libs' % llvm_config)
env.Append(CXXFLAGS = ['-std=c++11'])
except OSError:
print 'scons: llvm-config version %s failed' % llvm_version
print('scons: llvm-config version %s failed' % llvm_version)
return
assert llvm_version is not None
env['llvm'] = True
print 'scons: Found LLVM version %s' % llvm_version
print('scons: Found LLVM version %s' % llvm_version)
env['LLVM_VERSION'] = llvm_version
# Define HAVE_LLVM macro with the major/minor version number (e.g., 0x0206 for 2.6)

View File

@@ -28,7 +28,7 @@ def write_git_sha1_h_file(filename):
try:
subprocess.Popen(args, stdout=f).wait()
except:
print "Warning: exception in write_git_sha1_h_file()"
print("Warning: exception in write_git_sha1_h_file()")
return
if not os.path.exists(filename) or not filecmp.cmp(tempfile, filename):

View File

@@ -54,9 +54,7 @@ LOCAL_C_INCLUDES := \
$(call generated-sources-dir-for,STATIC_LIBRARIES,libmesa_nir,,)/nir \
$(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary \
$(intermediates)/common \
external/llvm/include \
external/llvm/device/include
$(intermediates)/common
LOCAL_EXPORT_C_INCLUDE_DIRS := \
$(LOCAL_PATH)/common

View File

@@ -216,20 +216,16 @@ VOID Object::DebugPrint(
#if DEBUG
if (m_client.callbacks.debugPrint != NULL)
{
va_list ap;
va_start(ap, pDebugString);
ADDR_DEBUGPRINT_INPUT debugPrintInput = {0};
debugPrintInput.size = sizeof(ADDR_DEBUGPRINT_INPUT);
debugPrintInput.pDebugString = const_cast<CHAR*>(pDebugString);
debugPrintInput.hClient = m_client.handle;
va_copy(debugPrintInput.ap, ap);
va_start(debugPrintInput.ap, pDebugString);
m_client.callbacks.debugPrint(&debugPrintInput);
va_end(ap);
va_end(debugPrintInput.ap);
}
#endif
}

View File

@@ -109,7 +109,7 @@ static void parse_relocs(Elf *elf, Elf_Data *relocs, Elf_Data *symbols,
}
}
void ac_elf_read(const char *elf_data, unsigned elf_size,
bool ac_elf_read(const char *elf_data, unsigned elf_size,
struct ac_shader_binary *binary)
{
char *elf_buffer;
@@ -118,6 +118,7 @@ void ac_elf_read(const char *elf_data, unsigned elf_size,
Elf_Data *symbols = NULL, *relocs = NULL;
size_t section_str_index;
unsigned symbol_sh_link = 0;
bool success = true;
/* One of the libelf implementations
* (http://www.mr511.de/software/english.htm) requires calling
@@ -137,7 +138,8 @@ void ac_elf_read(const char *elf_data, unsigned elf_size,
GElf_Shdr section_header;
if (gelf_getshdr(section, &section_header) != &section_header) {
fprintf(stderr, "Failed to read ELF section header\n");
return;
success = false;
break;
}
name = elf_strptr(elf, section_str_index, section_header.sh_name);
if (!strcmp(name, ".text")) {
@@ -148,6 +150,11 @@ void ac_elf_read(const char *elf_data, unsigned elf_size,
} else if (!strcmp(name, ".AMDGPU.config")) {
section_data = elf_getdata(section, section_data);
binary->config_size = section_data->d_size;
if (!binary->config_size) {
fprintf(stderr, ".AMDGPU.config is empty!\n");
success = false;
break;
}
binary->config = MALLOC(binary->config_size * sizeof(unsigned char));
memcpy(binary->config, section_data->d_buf, binary->config_size);
} else if (!strcmp(name, ".AMDGPU.disasm")) {
@@ -186,6 +193,7 @@ void ac_elf_read(const char *elf_data, unsigned elf_size,
binary->global_symbol_count = 1;
binary->config_size_per_symbol = binary->config_size;
}
return success;
}
const unsigned char *ac_shader_binary_config_start(

View File

@@ -83,7 +83,7 @@ struct ac_shader_config {
* Parse the elf binary stored in \p elf_data and create a
* ac_shader_binary object.
*/
void ac_elf_read(const char *elf_data, unsigned elf_size,
bool ac_elf_read(const char *elf_data, unsigned elf_size,
struct ac_shader_binary *binary);
/**

View File

@@ -45,10 +45,13 @@
* The caller is responsible for initializing ctx::module and ctx::builder.
*/
void
ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context)
ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context,
enum chip_class chip_class)
{
LLVMValueRef args[1];
ctx->chip_class = chip_class;
ctx->context = context;
ctx->module = NULL;
ctx->builder = NULL;
@@ -176,6 +179,20 @@ void ac_build_type_name_for_intr(LLVMTypeRef type, char *buf, unsigned bufsize)
}
}
/**
* Helper function that builds an LLVM IR PHI node and immediately adds
* incoming edges.
*/
LLVMValueRef
ac_build_phi(struct ac_llvm_context *ctx, LLVMTypeRef type,
unsigned count_incoming, LLVMValueRef *values,
LLVMBasicBlockRef *blocks)
{
LLVMValueRef phi = LLVMBuildPhi(ctx->builder, type, "");
LLVMAddIncoming(phi, values, blocks, count_incoming);
return phi;
}
LLVMValueRef
ac_build_gather_values_extended(struct ac_llvm_context *ctx,
LLVMValueRef *values,
@@ -290,15 +307,15 @@ static void build_cube_select(LLVMBuilderRef builder,
is_ma_x = LLVMBuildAnd(builder, is_not_ma_z, LLVMBuildNot(builder, is_ma_y, ""), "");
/* Select sc */
tmp = LLVMBuildSelect(builder, is_ma_z, coords[2], coords[0], "");
tmp = LLVMBuildSelect(builder, is_ma_x, coords[2], coords[0], "");
sgn = LLVMBuildSelect(builder, is_ma_y, LLVMConstReal(f32, 1.0),
LLVMBuildSelect(builder, is_ma_x, sgn_ma,
LLVMBuildSelect(builder, is_ma_z, sgn_ma,
LLVMBuildFNeg(builder, sgn_ma, ""), ""), "");
out_st[0] = LLVMBuildFMul(builder, tmp, sgn, "");
/* Select tc */
tmp = LLVMBuildSelect(builder, is_ma_y, coords[2], coords[1], "");
sgn = LLVMBuildSelect(builder, is_ma_y, LLVMBuildFNeg(builder, sgn_ma, ""),
sgn = LLVMBuildSelect(builder, is_ma_y, sgn_ma,
LLVMConstReal(f32, -1.0), "");
out_st[1] = LLVMBuildFMul(builder, tmp, sgn, "");
@@ -312,7 +329,7 @@ static void build_cube_select(LLVMBuilderRef builder,
void
ac_prepare_cube_coords(struct ac_llvm_context *ctx,
bool is_deriv, bool is_array,
bool is_deriv, bool is_array, bool is_lod,
LLVMValueRef *coords_arg,
LLVMValueRef *derivs_arg)
{
@@ -322,6 +339,38 @@ ac_prepare_cube_coords(struct ac_llvm_context *ctx,
LLVMValueRef coords[3];
LLVMValueRef invma;
if (is_array && !is_lod) {
LLVMValueRef tmp = coords_arg[3];
tmp = ac_build_intrinsic(ctx, "llvm.rint.f32", ctx->f32, &tmp, 1, 0);
/* Section 8.9 (Texture Functions) of the GLSL 4.50 spec says:
*
* "For Array forms, the array layer used will be
*
* max(0, min(d1, floor(layer+0.5)))
*
* where d is the depth of the texture array and layer
* comes from the component indicated in the tables below.
* Workaroudn for an issue where the layer is taken from a
* helper invocation which happens to fall on a different
* layer due to extrapolation."
*
* VI and earlier attempt to implement this in hardware by
* clamping the value of coords[2] = (8 * layer) + face.
* Unfortunately, this means that the we end up with the wrong
* face when clamping occurs.
*
* Clamp the layer earlier to work around the issue.
*/
if (ctx->chip_class <= VI) {
LLVMValueRef ge0;
ge0 = LLVMBuildFCmp(builder, LLVMRealOGE, tmp, ctx->f32_0, "");
tmp = LLVMBuildSelect(builder, ge0, tmp, ctx->f32_0, "");
}
coords_arg[3] = tmp;
}
build_cube_intrinsic(ctx, coords_arg, &selcoords);
invma = ac_build_intrinsic(ctx, "llvm.fabs.f32",
@@ -795,21 +844,21 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
bool has_ds_bpermute,
uint32_t mask,
int idx,
LLVMValueRef lds,
LLVMValueRef val)
{
LLVMValueRef thread_id, tl, trbl, tl_tid, trbl_tid, args[2];
LLVMValueRef tl, trbl, args[2];
LLVMValueRef result;
thread_id = ac_get_thread_id(ctx);
tl_tid = LLVMBuildAnd(ctx->builder, thread_id,
LLVMConstInt(ctx->i32, mask, false), "");
trbl_tid = LLVMBuildAdd(ctx->builder, tl_tid,
LLVMConstInt(ctx->i32, idx, false), "");
if (has_ds_bpermute) {
LLVMValueRef thread_id, tl_tid, trbl_tid;
thread_id = ac_get_thread_id(ctx);
tl_tid = LLVMBuildAnd(ctx->builder, thread_id,
LLVMConstInt(ctx->i32, mask, false), "");
trbl_tid = LLVMBuildAdd(ctx->builder, tl_tid,
LLVMConstInt(ctx->i32, idx, false), "");
args[0] = LLVMBuildMul(ctx->builder, tl_tid,
LLVMConstInt(ctx->i32, 4, false), "");
args[1] = val;
@@ -827,15 +876,42 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_CONVERGENT);
} else {
LLVMValueRef store_ptr, load_ptr0, load_ptr1;
uint32_t masks[2];
store_ptr = ac_build_gep0(ctx, lds, thread_id);
load_ptr0 = ac_build_gep0(ctx, lds, tl_tid);
load_ptr1 = ac_build_gep0(ctx, lds, trbl_tid);
switch (mask) {
case AC_TID_MASK_TOP_LEFT:
masks[0] = 0x8000;
if (idx == 1)
masks[1] = 0x8055;
else
masks[1] = 0x80aa;
LLVMBuildStore(ctx->builder, val, store_ptr);
tl = LLVMBuildLoad(ctx->builder, load_ptr0, "");
trbl = LLVMBuildLoad(ctx->builder, load_ptr1, "");
break;
case AC_TID_MASK_TOP:
masks[0] = 0x8044;
masks[1] = 0x80ee;
break;
case AC_TID_MASK_LEFT:
masks[0] = 0x80a0;
masks[1] = 0x80f5;
break;
}
args[0] = val;
args[1] = LLVMConstInt(ctx->i32, masks[0], false);
tl = ac_build_intrinsic(ctx,
"llvm.amdgcn.ds.swizzle", ctx->i32,
args, 2,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_CONVERGENT);
args[1] = LLVMConstInt(ctx->i32, masks[1], false);
trbl = ac_build_intrinsic(ctx,
"llvm.amdgcn.ds.swizzle", ctx->i32,
args, 2,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_CONVERGENT);
}
tl = LLVMBuildBitCast(ctx->builder, tl, ctx->f32, "");

View File

@@ -28,6 +28,8 @@
#include <stdbool.h>
#include <llvm-c/TargetMachine.h>
#include "amd_family.h"
#ifdef __cplusplus
extern "C" {
#endif
@@ -61,10 +63,13 @@ struct ac_llvm_context {
unsigned fpmath_md_kind;
LLVMValueRef fpmath_md_2p5_ulp;
LLVMValueRef empty_md;
enum chip_class chip_class;
};
void
ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context);
ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context,
enum chip_class chip_class);
LLVMValueRef
ac_build_intrinsic(struct ac_llvm_context *ctx, const char *name,
@@ -73,6 +78,11 @@ ac_build_intrinsic(struct ac_llvm_context *ctx, const char *name,
void ac_build_type_name_for_intr(LLVMTypeRef type, char *buf, unsigned bufsize);
LLVMValueRef
ac_build_phi(struct ac_llvm_context *ctx, LLVMTypeRef type,
unsigned count_incoming, LLVMValueRef *values,
LLVMBasicBlockRef *blocks);
LLVMValueRef
ac_build_gather_values_extended(struct ac_llvm_context *ctx,
LLVMValueRef *values,
@@ -91,7 +101,7 @@ ac_build_fdiv(struct ac_llvm_context *ctx,
void
ac_prepare_cube_coords(struct ac_llvm_context *ctx,
bool is_deriv, bool is_array,
bool is_deriv, bool is_array, bool is_lod,
LLVMValueRef *coords_arg,
LLVMValueRef *derivs_arg);
@@ -173,7 +183,6 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
bool has_ds_bpermute,
uint32_t mask,
int idx,
LLVMValueRef lds,
LLVMValueRef val);
#define AC_SENDMSG_GS 2

View File

@@ -1178,7 +1178,17 @@ static LLVMValueRef emit_find_lsb(struct ac_llvm_context *ctx,
*/
LLVMConstInt(ctx->i1, 1, false),
};
return ac_build_intrinsic(ctx, "llvm.cttz.i32", ctx->i32, params, 2, AC_FUNC_ATTR_READNONE);
LLVMValueRef lsb = ac_build_intrinsic(ctx, "llvm.cttz.i32", ctx->i32,
params, 2,
AC_FUNC_ATTR_READNONE);
/* TODO: We need an intrinsic to skip this conditional. */
/* Check for zero: */
return LLVMBuildSelect(ctx->builder, LLVMBuildICmp(ctx->builder,
LLVMIntEQ, src0,
ctx->i32_0, ""),
LLVMConstInt(ctx->i32, -1, 0), lsb, "");
}
static LLVMValueRef emit_ifind_msb(struct ac_llvm_context *ctx,
@@ -1305,7 +1315,6 @@ static LLVMValueRef emit_f2f16(struct nir_to_llvm_context *ctx,
src0 = to_float(&ctx->ac, src0);
result = LLVMBuildFPTrunc(ctx->builder, src0, ctx->f16, "");
/* TODO SI/CIK options here */
if (ctx->options->chip_class >= VI) {
LLVMValueRef args[2];
/* Check if the result is a denormal - and flush to 0 if so. */
@@ -1319,7 +1328,22 @@ static LLVMValueRef emit_f2f16(struct nir_to_llvm_context *ctx,
if (ctx->options->chip_class >= VI)
result = LLVMBuildSelect(ctx->builder, cond, ctx->f32zero, result, "");
else {
/* for SI/CIK */
/* 0x38800000 is smallest half float value (2^-14) in 32-bit float,
* so compare the result and flush to 0 if it's smaller.
*/
LLVMValueRef temp, cond2;
temp = emit_intrin_1f_param(&ctx->ac, "llvm.fabs",
ctx->f32, result);
cond = LLVMBuildFCmp(ctx->builder, LLVMRealUGT,
LLVMBuildBitCast(ctx->builder, LLVMConstInt(ctx->i32, 0x38800000, false), ctx->f32, ""),
temp, "");
cond2 = LLVMBuildFCmp(ctx->builder, LLVMRealUNE,
temp, ctx->f32zero, "");
cond = LLVMBuildAnd(ctx->builder, cond, cond2, "");
result = LLVMBuildSelect(ctx->builder, cond, ctx->f32zero, result, "");
}
return result;
}
@@ -1443,11 +1467,6 @@ static LLVMValueRef emit_ddxy(struct nir_to_llvm_context *ctx,
int idx;
LLVMValueRef result;
if (!ctx->lds && !ctx->has_ds_bpermute)
ctx->lds = LLVMAddGlobalInAddressSpace(ctx->module,
LLVMArrayType(ctx->i32, 64),
"ddxy_lds", LOCAL_ADDR_SPACE);
if (op == nir_op_fddx_fine || op == nir_op_fddx)
mask = AC_TID_MASK_LEFT;
else if (op == nir_op_fddy_fine || op == nir_op_fddy)
@@ -1464,7 +1483,7 @@ static LLVMValueRef emit_ddxy(struct nir_to_llvm_context *ctx,
idx = 2;
result = ac_build_ddxy(&ctx->ac, ctx->has_ds_bpermute,
mask, idx, ctx->lds,
mask, idx,
src0);
return result;
}
@@ -1742,7 +1761,7 @@ static void visit_alu(struct nir_to_llvm_context *ctx, const nir_alu_instr *inst
result);
break;
case nir_op_ffma:
result = emit_intrin_3f_param(&ctx->ac, "llvm.fma",
result = emit_intrin_3f_param(&ctx->ac, "llvm.fmuladd",
to_float_type(&ctx->ac, def_type), src[0], src[1], src[2]);
break;
case nir_op_ibitfield_extract:
@@ -1779,10 +1798,12 @@ static void visit_alu(struct nir_to_llvm_context *ctx, const nir_alu_instr *inst
break;
case nir_op_i2f32:
case nir_op_i2f64:
src[0] = to_integer(&ctx->ac, src[0]);
result = LLVMBuildSIToFP(ctx->builder, src[0], to_float_type(&ctx->ac, def_type), "");
break;
case nir_op_u2f32:
case nir_op_u2f64:
src[0] = to_integer(&ctx->ac, src[0]);
result = LLVMBuildUIToFP(ctx->builder, src[0], to_float_type(&ctx->ac, def_type), "");
break;
case nir_op_f2f64:
@@ -1793,6 +1814,7 @@ static void visit_alu(struct nir_to_llvm_context *ctx, const nir_alu_instr *inst
break;
case nir_op_u2u32:
case nir_op_u2u64:
src[0] = to_integer(&ctx->ac, src[0]);
if (get_elem_bits(&ctx->ac, LLVMTypeOf(src[0])) < get_elem_bits(&ctx->ac, def_type))
result = LLVMBuildZExt(ctx->builder, src[0], def_type, "");
else
@@ -1800,6 +1822,7 @@ static void visit_alu(struct nir_to_llvm_context *ctx, const nir_alu_instr *inst
break;
case nir_op_i2i32:
case nir_op_i2i64:
src[0] = to_integer(&ctx->ac, src[0]);
if (get_elem_bits(&ctx->ac, LLVMTypeOf(src[0])) < get_elem_bits(&ctx->ac, def_type))
result = LLVMBuildSExt(ctx->builder, src[0], def_type, "");
else
@@ -1809,18 +1832,25 @@ static void visit_alu(struct nir_to_llvm_context *ctx, const nir_alu_instr *inst
result = emit_bcsel(&ctx->ac, src[0], src[1], src[2]);
break;
case nir_op_find_lsb:
src[0] = to_integer(&ctx->ac, src[0]);
result = emit_find_lsb(&ctx->ac, src[0]);
break;
case nir_op_ufind_msb:
src[0] = to_integer(&ctx->ac, src[0]);
result = emit_ufind_msb(&ctx->ac, src[0]);
break;
case nir_op_ifind_msb:
src[0] = to_integer(&ctx->ac, src[0]);
result = emit_ifind_msb(&ctx->ac, src[0]);
break;
case nir_op_uadd_carry:
src[0] = to_integer(&ctx->ac, src[0]);
src[1] = to_integer(&ctx->ac, src[1]);
result = emit_uint_carry(&ctx->ac, "llvm.uadd.with.overflow.i32", src[0], src[1]);
break;
case nir_op_usub_borrow:
src[0] = to_integer(&ctx->ac, src[0]);
src[1] = to_integer(&ctx->ac, src[1]);
result = emit_uint_carry(&ctx->ac, "llvm.usub.with.overflow.i32", src[0], src[1]);
break;
case nir_op_b2f:
@@ -1833,15 +1863,20 @@ static void visit_alu(struct nir_to_llvm_context *ctx, const nir_alu_instr *inst
result = emit_b2i(&ctx->ac, src[0]);
break;
case nir_op_i2b:
src[0] = to_integer(&ctx->ac, src[0]);
result = emit_i2b(&ctx->ac, src[0]);
break;
case nir_op_fquantize2f16:
result = emit_f2f16(ctx, src[0]);
break;
case nir_op_umul_high:
src[0] = to_integer(&ctx->ac, src[0]);
src[1] = to_integer(&ctx->ac, src[1]);
result = emit_umul_high(&ctx->ac, src[0], src[1]);
break;
case nir_op_imul_high:
src[0] = to_integer(&ctx->ac, src[0]);
src[1] = to_integer(&ctx->ac, src[1]);
result = emit_imul_high(&ctx->ac, src[0], src[1]);
break;
case nir_op_pack_half_2x16:
@@ -2158,7 +2193,7 @@ static LLVMValueRef build_tex_intrinsic(struct nir_to_llvm_context *ctx,
break;
}
if (instr->op == nir_texop_tg4) {
if (instr->op == nir_texop_tg4 && ctx->options->chip_class <= VI) {
enum glsl_base_type stype = glsl_get_sampler_result_type(instr->texture->var->type);
if (stype == GLSL_TYPE_UINT || stype == GLSL_TYPE_INT) {
return radv_lower_gather4_integer(ctx, args, instr);
@@ -3274,13 +3309,13 @@ static LLVMValueRef get_image_coords(struct nir_to_llvm_context *ctx,
int count;
enum glsl_sampler_dim dim = glsl_get_sampler_dim(type);
bool is_array = glsl_sampler_type_is_array(type);
bool add_frag_pos = (dim == GLSL_SAMPLER_DIM_SUBPASS ||
dim == GLSL_SAMPLER_DIM_SUBPASS_MS);
bool is_ms = (dim == GLSL_SAMPLER_DIM_MS ||
dim == GLSL_SAMPLER_DIM_SUBPASS_MS);
count = image_type_to_components_count(dim,
glsl_sampler_type_is_array(type));
bool gfx9_1d = ctx->options->chip_class >= GFX9 && dim == GLSL_SAMPLER_DIM_1D;
count = image_type_to_components_count(dim, is_array);
if (is_ms) {
LLVMValueRef fmask_load_address[3];
@@ -3288,7 +3323,7 @@ static LLVMValueRef get_image_coords(struct nir_to_llvm_context *ctx,
fmask_load_address[0] = LLVMBuildExtractElement(ctx->builder, src0, masks[0], "");
fmask_load_address[1] = LLVMBuildExtractElement(ctx->builder, src0, masks[1], "");
if (glsl_sampler_type_is_array(type))
if (is_array)
fmask_load_address[2] = LLVMBuildExtractElement(ctx->builder, src0, masks[2], "");
else
fmask_load_address[2] = NULL;
@@ -3303,7 +3338,7 @@ static LLVMValueRef get_image_coords(struct nir_to_llvm_context *ctx,
sample_index,
get_sampler_desc(ctx, instr->variables[0], DESC_FMASK));
}
if (count == 1) {
if (count == 1 && !gfx9_1d) {
if (instr->src[0].ssa->num_components)
res = LLVMBuildExtractElement(ctx->builder, src0, masks[0], "");
else
@@ -3313,13 +3348,22 @@ static LLVMValueRef get_image_coords(struct nir_to_llvm_context *ctx,
if (is_ms)
count--;
for (chan = 0; chan < count; ++chan) {
coords[chan] = LLVMBuildExtractElement(ctx->builder, src0, masks[chan], "");
coords[chan] = llvm_extract_elem(ctx, src0, chan);
}
if (add_frag_pos) {
for (chan = 0; chan < count; ++chan)
coords[chan] = LLVMBuildAdd(ctx->builder, coords[chan], LLVMBuildFPToUI(ctx->builder, ctx->frag_pos[chan], ctx->i32, ""), "");
}
if (gfx9_1d) {
if (is_array) {
coords[2] = coords[1];
coords[1] = ctx->ac.i32_0;
} else
coords[1] = ctx->ac.i32_0;
count++;
}
if (is_ms) {
coords[count] = sample_index;
count++;
@@ -3400,7 +3444,10 @@ static void visit_image_store(struct nir_to_llvm_context *ctx,
char intrinsic_name[64];
const nir_variable *var = instr->variables[0]->var;
const struct glsl_type *type = glsl_without_array(var->type);
LLVMValueRef glc = ctx->i1false;
bool force_glc = ctx->options->chip_class == SI;
if (force_glc)
glc = ctx->i1true;
if (ctx->stage == MESA_SHADER_FRAGMENT)
ctx->shader_info->fs.writes_memory = true;
@@ -3410,7 +3457,7 @@ static void visit_image_store(struct nir_to_llvm_context *ctx,
params[2] = LLVMBuildExtractElement(ctx->builder, get_src(ctx, instr->src[0]),
LLVMConstInt(ctx->i32, 0, false), ""); /* vindex */
params[3] = LLVMConstInt(ctx->i32, 0, false); /* voffset */
params[4] = ctx->i1false; /* glc */
params[4] = glc; /* glc */
params[5] = ctx->i1false; /* slc */
ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.buffer.store.format.v4f32", ctx->voidt,
params, 6, 0);
@@ -3418,7 +3465,6 @@ static void visit_image_store(struct nir_to_llvm_context *ctx,
bool is_da = glsl_sampler_type_is_array(type) ||
glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE;
LLVMValueRef da = is_da ? ctx->i1true : ctx->i1false;
LLVMValueRef glc = ctx->i1false;
LLVMValueRef slc = ctx->i1false;
params[0] = to_float(&ctx->ac, get_src(ctx, instr->src[2]));
@@ -3453,7 +3499,7 @@ static void visit_image_store(struct nir_to_llvm_context *ctx,
static LLVMValueRef visit_image_atomic(struct nir_to_llvm_context *ctx,
const nir_intrinsic_instr *instr)
{
LLVMValueRef params[6];
LLVMValueRef params[7];
int param_count = 0;
const nir_variable *var = instr->variables[0]->var;
@@ -3465,15 +3511,17 @@ static LLVMValueRef visit_image_atomic(struct nir_to_llvm_context *ctx,
if (ctx->stage == MESA_SHADER_FRAGMENT)
ctx->shader_info->fs.writes_memory = true;
bool is_unsigned = glsl_get_sampler_result_type(type) == GLSL_TYPE_UINT;
switch (instr->intrinsic) {
case nir_intrinsic_image_atomic_add:
atomic_name = "add";
break;
case nir_intrinsic_image_atomic_min:
atomic_name = "smin";
atomic_name = is_unsigned ? "umin" : "smin";
break;
case nir_intrinsic_image_atomic_max:
atomic_name = "smax";
atomic_name = is_unsigned ? "umax" : "smax";
break;
case nir_intrinsic_image_atomic_and:
atomic_name = "and";
@@ -3554,14 +3602,23 @@ static LLVMValueRef visit_image_size(struct nir_to_llvm_context *ctx,
res = ac_build_image_opcode(&ctx->ac, &args);
LLVMValueRef two = LLVMConstInt(ctx->i32, 2, false);
if (glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE &&
glsl_sampler_type_is_array(type)) {
LLVMValueRef two = LLVMConstInt(ctx->i32, 2, false);
LLVMValueRef six = LLVMConstInt(ctx->i32, 6, false);
LLVMValueRef z = LLVMBuildExtractElement(ctx->builder, res, two, "");
z = LLVMBuildSDiv(ctx->builder, z, six, "");
res = LLVMBuildInsertElement(ctx->builder, res, z, two, "");
}
if (ctx->options->chip_class >= GFX9 &&
glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_1D &&
glsl_sampler_type_is_array(type)) {
LLVMValueRef layers = LLVMBuildExtractElement(ctx->builder, res, two, "");
res = LLVMBuildInsertElement(ctx->builder, res, layers,
ctx->ac.i32_1, "");
}
return res;
}
@@ -4418,36 +4475,50 @@ static void visit_tex(struct nir_to_llvm_context *ctx, nir_tex_instr *instr)
/* pack derivatives */
if (ddx || ddy) {
int num_src_deriv_channels, num_dest_deriv_channels;
switch (instr->sampler_dim) {
case GLSL_SAMPLER_DIM_3D:
case GLSL_SAMPLER_DIM_CUBE:
num_deriv_comp = 3;
num_src_deriv_channels = 3;
num_dest_deriv_channels = 3;
break;
case GLSL_SAMPLER_DIM_2D:
default:
num_src_deriv_channels = 2;
num_dest_deriv_channels = 2;
num_deriv_comp = 2;
break;
case GLSL_SAMPLER_DIM_1D:
num_deriv_comp = 1;
num_src_deriv_channels = 1;
if (ctx->options->chip_class >= GFX9) {
num_dest_deriv_channels = 2;
num_deriv_comp = 2;
} else {
num_dest_deriv_channels = 1;
num_deriv_comp = 1;
}
break;
}
for (unsigned i = 0; i < num_deriv_comp; i++) {
for (unsigned i = 0; i < num_src_deriv_channels; i++) {
derivs[i] = to_float(&ctx->ac, llvm_extract_elem(ctx, ddx, i));
derivs[num_deriv_comp + i] = to_float(&ctx->ac, llvm_extract_elem(ctx, ddy, i));
derivs[num_dest_deriv_channels + i] = to_float(&ctx->ac, llvm_extract_elem(ctx, ddy, i));
}
for (unsigned i = num_src_deriv_channels; i < num_dest_deriv_channels; i++) {
derivs[i] = ctx->ac.f32_0;
derivs[num_dest_deriv_channels + i] = ctx->ac.f32_0;
}
}
if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE && coord) {
if (instr->is_array && instr->op != nir_texop_lod)
coords[3] = apply_round_slice(ctx, coords[3]);
for (chan = 0; chan < instr->coord_components; chan++)
coords[chan] = to_float(&ctx->ac, coords[chan]);
if (instr->coord_components == 3)
coords[3] = LLVMGetUndef(ctx->f32);
ac_prepare_cube_coords(&ctx->ac,
instr->op == nir_texop_txd, instr->is_array,
coords, derivs);
instr->op == nir_texop_lod, coords, derivs);
if (num_deriv_comp)
num_deriv_comp--;
}
@@ -4475,6 +4546,25 @@ static void visit_tex(struct nir_to_llvm_context *ctx, nir_tex_instr *instr)
}
address[count++] = coords[2];
}
if (ctx->options->chip_class >= GFX9) {
LLVMValueRef filler;
if (instr->op == nir_texop_txf)
filler = ctx->ac.i32_0;
else
filler = LLVMConstReal(ctx->f32, 0.5);
if (instr->sampler_dim == GLSL_SAMPLER_DIM_1D) {
/* No nir_texop_lod, because it does not take a slice
* even with array textures. */
if (instr->is_array && instr->op != nir_texop_lod ) {
address[count] = address[count - 1];
address[count - 1] = filler;
count++;
} else
address[count++] = filler;
}
}
}
/* Pack LOD */
@@ -4569,6 +4659,14 @@ static void visit_tex(struct nir_to_llvm_context *ctx, nir_tex_instr *instr)
LLVMValueRef z = LLVMBuildExtractElement(ctx->builder, result, two, "");
z = LLVMBuildSDiv(ctx->builder, z, six, "");
result = LLVMBuildInsertElement(ctx->builder, result, z, two, "");
} else if (ctx->options->chip_class >= GFX9 &&
instr->op == nir_texop_txs &&
instr->sampler_dim == GLSL_SAMPLER_DIM_1D &&
instr->is_array) {
LLVMValueRef two = LLVMConstInt(ctx->i32, 2, false);
LLVMValueRef layers = LLVMBuildExtractElement(ctx->builder, result, two, "");
result = LLVMBuildInsertElement(ctx->builder, result, layers,
ctx->ac.i32_1, "");
} else if (instr->dest.ssa.num_components != 4)
result = trim_vector(ctx, result, instr->dest.ssa.num_components);
@@ -5178,6 +5276,7 @@ si_llvm_init_export_args(struct nir_to_llvm_context *ctx,
unsigned index = target - V_008DFC_SQ_EXP_MRT;
unsigned col_format = (ctx->options->key.fs.col_format >> (4 * index)) & 0xf;
bool is_int8 = (ctx->options->key.fs.is_int8 >> index) & 1;
bool is_int10 = (ctx->options->key.fs.is_int10 >> index) & 1;
switch(col_format) {
case V_028714_SPI_SHADER_ZERO:
@@ -5255,11 +5354,13 @@ si_llvm_init_export_args(struct nir_to_llvm_context *ctx,
break;
case V_028714_SPI_SHADER_UINT16_ABGR: {
LLVMValueRef max = LLVMConstInt(ctx->i32, is_int8 ? 255 : 65535, 0);
LLVMValueRef max_rgb = LLVMConstInt(ctx->i32,
is_int8 ? 255 : is_int10 ? 1023 : 65535, 0);
LLVMValueRef max_alpha = !is_int10 ? max_rgb : LLVMConstInt(ctx->i32, 3, 0);
for (unsigned chan = 0; chan < 4; chan++) {
val[chan] = to_integer(&ctx->ac, values[chan]);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntULT, val[chan], max);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntULT, val[chan], chan == 3 ? max_alpha : max_rgb);
}
args->compr = 1;
@@ -5269,14 +5370,18 @@ si_llvm_init_export_args(struct nir_to_llvm_context *ctx,
}
case V_028714_SPI_SHADER_SINT16_ABGR: {
LLVMValueRef max = LLVMConstInt(ctx->i32, is_int8 ? 127 : 32767, 0);
LLVMValueRef min = LLVMConstInt(ctx->i32, is_int8 ? -128 : -32768, 0);
LLVMValueRef max_rgb = LLVMConstInt(ctx->i32,
is_int8 ? 127 : is_int10 ? 511 : 32767, 0);
LLVMValueRef min_rgb = LLVMConstInt(ctx->i32,
is_int8 ? -128 : is_int10 ? -512 : -32768, 0);
LLVMValueRef max_alpha = !is_int10 ? max_rgb : ctx->i32one;
LLVMValueRef min_alpha = !is_int10 ? min_rgb : LLVMConstInt(ctx->i32, -2, 0);
/* Clamp. */
for (unsigned chan = 0; chan < 4; chan++) {
val[chan] = to_integer(&ctx->ac, values[chan]);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSLT, val[chan], max);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSGT, val[chan], min);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSLT, val[chan], chan == 3 ? max_alpha : max_rgb);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSGT, val[chan], chan == 3 ? min_alpha : min_rgb);
}
args->compr = 1;
@@ -5367,11 +5472,11 @@ handle_vs_outputs_post(struct nir_to_llvm_context *ctx,
ctx->outputs[radeon_llvm_reg_index_soa(VARYING_SLOT_VIEWPORT, 0)], "");
}
uint32_t mask = ((outinfo->writes_pointsize == true ? 1 : 0) |
(outinfo->writes_layer == true ? 4 : 0) |
(outinfo->writes_viewport_index == true ? 8 : 0));
if (mask) {
pos_args[1].enabled_channels = mask;
if (outinfo->writes_pointsize ||
outinfo->writes_layer ||
outinfo->writes_viewport_index) {
pos_args[1].enabled_channels = ((outinfo->writes_pointsize == true ? 1 : 0) |
(outinfo->writes_layer == true ? 4 : 0));
pos_args[1].valid_mask = 0;
pos_args[1].done = 0;
pos_args[1].target = V_008DFC_SQ_EXP_POS + 1;
@@ -5385,8 +5490,26 @@ handle_vs_outputs_post(struct nir_to_llvm_context *ctx,
pos_args[1].out[0] = psize_value;
if (outinfo->writes_layer == true)
pos_args[1].out[2] = layer_value;
if (outinfo->writes_viewport_index == true)
pos_args[1].out[3] = viewport_index_value;
if (outinfo->writes_viewport_index == true) {
if (ctx->options->chip_class >= GFX9) {
/* GFX9 has the layer in out.z[10:0] and the viewport
* index in out.z[19:16].
*/
LLVMValueRef v = viewport_index_value;
v = to_integer(&ctx->ac, v);
v = LLVMBuildShl(ctx->builder, v,
LLVMConstInt(ctx->i32, 16, false),
"");
v = LLVMBuildOr(ctx->builder, v,
to_integer(&ctx->ac, pos_args[1].out[2]), "");
pos_args[1].out[2] = to_float(&ctx->ac, v);
pos_args[1].enabled_channels |= 1 << 2;
} else {
pos_args[1].out[3] = viewport_index_value;
pos_args[1].enabled_channels |= 1 << 3;
}
}
}
for (i = 0; i < 4; i++) {
if (pos_args[i].out[0])
@@ -5815,10 +5938,11 @@ si_export_mrt_z(struct nir_to_llvm_context *ctx,
args.enabled_channels |= 0x4;
}
/* SI (except OLAND) has a bug that it only looks
/* SI (except OLAND and HAINAN) has a bug that it only looks
* at the X writemask component. */
if (ctx->options->chip_class == SI &&
ctx->options->family != CHIP_OLAND)
ctx->options->family != CHIP_OLAND &&
ctx->options->family != CHIP_HAINAN)
args.enabled_channels |= 0x1;
ac_build_export(&ctx->ac, &args);
@@ -6041,7 +6165,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
ctx.context = LLVMContextCreate();
ctx.module = LLVMModuleCreateWithNameInContext("shader", ctx.context);
ac_llvm_context_init(&ctx.ac, ctx.context);
ac_llvm_context_init(&ctx.ac, ctx.context, options->chip_class);
ctx.ac.module = ctx.module;
ctx.has_ds_bpermute = ctx.options->chip_class >= VI;
@@ -6375,7 +6499,7 @@ void ac_create_gs_copy_shader(LLVMTargetMachineRef tm,
ctx.options = options;
ctx.shader_info = shader_info;
ac_llvm_context_init(&ctx.ac, ctx.context);
ac_llvm_context_init(&ctx.ac, ctx.context, options->chip_class);
ctx.ac.module = ctx.module;
ctx.is_gs_copy_shader = true;

View File

@@ -57,6 +57,7 @@ struct ac_tcs_variant_key {
struct ac_fs_variant_key {
uint32_t col_format;
uint32_t is_int8;
uint32_t is_int10;
};
union ac_shader_variant_key {

View File

@@ -257,6 +257,18 @@ static int gfx6_compute_level(ADDR_HANDLE addrlib,
AddrSurfInfoIn->width = u_minify(config->info.width, level);
AddrSurfInfoIn->height = u_minify(config->info.height, level);
/* Make GFX6 linear surfaces compatible with GFX9 for hybrid graphics,
* because GFX9 needs linear alignment of 256 bytes.
*/
if (config->info.levels == 1 &&
AddrSurfInfoIn->tileMode == ADDR_TM_LINEAR_ALIGNED &&
AddrSurfInfoIn->bpp) {
unsigned alignment = 256 / (AddrSurfInfoIn->bpp / 8);
assert(util_is_power_of_two(AddrSurfInfoIn->bpp));
AddrSurfInfoIn->width = align(AddrSurfInfoIn->width, alignment);
}
if (config->is_3d)
AddrSurfInfoIn->numSlices = u_minify(config->info.depth, level);
else if (config->is_cube)
@@ -541,15 +553,35 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
AddrSurfInfoIn.flags.noStencil = (surf->flags & RADEON_SURF_SBUFFER) == 0;
AddrSurfInfoIn.flags.compressZ = AddrSurfInfoIn.flags.depth;
/* noStencil = 0 can result in a depth part that is incompatible with
* mipmapped texturing. So set noStencil = 1 when mipmaps are requested (in
* this case, we may end up setting stencil_adjusted).
/* On CI/VI, the DB uses the same pitch and tile mode (except tilesplit)
* for Z and stencil. This can cause a number of problems which we work
* around here:
*
* TODO: update addrlib to a newer version, remove this, and
* use flags.matchStencilTileCfg = 1 as an alternative fix.
* - a depth part that is incompatible with mipmapped texturing
* - at least on Stoney, entirely incompatible Z/S aspects (e.g.
* incorrect tiling applied to the stencil part, stencil buffer
* memory accesses that go out of bounds) even without mipmapping
*
* Some piglit tests that are prone to different types of related
* failures:
* ./bin/ext_framebuffer_multisample-upsample 2 stencil
* ./bin/framebuffer-blit-levels {draw,read} stencil
* ./bin/ext_framebuffer_multisample-unaligned-blit N {depth,stencil} {msaa,upsample,downsample}
* ./bin/fbo-depth-array fs-writes-{depth,stencil} / {depth,stencil}-{clear,layered-clear,draw}
* ./bin/depthstencil-render-miplevels 1024 d=s=z24_s8
*/
if (config->info.levels > 1)
int stencil_tile_idx = -1;
if (AddrSurfInfoIn.flags.depth && !AddrSurfInfoIn.flags.noStencil &&
(config->info.levels > 1 || info->family == CHIP_STONEY)) {
/* Compute stencilTileIdx that is compatible with the (depth)
* tileIdx. This degrades the depth surface if necessary to
* ensure that a matching stencilTileIdx exists. */
AddrSurfInfoIn.flags.matchStencilTileCfg = 1;
/* Keep the depth mip-tail compatible with texturing. */
AddrSurfInfoIn.flags.noStencil = 1;
}
/* Set preferred macrotile parameters. This is usually required
* for shared resources. This is for 2D tiling only. */
@@ -631,12 +663,33 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
if (level > 0)
continue;
/* Check that we actually got a TC-compatible HTILE if
* we requested it (only for level 0, since we're not
* supporting HTILE on higher mip levels anyway). */
assert(AddrSurfInfoOut.tcCompatible ||
!AddrSurfInfoIn.flags.tcCompatible ||
AddrSurfInfoIn.flags.matchStencilTileCfg);
if (AddrSurfInfoIn.flags.matchStencilTileCfg) {
if (!AddrSurfInfoOut.tcCompatible) {
AddrSurfInfoIn.flags.tcCompatible = 0;
surf->flags &= ~RADEON_SURF_TC_COMPATIBLE_HTILE;
}
AddrSurfInfoIn.flags.matchStencilTileCfg = 0;
AddrSurfInfoIn.tileIndex = AddrSurfInfoOut.tileIndex;
stencil_tile_idx = AddrSurfInfoOut.stencilTileIdx;
assert(stencil_tile_idx >= 0);
}
gfx6_surface_settings(info, &AddrSurfInfoOut, surf);
}
}
/* Calculate texture layout information for stencil. */
if (surf->flags & RADEON_SURF_SBUFFER) {
AddrSurfInfoIn.tileIndex = stencil_tile_idx;
AddrSurfInfoIn.bpp = 8;
AddrSurfInfoIn.flags.depth = 0;
AddrSurfInfoIn.flags.stencil = 1;
@@ -835,9 +888,11 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
in->numSamples == 1) {
ADDR2_COMPUTE_DCCINFO_INPUT din = {0};
ADDR2_COMPUTE_DCCINFO_OUTPUT dout = {0};
ADDR2_META_MIP_INFO meta_mip_info[RADEON_SURF_MAX_LEVELS] = {};
din.size = sizeof(ADDR2_COMPUTE_DCCINFO_INPUT);
dout.size = sizeof(ADDR2_COMPUTE_DCCINFO_OUTPUT);
dout.pMipInfo = meta_mip_info;
din.dccKeyFlags.pipeAligned = 1;
din.dccKeyFlags.rbAligned = 1;
@@ -861,6 +916,39 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
surf->u.gfx9.dcc_pitch_max = dout.pitch - 1;
surf->dcc_size = dout.dccRamSize;
surf->dcc_alignment = dout.dccRamBaseAlign;
surf->num_dcc_levels = in->numMipLevels;
/* Disable DCC for levels that are in the mip tail.
*
* There are two issues that this is intended to
* address:
*
* 1. Multiple mip levels may share a cache line. This
* can lead to corruption when switching between
* rendering to different mip levels because the
* RBs don't maintain coherency.
*
* 2. Texturing with metadata after rendering sometimes
* fails with corruption, probably for a similar
* reason.
*
* Working around these issues for all levels in the
* mip tail may be overly conservative, but it's what
* Vulkan does.
*
* Alternative solutions that also work but are worse:
* - Disable DCC entirely.
* - Flush TC L2 after rendering.
*/
for (unsigned i = 0; i < in->numMipLevels; i++) {
if (meta_mip_info[i].inMiptail) {
surf->num_dcc_levels = i;
break;
}
}
if (!surf->num_dcc_levels)
surf->dcc_size = 0;
}
/* FMASK */
@@ -997,6 +1085,11 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
case RADEON_SURF_MODE_1D:
case RADEON_SURF_MODE_2D:
if (surf->flags & RADEON_SURF_IMPORTED) {
AddrSurfInfoIn.swizzleMode = surf->u.gfx9.surf.swizzle_mode;
break;
}
r = gfx9_get_preferred_swizzle_mode(addrlib, &AddrSurfInfoIn, false,
&AddrSurfInfoIn.swizzleMode);
if (r)
@@ -1009,6 +1102,7 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
surf->u.gfx9.resource_type = AddrSurfInfoIn.resourceType;
surf->num_dcc_levels = 0;
surf->surf_size = 0;
surf->dcc_size = 0;
surf->htile_size = 0;
@@ -1025,9 +1119,16 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
/* Calculate texture layout information for stencil. */
if (surf->flags & RADEON_SURF_SBUFFER) {
AddrSurfInfoIn.bpp = 8;
AddrSurfInfoIn.flags.depth = 0;
AddrSurfInfoIn.flags.stencil = 1;
AddrSurfInfoIn.bpp = 8;
if (!AddrSurfInfoIn.flags.depth) {
r = gfx9_get_preferred_swizzle_mode(addrlib, &AddrSurfInfoIn, false,
&AddrSurfInfoIn.swizzleMode);
if (r)
return r;
} else
AddrSurfInfoIn.flags.depth = 0;
r = gfx9_compute_miptree(addrlib, surf, compressed, &AddrSurfInfoIn);
if (r)
@@ -1035,7 +1136,6 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
}
surf->is_linear = surf->u.gfx9.surf.swizzle_mode == ADDR_SW_LINEAR;
surf->num_dcc_levels = surf->dcc_size ? config->info.levels : 0;
switch (surf->u.gfx9.surf.swizzle_mode) {
/* S = standard. */

View File

@@ -2453,6 +2453,8 @@
#define S_008F3C_BORDER_COLOR_PTR(x) (((unsigned)(x) & 0xFFF) << 0)
#define G_008F3C_BORDER_COLOR_PTR(x) (((x) >> 0) & 0xFFF)
#define C_008F3C_BORDER_COLOR_PTR 0xFFFFF000
/* The UPGRADED_DEPTH field is driver-specific and does not exist in hardware. */
#define S_008F3C_UPGRADED_DEPTH(x) (((unsigned)(x) & 0x1) << 29)
#define S_008F3C_BORDER_COLOR_TYPE(x) (((unsigned)(x) & 0x03) << 30)
#define G_008F3C_BORDER_COLOR_TYPE(x) (((x) >> 30) & 0x03)
#define C_008F3C_BORDER_COLOR_TYPE 0x3FFFFFFF

View File

@@ -368,6 +368,10 @@ radv_emit_graphics_blend_state(struct radv_cmd_buffer *cmd_buffer,
radeon_set_context_reg(cmd_buffer->cs, R_028B70_DB_ALPHA_TO_MASK, pipeline->graphics.blend.db_alpha_to_mask);
if (cmd_buffer->device->physical_device->has_rbplus) {
radeon_set_context_reg_seq(cmd_buffer->cs, R_028760_SX_MRT0_BLEND_OPT, 8);
radeon_emit_array(cmd_buffer->cs, pipeline->graphics.blend.sx_mrt_blend_opt, 8);
radeon_set_context_reg_seq(cmd_buffer->cs, R_028754_SX_PS_DOWNCONVERT, 3);
radeon_emit(cmd_buffer->cs, 0); /* R_028754_SX_PS_DOWNCONVERT */
radeon_emit(cmd_buffer->cs, 0); /* R_028758_SX_BLEND_OPT_EPSILON */
@@ -934,6 +938,11 @@ static void
radv_emit_scissor(struct radv_cmd_buffer *cmd_buffer)
{
uint32_t count = cmd_buffer->state.dynamic.scissor.count;
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_PS_PARTIAL_FLUSH;
si_emit_cache_flush(cmd_buffer);
}
si_write_scissors(cmd_buffer->cs, 0, count,
cmd_buffer->state.dynamic.scissor.scissors,
cmd_buffer->state.dynamic.viewport.viewports,
@@ -1007,6 +1016,8 @@ radv_emit_fb_ds_state(struct radv_cmd_buffer *cmd_buffer,
}
radeon_set_context_reg(cmd_buffer->cs, R_028008_DB_DEPTH_VIEW, ds->db_depth_view);
radeon_set_context_reg(cmd_buffer->cs, R_028ABC_DB_HTILE_SURFACE, ds->db_htile_surface);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_set_context_reg_seq(cmd_buffer->cs, R_028014_DB_HTILE_DATA_BASE, 3);
@@ -1043,7 +1054,6 @@ radv_emit_fb_ds_state(struct radv_cmd_buffer *cmd_buffer,
radeon_emit(cmd_buffer->cs, ds->db_depth_size); /* R_028058_DB_DEPTH_SIZE */
radeon_emit(cmd_buffer->cs, ds->db_depth_slice); /* R_02805C_DB_DEPTH_SLICE */
radeon_set_context_reg(cmd_buffer->cs, R_028ABC_DB_HTILE_SURFACE, ds->db_htile_surface);
}
radeon_set_context_reg(cmd_buffer->cs, R_028B78_PA_SU_POLY_OFFSET_DB_FMT_CNTL,
@@ -1208,6 +1218,10 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer)
struct radv_framebuffer *framebuffer = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
/* this may happen for inherited secondary recording */
if (!framebuffer)
return;
for (i = 0; i < 8; ++i) {
if (i >= subpass->color_count || subpass->color_attachments[i].attachment == VK_ATTACHMENT_UNUSED) {
radeon_set_context_reg(cmd_buffer->cs, R_028C70_CB_COLOR0_INFO + i * 0x3C,
@@ -1247,9 +1261,13 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer)
}
radv_load_depth_clear_regs(cmd_buffer, image);
} else {
radeon_set_context_reg_seq(cmd_buffer->cs, R_028040_DB_Z_INFO, 2);
radeon_emit(cmd_buffer->cs, S_028040_FORMAT(V_028040_Z_INVALID)); /* R_028040_DB_Z_INFO */
radeon_emit(cmd_buffer->cs, S_028044_FORMAT(V_028044_STENCIL_INVALID)); /* R_028044_DB_STENCIL_INFO */
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9)
radeon_set_context_reg_seq(cmd_buffer->cs, R_028038_DB_Z_INFO, 2);
else
radeon_set_context_reg_seq(cmd_buffer->cs, R_028040_DB_Z_INFO, 2);
radeon_emit(cmd_buffer->cs, S_028040_FORMAT(V_028040_Z_INVALID)); /* DB_Z_INFO */
radeon_emit(cmd_buffer->cs, S_028044_FORMAT(V_028044_STENCIL_INVALID)); /* DB_STENCIL_INFO */
}
radeon_set_context_reg(cmd_buffer->cs, R_028208_PA_SC_WINDOW_SCISSOR_BR,
S_028208_BR_X(framebuffer->width) |
@@ -1971,6 +1989,7 @@ VkResult radv_BeginCommandBuffer(
memset(&cmd_buffer->state, 0, sizeof(cmd_buffer->state));
cmd_buffer->state.last_primitive_reset_en = -1;
cmd_buffer->usage_flags = pBeginInfo->flags;
/* setup initial configuration into command buffer */
if (cmd_buffer->level == VK_COMMAND_BUFFER_LEVEL_PRIMARY) {
@@ -2016,7 +2035,7 @@ void radv_CmdBindVertexBuffers(
/* We have to defer setting up vertex buffer since we need the buffer
* stride from the pipeline. */
assert(firstBinding + bindingCount < MAX_VBS);
assert(firstBinding + bindingCount <= MAX_VBS);
for (uint32_t i = 0; i < bindingCount; i++) {
vb[firstBinding + i].buffer = radv_buffer_from_handle(pBuffers[i]);
vb[firstBinding + i].offset = pOffsets[i];
@@ -2233,8 +2252,13 @@ VkResult radv_EndCommandBuffer(
{
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
if (cmd_buffer->queue_family_index != RADV_QUEUE_TRANSFER)
if (cmd_buffer->queue_family_index != RADV_QUEUE_TRANSFER) {
if (cmd_buffer->device->physical_device->rad_info.chip_class == SI)
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_CS_PARTIAL_FLUSH | RADV_CMD_FLAG_PS_PARTIAL_FLUSH | RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2;
si_emit_cache_flush(cmd_buffer);
}
vk_free(&cmd_buffer->pool->alloc, cmd_buffer->state.attachments);
if (!cmd_buffer->device->ws->cs_finalize(cmd_buffer->cs) ||
cmd_buffer->record_fail)
@@ -2701,7 +2725,7 @@ void radv_CmdDrawIndexed(
radv_cmd_buffer_flush_state(cmd_buffer, true, (instanceCount > 1), false, indexCount);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 15);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 16);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_set_uconfig_reg_idx(cmd_buffer->cs, R_03090C_VGT_INDEX_TYPE,
@@ -2757,6 +2781,8 @@ radv_emit_indirect_draw(struct radv_cmd_buffer *cmd_buffer,
if (count_buffer) {
count_va = cmd_buffer->device->ws->buffer_get_va(count_buffer->bo);
count_va += count_offset + count_buffer->offset;
cmd_buffer->device->ws->cs_add_buffer(cs, count_buffer->bo, 8);
}
if (!draw_count)
@@ -2772,20 +2798,30 @@ radv_emit_indirect_draw(struct radv_cmd_buffer *cmd_buffer,
radeon_emit(cs, indirect_va);
radeon_emit(cs, indirect_va >> 32);
radeon_emit(cs, PKT3(indexed ? PKT3_DRAW_INDEX_INDIRECT_MULTI :
PKT3_DRAW_INDIRECT_MULTI,
8, false));
radeon_emit(cs, 0);
radeon_emit(cs, (base_reg - SI_SH_REG_OFFSET) >> 2);
radeon_emit(cs, ((base_reg + 4) - SI_SH_REG_OFFSET) >> 2);
radeon_emit(cs, (((base_reg + 8) - SI_SH_REG_OFFSET) >> 2) |
S_2C3_DRAW_INDEX_ENABLE(draw_id_enable) |
S_2C3_COUNT_INDIRECT_ENABLE(!!count_va));
radeon_emit(cs, draw_count); /* count */
radeon_emit(cs, count_va); /* count_addr */
radeon_emit(cs, count_va >> 32);
radeon_emit(cs, stride); /* stride */
radeon_emit(cs, di_src_sel);
if (draw_count == 1 && !count_va && !draw_id_enable) {
radeon_emit(cs, PKT3(indexed ? PKT3_DRAW_INDEX_INDIRECT :
PKT3_DRAW_INDIRECT, 3, false));
radeon_emit(cs, 0);
radeon_emit(cs, (base_reg - SI_SH_REG_OFFSET) >> 2);
radeon_emit(cs, ((base_reg + 4) - SI_SH_REG_OFFSET) >> 2);
radeon_emit(cs, di_src_sel);
} else {
radeon_emit(cs, PKT3(indexed ? PKT3_DRAW_INDEX_INDIRECT_MULTI :
PKT3_DRAW_INDIRECT_MULTI,
8, false));
radeon_emit(cs, 0);
radeon_emit(cs, (base_reg - SI_SH_REG_OFFSET) >> 2);
radeon_emit(cs, ((base_reg + 4) - SI_SH_REG_OFFSET) >> 2);
radeon_emit(cs, (((base_reg + 8) - SI_SH_REG_OFFSET) >> 2) |
S_2C3_DRAW_INDEX_ENABLE(draw_id_enable) |
S_2C3_COUNT_INDIRECT_ENABLE(!!count_va));
radeon_emit(cs, draw_count); /* count */
radeon_emit(cs, count_va); /* count_addr */
radeon_emit(cs, count_va >> 32);
radeon_emit(cs, stride); /* stride */
radeon_emit(cs, di_src_sel);
}
radv_cmd_buffer_trace_emit(cmd_buffer);
}

View File

@@ -66,6 +66,7 @@ VkResult radv_CreateDescriptorSetLayout(
set_layout->binding_count = max_binding + 1;
set_layout->shader_stages = 0;
set_layout->dynamic_shader_stages = 0;
set_layout->size = 0;
memset(set_layout->binding, 0, size - sizeof(struct radv_descriptor_set_layout));
@@ -734,8 +735,59 @@ void radv_update_descriptor_sets(
}
}
if (descriptorCopyCount)
radv_finishme("copy descriptors");
for (i = 0; i < descriptorCopyCount; i++) {
const VkCopyDescriptorSet *copyset = &pDescriptorCopies[i];
RADV_FROM_HANDLE(radv_descriptor_set, src_set,
copyset->srcSet);
RADV_FROM_HANDLE(radv_descriptor_set, dst_set,
copyset->dstSet);
const struct radv_descriptor_set_binding_layout *src_binding_layout =
src_set->layout->binding + copyset->srcBinding;
const struct radv_descriptor_set_binding_layout *dst_binding_layout =
dst_set->layout->binding + copyset->dstBinding;
uint32_t *src_ptr = src_set->mapped_ptr;
uint32_t *dst_ptr = dst_set->mapped_ptr;
struct radeon_winsys_bo **src_buffer_list = src_set->descriptors;
struct radeon_winsys_bo **dst_buffer_list = dst_set->descriptors;
src_ptr += src_binding_layout->offset / 4;
dst_ptr += dst_binding_layout->offset / 4;
src_ptr += src_binding_layout->size * copyset->srcArrayElement / 4;
dst_ptr += dst_binding_layout->size * copyset->dstArrayElement / 4;
src_buffer_list += src_binding_layout->buffer_offset;
src_buffer_list += copyset->srcArrayElement;
dst_buffer_list += dst_binding_layout->buffer_offset;
dst_buffer_list += copyset->dstArrayElement;
for (j = 0; j < copyset->descriptorCount; ++j) {
switch (src_binding_layout->type) {
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC: {
unsigned src_idx = copyset->srcArrayElement + j;
unsigned dst_idx = copyset->dstArrayElement + j;
struct radv_descriptor_range *src_range, *dst_range;
src_idx += src_binding_layout->dynamic_offset_offset;
dst_idx += dst_binding_layout->dynamic_offset_offset;
src_range = src_set->dynamic_descriptors + src_idx;
dst_range = dst_set->dynamic_descriptors + dst_idx;
*dst_range = *src_range;
break;
}
default:
memcpy(dst_ptr, src_ptr, src_binding_layout->size);
}
src_ptr += src_binding_layout->size / 4;
dst_ptr += dst_binding_layout->size / 4;
dst_buffer_list[j] = src_buffer_list[j];
++src_buffer_list;
++dst_buffer_list;
}
}
}
void radv_UpdateDescriptorSets(

View File

@@ -235,32 +235,112 @@ is_extension_enabled(const VkExtensionProperties *extensions,
return false;
}
static const char *
get_chip_name(enum radeon_family family)
static void
radv_get_device_name(enum radeon_family family, char *name, size_t name_len)
{
const char *chip_string;
char llvm_string[32] = {};
switch (family) {
case CHIP_TAHITI: return "AMD RADV TAHITI";
case CHIP_PITCAIRN: return "AMD RADV PITCAIRN";
case CHIP_VERDE: return "AMD RADV CAPE VERDE";
case CHIP_OLAND: return "AMD RADV OLAND";
case CHIP_HAINAN: return "AMD RADV HAINAN";
case CHIP_BONAIRE: return "AMD RADV BONAIRE";
case CHIP_KAVERI: return "AMD RADV KAVERI";
case CHIP_KABINI: return "AMD RADV KABINI";
case CHIP_HAWAII: return "AMD RADV HAWAII";
case CHIP_MULLINS: return "AMD RADV MULLINS";
case CHIP_TONGA: return "AMD RADV TONGA";
case CHIP_ICELAND: return "AMD RADV ICELAND";
case CHIP_CARRIZO: return "AMD RADV CARRIZO";
case CHIP_FIJI: return "AMD RADV FIJI";
case CHIP_POLARIS10: return "AMD RADV POLARIS10";
case CHIP_POLARIS11: return "AMD RADV POLARIS11";
case CHIP_POLARIS12: return "AMD RADV POLARIS12";
case CHIP_STONEY: return "AMD RADV STONEY";
case CHIP_VEGA10: return "AMD RADV VEGA";
case CHIP_RAVEN: return "AMD RADV RAVEN";
default: return "AMD RADV unknown";
case CHIP_TAHITI: chip_string = "AMD RADV TAHITI"; break;
case CHIP_PITCAIRN: chip_string = "AMD RADV PITCAIRN"; break;
case CHIP_VERDE: chip_string = "AMD RADV CAPE VERDE"; break;
case CHIP_OLAND: chip_string = "AMD RADV OLAND"; break;
case CHIP_HAINAN: chip_string = "AMD RADV HAINAN"; break;
case CHIP_BONAIRE: chip_string = "AMD RADV BONAIRE"; break;
case CHIP_KAVERI: chip_string = "AMD RADV KAVERI"; break;
case CHIP_KABINI: chip_string = "AMD RADV KABINI"; break;
case CHIP_HAWAII: chip_string = "AMD RADV HAWAII"; break;
case CHIP_MULLINS: chip_string = "AMD RADV MULLINS"; break;
case CHIP_TONGA: chip_string = "AMD RADV TONGA"; break;
case CHIP_ICELAND: chip_string = "AMD RADV ICELAND"; break;
case CHIP_CARRIZO: chip_string = "AMD RADV CARRIZO"; break;
case CHIP_FIJI: chip_string = "AMD RADV FIJI"; break;
case CHIP_POLARIS10: chip_string = "AMD RADV POLARIS10"; break;
case CHIP_POLARIS11: chip_string = "AMD RADV POLARIS11"; break;
case CHIP_POLARIS12: chip_string = "AMD RADV POLARIS12"; break;
case CHIP_STONEY: chip_string = "AMD RADV STONEY"; break;
case CHIP_VEGA10: chip_string = "AMD RADV VEGA"; break;
case CHIP_RAVEN: chip_string = "AMD RADV RAVEN"; break;
default: chip_string = "AMD RADV unknown"; break;
}
if (HAVE_LLVM > 0) {
snprintf(llvm_string, sizeof(llvm_string),
" (LLVM %i.%i.%i)", (HAVE_LLVM >> 8) & 0xff,
HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
}
snprintf(name, name_len, "%s%s", chip_string, llvm_string);
}
static void
radv_physical_device_init_mem_types(struct radv_physical_device *device)
{
STATIC_ASSERT(RADV_MEM_HEAP_COUNT <= VK_MAX_MEMORY_HEAPS);
uint64_t visible_vram_size = MIN2(device->rad_info.vram_size,
device->rad_info.vram_vis_size);
int vram_index = -1, visible_vram_index = -1, gart_index = -1;
device->memory_properties.memoryHeapCount = 0;
if (device->rad_info.vram_size - visible_vram_size > 0) {
vram_index = device->memory_properties.memoryHeapCount++;
device->memory_properties.memoryHeaps[vram_index] = (VkMemoryHeap) {
.size = device->rad_info.vram_size - visible_vram_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
}
if (visible_vram_size) {
visible_vram_index = device->memory_properties.memoryHeapCount++;
device->memory_properties.memoryHeaps[visible_vram_index] = (VkMemoryHeap) {
.size = visible_vram_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
}
if (device->rad_info.gart_size > 0) {
gart_index = device->memory_properties.memoryHeapCount++;
device->memory_properties.memoryHeaps[gart_index] = (VkMemoryHeap) {
.size = device->rad_info.gart_size,
.flags = 0,
};
}
STATIC_ASSERT(RADV_MEM_TYPE_COUNT <= VK_MAX_MEMORY_TYPES);
unsigned type_count = 0;
if (vram_index >= 0) {
device->mem_type_indices[type_count] = RADV_MEM_TYPE_VRAM;
device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
.heapIndex = vram_index,
};
}
if (gart_index >= 0) {
device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_WRITE_COMBINE;
device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
.heapIndex = gart_index,
};
}
if (visible_vram_index >= 0) {
device->mem_type_indices[type_count] = RADV_MEM_TYPE_VRAM_CPU_ACCESS;
device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
.heapIndex = visible_vram_index,
};
}
if (gart_index >= 0) {
device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
.heapIndex = gart_index,
};
}
device->memory_properties.memoryTypeCount = type_count;
}
static VkResult
@@ -311,6 +391,8 @@ radv_physical_device_init(struct radv_physical_device *device,
goto fail;
}
radv_get_device_name(device->rad_info.family, device->name, sizeof(device->name));
if (radv_device_get_cache_uuid(device->rad_info.family, device->uuid)) {
radv_finish_wsi(device);
device->ws->destroy(device->ws);
@@ -336,7 +418,6 @@ radv_physical_device_init(struct radv_physical_device *device,
}
fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n");
device->name = get_chip_name(device->rad_info.family);
radv_get_device_uuid(drm_device, device->device_uuid);
@@ -346,6 +427,7 @@ radv_physical_device_init(struct radv_physical_device *device,
device->rbplus_allowed = device->rad_info.family == CHIP_STONEY;
}
radv_physical_device_init_mem_types(device);
return VK_SUCCESS;
fail:
@@ -902,47 +984,7 @@ void radv_GetPhysicalDeviceMemoryProperties(
{
RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
STATIC_ASSERT(RADV_MEM_TYPE_COUNT <= VK_MAX_MEMORY_TYPES);
pMemoryProperties->memoryTypeCount = RADV_MEM_TYPE_COUNT;
pMemoryProperties->memoryTypes[RADV_MEM_TYPE_VRAM] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
.heapIndex = RADV_MEM_HEAP_VRAM,
};
pMemoryProperties->memoryTypes[RADV_MEM_TYPE_GTT_WRITE_COMBINE] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
.heapIndex = RADV_MEM_HEAP_GTT,
};
pMemoryProperties->memoryTypes[RADV_MEM_TYPE_VRAM_CPU_ACCESS] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
.heapIndex = RADV_MEM_HEAP_VRAM_CPU_ACCESS,
};
pMemoryProperties->memoryTypes[RADV_MEM_TYPE_GTT_CACHED] = (VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
.heapIndex = RADV_MEM_HEAP_GTT,
};
STATIC_ASSERT(RADV_MEM_HEAP_COUNT <= VK_MAX_MEMORY_HEAPS);
pMemoryProperties->memoryHeapCount = RADV_MEM_HEAP_COUNT;
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM] = (VkMemoryHeap) {
.size = physical_device->rad_info.vram_size -
physical_device->rad_info.vram_vis_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM_CPU_ACCESS] = (VkMemoryHeap) {
.size = physical_device->rad_info.vram_vis_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_GTT] = (VkMemoryHeap) {
.size = physical_device->rad_info.gart_size,
.flags = 0,
};
*pMemoryProperties = physical_device->memory_properties;
}
void radv_GetPhysicalDeviceMemoryProperties2KHR(
@@ -1958,10 +2000,6 @@ static VkResult radv_alloc_sem_counts(struct radv_winsys_sem_counts *counts,
if (sem->temp_syncobj) {
counts->syncobj[syncobj_idx++] = sem->temp_syncobj;
if (reset_temp) {
/* after we wait on a temp import - drop it */
sem->temp_syncobj = 0;
}
}
else if (sem->syncobj)
counts->syncobj[syncobj_idx++] = sem->syncobj;
@@ -1982,6 +2020,21 @@ void radv_free_sem_info(struct radv_winsys_sem_info *sem_info)
free(sem_info->signal.sem);
}
static void radv_free_temp_syncobjs(struct radv_device *device,
int num_sems,
const VkSemaphore *sems)
{
for (uint32_t i = 0; i < num_sems; i++) {
RADV_FROM_HANDLE(radv_semaphore, sem, sems[i]);
if (sem->temp_syncobj) {
device->ws->destroy_syncobj(device->ws, sem->temp_syncobj);
sem->temp_syncobj = 0;
}
}
}
VkResult radv_alloc_sem_info(struct radv_winsys_sem_info *sem_info,
int num_wait_sems,
const VkSemaphore *wait_sems,
@@ -2133,6 +2186,9 @@ VkResult radv_QueueSubmit(
}
}
radv_free_temp_syncobjs(queue->device,
pSubmits[i].waitSemaphoreCount,
pSubmits[i].pWaitSemaphores);
radv_free_sem_info(&sem_info);
free(cs_array);
}
@@ -2231,6 +2287,7 @@ VkResult radv_AllocateMemory(
VkResult result;
enum radeon_bo_domain domain;
uint32_t flags = 0;
enum radv_mem_type mem_type_index = device->physical_device->mem_type_indices[pAllocateInfo->memoryTypeIndex];
assert(pAllocateInfo->sType == VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO);
@@ -2273,18 +2330,18 @@ VkResult radv_AllocateMemory(
}
uint64_t alloc_size = align_u64(pAllocateInfo->allocationSize, 4096);
if (pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_WRITE_COMBINE ||
pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_CACHED)
if (mem_type_index == RADV_MEM_TYPE_GTT_WRITE_COMBINE ||
mem_type_index == RADV_MEM_TYPE_GTT_CACHED)
domain = RADEON_DOMAIN_GTT;
else
domain = RADEON_DOMAIN_VRAM;
if (pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_VRAM)
if (mem_type_index == RADV_MEM_TYPE_VRAM)
flags |= RADEON_FLAG_NO_CPU_ACCESS;
else
flags |= RADEON_FLAG_CPU_ACCESS;
if (pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_WRITE_COMBINE)
if (mem_type_index == RADV_MEM_TYPE_GTT_WRITE_COMBINE)
flags |= RADEON_FLAG_GTT_WC;
mem->bo = device->ws->buffer_create(device->ws, alloc_size, device->physical_device->rad_info.max_alignment,
@@ -2294,7 +2351,7 @@ VkResult radv_AllocateMemory(
result = VK_ERROR_OUT_OF_DEVICE_MEMORY;
goto fail;
}
mem->type_index = pAllocateInfo->memoryTypeIndex;
mem->type_index = mem_type_index;
out_success:
*pMem = radv_device_memory_to_handle(mem);
@@ -2378,13 +2435,14 @@ VkResult radv_InvalidateMappedMemoryRanges(
}
void radv_GetBufferMemoryRequirements(
VkDevice device,
VkDevice _device,
VkBuffer _buffer,
VkMemoryRequirements* pMemoryRequirements)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_buffer, buffer, _buffer);
pMemoryRequirements->memoryTypeBits = (1u << RADV_MEM_TYPE_COUNT) - 1;
pMemoryRequirements->memoryTypeBits = (1u << device->physical_device->memory_properties.memoryTypeCount) - 1;
if (buffer->flags & VK_BUFFER_CREATE_SPARSE_BINDING_BIT)
pMemoryRequirements->alignment = 4096;
@@ -2418,13 +2476,14 @@ void radv_GetBufferMemoryRequirements2KHR(
}
void radv_GetImageMemoryRequirements(
VkDevice device,
VkDevice _device,
VkImage _image,
VkMemoryRequirements* pMemoryRequirements)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_image, image, _image);
pMemoryRequirements->memoryTypeBits = (1u << RADV_MEM_TYPE_COUNT) - 1;
pMemoryRequirements->memoryTypeBits = (1u << device->physical_device->memory_properties.memoryTypeCount) - 1;
pMemoryRequirements->size = image->size;
pMemoryRequirements->alignment = image->alignment;
@@ -2811,7 +2870,7 @@ VkResult radv_CreateEvent(
event->bo = device->ws->buffer_create(device->ws, 8, 8,
RADEON_DOMAIN_GTT,
RADEON_FLAG_CPU_ACCESS);
RADEON_FLAG_VA_UNCACHED | RADEON_FLAG_CPU_ACCESS);
if (!event->bo) {
vk_free2(&device->alloc, pAllocator, event);
return VK_ERROR_OUT_OF_DEVICE_MEMORY;
@@ -3077,9 +3136,13 @@ radv_initialise_color_surface(struct radv_device *device,
format != V_028C70_COLOR_24_8) |
S_028C70_NUMBER_TYPE(ntype) |
S_028C70_ENDIAN(endian);
if (iview->image->info.samples > 1)
if (iview->image->fmask.size)
cb->cb_color_info |= S_028C70_COMPRESSION(1);
if ((iview->image->info.samples > 1) && iview->image->fmask.size) {
cb->cb_color_info |= S_028C70_COMPRESSION(1);
if (device->physical_device->rad_info.chip_class == SI) {
unsigned fmask_bankh = util_logbase2(iview->image->fmask.bank_height);
cb->cb_color_attrib |= S_028C74_FMASK_BANK_HEIGHT(fmask_bankh);
}
}
if (iview->image->cmask.size &&
!(device->debug_flags & RADV_DEBUG_NO_FAST_CLEARS))
@@ -3109,15 +3172,15 @@ radv_initialise_color_surface(struct radv_device *device,
}
if (device->physical_device->rad_info.chip_class >= GFX9) {
uint32_t max_slice = radv_surface_layer_count(iview);
unsigned mip0_depth = iview->base_layer + max_slice - 1;
unsigned mip0_depth = iview->image->type == VK_IMAGE_TYPE_3D ?
(iview->extent.depth - 1) : (iview->image->info.array_size - 1);
cb->cb_color_view |= S_028C6C_MIP_LEVEL(iview->base_mip);
cb->cb_color_attrib |= S_028C74_MIP0_DEPTH(mip0_depth) |
S_028C74_RESOURCE_TYPE(iview->image->surface.u.gfx9.resource_type);
cb->cb_color_attrib2 = S_028C68_MIP0_WIDTH(iview->image->info.width - 1) |
S_028C68_MIP0_HEIGHT(iview->image->info.height - 1) |
S_028C68_MAX_MIP(iview->image->info.levels);
cb->cb_color_attrib2 = S_028C68_MIP0_WIDTH(iview->extent.width - 1) |
S_028C68_MIP0_HEIGHT(iview->extent.height - 1) |
S_028C68_MAX_MIP(iview->image->info.levels - 1);
cb->gfx9_epitch = S_0287A0_EPITCH(iview->image->surface.u.gfx9.surf.epitch);
@@ -3246,6 +3309,8 @@ radv_initialise_ds_surface(struct radv_device *device,
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
tile_mode_index = si_tile_mode_index(iview->image, level, true);
ds->db_stencil_info |= S_028044_TILE_MODE_INDEX(tile_mode_index);
if (stencil_only)
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
}
ds->db_depth_size = S_028058_PITCH_TILE_MAX((level_info->nblk_x / 8) - 1) |
@@ -3584,6 +3649,7 @@ VkResult radv_ImportSemaphoreFdKHR(VkDevice _device,
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_semaphore, sem, pImportSemaphoreFdInfo->semaphore);
uint32_t syncobj_handle = 0;
uint32_t *syncobj_dst = NULL;
assert(pImportSemaphoreFdInfo->handleType == VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
int ret = device->ws->import_syncobj(device->ws, pImportSemaphoreFdInfo->fd, &syncobj_handle);
@@ -3591,10 +3657,15 @@ VkResult radv_ImportSemaphoreFdKHR(VkDevice _device,
return VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR;
if (pImportSemaphoreFdInfo->flags & VK_SEMAPHORE_IMPORT_TEMPORARY_BIT_KHR) {
sem->temp_syncobj = syncobj_handle;
syncobj_dst = &sem->temp_syncobj;
} else {
sem->syncobj = syncobj_handle;
syncobj_dst = &sem->syncobj;
}
if (*syncobj_dst)
device->ws->destroy_syncobj(device->ws, *syncobj_dst);
*syncobj_dst = syncobj_handle;
close(pImportSemaphoreFdInfo->fd);
return VK_SUCCESS;
}
@@ -3624,9 +3695,14 @@ void radv_GetPhysicalDeviceExternalSemaphorePropertiesKHR(
const VkPhysicalDeviceExternalSemaphoreInfoKHR* pExternalSemaphoreInfo,
VkExternalSemaphorePropertiesKHR* pExternalSemaphoreProperties)
{
pExternalSemaphoreProperties->exportFromImportedHandleTypes = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
pExternalSemaphoreProperties->compatibleHandleTypes = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
pExternalSemaphoreProperties->externalSemaphoreFeatures = VK_EXTERNAL_SEMAPHORE_FEATURE_EXPORTABLE_BIT_KHR |
VK_EXTERNAL_SEMAPHORE_FEATURE_IMPORTABLE_BIT_KHR;
if (pExternalSemaphoreInfo->handleType == VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR) {
pExternalSemaphoreProperties->exportFromImportedHandleTypes = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
pExternalSemaphoreProperties->compatibleHandleTypes = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
pExternalSemaphoreProperties->externalSemaphoreFeatures = VK_EXTERNAL_SEMAPHORE_FEATURE_EXPORTABLE_BIT_KHR |
VK_EXTERNAL_SEMAPHORE_FEATURE_IMPORTABLE_BIT_KHR;
} else {
pExternalSemaphoreProperties->exportFromImportedHandleTypes = 0;
pExternalSemaphoreProperties->compatibleHandleTypes = 0;
pExternalSemaphoreProperties->externalSemaphoreFeatures = 0;
}
}

View File

@@ -578,6 +578,10 @@ radv_physical_device_get_format_properties(struct radv_physical_device *physical
VK_FORMAT_FEATURE_BLIT_DST_BIT;
tiled |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR |
VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR;
/* GFX9 doesn't support linear depth surfaces */
if (physical_device->rad_info.chip_class >= GFX9)
linear = 0;
}
} else {
bool linear_sampling;
@@ -958,6 +962,12 @@ bool radv_format_pack_clear_color(VkFormat format,
clear_vals[1] = ((uint16_t)util_iround(CLAMP(value->float32[2], 0.0f, 1.0f) * 0xffff)) & 0xffff;
clear_vals[1] |= ((uint16_t)util_iround(CLAMP(value->float32[3], 0.0f, 1.0f) * 0xffff)) << 16;
break;
case VK_FORMAT_R16G16B16A16_SNORM:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], -1.0f, 1.0f) * 0x7fff)) & 0xffff;
clear_vals[0] |= ((uint16_t)util_iround(CLAMP(value->float32[1], -1.0f, 1.0f) * 0x7fff)) << 16;
clear_vals[1] = ((uint16_t)util_iround(CLAMP(value->float32[2], -1.0f, 1.0f) * 0x7fff)) & 0xffff;
clear_vals[1] |= ((uint16_t)util_iround(CLAMP(value->float32[3], -1.0f, 1.0f) * 0x7fff)) << 16;
break;
case VK_FORMAT_A2B10G10R10_UNORM_PACK32:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], 0.0f, 1.0f) * 0x3ff)) & 0x3ff;
clear_vals[0] |= (((uint16_t)util_iround(CLAMP(value->float32[1], 0.0f, 1.0f) * 0x3ff)) & 0x3ff) << 10;

View File

@@ -34,7 +34,7 @@
#include "util/debug.h"
#include "util/u_atomic.h"
static unsigned
radv_choose_tiling(struct radv_device *Device,
radv_choose_tiling(struct radv_device *device,
const struct radv_image_create_info *create_info)
{
const VkImageCreateInfo *pCreateInfo = create_info->vk_info;
@@ -44,12 +44,17 @@ radv_choose_tiling(struct radv_device *Device,
return RADEON_SURF_MODE_LINEAR_ALIGNED;
}
/* Textures with a very small height are recommended to be linear. */
if (pCreateInfo->imageType == VK_IMAGE_TYPE_1D ||
/* Only very thin and long 2D textures should benefit from
* linear_aligned. */
(pCreateInfo->extent.width > 8 && pCreateInfo->extent.height <= 2))
return RADEON_SURF_MODE_LINEAR_ALIGNED;
if (!vk_format_is_compressed(pCreateInfo->format) &&
!vk_format_is_depth_or_stencil(pCreateInfo->format)
&& device->physical_device->rad_info.chip_class <= VI) {
/* this causes hangs in some VK CTS tests on GFX9. */
/* Textures with a very small height are recommended to be linear. */
if (pCreateInfo->imageType == VK_IMAGE_TYPE_1D ||
/* Only very thin and long 2D textures should benefit from
* linear_aligned. */
(pCreateInfo->extent.width > 8 && pCreateInfo->extent.height <= 2))
return RADEON_SURF_MODE_LINEAR_ALIGNED;
}
/* MSAA resources must be 2D tiled. */
if (pCreateInfo->samples > 1)
@@ -115,6 +120,7 @@ radv_init_surface(struct radv_device *device,
VK_IMAGE_USAGE_STORAGE_BIT)) ||
(pCreateInfo->flags & VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT) ||
(pCreateInfo->tiling == VK_IMAGE_TILING_LINEAR) ||
pCreateInfo->mipLevels > 1 || pCreateInfo->arrayLayers > 1 ||
device->physical_device->rad_info.chip_class < VI ||
create_info->scanout || (device->debug_flags & RADV_DEBUG_NO_DCC) ||
!radv_is_colorbuffer_format_supported(pCreateInfo->format, &blendable))
@@ -181,6 +187,11 @@ radv_make_buffer_descriptor(struct radv_device *device,
state[0] = va;
state[1] = S_008F04_BASE_ADDRESS_HI(va >> 32) |
S_008F04_STRIDE(stride);
if (device->physical_device->rad_info.chip_class != VI && stride) {
range /= stride;
}
state[2] = range;
state[3] = S_008F0C_DST_SEL_X(radv_map_swizzle(desc->swizzle[0])) |
S_008F0C_DST_SEL_Y(radv_map_swizzle(desc->swizzle[1])) |
@@ -200,7 +211,6 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
{
uint64_t gpu_address = image->bo ? device->ws->buffer_get_va(image->bo) + image->offset : 0;
uint64_t va = gpu_address;
unsigned pitch = base_level_info->nblk_x * block_width;
enum chip_class chip_class = device->physical_device->rad_info.chip_class;
uint64_t meta_va = 0;
if (chip_class >= GFX9) {
@@ -216,9 +226,6 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
state[0] |= image->surface.u.legacy.tile_swizzle;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
state[3] |= S_008F1C_TILING_INDEX(si_tile_mode_index(image, base_level,
is_stencil));
state[4] |= S_008F20_PITCH_GFX6(pitch - 1);
if (chip_class >= VI) {
state[6] &= C_008F28_COMPRESSION_EN;
@@ -274,10 +281,14 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
}
static unsigned radv_tex_dim(VkImageType image_type, VkImageViewType view_type,
unsigned nr_layers, unsigned nr_samples, bool is_storage_image)
unsigned nr_layers, unsigned nr_samples, bool is_storage_image, bool gfx9)
{
if (view_type == VK_IMAGE_VIEW_TYPE_CUBE || view_type == VK_IMAGE_VIEW_TYPE_CUBE_ARRAY)
return is_storage_image ? V_008F1C_SQ_RSRC_IMG_2D_ARRAY : V_008F1C_SQ_RSRC_IMG_CUBE;
/* GFX9 allocates 1D textures as 2D. */
if (gfx9 && image_type == VK_IMAGE_TYPE_1D)
image_type = VK_IMAGE_TYPE_2D;
switch (image_type) {
case VK_IMAGE_TYPE_1D:
return nr_layers > 1 ? V_008F1C_SQ_RSRC_IMG_1D_ARRAY : V_008F1C_SQ_RSRC_IMG_1D;
@@ -368,7 +379,7 @@ si_make_texture_descriptor(struct radv_device *device,
}
type = radv_tex_dim(image->type, view_type, image->info.array_size, image->info.samples,
is_storage_image);
is_storage_image, device->physical_device->rad_info.chip_class >= GFX9);
if (type == V_008F1C_SQ_RSRC_IMG_1D_ARRAY) {
height = 1;
depth = image->info.array_size;
@@ -414,7 +425,7 @@ si_make_texture_descriptor(struct radv_device *device,
state[4] |= S_008F20_BC_SWIZZLE(bc_swizzle);
state[5] |= S_008F24_MAX_MIP(image->info.samples > 1 ?
util_logbase2(image->info.samples) :
last_level);
image->info.levels - 1);
} else {
state[3] |= S_008F1C_POW2_PAD(image->info.levels > 1);
state[4] |= S_008F20_DEPTH(depth - 1);
@@ -489,7 +500,7 @@ si_make_texture_descriptor(struct radv_device *device,
S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_Z(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_W(V_008F1C_SQ_SEL_X) |
S_008F1C_TYPE(radv_tex_dim(image->type, view_type, 1, 0, false));
S_008F1C_TYPE(radv_tex_dim(image->type, view_type, 1, 0, false, false));
fmask_state[4] = 0;
fmask_state[5] = S_008F24_BASE_ARRAY(first_layer);
fmask_state[6] = 0;
@@ -554,10 +565,11 @@ radv_query_opaque_metadata(struct radv_device *device,
memcpy(&md->metadata[2], desc, sizeof(desc));
/* Dwords [10:..] contain the mipmap level offsets. */
for (i = 0; i <= image->info.levels - 1; i++)
md->metadata[10+i] = image->surface.u.legacy.level[i].offset >> 8;
md->size_metadata = (11 + image->info.levels - 1) * 4;
if (device->physical_device->rad_info.chip_class <= VI) {
for (i = 0; i <= image->info.levels - 1; i++)
md->metadata[10+i] = image->surface.u.legacy.level[i].offset >> 8;
md->size_metadata = (11 + image->info.levels - 1) * 4;
}
}
void
@@ -826,8 +838,10 @@ radv_image_create(VkDevice _device,
if ((pCreateInfo->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) &&
pCreateInfo->mipLevels == 1 &&
!image->surface.dcc_size && image->info.depth == 1 && can_cmask_dcc)
!image->surface.dcc_size && image->info.depth == 1 && can_cmask_dcc &&
!image->surface.is_linear)
radv_image_alloc_cmask(device, image);
if (image->info.samples > 1 && vk_format_is_color(pCreateInfo->format)) {
radv_image_alloc_fmask(device, image);
} else if (vk_format_is_depth(pCreateInfo->format)) {
@@ -856,15 +870,15 @@ radv_image_create(VkDevice _device,
static void
radv_image_view_make_descriptor(struct radv_image_view *iview,
struct radv_device *device,
const VkImageViewCreateInfo* pCreateInfo,
const VkComponentMapping *components,
bool is_storage_image)
{
RADV_FROM_HANDLE(radv_image, image, pCreateInfo->image);
const VkImageSubresourceRange *range = &pCreateInfo->subresourceRange;
struct radv_image *image = iview->image;
bool is_stencil = iview->aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT;
uint32_t blk_w;
uint32_t *descriptor;
uint32_t *fmask_descriptor;
uint32_t hw_level = 0;
if (is_storage_image) {
descriptor = iview->storage_descriptor;
@@ -877,23 +891,32 @@ radv_image_view_make_descriptor(struct radv_image_view *iview,
assert(image->surface.blk_w % vk_format_get_blockwidth(image->vk_format) == 0);
blk_w = image->surface.blk_w / vk_format_get_blockwidth(image->vk_format) * vk_format_get_blockwidth(iview->vk_format);
if (device->physical_device->rad_info.chip_class >= GFX9)
hw_level = iview->base_mip;
si_make_texture_descriptor(device, image, is_storage_image,
iview->type,
iview->vk_format,
&pCreateInfo->components,
0, radv_get_levelCount(image, range) - 1,
range->baseArrayLayer,
range->baseArrayLayer + radv_get_layerCount(image, range) - 1,
components,
hw_level, hw_level + iview->level_count - 1,
iview->base_layer,
iview->base_layer + iview->layer_count - 1,
iview->extent.width,
iview->extent.height,
iview->extent.depth,
descriptor,
fmask_descriptor);
const struct legacy_surf_level *base_level_info = NULL;
if (device->physical_device->rad_info.chip_class <= GFX9) {
if (is_stencil)
base_level_info = &image->surface.u.legacy.stencil_level[iview->base_mip];
else
base_level_info = &image->surface.u.legacy.level[iview->base_mip];
}
si_set_mutable_tex_desc_fields(device, image,
is_stencil ? &image->surface.u.legacy.stencil_level[range->baseMipLevel]
: &image->surface.u.legacy.level[range->baseMipLevel],
range->baseMipLevel,
range->baseMipLevel,
base_level_info,
iview->base_mip,
iview->base_mip,
blk_w, is_stencil, descriptor);
}
@@ -929,23 +952,34 @@ radv_image_view_init(struct radv_image_view *iview,
iview->vk_format = vk_format_depth_only(iview->vk_format);
}
iview->extent = (VkExtent3D) {
.width = radv_minify(image->info.width , range->baseMipLevel),
.height = radv_minify(image->info.height, range->baseMipLevel),
.depth = radv_minify(image->info.depth , range->baseMipLevel),
};
if (device->physical_device->rad_info.chip_class >= GFX9) {
iview->extent = (VkExtent3D) {
.width = image->info.width,
.height = image->info.height,
.depth = image->info.depth,
};
} else {
iview->extent = (VkExtent3D) {
.width = radv_minify(image->info.width , range->baseMipLevel),
.height = radv_minify(image->info.height, range->baseMipLevel),
.depth = radv_minify(image->info.depth , range->baseMipLevel),
};
}
iview->extent.width = round_up_u32(iview->extent.width * vk_format_get_blockwidth(iview->vk_format),
vk_format_get_blockwidth(image->vk_format));
iview->extent.height = round_up_u32(iview->extent.height * vk_format_get_blockheight(iview->vk_format),
vk_format_get_blockheight(image->vk_format));
if (iview->vk_format != image->vk_format) {
iview->extent.width = round_up_u32(iview->extent.width * vk_format_get_blockwidth(iview->vk_format),
vk_format_get_blockwidth(image->vk_format));
iview->extent.height = round_up_u32(iview->extent.height * vk_format_get_blockheight(iview->vk_format),
vk_format_get_blockheight(image->vk_format));
}
iview->base_layer = range->baseArrayLayer;
iview->layer_count = radv_get_layerCount(image, range);
iview->base_mip = range->baseMipLevel;
iview->level_count = radv_get_levelCount(image, range);
radv_image_view_make_descriptor(iview, device, pCreateInfo, false);
radv_image_view_make_descriptor(iview, device, pCreateInfo, true);
radv_image_view_make_descriptor(iview, device, &pCreateInfo->components, false);
radv_image_view_make_descriptor(iview, device, &pCreateInfo->components, true);
}
bool radv_layout_has_htile(const struct radv_image *image,
@@ -1020,23 +1054,34 @@ radv_DestroyImage(VkDevice _device, VkImage _image,
}
void radv_GetImageSubresourceLayout(
VkDevice device,
VkDevice _device,
VkImage _image,
const VkImageSubresource* pSubresource,
VkSubresourceLayout* pLayout)
{
RADV_FROM_HANDLE(radv_image, image, _image);
RADV_FROM_HANDLE(radv_device, device, _device);
int level = pSubresource->mipLevel;
int layer = pSubresource->arrayLayer;
struct radeon_surf *surface = &image->surface;
pLayout->offset = surface->u.legacy.level[level].offset + surface->u.legacy.level[level].slice_size * layer;
pLayout->rowPitch = surface->u.legacy.level[level].nblk_x * surface->bpe;
pLayout->arrayPitch = surface->u.legacy.level[level].slice_size;
pLayout->depthPitch = surface->u.legacy.level[level].slice_size;
pLayout->size = surface->u.legacy.level[level].slice_size;
if (image->type == VK_IMAGE_TYPE_3D)
pLayout->size *= u_minify(image->info.depth, level);
if (device->physical_device->rad_info.chip_class >= GFX9) {
pLayout->offset = surface->u.gfx9.offset[level] + surface->u.gfx9.surf_slice_size * layer;
pLayout->rowPitch = surface->u.gfx9.surf_pitch * surface->bpe;
pLayout->arrayPitch = surface->u.gfx9.surf_slice_size;
pLayout->depthPitch = surface->u.gfx9.surf_slice_size;
pLayout->size = surface->u.gfx9.surf_slice_size;
if (image->type == VK_IMAGE_TYPE_3D)
pLayout->size *= u_minify(image->info.depth, level);
} else {
pLayout->offset = surface->u.legacy.level[level].offset + surface->u.legacy.level[level].slice_size * layer;
pLayout->rowPitch = surface->u.legacy.level[level].nblk_x * surface->bpe;
pLayout->arrayPitch = surface->u.legacy.level[level].slice_size;
pLayout->depthPitch = surface->u.legacy.level[level].slice_size;
pLayout->size = surface->u.legacy.level[level].slice_size;
if (image->type == VK_IMAGE_TYPE_3D)
pLayout->size *= u_minify(image->info.depth, level);
}
}

View File

@@ -477,48 +477,8 @@ radv_meta_build_nir_fs_noop(void)
return b.shader;
}
static nir_ssa_def *radv_meta_build_resolve_srgb_conversion(nir_builder *b,
nir_ssa_def *input)
{
nir_const_value v;
unsigned i;
v.u32[0] = 0x3b4d2e1c; // 0.00313080009
nir_ssa_def *cmp[3];
for (i = 0; i < 3; i++)
cmp[i] = nir_flt(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *ltvals[3];
v.f32[0] = 12.92;
for (i = 0; i < 3; i++)
ltvals[i] = nir_fmul(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *gtvals[3];
for (i = 0; i < 3; i++) {
v.f32[0] = 1.0/2.4;
gtvals[i] = nir_fpow(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
v.f32[0] = 1.055;
gtvals[i] = nir_fmul(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
v.f32[0] = 0.055;
gtvals[i] = nir_fsub(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
}
nir_ssa_def *comp[4];
for (i = 0; i < 3; i++)
comp[i] = nir_bcsel(b, cmp[i], ltvals[i], gtvals[i]);
comp[3] = nir_channels(b, input, 3);
return nir_vec(b, comp, 4);
}
void radv_meta_build_resolve_shader_core(nir_builder *b,
bool is_integer,
bool is_srgb,
int samples,
nir_variable *input_img,
nir_variable *color,
@@ -596,10 +556,4 @@ void radv_meta_build_resolve_shader_core(nir_builder *b,
if (outer_if)
b->cursor = nir_after_cf_node(&outer_if->cf_node);
if (is_srgb) {
nir_ssa_def *newv = nir_load_var(b, color);
newv = radv_meta_build_resolve_srgb_conversion(b, newv);
nir_store_var(b, color, newv, 0xf);
}
}

View File

@@ -234,7 +234,6 @@ nir_shader *radv_meta_build_nir_fs_noop(void);
void radv_meta_build_resolve_shader_core(nir_builder *b,
bool is_integer,
bool is_srgb,
int samples,
nir_variable *input_img,
nir_variable *color,

View File

@@ -269,21 +269,26 @@ meta_emit_blit(struct radv_cmd_buffer *cmd_buffer,
VkOffset3D src_offset_1,
struct radv_image *dest_image,
struct radv_image_view *dest_iview,
VkOffset3D dest_offset_0,
VkOffset3D dest_offset_1,
VkOffset2D dest_offset_0,
VkOffset2D dest_offset_1,
VkRect2D dest_box,
VkFilter blit_filter)
{
struct radv_device *device = cmd_buffer->device;
uint32_t src_width = radv_minify(src_iview->image->info.width, src_iview->base_mip);
uint32_t src_height = radv_minify(src_iview->image->info.height, src_iview->base_mip);
uint32_t src_depth = radv_minify(src_iview->image->info.depth, src_iview->base_mip);
uint32_t dst_width = radv_minify(dest_iview->image->info.width, dest_iview->base_mip);
uint32_t dst_height = radv_minify(dest_iview->image->info.height, dest_iview->base_mip);
assert(src_image->info.samples == dest_image->info.samples);
float vertex_push_constants[5] = {
(float)src_offset_0.x / (float)src_iview->extent.width,
(float)src_offset_0.y / (float)src_iview->extent.height,
(float)src_offset_1.x / (float)src_iview->extent.width,
(float)src_offset_1.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
(float)src_offset_0.x / (float)src_width,
(float)src_offset_0.y / (float)src_height,
(float)src_offset_1.x / (float)src_width,
(float)src_offset_1.y / (float)src_height,
(float)src_offset_0.z / (float)src_depth,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
@@ -310,8 +315,8 @@ meta_emit_blit(struct radv_cmd_buffer *cmd_buffer,
.pAttachments = (VkImageView[]) {
radv_image_view_to_handle(dest_iview),
},
.width = dest_iview->extent.width,
.height = dest_iview->extent.height,
.width = dst_width,
.height = dst_height,
.layers = 1,
}, &cmd_buffer->pool->alloc, &fb);
VkPipeline pipeline;
@@ -512,21 +517,6 @@ void radv_CmdBlitImage(
for (unsigned r = 0; r < regionCount; r++) {
const VkImageSubresourceLayers *src_res = &pRegions[r].srcSubresource;
const VkImageSubresourceLayers *dst_res = &pRegions[r].dstSubresource;
struct radv_image_view src_iview;
radv_image_view_init(&src_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = srcImage,
.viewType = radv_meta_get_view_type(src_image),
.format = src_image->vk_format,
.subresourceRange = {
.aspectMask = src_res->aspectMask,
.baseMipLevel = src_res->mipLevel,
.levelCount = 1,
.baseArrayLayer = src_res->baseArrayLayer,
.layerCount = 1
},
});
unsigned dst_start, dst_end;
if (dest_image->type == VK_IMAGE_TYPE_3D) {
@@ -573,18 +563,17 @@ void radv_CmdBlitImage(
dest_box.extent.width = abs(dst_x1 - dst_x0);
dest_box.extent.height = abs(dst_y1 - dst_y0);
struct radv_image_view dest_iview;
const unsigned num_layers = dst_end - dst_start;
for (unsigned i = 0; i < num_layers; i++) {
const VkOffset3D dest_offset_0 = {
struct radv_image_view dest_iview, src_iview;
const VkOffset2D dest_offset_0 = {
.x = dst_x0,
.y = dst_y0,
.z = dst_start + i ,
};
const VkOffset3D dest_offset_1 = {
const VkOffset2D dest_offset_1 = {
.x = dst_x1,
.y = dst_y1,
.z = dst_start + i ,
};
VkOffset3D src_offset_0 = {
.x = src_x0,
@@ -596,9 +585,10 @@ void radv_CmdBlitImage(
.y = src_y1,
.z = src_start + i * src_z_step,
};
const uint32_t dest_array_slice =
radv_meta_get_iview_layer(dest_image, dst_res,
&dest_offset_0);
const uint32_t dest_array_slice = dst_start + i;
/* 3D images have just 1 layer */
const uint32_t src_array_slice = src_image->type == VK_IMAGE_TYPE_3D ? 0 : src_start + i;
radv_image_view_init(&dest_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
@@ -614,6 +604,20 @@ void radv_CmdBlitImage(
.layerCount = 1
},
});
radv_image_view_init(&src_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = srcImage,
.viewType = radv_meta_get_view_type(src_image),
.format = src_image->vk_format,
.subresourceRange = {
.aspectMask = src_res->aspectMask,
.baseMipLevel = src_res->mipLevel,
.levelCount = 1,
.baseArrayLayer = src_array_slice,
.layerCount = 1
},
});
meta_emit_blit(cmd_buffer,
src_image, &src_iview,
src_offset_0, src_offset_1,
@@ -695,6 +699,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,

View File

@@ -53,7 +53,8 @@ enum blit2d_src_type {
static void
create_iview(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_surf *surf,
struct radv_image_view *iview, VkFormat depth_format)
struct radv_image_view *iview, VkFormat depth_format,
VkImageAspectFlagBits aspects)
{
VkFormat format;
@@ -69,7 +70,7 @@ create_iview(struct radv_cmd_buffer *cmd_buffer,
.viewType = VK_IMAGE_VIEW_TYPE_2D,
.format = format,
.subresourceRange = {
.aspectMask = surf->aspect_mask,
.aspectMask = aspects,
.baseMipLevel = surf->level,
.levelCount = 1,
.baseArrayLayer = surf->layer,
@@ -111,7 +112,8 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_surf *src_img,
struct radv_meta_blit2d_buffer *src_buf,
struct blit2d_src_temps *tmp,
enum blit2d_src_type src_type, VkFormat depth_format)
enum blit2d_src_type src_type, VkFormat depth_format,
VkImageAspectFlagBits aspects)
{
struct radv_device *device = cmd_buffer->device;
@@ -138,7 +140,7 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4,
&src_buf->pitch);
} else {
create_iview(cmd_buffer, src_img, &tmp->iview, depth_format);
create_iview(cmd_buffer, src_img, &tmp->iview, depth_format, aspects);
radv_meta_push_descriptor_set(cmd_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
device->meta_state.blit2d.p_layouts[src_type],
@@ -175,9 +177,10 @@ blit2d_bind_dst(struct radv_cmd_buffer *cmd_buffer,
uint32_t width,
uint32_t height,
VkFormat depth_format,
struct blit2d_dst_temps *tmp)
struct blit2d_dst_temps *tmp,
VkImageAspectFlagBits aspects)
{
create_iview(cmd_buffer, dst, &tmp->iview, depth_format);
create_iview(cmd_buffer, dst, &tmp->iview, depth_format, aspects);
radv_CreateFramebuffer(radv_device_to_handle(cmd_buffer->device),
&(VkFramebufferCreateInfo) {
@@ -250,106 +253,111 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device;
for (unsigned r = 0; r < num_rects; ++r) {
VkFormat depth_format = 0;
if (dst->aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT)
depth_format = vk_format_stencil_only(dst->image->vk_format);
else if (dst->aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT)
depth_format = vk_format_depth_only(dst->image->vk_format);
struct blit2d_src_temps src_temps;
blit2d_bind_src(cmd_buffer, src_img, src_buf, &src_temps, src_type, depth_format);
unsigned i;
for_each_bit(i, dst->aspect_mask) {
unsigned aspect_mask = 1u << i;
VkFormat depth_format = 0;
if (aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT)
depth_format = vk_format_stencil_only(dst->image->vk_format);
else if (aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT)
depth_format = vk_format_depth_only(dst->image->vk_format);
struct blit2d_src_temps src_temps;
blit2d_bind_src(cmd_buffer, src_img, src_buf, &src_temps, src_type, depth_format, aspect_mask);
struct blit2d_dst_temps dst_temps;
blit2d_bind_dst(cmd_buffer, dst, rects[r].dst_x + rects[r].width,
rects[r].dst_y + rects[r].height, depth_format, &dst_temps);
struct blit2d_dst_temps dst_temps;
blit2d_bind_dst(cmd_buffer, dst, rects[r].dst_x + rects[r].width,
rects[r].dst_y + rects[r].height, depth_format, &dst_temps, aspect_mask);
float vertex_push_constants[4] = {
rects[r].src_x,
rects[r].src_y,
rects[r].src_x + rects[r].width,
rects[r].src_y + rects[r].height,
};
float vertex_push_constants[4] = {
rects[r].src_x,
rects[r].src_y,
rects[r].src_x + rects[r].width,
rects[r].src_y + rects[r].height,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type],
VK_SHADER_STAGE_VERTEX_BIT, 0, 16,
vertex_push_constants);
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type],
VK_SHADER_STAGE_VERTEX_BIT, 0, 16,
vertex_push_constants);
if (dst->aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT) {
unsigned fs_key = radv_format_meta_fs_key(dst_temps.iview.vk_format);
if (aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT) {
unsigned fs_key = radv_format_meta_fs_key(dst_temps.iview.vk_format);
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.render_passes[fs_key],
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
.extent = { rects[r].width, rects[r].height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.render_passes[fs_key],
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
.extent = { rects[r].width, rects[r].height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
bind_pipeline(cmd_buffer, src_type, fs_key);
} else if (dst->aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT) {
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.depth_only_rp,
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
.extent = { rects[r].width, rects[r].height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
bind_pipeline(cmd_buffer, src_type, fs_key);
} else if (aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT) {
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.depth_only_rp,
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
.extent = { rects[r].width, rects[r].height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
bind_depth_pipeline(cmd_buffer, src_type);
bind_depth_pipeline(cmd_buffer, src_type);
} else if (dst->aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT) {
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.stencil_only_rp,
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
.extent = { rects[r].width, rects[r].height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
} else if (aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT) {
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.stencil_only_rp,
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
.extent = { rects[r].width, rects[r].height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
bind_stencil_pipeline(cmd_buffer, src_type);
bind_stencil_pipeline(cmd_buffer, src_type);
} else
unreachable("Processing blit2d with multiple aspects.");
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) {
.x = rects[r].dst_x,
.y = rects[r].dst_y,
.width = rects[r].width,
.height = rects[r].height,
.minDepth = 0.0f,
.maxDepth = 1.0f
});
radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkRect2D) {
.offset = (VkOffset2D) { rects[r].dst_x, rects[r].dst_y },
.extent = (VkExtent2D) { rects[r].width, rects[r].height },
});
radv_CmdDraw(radv_cmd_buffer_to_handle(cmd_buffer), 3, 1, 0, 0);
radv_CmdEndRenderPass(radv_cmd_buffer_to_handle(cmd_buffer));
/* At the point where we emit the draw call, all data from the
* descriptor sets, etc. has been used. We are free to delete it.
*/
blit2d_unbind_dst(cmd_buffer, &dst_temps);
}
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) {
.x = rects[r].dst_x,
.y = rects[r].dst_y,
.width = rects[r].width,
.height = rects[r].height,
.minDepth = 0.0f,
.maxDepth = 1.0f
});
radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkRect2D) {
.offset = (VkOffset2D) { rects[r].dst_x, rects[r].dst_y },
.extent = (VkExtent2D) { rects[r].width, rects[r].height },
});
radv_CmdDraw(radv_cmd_buffer_to_handle(cmd_buffer), 3, 1, 0, 0);
radv_CmdEndRenderPass(radv_cmd_buffer_to_handle(cmd_buffer));
/* At the point where we emit the draw call, all data from the
* descriptor sets, etc. has been used. We are free to delete it.
*/
blit2d_unbind_dst(cmd_buffer, &dst_temps);
}
}
@@ -1134,6 +1142,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,

View File

@@ -754,6 +754,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,
@@ -977,7 +979,7 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
if (iview->image->info.levels > 1)
goto fail;
if (iview->image->surface.u.legacy.level[0].mode < RADEON_SURF_MODE_1D)
if (iview->image->surface.is_linear)
goto fail;
if (!radv_image_extent_compare(iview->image, &iview->extent))
goto fail;
@@ -1174,6 +1176,9 @@ radv_clear_image_layer(struct radv_cmd_buffer *cmd_buffer,
{
VkDevice device_h = radv_device_to_handle(cmd_buffer->device);
struct radv_image_view iview;
uint32_t width = radv_minify(image->info.width, range->baseMipLevel + level);
uint32_t height = radv_minify(image->info.height, range->baseMipLevel + level);
radv_image_view_init(&iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
@@ -1197,9 +1202,9 @@ radv_clear_image_layer(struct radv_cmd_buffer *cmd_buffer,
.pAttachments = (VkImageView[]) {
radv_image_view_to_handle(&iview),
},
.width = iview.extent.width,
.height = iview.extent.height,
.layers = 1
.width = width,
.height = height,
.layers = 1
},
&cmd_buffer->pool->alloc,
&fb);
@@ -1255,8 +1260,8 @@ radv_clear_image_layer(struct radv_cmd_buffer *cmd_buffer,
.renderArea = {
.offset = { 0, 0, },
.extent = {
.width = iview.extent.width,
.height = iview.extent.height,
.width = width,
.height = height,
},
},
.renderPass = pass,
@@ -1275,7 +1280,7 @@ radv_clear_image_layer(struct radv_cmd_buffer *cmd_buffer,
VkClearRect clear_rect = {
.rect = {
.offset = { 0, 0 },
.extent = { iview.extent.width, iview.extent.height },
.extent = { width, height },
},
.baseArrayLayer = range->baseArrayLayer,
.layerCount = 1, /* FINISHME: clear multi-layer framebuffer */

View File

@@ -29,7 +29,9 @@
#include "sid.h"
static VkResult
create_pass(struct radv_device *device)
create_pass(struct radv_device *device,
uint32_t samples,
VkRenderPass *pass)
{
VkResult result;
VkDevice device_h = radv_device_to_handle(device);
@@ -37,7 +39,7 @@ create_pass(struct radv_device *device)
VkAttachmentDescription attachment;
attachment.format = VK_FORMAT_D32_SFLOAT_S8_UINT;
attachment.samples = 1;
attachment.samples = samples;
attachment.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
attachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
attachment.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
@@ -65,14 +67,18 @@ create_pass(struct radv_device *device)
.dependencyCount = 0,
},
alloc,
&device->meta_state.depth_decomp.pass);
pass);
return result;
}
static VkResult
create_pipeline(struct radv_device *device,
VkShaderModule vs_module_h)
VkShaderModule vs_module_h,
uint32_t samples,
VkRenderPass pass,
VkPipeline *decompress_pipeline,
VkPipeline *resummarize_pipeline)
{
VkResult result;
VkDevice device_h = radv_device_to_handle(device);
@@ -129,7 +135,7 @@ create_pipeline(struct radv_device *device,
},
.pMultisampleState = &(VkPipelineMultisampleStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,
.rasterizationSamples = 1,
.rasterizationSamples = samples,
.sampleShadingEnable = false,
.pSampleMask = NULL,
.alphaToCoverageEnable = false,
@@ -156,7 +162,7 @@ create_pipeline(struct radv_device *device,
VK_DYNAMIC_STATE_SCISSOR,
},
},
.renderPass = device->meta_state.depth_decomp.pass,
.renderPass = pass,
.subpass = 0,
};
@@ -169,7 +175,7 @@ create_pipeline(struct radv_device *device,
.db_flush_stencil_inplace = true,
},
&device->meta_state.alloc,
&device->meta_state.depth_decomp.decompress_pipeline);
decompress_pipeline);
if (result != VK_SUCCESS)
goto cleanup;
@@ -183,7 +189,7 @@ create_pipeline(struct radv_device *device,
.db_resummarize = true,
},
&device->meta_state.alloc,
&device->meta_state.depth_decomp.resummarize_pipeline);
resummarize_pipeline);
if (result != VK_SUCCESS)
goto cleanup;
@@ -199,29 +205,31 @@ radv_device_finish_meta_depth_decomp_state(struct radv_device *device)
{
struct radv_meta_state *state = &device->meta_state;
VkDevice device_h = radv_device_to_handle(device);
VkRenderPass pass_h = device->meta_state.depth_decomp.pass;
const VkAllocationCallbacks *alloc = &device->meta_state.alloc;
if (pass_h)
radv_DestroyRenderPass(device_h, pass_h,
&device->meta_state.alloc);
VkPipeline pipeline_h = state->depth_decomp.decompress_pipeline;
if (pipeline_h) {
radv_DestroyPipeline(device_h, pipeline_h, alloc);
}
pipeline_h = state->depth_decomp.resummarize_pipeline;
if (pipeline_h) {
radv_DestroyPipeline(device_h, pipeline_h, alloc);
for (uint32_t i = 0; i < ARRAY_SIZE(state->depth_decomp); ++i) {
VkRenderPass pass_h = state->depth_decomp[i].pass;
if (pass_h) {
radv_DestroyRenderPass(device_h, pass_h, alloc);
}
VkPipeline pipeline_h = state->depth_decomp[i].decompress_pipeline;
if (pipeline_h) {
radv_DestroyPipeline(device_h, pipeline_h, alloc);
}
pipeline_h = state->depth_decomp[i].resummarize_pipeline;
if (pipeline_h) {
radv_DestroyPipeline(device_h, pipeline_h, alloc);
}
}
}
VkResult
radv_device_init_meta_depth_decomp_state(struct radv_device *device)
{
struct radv_meta_state *state = &device->meta_state;
VkResult res = VK_SUCCESS;
zero(device->meta_state.depth_decomp);
zero(state->depth_decomp);
struct radv_shader_module vs_module = { .nir = radv_meta_build_nir_vs_generate_vertices() };
if (!vs_module.nir) {
@@ -230,14 +238,22 @@ radv_device_init_meta_depth_decomp_state(struct radv_device *device)
goto fail;
}
res = create_pass(device);
if (res != VK_SUCCESS)
goto fail;
VkShaderModule vs_module_h = radv_shader_module_to_handle(&vs_module);
res = create_pipeline(device, vs_module_h);
if (res != VK_SUCCESS)
goto fail;
for (uint32_t i = 0; i < ARRAY_SIZE(state->depth_decomp); ++i) {
uint32_t samples = 1 << i;
res = create_pass(device, samples, &state->depth_decomp[i].pass);
if (res != VK_SUCCESS)
goto fail;
res = create_pipeline(device, vs_module_h, samples,
state->depth_decomp[i].pass,
&state->depth_decomp[i].decompress_pipeline,
&state->depth_decomp[i].resummarize_pipeline);
if (res != VK_SUCCESS)
goto fail;
}
goto cleanup;
@@ -283,10 +299,15 @@ emit_depth_decomp(struct radv_cmd_buffer *cmd_buffer,
}
enum radv_depth_op {
DEPTH_DECOMPRESS,
DEPTH_RESUMMARIZE,
};
static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
VkImageSubresourceRange *subresourceRange,
VkPipeline pipeline_h)
enum radv_depth_op op)
{
struct radv_meta_saved_state saved_state;
struct radv_meta_saved_pass_state saved_pass_state;
@@ -296,6 +317,9 @@ static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
subresourceRange->baseMipLevel);
uint32_t height = radv_minify(image->info.height,
subresourceRange->baseMipLevel);
uint32_t samples = image->info.samples;
uint32_t samples_log2 = ffs(samples) - 1;
struct radv_meta_state *meta_state = &cmd_buffer->device->meta_state;
if (!image->surface.htile_size)
return;
@@ -339,7 +363,7 @@ static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
radv_CmdBeginRenderPass(cmd_buffer_h,
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = cmd_buffer->device->meta_state.depth_decomp.pass,
.renderPass = meta_state->depth_decomp[samples_log2].pass,
.framebuffer = fb_h,
.renderArea = {
.offset = {
@@ -356,6 +380,18 @@ static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
},
VK_SUBPASS_CONTENTS_INLINE);
VkPipeline pipeline_h;
switch (op) {
case DEPTH_DECOMPRESS:
pipeline_h = meta_state->depth_decomp[samples_log2].decompress_pipeline;
break;
case DEPTH_RESUMMARIZE:
pipeline_h = meta_state->depth_decomp[samples_log2].resummarize_pipeline;
break;
default:
unreachable("unknown operation");
}
emit_depth_decomp(cmd_buffer, &(VkOffset2D){0, 0 }, &(VkExtent2D){width, height}, pipeline_h);
radv_CmdEndRenderPass(cmd_buffer_h);
@@ -371,8 +407,7 @@ void radv_decompress_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
VkImageSubresourceRange *subresourceRange)
{
assert(cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL);
radv_process_depth_image_inplace(cmd_buffer, image, subresourceRange,
cmd_buffer->device->meta_state.depth_decomp.decompress_pipeline);
radv_process_depth_image_inplace(cmd_buffer, image, subresourceRange, DEPTH_DECOMPRESS);
}
void radv_resummarize_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
@@ -380,6 +415,5 @@ void radv_resummarize_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
VkImageSubresourceRange *subresourceRange)
{
assert(cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL);
radv_process_depth_image_inplace(cmd_buffer, image, subresourceRange,
cmd_buffer->device->meta_state.depth_decomp.resummarize_pipeline);
radv_process_depth_image_inplace(cmd_buffer, image, subresourceRange, DEPTH_RESUMMARIZE);
}

View File

@@ -382,6 +382,11 @@ void radv_CmdResolveImage(
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
assert(src_image->info.samples > 1);
if (src_image->info.samples <= 1) {
/* this causes GPU hangs if we get past here */
fprintf(stderr, "radv: Illegal resolve operation (src not multisampled), will hang GPU.");
return;
}
assert(dest_image->info.samples == 1);
if (src_image->info.samples >= 16) {
@@ -607,13 +612,6 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
/* Subpass resolves must respect the render area. We can ignore the
* render area here because vkCmdBeginRenderPass set the render area
* with 3DSTATE_DRAWING_RECTANGLE.
*
* XXX(chadv): Does the hardware really respect
* 3DSTATE_DRAWING_RECTANGLE when draing a 3DPRIM_RECTLIST?
*/
emit_resolve(cmd_buffer,
&(VkOffset2D) { 0, 0 },
&(VkExtent2D) { fb->width, fb->height });

View File

@@ -31,6 +31,45 @@
#include "sid.h"
#include "vk_format.h"
static nir_ssa_def *radv_meta_build_resolve_srgb_conversion(nir_builder *b,
nir_ssa_def *input)
{
nir_const_value v;
unsigned i;
v.u32[0] = 0x3b4d2e1c; // 0.00313080009
nir_ssa_def *cmp[3];
for (i = 0; i < 3; i++)
cmp[i] = nir_flt(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *ltvals[3];
v.f32[0] = 12.92;
for (i = 0; i < 3; i++)
ltvals[i] = nir_fmul(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *gtvals[3];
for (i = 0; i < 3; i++) {
v.f32[0] = 1.0/2.4;
gtvals[i] = nir_fpow(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
v.f32[0] = 1.055;
gtvals[i] = nir_fmul(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
v.f32[0] = 0.055;
gtvals[i] = nir_fsub(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
}
nir_ssa_def *comp[4];
for (i = 0; i < 3; i++)
comp[i] = nir_bcsel(b, cmp[i], ltvals[i], gtvals[i]);
comp[3] = nir_channels(b, input, 1 << 3);
return nir_vec(b, comp, 4);
}
static nir_shader *
build_resolve_compute_shader(struct radv_device *dev, bool is_integer, bool is_srgb, int samples)
{
@@ -88,10 +127,13 @@ build_resolve_compute_shader(struct radv_device *dev, bool is_integer, bool is_s
nir_ssa_def *img_coord = nir_channels(&b, nir_iadd(&b, global_id, &src_offset->dest.ssa), 0x3);
nir_variable *color = nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
radv_meta_build_resolve_shader_core(&b, is_integer, is_srgb, samples,
input_img, color, img_coord);
radv_meta_build_resolve_shader_core(&b, is_integer, samples, input_img,
color, img_coord);
nir_ssa_def *outval = nir_load_var(&b, color);
if (is_srgb)
outval = radv_meta_build_resolve_srgb_conversion(&b, outval);
nir_ssa_def *coord = nir_iadd(&b, global_id, &dst_offset->dest.ssa);
nir_intrinsic_instr *store = nir_intrinsic_instr_create(b.shader, nir_intrinsic_image_store);
store->src[0] = nir_src_for_ssa(coord);
@@ -402,7 +444,7 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(dest_image),
.viewType = radv_meta_get_view_type(dest_image),
.format = dest_image->vk_format,
.format = vk_to_non_srgb_format(dest_image->vk_format),
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = region->dstSubresource.mipLevel,
@@ -479,21 +521,6 @@ radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer *cmd_buffer)
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_subpass resolve_subpass = {
.color_count = 1,
.color_attachments = (VkAttachmentReference[]) { dest_att },
.depth_stencil_attachment = { .attachment = VK_ATTACHMENT_UNUSED },
};
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
/* Subpass resolves must respect the render area. We can ignore the
* render area here because vkCmdBeginRenderPass set the render area
* with 3DSTATE_DRAWING_RECTANGLE.
*
* XXX(chadv): Does the hardware really respect
* 3DSTATE_DRAWING_RECTANGLE when draing a 3DPRIM_RECTLIST?
*/
emit_resolve(cmd_buffer,
src_iview,
dst_iview,

View File

@@ -51,7 +51,7 @@ build_nir_vertex_shader(void)
}
static nir_shader *
build_resolve_fragment_shader(struct radv_device *dev, bool is_integer, bool is_srgb, int samples)
build_resolve_fragment_shader(struct radv_device *dev, bool is_integer, int samples)
{
nir_builder b;
char name[64];
@@ -62,7 +62,7 @@ build_resolve_fragment_shader(struct radv_device *dev, bool is_integer, bool is_
false,
GLSL_TYPE_FLOAT);
snprintf(name, 64, "meta_resolve_fs-%d-%s", samples, is_integer ? "int" : (is_srgb ? "srgb" : "float"));
snprintf(name, 64, "meta_resolve_fs-%d-%s", samples, is_integer ? "int" : "float");
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_strdup(b.shader, name);
@@ -92,8 +92,8 @@ build_resolve_fragment_shader(struct radv_device *dev, bool is_integer, bool is_
nir_ssa_def *img_coord = nir_channels(&b, nir_iadd(&b, pos_int, &src_offset->dest.ssa), 0x3);
nir_variable *color = nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
radv_meta_build_resolve_shader_core(&b, is_integer, is_srgb,samples,
input_img, color, img_coord);
radv_meta_build_resolve_shader_core(&b, is_integer, samples, input_img,
color, img_coord);
nir_ssa_def *outval = nir_load_var(&b, color);
nir_store_var(&b, color_out, outval, 0xf);
@@ -160,6 +160,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,
@@ -175,31 +177,25 @@ create_resolve_pipeline(struct radv_device *device,
VkFormat format)
{
VkResult result;
bool is_integer = false, is_srgb = false;
bool is_integer = false;
uint32_t samples = 1 << samples_log2;
unsigned fs_key = radv_format_meta_fs_key(format);
const VkPipelineVertexInputStateCreateInfo *vi_create_info;
vi_create_info = &normal_vi_create_info;
if (vk_format_is_int(format))
is_integer = true;
else if (vk_format_is_srgb(format))
is_srgb = true;
struct radv_shader_module fs = { .nir = NULL };
fs.nir = build_resolve_fragment_shader(device, is_integer, is_srgb, samples);
fs.nir = build_resolve_fragment_shader(device, is_integer, samples);
struct radv_shader_module vs = {
.nir = build_nir_vertex_shader(),
};
VkRenderPass *rp = is_srgb ?
&device->meta_state.resolve_fragment.rc[samples_log2].srgb_render_pass :
&device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
VkRenderPass *rp = &device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
assert(!*rp);
VkPipeline *pipeline = is_srgb ?
&device->meta_state.resolve_fragment.rc[samples_log2].srgb_pipeline :
&device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
VkPipeline *pipeline = &device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
assert(!*pipeline);
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
@@ -348,8 +344,6 @@ radv_device_init_meta_resolve_fragment_state(struct radv_device *device)
for (unsigned j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) {
res = create_resolve_pipeline(device, i, pipeline_formats[j]);
}
res = create_resolve_pipeline(device, i, VK_FORMAT_R8G8B8A8_SRGB);
}
return res;
@@ -368,12 +362,6 @@ radv_device_finish_meta_resolve_fragment_state(struct radv_device *device)
state->resolve_fragment.rc[i].pipeline[j],
&state->alloc);
}
radv_DestroyRenderPass(radv_device_to_handle(device),
state->resolve_fragment.rc[i].srgb_render_pass,
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_fragment.rc[i].srgb_pipeline,
&state->alloc);
}
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
@@ -430,9 +418,7 @@ emit_resolve(struct radv_cmd_buffer *cmd_buffer,
push_constants);
unsigned fs_key = radv_format_meta_fs_key(dest_iview->vk_format);
VkPipeline pipeline_h = vk_format_is_srgb(dest_iview->vk_format) ?
device->meta_state.resolve_fragment.rc[samples_log2].srgb_pipeline :
device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
VkPipeline pipeline_h = device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
radv_CmdBindPipeline(cmd_buffer_h, VK_PIPELINE_BIND_POINT_GRAPHICS,
pipeline_h);
@@ -483,9 +469,7 @@ void radv_meta_resolve_fragment_image(struct radv_cmd_buffer *cmd_buffer,
radv_fast_clear_flush_image_inplace(cmd_buffer, src_image, &range);
}
rp = vk_format_is_srgb(dest_image->vk_format) ?
device->meta_state.resolve_fragment.rc[samples_log2].srgb_render_pass :
device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
rp = device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t r = 0; r < region_count; ++r) {
@@ -649,13 +633,6 @@ radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer)
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
/* Subpass resolves must respect the render area. We can ignore the
* render area here because vkCmdBeginRenderPass set the render area
* with 3DSTATE_DRAWING_RECTANGLE.
*
* XXX(chadv): Does the hardware really respect
* 3DSTATE_DRAWING_RECTANGLE when draing a 3DPRIM_RECTLIST?
*/
emit_resolve(cmd_buffer,
src_iview,
dest_iview,

View File

@@ -65,6 +65,7 @@ static const struct nir_shader_compiler_options nir_options = {
.lower_unpack_unorm_4x8 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
.lower_ffma = true,
.max_unroll_iterations = 32
};
@@ -161,6 +162,7 @@ radv_optimize_nir(struct nir_shader *shader)
if (nir_opt_trivial_continues(shader)) {
progress = true;
NIR_PASS(progress, shader, nir_copy_prop);
NIR_PASS(progress, shader, nir_opt_remove_phis);
NIR_PASS(progress, shader, nir_opt_dce);
}
NIR_PASS(progress, shader, nir_opt_if);
@@ -273,8 +275,32 @@ radv_shader_compile_to_nir(struct radv_device *device,
nir_shader_gather_info(nir, entry_point->impl);
/* While it would be nice not to have this flag, we are constrained
* by the reality that LLVM 5.0 doesn't have working VGPR indexing
* on GFX9.
*/
bool llvm_has_working_vgpr_indexing =
device->physical_device->rad_info.chip_class <= VI;
/* TODO: Indirect indexing of GS inputs is unimplemented.
*
* TCS and TES load inputs directly from LDS or offchip memory, so
* indirect indexing is trivial.
*/
nir_variable_mode indirect_mask = 0;
indirect_mask |= nir_var_shader_in;
if (!llvm_has_working_vgpr_indexing &&
nir->info.stage != MESA_SHADER_TESS_CTRL)
indirect_mask |= nir_var_shader_out;
/* TODO: We shouldn't need to do this, however LLVM isn't currently
* smart enough to handle indirects without causing excess spilling
* causing the gpu to hang.
*
* See the following thread for more details of the problem:
* https://lists.freedesktop.org/archives/mesa-dev/2017-July/162106.html
*/
indirect_mask |= nir_var_local;
nir_lower_indirect_derefs(nir, indirect_mask);
@@ -852,6 +878,79 @@ static uint32_t si_translate_blend_factor(VkBlendFactor factor)
}
}
static uint32_t si_translate_blend_opt_function(VkBlendOp op)
{
switch (op) {
case VK_BLEND_OP_ADD:
return V_028760_OPT_COMB_ADD;
case VK_BLEND_OP_SUBTRACT:
return V_028760_OPT_COMB_SUBTRACT;
case VK_BLEND_OP_REVERSE_SUBTRACT:
return V_028760_OPT_COMB_REVSUBTRACT;
case VK_BLEND_OP_MIN:
return V_028760_OPT_COMB_MIN;
case VK_BLEND_OP_MAX:
return V_028760_OPT_COMB_MAX;
default:
return V_028760_OPT_COMB_BLEND_DISABLED;
}
}
static uint32_t si_translate_blend_opt_factor(VkBlendFactor factor, bool is_alpha)
{
switch (factor) {
case VK_BLEND_FACTOR_ZERO:
return V_028760_BLEND_OPT_PRESERVE_NONE_IGNORE_ALL;
case VK_BLEND_FACTOR_ONE:
return V_028760_BLEND_OPT_PRESERVE_ALL_IGNORE_NONE;
case VK_BLEND_FACTOR_SRC_COLOR:
return is_alpha ? V_028760_BLEND_OPT_PRESERVE_A1_IGNORE_A0
: V_028760_BLEND_OPT_PRESERVE_C1_IGNORE_C0;
case VK_BLEND_FACTOR_ONE_MINUS_SRC_COLOR:
return is_alpha ? V_028760_BLEND_OPT_PRESERVE_A0_IGNORE_A1
: V_028760_BLEND_OPT_PRESERVE_C0_IGNORE_C1;
case VK_BLEND_FACTOR_SRC_ALPHA:
return V_028760_BLEND_OPT_PRESERVE_A1_IGNORE_A0;
case VK_BLEND_FACTOR_ONE_MINUS_SRC_ALPHA:
return V_028760_BLEND_OPT_PRESERVE_A0_IGNORE_A1;
case VK_BLEND_FACTOR_SRC_ALPHA_SATURATE:
return is_alpha ? V_028760_BLEND_OPT_PRESERVE_ALL_IGNORE_NONE
: V_028760_BLEND_OPT_PRESERVE_NONE_IGNORE_A0;
default:
return V_028760_BLEND_OPT_PRESERVE_NONE_IGNORE_NONE;
}
}
/**
* Get rid of DST in the blend factors by commuting the operands:
* func(src * DST, dst * 0) ---> func(src * 0, dst * SRC)
*/
static void si_blend_remove_dst(unsigned *func, unsigned *src_factor,
unsigned *dst_factor, unsigned expected_dst,
unsigned replacement_src)
{
if (*src_factor == expected_dst &&
*dst_factor == VK_BLEND_FACTOR_ZERO) {
*src_factor = VK_BLEND_FACTOR_ZERO;
*dst_factor = replacement_src;
/* Commuting the operands requires reversing subtractions. */
if (*func == VK_BLEND_OP_SUBTRACT)
*func = VK_BLEND_OP_REVERSE_SUBTRACT;
else if (*func == VK_BLEND_OP_REVERSE_SUBTRACT)
*func = VK_BLEND_OP_SUBTRACT;
}
}
static bool si_blend_factor_uses_dst(unsigned factor)
{
return factor == VK_BLEND_FACTOR_DST_COLOR ||
factor == VK_BLEND_FACTOR_DST_ALPHA ||
factor == VK_BLEND_FACTOR_SRC_ALPHA_SATURATE ||
factor == VK_BLEND_FACTOR_ONE_MINUS_DST_ALPHA ||
factor == VK_BLEND_FACTOR_ONE_MINUS_DST_COLOR;
}
static bool is_dual_src(VkBlendFactor factor)
{
switch (factor) {
@@ -1067,20 +1166,37 @@ format_is_int8(VkFormat format)
desc->channel[channel].size == 8;
}
static bool
format_is_int10(VkFormat format)
{
const struct vk_format_description *desc = vk_format_description(format);
if (desc->nr_channels != 4)
return false;
for (unsigned i = 0; i < 4; i++) {
if (desc->channel[i].pure_integer && desc->channel[i].size == 10)
return true;
}
return false;
}
unsigned radv_format_meta_fs_key(VkFormat format)
{
unsigned col_format = si_choose_spi_color_format(format, false, false) - 1;
bool is_int8 = format_is_int8(format);
bool is_int10 = format_is_int10(format);
return col_format + (is_int8 ? 3 : 0);
return col_format + (is_int8 ? 3 : is_int10 ? 5 : 0);
}
static unsigned
radv_pipeline_compute_is_int8(const VkGraphicsPipelineCreateInfo *pCreateInfo)
static void
radv_pipeline_compute_get_int_clamp(const VkGraphicsPipelineCreateInfo *pCreateInfo,
unsigned *is_int8, unsigned *is_int10)
{
RADV_FROM_HANDLE(radv_render_pass, pass, pCreateInfo->renderPass);
struct radv_subpass *subpass = pass->subpasses + pCreateInfo->subpass;
unsigned is_int8 = 0;
*is_int8 = 0;
*is_int10 = 0;
for (unsigned i = 0; i < subpass->color_count; ++i) {
struct radv_render_pass_attachment *attachment;
@@ -1091,10 +1207,10 @@ radv_pipeline_compute_is_int8(const VkGraphicsPipelineCreateInfo *pCreateInfo)
attachment = pass->attachments + subpass->color_attachments[i].attachment;
if (format_is_int8(attachment->format))
is_int8 |= 1 << i;
*is_int8 |= 1 << i;
if (format_is_int10(attachment->format))
*is_int10 |= 1 << i;
}
return is_int8;
}
static void
@@ -1132,6 +1248,7 @@ radv_pipeline_init_blend_state(struct radv_pipeline *pipeline,
for (i = 0; i < vkblend->attachmentCount; i++) {
const VkPipelineColorBlendAttachmentState *att = &vkblend->pAttachments[i];
unsigned blend_cntl = 0;
unsigned srcRGB_opt, dstRGB_opt, srcA_opt, dstA_opt;
VkBlendOp eqRGB = att->colorBlendOp;
VkBlendFactor srcRGB = att->srcColorBlendFactor;
VkBlendFactor dstRGB = att->dstColorBlendFactor;
@@ -1139,7 +1256,7 @@ radv_pipeline_init_blend_state(struct radv_pipeline *pipeline,
VkBlendFactor srcA = att->srcAlphaBlendFactor;
VkBlendFactor dstA = att->dstAlphaBlendFactor;
blend->sx_mrt0_blend_opt[i] = S_028760_COLOR_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED) | S_028760_ALPHA_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED);
blend->sx_mrt_blend_opt[i] = S_028760_COLOR_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED) | S_028760_ALPHA_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED);
if (!att->colorWriteMask)
continue;
@@ -1163,6 +1280,50 @@ radv_pipeline_init_blend_state(struct radv_pipeline *pipeline,
dstA = VK_BLEND_FACTOR_ONE;
}
/* Blending optimizations for RB+.
* These transformations don't change the behavior.
*
* First, get rid of DST in the blend factors:
* func(src * DST, dst * 0) ---> func(src * 0, dst * SRC)
*/
si_blend_remove_dst(&eqRGB, &srcRGB, &dstRGB,
VK_BLEND_FACTOR_DST_COLOR,
VK_BLEND_FACTOR_SRC_COLOR);
si_blend_remove_dst(&eqA, &srcA, &dstA,
VK_BLEND_FACTOR_DST_COLOR,
VK_BLEND_FACTOR_SRC_COLOR);
si_blend_remove_dst(&eqA, &srcA, &dstA,
VK_BLEND_FACTOR_DST_ALPHA,
VK_BLEND_FACTOR_SRC_ALPHA);
/* Look up the ideal settings from tables. */
srcRGB_opt = si_translate_blend_opt_factor(srcRGB, false);
dstRGB_opt = si_translate_blend_opt_factor(dstRGB, false);
srcA_opt = si_translate_blend_opt_factor(srcA, true);
dstA_opt = si_translate_blend_opt_factor(dstA, true);
/* Handle interdependencies. */
if (si_blend_factor_uses_dst(srcRGB))
dstRGB_opt = V_028760_BLEND_OPT_PRESERVE_NONE_IGNORE_NONE;
if (si_blend_factor_uses_dst(srcA))
dstA_opt = V_028760_BLEND_OPT_PRESERVE_NONE_IGNORE_NONE;
if (srcRGB == VK_BLEND_FACTOR_SRC_ALPHA_SATURATE &&
(dstRGB == VK_BLEND_FACTOR_ZERO ||
dstRGB == VK_BLEND_FACTOR_SRC_ALPHA ||
dstRGB == VK_BLEND_FACTOR_SRC_ALPHA_SATURATE))
dstRGB_opt = V_028760_BLEND_OPT_PRESERVE_NONE_IGNORE_A0;
/* Set the final value. */
blend->sx_mrt_blend_opt[i] =
S_028760_COLOR_SRC_OPT(srcRGB_opt) |
S_028760_COLOR_DST_OPT(dstRGB_opt) |
S_028760_COLOR_COMB_FCN(si_translate_blend_opt_function(eqRGB)) |
S_028760_ALPHA_SRC_OPT(srcA_opt) |
S_028760_ALPHA_DST_OPT(dstA_opt) |
S_028760_ALPHA_COMB_FCN(si_translate_blend_opt_function(eqA));
blend_cntl |= S_028780_ENABLE(1);
blend_cntl |= S_028780_COLOR_COMB_FCN(si_translate_blend_function(eqRGB));
@@ -1186,8 +1347,14 @@ radv_pipeline_init_blend_state(struct radv_pipeline *pipeline,
dstRGB == VK_BLEND_FACTOR_ONE_MINUS_SRC_ALPHA)
blend_need_alpha |= 1 << i;
}
for (i = vkblend->attachmentCount; i < 8; i++)
for (i = vkblend->attachmentCount; i < 8; i++) {
blend->cb_blend_control[i] = 0;
blend->sx_mrt_blend_opt[i] = S_028760_COLOR_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED) | S_028760_ALPHA_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED);
}
/* disable RB+ for now */
if (pipeline->device->physical_device->has_rbplus)
blend->cb_color_control |= S_028808_DISABLE_DUAL_QUAD(1);
if (blend->cb_target_mask)
blend->cb_color_control |= S_028808_MODE(mode);
@@ -2053,9 +2220,11 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
}
if (modules[MESA_SHADER_FRAGMENT]) {
union ac_shader_variant_key key;
union ac_shader_variant_key key = {0};
key.fs.col_format = pipeline->graphics.blend.spi_shader_col_format;
key.fs.is_int8 = radv_pipeline_compute_is_int8(pCreateInfo);
if (pipeline->device->physical_device->rad_info.chip_class < VI)
radv_pipeline_compute_get_int_clamp(pCreateInfo, &key.fs.is_int8, &key.fs.is_int10);
const VkPipelineShaderStageCreateInfo *stage = pStages[MESA_SHADER_FRAGMENT];
@@ -2180,6 +2349,9 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
S_02880C_EXEC_ON_HIER_FAIL(ps->info.fs.writes_memory) |
S_02880C_EXEC_ON_NOOP(ps->info.fs.writes_memory);
if (pipeline->device->physical_device->has_rbplus)
pipeline->graphics.db_shader_control |= S_02880C_DUAL_QUAD_DISABLE(1);
pipeline->graphics.shader_z_format =
ps->info.fs.writes_sample_mask ? V_028710_SPI_SHADER_32_ABGR :
ps->info.fs.writes_stencil ? V_028710_SPI_SHADER_32_GR :

View File

@@ -118,6 +118,9 @@ radv_pipeline_cache_search_unlocked(struct radv_pipeline_cache *cache,
const uint32_t mask = cache->table_size - 1;
const uint32_t start = (*(uint32_t *) sha1);
if (cache->table_size == 0)
return NULL;
for (uint32_t i = 0; i < cache->table_size; i++) {
const uint32_t index = (start + i) & mask;
struct cache_entry *entry = cache->hash_table[index];

View File

@@ -84,7 +84,7 @@ typedef uint32_t xcb_window_t;
#define MAX_PUSH_DESCRIPTORS 32
#define MAX_DYNAMIC_BUFFERS 16
#define MAX_SAMPLES_LOG2 4
#define NUM_META_FS_KEYS 11
#define NUM_META_FS_KEYS 13
#define RADV_MAX_DRM_DEVICES 8
#define NUM_DEPTH_CLEAR_PIPELINES 3
@@ -266,7 +266,7 @@ struct radv_physical_device {
struct radeon_winsys *ws;
struct radeon_info rad_info;
char path[20];
const char * name;
char name[VK_MAX_PHYSICAL_DEVICE_NAME_SIZE];
uint8_t uuid[VK_UUID_SIZE];
uint8_t device_uuid[VK_UUID_SIZE];
@@ -276,6 +276,9 @@ struct radv_physical_device {
bool has_rbplus; /* if RB+ register exist */
bool rbplus_allowed; /* if RB+ is allowed */
VkPhysicalDeviceMemoryProperties memory_properties;
enum radv_mem_type mem_type_indices[RADV_MEM_TYPE_COUNT];
};
struct radv_instance {
@@ -433,8 +436,6 @@ struct radv_meta_state {
VkPipelineLayout p_layout;
struct {
VkRenderPass srgb_render_pass;
VkPipeline srgb_pipeline;
VkRenderPass render_pass[NUM_META_FS_KEYS];
VkPipeline pipeline[NUM_META_FS_KEYS];
} rc[MAX_SAMPLES_LOG2];
@@ -444,7 +445,7 @@ struct radv_meta_state {
VkPipeline decompress_pipeline;
VkPipeline resummarize_pipeline;
VkRenderPass pass;
} depth_decomp;
} depth_decomp[1 + MAX_SAMPLES_LOG2];
struct {
VkPipeline cmask_eliminate_pipeline;
@@ -1002,7 +1003,7 @@ struct radv_depth_stencil_state {
struct radv_blend_state {
uint32_t cb_color_control;
uint32_t cb_target_mask;
uint32_t sx_mrt0_blend_opt[8];
uint32_t sx_mrt_blend_opt[8];
uint32_t cb_blend_control[8];
uint32_t spi_shader_col_format;
@@ -1215,14 +1216,14 @@ struct radv_image {
/* Set when bound */
struct radeon_winsys_bo *bo;
VkDeviceSize offset;
uint32_t dcc_offset;
uint32_t htile_offset;
uint64_t dcc_offset;
uint64_t htile_offset;
struct radeon_surf surface;
struct radv_fmask_info fmask;
struct radv_cmask_info cmask;
uint32_t clear_value_offset;
uint32_t dcc_pred_offset;
uint64_t clear_value_offset;
uint64_t dcc_pred_offset;
};
/* Whether the image has a htile that is known consistent with the contents of
@@ -1280,6 +1281,7 @@ struct radv_image_view {
uint32_t base_layer;
uint32_t layer_count;
uint32_t base_mip;
uint32_t level_count;
VkExtent3D extent; /**< Extent of VkImageViewCreateInfo::baseMipLevel. */
uint32_t descriptor[8];

View File

@@ -653,7 +653,7 @@ static void radv_query_shader(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device;
struct radv_meta_saved_compute_state saved_state;
radv_meta_save_compute(&saved_state, cmd_buffer, 4);
radv_meta_save_compute(&saved_state, cmd_buffer, 16);
struct radv_buffer dst_buffer = {
.bo = dst_bo,
@@ -737,7 +737,7 @@ static void radv_query_shader(struct radv_cmd_buffer *cmd_buffer,
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_CS_PARTIAL_FLUSH;
radv_meta_restore_compute(&saved_state, cmd_buffer, 4);
radv_meta_restore_compute(&saved_state, cmd_buffer, 16);
}
VkResult radv_CreateQueryPool(

View File

@@ -51,7 +51,8 @@ enum radeon_bo_flag { /* bitfield */
RADEON_FLAG_GTT_WC = (1 << 0),
RADEON_FLAG_CPU_ACCESS = (1 << 1),
RADEON_FLAG_NO_CPU_ACCESS = (1 << 2),
RADEON_FLAG_VIRTUAL = (1 << 3)
RADEON_FLAG_VIRTUAL = (1 << 3),
RADEON_FLAG_VA_UNCACHED = (1 << 4),
};
enum radeon_bo_usage { /* bitfield */

View File

@@ -154,6 +154,7 @@ radv_wsi_image_create(VkDevice device_h,
VkImage image_h;
struct radv_image *image;
int fd;
RADV_FROM_HANDLE(radv_device, device, device_h);
result = radv_image_create(device_h,
&(struct radv_image_create_info) {
@@ -192,12 +193,26 @@ radv_wsi_image_create(VkDevice device_h,
.image = image_h
};
/* Find the first VRAM memory type, or GART for PRIME images. */
int memory_type_index = -1;
for (int i = 0; i < device->physical_device->memory_properties.memoryTypeCount; ++i) {
bool is_local = !!(device->physical_device->memory_properties.memoryTypes[i].propertyFlags & VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
if ((linear && !is_local) || (!linear && is_local)) {
memory_type_index = i;
break;
}
}
/* fallback */
if (memory_type_index == -1)
memory_type_index = 0;
result = radv_AllocateMemory(device_h,
&(VkMemoryAllocateInfo) {
.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,
.pNext = &ded_alloc,
.allocationSize = image->size,
.memoryTypeIndex = linear ? 1 : 0,
.memoryTypeIndex = memory_type_index,
},
NULL /* XXX: pAllocator */,
&memory_h);
@@ -211,7 +226,6 @@ radv_wsi_image_create(VkDevice device_h,
* or the fd for the linear image if a copy is required.
*/
if (!needs_linear_copy || (needs_linear_copy && linear)) {
RADV_FROM_HANDLE(radv_device, device, device_h);
RADV_FROM_HANDLE(radv_device_memory, memory, memory_h);
if (!radv_get_memory_fd(device, memory, &fd))
goto fail_alloc_memory;
@@ -224,7 +238,11 @@ radv_wsi_image_create(VkDevice device_h,
*memory_p = memory_h;
*size = image->size;
*offset = image->offset;
*row_pitch = surface->u.legacy.level[0].nblk_x * surface->bpe;
if (device->physical_device->rad_info.chip_class >= GFX9)
*row_pitch = surface->u.gfx9.surf_pitch * surface->bpe;
else
*row_pitch = surface->u.legacy.level[0].nblk_x * surface->bpe;
return VK_SUCCESS;
fail_alloc_memory:
radv_FreeMemory(device_h, memory_h, pAllocator);

View File

@@ -1133,15 +1133,20 @@ si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer)
void
si_emit_set_predication_state(struct radv_cmd_buffer *cmd_buffer, uint64_t va)
{
uint32_t val = 0;
uint32_t op = 0;
if (va)
val = (((va >> 32) & 0xff) |
PRED_OP(PREDICATION_OP_BOOL64)|
PREDICATION_DRAW_VISIBLE);
radeon_emit(cmd_buffer->cs, PKT3(PKT3_SET_PREDICATION, 1, 0));
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, val);
op = PRED_OP(PREDICATION_OP_BOOL64) | PREDICATION_DRAW_VISIBLE;
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_emit(cmd_buffer->cs, PKT3(PKT3_SET_PREDICATION, 2, 0));
radeon_emit(cmd_buffer->cs, op);
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
} else {
radeon_emit(cmd_buffer->cs, PKT3(PKT3_SET_PREDICATION, 1, 0));
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, op | ((va >> 32) & 0xFF));
}
}
/* Set this if you want the 3D engine to wait until CP DMA is done.

View File

@@ -465,4 +465,27 @@ vk_format_get_component_bits(VkFormat format,
}
}
static inline VkFormat
vk_to_non_srgb_format(VkFormat format)
{
switch(format) {
case VK_FORMAT_R8_SRGB :
return VK_FORMAT_R8_UNORM;
case VK_FORMAT_R8G8_SRGB:
return VK_FORMAT_R8G8_UNORM;
case VK_FORMAT_R8G8B8_SRGB:
return VK_FORMAT_R8G8B8_UNORM;
case VK_FORMAT_B8G8R8_SRGB:
return VK_FORMAT_B8G8R8_UNORM;
case VK_FORMAT_R8G8B8A8_SRGB :
return VK_FORMAT_R8G8B8A8_UNORM;
case VK_FORMAT_B8G8R8A8_SRGB:
return VK_FORMAT_B8G8R8A8_UNORM;
case VK_FORMAT_A8B8G8R8_SRGB_PACK32:
return VK_FORMAT_A8B8G8R8_UNORM_PACK32;
default:
return format;
}
}
#endif /* VK_FORMAT_H */

View File

@@ -39,6 +39,23 @@
static void radv_amdgpu_winsys_bo_destroy(struct radeon_winsys_bo *_bo);
static int
radv_amdgpu_bo_va_op(amdgpu_device_handle dev,
amdgpu_bo_handle bo,
uint64_t offset,
uint64_t size,
uint64_t addr,
uint64_t flags,
uint32_t ops)
{
size = ALIGN(size, getpagesize());
flags |= (AMDGPU_VM_PAGE_READABLE |
AMDGPU_VM_PAGE_WRITEABLE |
AMDGPU_VM_PAGE_EXECUTABLE);
return amdgpu_bo_va_op_raw(dev, bo, offset, size, addr,
flags, ops);
}
static void
radv_amdgpu_winsys_virtual_map(struct radv_amdgpu_winsys_bo *bo,
const struct radv_amdgpu_map_range *range)
@@ -49,8 +66,8 @@ radv_amdgpu_winsys_virtual_map(struct radv_amdgpu_winsys_bo *bo,
return; /* TODO: PRT mapping */
p_atomic_inc(&range->bo->ref_count);
int r = amdgpu_bo_va_op(range->bo->bo, range->bo_offset, range->size,
range->offset + bo->va, 0, AMDGPU_VA_OP_MAP);
int r = radv_amdgpu_bo_va_op(bo->ws->dev, range->bo->bo, range->bo_offset, range->size,
range->offset + bo->va, 0, AMDGPU_VA_OP_MAP);
if (r)
abort();
}
@@ -64,8 +81,8 @@ radv_amdgpu_winsys_virtual_unmap(struct radv_amdgpu_winsys_bo *bo,
if (!range->bo)
return; /* TODO: PRT mapping */
int r = amdgpu_bo_va_op(range->bo->bo, range->bo_offset, range->size,
range->offset + bo->va, 0, AMDGPU_VA_OP_UNMAP);
int r = radv_amdgpu_bo_va_op(bo->ws->dev, range->bo->bo, range->bo_offset, range->size,
range->offset + bo->va, 0, AMDGPU_VA_OP_UNMAP);
if (r)
abort();
radv_amdgpu_winsys_bo_destroy((struct radeon_winsys_bo *)range->bo);
@@ -149,6 +166,7 @@ radv_amdgpu_winsys_bo_virtual_bind(struct radeon_winsys_bo *_parent,
if (parent->ranges[first].bo == bo && (!bo || offset - bo_offset == parent->ranges[first].offset - parent->ranges[first].bo_offset)) {
size += offset - parent->ranges[first].offset;
offset = parent->ranges[first].offset;
bo_offset = parent->ranges[first].bo_offset;
remove_first = true;
}
@@ -234,7 +252,7 @@ static void radv_amdgpu_winsys_bo_destroy(struct radeon_winsys_bo *_bo)
bo->ws->num_buffers--;
pthread_mutex_unlock(&bo->ws->global_bo_list_lock);
}
amdgpu_bo_va_op(bo->bo, 0, bo->size, bo->va, 0, AMDGPU_VA_OP_UNMAP);
radv_amdgpu_bo_va_op(bo->ws->dev, bo->bo, 0, bo->size, bo->va, 0, AMDGPU_VA_OP_UNMAP);
amdgpu_bo_free(bo->bo);
}
amdgpu_va_range_free(bo->va_handle);
@@ -322,7 +340,11 @@ radv_amdgpu_winsys_bo_create(struct radeon_winsys *_ws,
goto error_bo_alloc;
}
r = amdgpu_bo_va_op(buf_handle, 0, size, va, 0, AMDGPU_VA_OP_MAP);
uint32_t va_flags = 0;
if ((flags & RADEON_FLAG_VA_UNCACHED) && ws->info.chip_class >= GFX9)
va_flags |= AMDGPU_VM_MTYPE_UC;
r = radv_amdgpu_bo_va_op(ws->dev, buf_handle, 0, size, va, va_flags, AMDGPU_VA_OP_MAP);
if (r)
goto error_va_map;
@@ -398,7 +420,7 @@ radv_amdgpu_winsys_bo_from_fd(struct radeon_winsys *_ws,
if (r)
goto error_query;
r = amdgpu_bo_va_op(result.buf_handle, 0, result.alloc_size, va, 0, AMDGPU_VA_OP_MAP);
r = radv_amdgpu_bo_va_op(ws->dev, result.buf_handle, 0, result.alloc_size, va, 0, AMDGPU_VA_OP_MAP);
if (r)
goto error_va_map;

View File

@@ -86,6 +86,7 @@ LIBGLSL_FILES = \
glsl/lower_buffer_access.cpp \
glsl/lower_buffer_access.h \
glsl/lower_const_arrays_to_uniforms.cpp \
glsl/lower_cs_derived.cpp \
glsl/lower_discard.cpp \
glsl/lower_discard_flow.cpp \
glsl/lower_distance.cpp \
@@ -140,7 +141,9 @@ LIBGLSL_FILES = \
glsl/program.h \
glsl/propagate_invariance.cpp \
glsl/s_expression.cpp \
glsl/s_expression.h
glsl/s_expression.h \
glsl/string_to_uint_map.cpp \
glsl/string_to_uint_map.h
LIBGLSL_SHADER_CACHE_FILES = \
glsl/shader_cache.cpp \

View File

@@ -224,19 +224,28 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
val = ((ir_swizzle *)val)->val;
}
while (val->ir_type == ir_type_dereference_array) {
val = ((ir_dereference_array *)val)->array;
for (;;) {
if (val->ir_type == ir_type_dereference_array) {
val = ((ir_dereference_array *)val)->array;
} else if (val->ir_type == ir_type_dereference_record &&
!state->es_shader) {
val = ((ir_dereference_record *)val)->record;
} else
break;
}
if (!val->as_dereference_variable() ||
val->variable_referenced()->data.mode != ir_var_shader_in) {
ir_variable *var = NULL;
if (const ir_dereference_variable *deref_var = val->as_dereference_variable())
var = deref_var->variable_referenced();
if (!var || var->data.mode != ir_var_shader_in) {
_mesa_glsl_error(&loc, state,
"parameter `%s` must be a shader input",
formal->name);
return false;
}
val->variable_referenced()->data.must_be_shader_input = 1;
var->data.must_be_shader_input = 1;
}
/* Verify that 'out' and 'inout' actual parameters are lvalues. */
@@ -663,8 +672,13 @@ generate_array_index(void *mem_ctx, exec_list *instructions,
ir_variable *sub_var = NULL;
*function_name = array->primary_expression.identifier;
match_subroutine_by_name(*function_name, actual_parameters,
state, &sub_var);
if (!match_subroutine_by_name(*function_name, actual_parameters,
state, &sub_var)) {
_mesa_glsl_error(&loc, state, "Unknown subroutine `%s'",
*function_name);
*function_name = NULL; /* indicate error condition to caller */
return NULL;
}
ir_rvalue *outer_array_idx = idx->hir(instructions, state);
return new(mem_ctx) ir_dereference_array(sub_var, outer_array_idx);

View File

@@ -4495,7 +4495,7 @@ process_initializer(ir_variable *var, ast_declaration *decl,
} else {
if (var->type->is_numeric()) {
/* Reduce cascading errors. */
var->constant_value = type->qualifier.flags.q.constant
rhs = var->constant_value = type->qualifier.flags.q.constant
? ir_constant::zero(state, var->type) : NULL;
}
}

View File

@@ -46,6 +46,9 @@ grow_to_fit(struct blob *blob, size_t additional)
size_t to_allocate;
uint8_t *new_data;
if (blob->out_of_memory)
return false;
if (blob->size + additional <= blob->allocated)
return true;
@@ -57,8 +60,10 @@ grow_to_fit(struct blob *blob, size_t additional)
to_allocate = MAX2(to_allocate, blob->allocated + additional);
new_data = realloc(blob->data, to_allocate);
if (new_data == NULL)
if (new_data == NULL) {
blob->out_of_memory = true;
return false;
}
blob->data = new_data;
blob->allocated = to_allocate;
@@ -104,6 +109,7 @@ blob_create()
blob->data = NULL;
blob->allocated = 0;
blob->size = 0;
blob->out_of_memory = false;
return blob;
}
@@ -207,6 +213,9 @@ blob_reader_init(struct blob_reader *blob, uint8_t *data, size_t size)
static bool
ensure_can_read(struct blob_reader *blob, size_t size)
{
if (blob->overrun)
return false;
if (blob->current < blob->end && blob->end - blob->current >= size)
return true;

View File

@@ -55,6 +55,12 @@ struct blob {
/** The number of bytes that have actual data written to them. */
size_t size;
/**
* True if we've ever failed to realloc or if we go pas the end of a fixed
* allocation blob.
*/
bool out_of_memory;
};
/* When done reading, the caller can ensure that everything was consumed by

View File

@@ -90,9 +90,9 @@ static const struct gl_builtin_uniform_element gl_LightSource_elements[] = {
SWIZZLE_Y,
SWIZZLE_Z,
SWIZZLE_Z)},
{"spotCosCutoff", {STATE_LIGHT, 0, STATE_SPOT_DIRECTION}, SWIZZLE_WWWW},
{"spotCutoff", {STATE_LIGHT, 0, STATE_SPOT_CUTOFF}, SWIZZLE_XXXX},
{"spotExponent", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_WWWW},
{"spotCutoff", {STATE_LIGHT, 0, STATE_SPOT_CUTOFF}, SWIZZLE_XXXX},
{"spotCosCutoff", {STATE_LIGHT, 0, STATE_SPOT_DIRECTION}, SWIZZLE_WWWW},
{"constantAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_XXXX},
{"linearAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_YYYY},
{"quadraticAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_ZZZZ},
@@ -1290,15 +1290,10 @@ builtin_variable_generator::generate_cs_special_vars()
uvec3_t, "gl_LocalGroupSizeARB");
}
if (state->ctx->Const.LowerCsDerivedVariables) {
add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0);
add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0);
} else {
add_system_value(SYSTEM_VALUE_GLOBAL_INVOCATION_ID,
uvec3_t, "gl_GlobalInvocationID");
add_system_value(SYSTEM_VALUE_LOCAL_INVOCATION_INDEX,
uint_t, "gl_LocalInvocationIndex");
}
add_system_value(SYSTEM_VALUE_GLOBAL_INVOCATION_ID,
uvec3_t, "gl_GlobalInvocationID");
add_system_value(SYSTEM_VALUE_LOCAL_INVOCATION_INDEX,
uint_t, "gl_LocalInvocationIndex");
}
@@ -1469,84 +1464,3 @@ _mesa_glsl_initialize_variables(exec_list *instructions,
break;
}
}
/**
* Initialize compute shader variables with values that are derived from other
* compute shader variable.
*/
static void
initialize_cs_derived_variables(gl_shader *shader,
ir_function_signature *const main_sig)
{
assert(shader->Stage == MESA_SHADER_COMPUTE);
ir_variable *gl_GlobalInvocationID =
shader->symbols->get_variable("gl_GlobalInvocationID");
assert(gl_GlobalInvocationID);
ir_variable *gl_WorkGroupID =
shader->symbols->get_variable("gl_WorkGroupID");
assert(gl_WorkGroupID);
ir_variable *gl_WorkGroupSize =
shader->symbols->get_variable("gl_WorkGroupSize");
if (gl_WorkGroupSize == NULL) {
void *const mem_ctx = ralloc_parent(shader->ir);
gl_WorkGroupSize = new(mem_ctx) ir_variable(glsl_type::uvec3_type,
"gl_WorkGroupSize",
ir_var_auto);
gl_WorkGroupSize->data.how_declared = ir_var_declared_implicitly;
gl_WorkGroupSize->data.read_only = true;
shader->ir->push_head(gl_WorkGroupSize);
}
ir_variable *gl_LocalInvocationID =
shader->symbols->get_variable("gl_LocalInvocationID");
assert(gl_LocalInvocationID);
/* gl_GlobalInvocationID =
* gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
*/
ir_instruction *inst =
assign(gl_GlobalInvocationID,
add(mul(gl_WorkGroupID, gl_WorkGroupSize),
gl_LocalInvocationID));
main_sig->body.push_head(inst);
/* gl_LocalInvocationIndex =
* gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
* gl_LocalInvocationID.y * gl_WorkGroupSize.x +
* gl_LocalInvocationID.x;
*/
ir_expression *index_z =
mul(mul(swizzle_z(gl_LocalInvocationID), swizzle_x(gl_WorkGroupSize)),
swizzle_y(gl_WorkGroupSize));
ir_expression *index_y =
mul(swizzle_y(gl_LocalInvocationID), swizzle_x(gl_WorkGroupSize));
ir_expression *index_y_plus_z = add(index_y, index_z);
operand index_x(swizzle_x(gl_LocalInvocationID));
ir_expression *index_x_plus_y_plus_z = add(index_y_plus_z, index_x);
ir_variable *gl_LocalInvocationIndex =
shader->symbols->get_variable("gl_LocalInvocationIndex");
assert(gl_LocalInvocationIndex);
inst = assign(gl_LocalInvocationIndex, index_x_plus_y_plus_z);
main_sig->body.push_head(inst);
}
/**
* Initialize builtin variables with values based on other builtin variables.
* These are initialized in the main function.
*/
void
_mesa_glsl_initialize_derived_variables(struct gl_context *ctx,
gl_shader *shader)
{
/* We only need to set CS variables currently. */
if (shader->Stage == MESA_SHADER_COMPUTE &&
ctx->Const.LowerCsDerivedVariables) {
ir_function_signature *const main_sig =
_mesa_get_main_function_signature(shader->symbols);
if (main_sig != NULL)
initialize_cs_derived_variables(shader, main_sig);
}
}

View File

@@ -1863,6 +1863,49 @@ set_shader_inout_layout(struct gl_shader *shader,
shader->bound_image = state->bound_image_specified;
}
/* src can be NULL if only the symbols found in the exec_list should be
* copied
*/
void
_mesa_glsl_copy_symbols_from_table(struct exec_list *shader_ir,
struct glsl_symbol_table *src,
struct glsl_symbol_table *dest)
{
foreach_in_list (ir_instruction, ir, shader_ir) {
switch (ir->ir_type) {
case ir_type_function:
dest->add_function((ir_function *) ir);
break;
case ir_type_variable: {
ir_variable *const var = (ir_variable *) ir;
if (var->data.mode != ir_var_temporary)
dest->add_variable(var);
break;
}
default:
break;
}
}
if (src != NULL) {
/* Explicitly copy the gl_PerVertex interface definitions because these
* are needed to check they are the same during the interstage link.
* They cant necessarily be found via the exec_list because the members
* might not be referenced. The GL spec still requires that they match
* in that case.
*/
const glsl_type *iface =
src->get_interface("gl_PerVertex", ir_var_shader_in);
if (iface)
dest->add_interface(iface->name, iface, ir_var_shader_in);
iface = src->get_interface("gl_PerVertex", ir_var_shader_out);
if (iface)
dest->add_interface(iface->name, iface, ir_var_shader_out);
}
}
extern "C" {
static void
@@ -1937,6 +1980,7 @@ do_late_parsing_checks(struct _mesa_glsl_parse_state *state)
static void
opt_shader_and_create_symbol_table(struct gl_context *ctx,
struct glsl_symbol_table *source_symbols,
struct gl_shader *shader)
{
assert(shader->CompileStatus != compile_failure &&
@@ -1994,24 +2038,8 @@ opt_shader_and_create_symbol_table(struct gl_context *ctx,
* We don't have to worry about types or interface-types here because those
* are fly-weights that are looked up by glsl_type.
*/
foreach_in_list (ir_instruction, ir, shader->ir) {
switch (ir->ir_type) {
case ir_type_function:
shader->symbols->add_function((ir_function *) ir);
break;
case ir_type_variable: {
ir_variable *const var = (ir_variable *) ir;
if (var->data.mode != ir_var_temporary)
shader->symbols->add_variable(var);
break;
}
default:
break;
}
}
_mesa_glsl_initialize_derived_variables(ctx, shader);
_mesa_glsl_copy_symbols_from_table(shader->ir, source_symbols,
shader->symbols);
}
void
@@ -2048,7 +2076,9 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
return;
if (shader->CompileStatus == compiled_no_opts) {
opt_shader_and_create_symbol_table(ctx, shader);
opt_shader_and_create_symbol_table(ctx,
NULL, /* source_symbols */
shader);
shader->CompileStatus = compile_success;
return;
}
@@ -2109,7 +2139,7 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
lower_subroutine(shader->ir, state);
if (!ctx->Cache || force_recompile)
opt_shader_and_create_symbol_table(ctx, shader);
opt_shader_and_create_symbol_table(ctx, state->symbols, shader);
else {
reparent_ir(shader->ir, shader->ir);
shader->CompileStatus = compiled_no_opts;

View File

@@ -948,6 +948,11 @@ extern int glcpp_preprocess(void *ctx, const char **shader, char **info_log,
extern void _mesa_destroy_shader_compiler(void);
extern void _mesa_destroy_shader_compiler_caches(void);
extern void
_mesa_glsl_copy_symbols_from_table(struct exec_list *shader_ir,
struct glsl_symbol_table *src,
struct glsl_symbol_table *dest);
#ifdef __cplusplus
}
#endif

View File

@@ -1158,8 +1158,6 @@ nir_visitor::visit(ir_call *ir)
case nir_intrinsic_vote_eq: {
nir_ssa_dest_init(&instr->instr, &instr->dest, 1, 32, NULL);
instr->variables[0] = evaluate_deref(&instr->instr, ir->return_deref);
ir_rvalue *value = (ir_rvalue *) ir->actual_parameters.get_head();
instr->src[0] = nir_src_for_ssa(evaluate_rvalue(value));

View File

@@ -2402,10 +2402,6 @@ extern void
_mesa_glsl_initialize_variables(exec_list *instructions,
struct _mesa_glsl_parse_state *state);
extern void
_mesa_glsl_initialize_derived_variables(struct gl_context *ctx,
gl_shader *shader);
extern void
reparent_ir(exec_list *list, void *mem_ctx);

View File

@@ -725,6 +725,8 @@ ir_swizzle::constant_expression_value(struct hash_table *variable_context)
case GLSL_TYPE_FLOAT: data.f[i] = v->value.f[swiz_idx[i]]; break;
case GLSL_TYPE_BOOL: data.b[i] = v->value.b[swiz_idx[i]]; break;
case GLSL_TYPE_DOUBLE:data.d[i] = v->value.d[swiz_idx[i]]; break;
case GLSL_TYPE_UINT64:data.u64[i] = v->value.u64[swiz_idx[i]]; break;
case GLSL_TYPE_INT64: data.i64[i] = v->value.i64[swiz_idx[i]]; break;
default: assert(!"Should not get here."); break;
}
}

View File

@@ -165,6 +165,7 @@ void optimize_dead_builtin_variables(exec_list *instructions,
bool lower_tess_level(gl_linked_shader *shader);
bool lower_vertex_id(gl_linked_shader *shader);
bool lower_cs_derived(gl_linked_shader *shader);
bool lower_blend_equation_advanced(gl_linked_shader *shader);
bool lower_subroutine(exec_list *instructions, struct _mesa_glsl_parse_state *state);

View File

@@ -207,7 +207,7 @@ link_assign_atomic_counter_resources(struct gl_context *ctx,
active_atomic_buffer *abs =
find_active_atomic_counters(ctx, prog, &num_buffers);
prog->data->AtomicBuffers = rzalloc_array(prog, gl_active_atomic_buffer,
prog->data->AtomicBuffers = rzalloc_array(prog->data, gl_active_atomic_buffer,
num_buffers);
prog->data->NumAtomicBuffers = num_buffers;
@@ -270,7 +270,7 @@ link_assign_atomic_counter_resources(struct gl_context *ctx,
struct gl_program *gl_prog = prog->_LinkedShaders[j]->Program;
gl_prog->info.num_abos = num_atomic_buffers[j];
gl_prog->sh.AtomicBuffers =
rzalloc_array(prog, gl_active_atomic_buffer *,
rzalloc_array(gl_prog, gl_active_atomic_buffer *,
num_atomic_buffers[j]);
unsigned intra_stage_idx = 0;

View File

@@ -364,6 +364,35 @@ validate_interstage_inout_blocks(struct gl_shader_program *prog,
consumer->Stage != MESA_SHADER_FRAGMENT) ||
consumer->Stage == MESA_SHADER_GEOMETRY;
/* Check that block re-declarations of gl_PerVertex are compatible
* across shaders: From OpenGL Shading Language 4.5, section
* "7.1 Built-In Language Variables", page 130 of the PDF:
*
* "If multiple shaders using members of a built-in block belonging
* to the same interface are linked together in the same program,
* they must all redeclare the built-in block in the same way, as
* described in section 4.3.9 “Interface Blocks” for interface-block
* matching, or a link-time error will result."
*
* This is done explicitly outside of iterating the member variable
* declarations because it is possible that the variables are not used and
* so they would have been optimised out.
*/
const glsl_type *consumer_iface =
consumer->symbols->get_interface("gl_PerVertex",
ir_var_shader_in);
const glsl_type *producer_iface =
producer->symbols->get_interface("gl_PerVertex",
ir_var_shader_out);
if (producer_iface && consumer_iface &&
interstage_member_mismatch(prog, consumer_iface, producer_iface)) {
linker_error(prog, "Incompatible or missing gl_PerVertex re-declaration "
"in consecutive shaders");
return;
}
/* Add output interfaces from the producer to the symbol table. */
foreach_in_list(ir_instruction, node, producer->ir) {
ir_variable *var = node->as_variable();

View File

@@ -25,7 +25,7 @@
#include "ir.h"
#include "linker.h"
#include "ir_uniform.h"
#include "util/string_to_uint_map.h"
#include "string_to_uint_map.h"
/* These functions are put in a "private" namespace instead of being marked
* static so that the unit tests can access them. See

View File

@@ -27,7 +27,7 @@
#include "ir_uniform.h"
#include "glsl_symbol_table.h"
#include "program.h"
#include "util/string_to_uint_map.h"
#include "string_to_uint_map.h"
#include "ir_array_refcount.h"
/**
@@ -1319,7 +1319,7 @@ link_assign_uniform_storage(struct gl_context *ctx,
union gl_constant_value *data;
if (prog->data->UniformStorage == NULL) {
prog->data->UniformStorage = rzalloc_array(prog,
prog->data->UniformStorage = rzalloc_array(prog->data,
struct gl_uniform_storage,
prog->data->NumUniformStorage);
data = rzalloc_array(prog->data->UniformStorage,
@@ -1385,13 +1385,6 @@ link_assign_uniform_storage(struct gl_context *ctx,
sizeof(shader->Program->sh.SamplerTargets));
}
/* If this is a fallback compile for a cache miss we already have the
* correct uniform mappings and we don't want to reinitialise uniforms so
* just return now.
*/
if (prog->data->cache_fallback)
return;
#ifndef NDEBUG
for (unsigned i = 0; i < prog->data->NumUniformStorage; i++) {
assert(prog->data->UniformStorage[i].storage != NULL ||
@@ -1416,11 +1409,9 @@ void
link_assign_uniform_locations(struct gl_shader_program *prog,
struct gl_context *ctx)
{
if (!prog->data->cache_fallback) {
ralloc_free(prog->data->UniformStorage);
prog->data->UniformStorage = NULL;
prog->data->NumUniformStorage = 0;
}
ralloc_free(prog->data->UniformStorage);
prog->data->UniformStorage = NULL;
prog->data->NumUniformStorage = 0;
if (prog->UniformHash != NULL) {
prog->UniformHash->clear();

View File

@@ -165,10 +165,12 @@ process_xfb_layout_qualifiers(void *mem_ctx, const gl_linked_shader *sh,
if (var->data.from_named_ifc_block) {
type = var->get_interface_type();
/* Find the member type before it was altered by lowering */
const glsl_type *type_wa = type->without_array();
member_type =
type->fields.structure[type->field_index(var->name)].type;
name = ralloc_strdup(NULL, type->without_array()->name);
type_wa->fields.structure[type_wa->field_index(var->name)].type;
name = ralloc_strdup(NULL, type_wa->name);
} else {
type = var->type;
member_type = NULL;
@@ -1119,7 +1121,6 @@ store_tfeedback_info(struct gl_context *ctx, struct gl_shader_program *prog,
if (has_xfb_qualifiers) {
for (unsigned j = 0; j < MAX_FEEDBACK_BUFFERS; j++) {
if (prog->TransformFeedback.BufferStride[j]) {
buffers |= 1 << j;
explicit_stride[j] = true;
xfb_prog->sh.LinkedTransformFeedback->Buffers[j].Stride =
prog->TransformFeedback.BufferStride[j] / 4;
@@ -1144,10 +1145,24 @@ store_tfeedback_info(struct gl_context *ctx, struct gl_shader_program *prog,
num_buffers++;
buffer_stream_id = -1;
continue;
} else if (tfeedback_decls[i].is_varying()) {
}
if (has_xfb_qualifiers) {
buffer = tfeedback_decls[i].get_buffer();
} else {
buffer = num_buffers;
}
if (tfeedback_decls[i].is_varying()) {
if (buffer_stream_id == -1) {
/* First varying writing to this buffer: remember its stream */
buffer_stream_id = (int) tfeedback_decls[i].get_stream_id();
/* Only mark a buffer as active when there is a varying
* attached to it. This behaviour is based on a revised version
* of section 13.2.2 of the GL 4.6 spec.
*/
buffers |= 1 << buffer;
} else if (buffer_stream_id !=
(int) tfeedback_decls[i].get_stream_id()) {
/* Varying writes to the same buffer from a different stream */
@@ -1163,13 +1178,6 @@ store_tfeedback_info(struct gl_context *ctx, struct gl_shader_program *prog,
}
}
if (has_xfb_qualifiers) {
buffer = tfeedback_decls[i].get_buffer();
} else {
buffer = num_buffers;
}
buffers |= 1 << buffer;
if (!tfeedback_decls[i].store(ctx, prog,
xfb_prog->sh.LinkedTransformFeedback,
buffer, num_buffers, num_outputs,
@@ -2072,7 +2080,8 @@ reserved_varying_slot(struct gl_linked_shader *stage,
var_slot = var->data.location - VARYING_SLOT_VAR0;
unsigned num_elements = get_varying_type(var, stage->Stage)
->count_attribute_slots(stage->Stage == MESA_SHADER_VERTEX);
->count_attribute_slots(io_mode == ir_var_shader_in &&
stage->Stage == MESA_SHADER_VERTEX);
for (unsigned i = 0; i < num_elements; i++) {
if (var_slot >= 0 && var_slot < MAX_VARYINGS_INCL_PATCH)
slots |= UINT64_C(1) << var_slot;

View File

@@ -75,7 +75,7 @@
#include "program/program.h"
#include "util/mesa-sha1.h"
#include "util/set.h"
#include "util/string_to_uint_map.h"
#include "string_to_uint_map.h"
#include "linker.h"
#include "link_varyings.h"
#include "ir_optimization.h"
@@ -1128,10 +1128,16 @@ cross_validate_globals(struct gl_shader_program *prog,
if (prog->IsES && (prog->data->Version != 310 ||
!var->get_interface_type()) &&
existing->data.precision != var->data.precision) {
linker_error(prog, "declarations for %s `%s` have "
"mismatching precision qualifiers\n",
mode_string(var), var->name);
return;
if ((existing->data.used && var->data.used) || prog->data->Version >= 300) {
linker_error(prog, "declarations for %s `%s` have "
"mismatching precision qualifiers\n",
mode_string(var), var->name);
return;
} else {
linker_warning(prog, "declarations for %s `%s` have "
"mismatching precision qualifiers\n",
mode_string(var), var->name);
}
}
} else
variables->add_variable(var);
@@ -1202,8 +1208,8 @@ interstage_cross_validate_uniform_blocks(struct gl_shader_program *prog,
}
for (unsigned int j = 0; j < sh_num_blocks; j++) {
int index = link_cross_validate_uniform_block(prog, &blks, num_blks,
sh_blks[j]);
int index = link_cross_validate_uniform_block(prog->data, &blks,
num_blks, sh_blks[j]);
if (index == -1) {
linker_error(prog, "buffer block `%s' has mismatching "
@@ -1262,21 +1268,11 @@ interstage_cross_validate_uniform_blocks(struct gl_shader_program *prog,
* Populates a shaders symbol table with all global declarations
*/
static void
populate_symbol_table(gl_linked_shader *sh)
populate_symbol_table(gl_linked_shader *sh, glsl_symbol_table *symbols)
{
sh->symbols = new(sh) glsl_symbol_table;
foreach_in_list(ir_instruction, inst, sh->ir) {
ir_variable *var;
ir_function *func;
if ((func = inst->as_function()) != NULL) {
sh->symbols->add_function(func);
} else if ((var = inst->as_variable()) != NULL) {
if (var->data.mode != ir_var_temporary)
sh->symbols->add_variable(var);
}
}
_mesa_glsl_copy_symbols_from_table(sh->ir, symbols, sh->symbols);
}
@@ -2277,8 +2273,7 @@ link_intrastage_shaders(void *mem_ctx,
return NULL;
}
if (!prog->data->cache_fallback)
_mesa_reference_shader_program_data(ctx, &gl_prog->sh.data, prog->data);
_mesa_reference_shader_program_data(ctx, &gl_prog->sh.data, prog->data);
/* Don't use _mesa_reference_program() just take ownership */
linked->Program = gl_prog;
@@ -2298,7 +2293,7 @@ link_intrastage_shaders(void *mem_ctx,
link_bindless_layout_qualifiers(prog, gl_prog, shader_list, num_shaders);
populate_symbol_table(linked);
populate_symbol_table(linked, shader_list[0]->symbols);
/* The pointer to the main function in the final linked shader (i.e., the
* copy of the original shader that contained the main function).
@@ -2336,35 +2331,33 @@ link_intrastage_shaders(void *mem_ctx,
v.run(linked->ir);
v.fixup_unnamed_interface_types();
if (!prog->data->cache_fallback) {
/* Link up uniform blocks defined within this stage. */
link_uniform_blocks(mem_ctx, ctx, prog, linked, &ubo_blocks,
&num_ubo_blocks, &ssbo_blocks, &num_ssbo_blocks);
/* Link up uniform blocks defined within this stage. */
link_uniform_blocks(mem_ctx, ctx, prog, linked, &ubo_blocks,
&num_ubo_blocks, &ssbo_blocks, &num_ssbo_blocks);
if (!prog->data->LinkStatus) {
_mesa_delete_linked_shader(ctx, linked);
return NULL;
}
/* Copy ubo blocks to linked shader list */
linked->Program->sh.UniformBlocks =
ralloc_array(linked, gl_uniform_block *, num_ubo_blocks);
ralloc_steal(linked, ubo_blocks);
for (unsigned i = 0; i < num_ubo_blocks; i++) {
linked->Program->sh.UniformBlocks[i] = &ubo_blocks[i];
}
linked->Program->info.num_ubos = num_ubo_blocks;
/* Copy ssbo blocks to linked shader list */
linked->Program->sh.ShaderStorageBlocks =
ralloc_array(linked, gl_uniform_block *, num_ssbo_blocks);
ralloc_steal(linked, ssbo_blocks);
for (unsigned i = 0; i < num_ssbo_blocks; i++) {
linked->Program->sh.ShaderStorageBlocks[i] = &ssbo_blocks[i];
}
linked->Program->info.num_ssbos = num_ssbo_blocks;
if (!prog->data->LinkStatus) {
_mesa_delete_linked_shader(ctx, linked);
return NULL;
}
/* Copy ubo blocks to linked shader list */
linked->Program->sh.UniformBlocks =
ralloc_array(linked, gl_uniform_block *, num_ubo_blocks);
ralloc_steal(linked, ubo_blocks);
for (unsigned i = 0; i < num_ubo_blocks; i++) {
linked->Program->sh.UniformBlocks[i] = &ubo_blocks[i];
}
linked->Program->info.num_ubos = num_ubo_blocks;
/* Copy ssbo blocks to linked shader list */
linked->Program->sh.ShaderStorageBlocks =
ralloc_array(linked, gl_uniform_block *, num_ssbo_blocks);
ralloc_steal(linked, ssbo_blocks);
for (unsigned i = 0; i < num_ssbo_blocks; i++) {
linked->Program->sh.ShaderStorageBlocks[i] = &ssbo_blocks[i];
}
linked->Program->info.num_ssbos = num_ssbo_blocks;
/* At this point linked should contain all of the linked IR, so
* validate it to make sure nothing went wrong.
*/
@@ -2384,6 +2377,9 @@ link_intrastage_shaders(void *mem_ctx,
if (ctx->Const.VertexID_is_zero_based)
lower_vertex_id(linked);
if (ctx->Const.LowerCsDerivedVariables)
lower_cs_derived(linked);
#ifdef DEBUG
/* Compute the source checksum. */
linked->SourceChecksum = 0;
@@ -2661,12 +2657,14 @@ assign_attribute_or_color_locations(void *mem_ctx,
} to_assign[32];
assert(max_index <= 32);
/* Temporary array for the set of attributes that have locations assigned.
/* Temporary array for the set of attributes that have locations assigned,
* for the purpose of checking overlapping slots/components of (non-ES)
* fragment shader outputs.
*/
ir_variable *assigned[16];
ir_variable *assigned[12 * 4]; /* (max # of FS outputs) * # components */
unsigned assigned_attr = 0;
unsigned num_attr = 0;
unsigned assigned_attr = 0;
foreach_in_list(ir_instruction, node, sh->ir) {
ir_variable *const var = node->as_variable();
@@ -2905,6 +2903,18 @@ assign_attribute_or_color_locations(void *mem_ctx,
}
}
if (target_index == MESA_SHADER_FRAGMENT && !prog->IsES) {
/* Only track assigned variables for non-ES fragment shaders
* to avoid overflowing the array.
*
* At most one variable per fragment output component should
* reach this.
*/
assert(assigned_attr < ARRAY_SIZE(assigned));
assigned[assigned_attr] = var;
assigned_attr++;
}
used_locations |= (use_mask << attr);
/* From the GL 4.5 core spec, section 11.1.1 (Vertex Attributes):
@@ -2931,9 +2941,6 @@ assign_attribute_or_color_locations(void *mem_ctx,
double_storage_locations |= (use_mask << attr);
}
assigned[assigned_attr] = var;
assigned_attr++;
continue;
}
@@ -3613,7 +3620,7 @@ add_program_resource(struct gl_shader_program *prog,
return true;
prog->data->ProgramResourceList =
reralloc(prog,
reralloc(prog->data,
prog->data->ProgramResourceList,
gl_program_resource,
prog->data->NumProgramResourceList + 1);
@@ -3808,6 +3815,7 @@ add_shader_variable(const struct gl_context *ctx,
GLenum programInterface, ir_variable *var,
const char *name, const glsl_type *type,
bool use_implicit_location, int location,
bool inouts_share_location,
const glsl_type *outermost_struct_type = NULL)
{
const glsl_type *interface_type = var->get_interface_type();
@@ -3870,7 +3878,7 @@ add_shader_variable(const struct gl_context *ctx,
stage_mask, programInterface,
var, field_name, field->type,
use_implicit_location, field_location,
outermost_struct_type))
false, outermost_struct_type))
return false;
field_location += field->type->count_attribute_slots(false);
@@ -3878,6 +3886,43 @@ add_shader_variable(const struct gl_context *ctx,
return true;
}
case GLSL_TYPE_ARRAY: {
/* The ARB_program_interface_query spec says:
*
* "For an active variable declared as an array of basic types, a
* single entry will be generated, with its name string formed by
* concatenating the name of the array and the string "[0]"."
*
* "For an active variable declared as an array of an aggregate data
* type (structures or arrays), a separate entry will be generated
* for each active array element, unless noted immediately below.
* The name of each entry is formed by concatenating the name of
* the array, the "[" character, an integer identifying the element
* number, and the "]" character. These enumeration rules are
* applied recursively, treating each enumerated array element as a
* separate active variable."
*/
const struct glsl_type *array_type = type->fields.array;
if (array_type->base_type == GLSL_TYPE_STRUCT ||
array_type->base_type == GLSL_TYPE_ARRAY) {
unsigned elem_location = location;
unsigned stride = inouts_share_location ? 0 :
array_type->count_attribute_slots(false);
for (unsigned i = 0; i < type->length; i++) {
char *elem = ralloc_asprintf(shProg, "%s[%d]", name, i);
if (!add_shader_variable(ctx, shProg, resource_set,
stage_mask, programInterface,
var, elem, array_type,
use_implicit_location, elem_location,
false, outermost_struct_type))
return false;
elem_location += stride;
}
return true;
}
/* fallthrough */
}
default: {
/* The ARB_program_interface_query spec says:
*
@@ -3898,6 +3943,20 @@ add_shader_variable(const struct gl_context *ctx,
}
}
static bool
inout_has_same_location(const ir_variable *var, unsigned stage)
{
if (!var->data.patch &&
((var->data.mode == ir_var_shader_out &&
stage == MESA_SHADER_TESS_CTRL) ||
(var->data.mode == ir_var_shader_in &&
(stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_TESS_EVAL ||
stage == MESA_SHADER_GEOMETRY))))
return true;
else
return false;
}
static bool
add_interface_variables(const struct gl_context *ctx,
struct gl_shader_program *shProg,
@@ -3954,7 +4013,8 @@ add_interface_variables(const struct gl_context *ctx,
if (!add_shader_variable(ctx, shProg, resource_set,
1 << stage, programInterface,
var, var->name, var->type, vs_input_or_fs_output,
var->data.location - loc_bias))
var->data.location - loc_bias,
inout_has_same_location(var, stage)))
return false;
}
return true;
@@ -3992,7 +4052,8 @@ add_packed_varyings(const struct gl_context *ctx,
if (!add_shader_variable(ctx, shProg, resource_set,
stage_mask,
iface, var, var->name, var->type, false,
var->data.location - VARYING_SLOT_VAR0))
var->data.location - VARYING_SLOT_VAR0,
inout_has_same_location(var, stage)))
return false;
}
}
@@ -4018,7 +4079,8 @@ add_fragdata_arrays(const struct gl_context *ctx,
if (!add_shader_variable(ctx, shProg, resource_set,
1 << MESA_SHADER_FRAGMENT,
GL_PROGRAM_OUTPUT, var, var->name, var->type,
true, var->data.location - FRAG_RESULT_DATA0))
true, var->data.location - FRAG_RESULT_DATA0,
false))
return false;
}
}
@@ -4581,14 +4643,12 @@ link_and_validate_uniforms(struct gl_context *ctx,
update_array_sizes(prog);
link_assign_uniform_locations(prog, ctx);
if (!prog->data->cache_fallback) {
link_assign_atomic_counter_resources(ctx, prog);
link_calculate_subroutine_compat(prog);
check_resources(ctx, prog);
check_subroutine_resources(prog);
check_image_resources(ctx, prog);
link_check_atomic_counter_resources(ctx, prog);
}
link_assign_atomic_counter_resources(ctx, prog);
link_calculate_subroutine_compat(prog);
check_resources(ctx, prog);
check_subroutine_resources(prog);
check_image_resources(ctx, prog);
link_check_atomic_counter_resources(ctx, prog);
}
static bool
@@ -4902,10 +4962,8 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
last = i;
}
if (!prog->data->cache_fallback) {
check_explicit_uniform_locations(ctx, prog);
link_assign_subroutine_types(prog);
}
check_explicit_uniform_locations(ctx, prog);
link_assign_subroutine_types(prog);
if (!prog->data->LinkStatus)
goto done;
@@ -4960,15 +5018,13 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
if (prog->SeparateShader)
disable_varying_optimizations_for_sso(prog);
if (!prog->data->cache_fallback) {
/* Process UBOs */
if (!interstage_cross_validate_uniform_blocks(prog, false))
goto done;
/* Process UBOs */
if (!interstage_cross_validate_uniform_blocks(prog, false))
goto done;
/* Process SSBOs */
if (!interstage_cross_validate_uniform_blocks(prog, true))
goto done;
}
/* Process SSBOs */
if (!interstage_cross_validate_uniform_blocks(prog, true))
goto done;
/* Do common optimization before assigning storage for attributes,
* uniforms, and varyings. Later optimization could possibly make

View File

@@ -0,0 +1,234 @@
/*
* Copyright © 2017 Ilia Mirkin
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
/**
* \file lower_cs_derived.cpp
*
* For hardware that does not support the gl_GlobalInvocationID and
* gl_LocalInvocationIndex system values, replace them with fresh
* globals. Note that we can't rely on gl_WorkGroupSize or
* gl_LocalGroupSizeARB being available, since they may only have been defined
* in a non-main shader.
*
* [ This can happen if only a secondary shader has the layout(local_size_*)
* declaration. ]
*
* This is meant to be run post-linking.
*/
#include "glsl_symbol_table.h"
#include "ir_hierarchical_visitor.h"
#include "ir.h"
#include "ir_builder.h"
#include "linker.h"
#include "program/prog_statevars.h"
#include "builtin_functions.h"
using namespace ir_builder;
namespace {
class lower_cs_derived_visitor : public ir_hierarchical_visitor {
public:
explicit lower_cs_derived_visitor(gl_linked_shader *shader)
: progress(false),
shader(shader),
local_size_variable(shader->Program->info.cs.local_size_variable),
gl_WorkGroupSize(NULL),
gl_WorkGroupID(NULL),
gl_LocalInvocationID(NULL),
gl_GlobalInvocationID(NULL),
gl_LocalInvocationIndex(NULL)
{
main_sig = _mesa_get_main_function_signature(shader->symbols);
assert(main_sig);
}
virtual ir_visitor_status visit(ir_dereference_variable *);
ir_variable *add_system_value(
int slot, const glsl_type *type, const char *name);
void find_sysvals();
void make_gl_GlobalInvocationID();
void make_gl_LocalInvocationIndex();
bool progress;
private:
gl_linked_shader *shader;
bool local_size_variable;
ir_function_signature *main_sig;
ir_rvalue *gl_WorkGroupSize;
ir_variable *gl_WorkGroupID;
ir_variable *gl_LocalInvocationID;
ir_variable *gl_GlobalInvocationID;
ir_variable *gl_LocalInvocationIndex;
};
} /* anonymous namespace */
ir_variable *
lower_cs_derived_visitor::add_system_value(
int slot, const glsl_type *type, const char *name)
{
ir_variable *var = new(shader) ir_variable(type, name, ir_var_system_value);
var->data.how_declared = ir_var_declared_implicitly;
var->data.read_only = true;
var->data.location = slot;
var->data.explicit_location = true;
var->data.explicit_index = 0;
shader->ir->push_head(var);
return var;
}
void
lower_cs_derived_visitor::find_sysvals()
{
if (gl_WorkGroupSize != NULL)
return;
ir_variable *WorkGroupSize;
if (local_size_variable)
WorkGroupSize = shader->symbols->get_variable("gl_LocalGroupSizeARB");
else
WorkGroupSize = shader->symbols->get_variable("gl_WorkGroupSize");
if (WorkGroupSize)
gl_WorkGroupSize = new(shader) ir_dereference_variable(WorkGroupSize);
gl_WorkGroupID = shader->symbols->get_variable("gl_WorkGroupID");
gl_LocalInvocationID = shader->symbols->get_variable("gl_LocalInvocationID");
/*
* These may be missing due to either dead code elimination, or, in the
* case of the group size, due to the layout being declared in a non-main
* shader. Re-create them.
*/
if (!gl_WorkGroupID)
gl_WorkGroupID = add_system_value(
SYSTEM_VALUE_WORK_GROUP_ID, glsl_type::uvec3_type, "gl_WorkGroupID");
if (!gl_LocalInvocationID)
gl_LocalInvocationID = add_system_value(
SYSTEM_VALUE_LOCAL_INVOCATION_ID, glsl_type::uvec3_type,
"gl_LocalInvocationID");
if (!WorkGroupSize) {
if (local_size_variable) {
gl_WorkGroupSize = new(shader) ir_dereference_variable(
add_system_value(
SYSTEM_VALUE_LOCAL_GROUP_SIZE, glsl_type::uvec3_type,
"gl_LocalGroupSizeARB"));
} else {
ir_constant_data data;
memset(&data, 0, sizeof(data));
for (int i = 0; i < 3; i++)
data.u[i] = shader->Program->info.cs.local_size[i];
gl_WorkGroupSize = new(shader) ir_constant(glsl_type::uvec3_type, &data);
}
}
}
void
lower_cs_derived_visitor::make_gl_GlobalInvocationID()
{
if (gl_GlobalInvocationID != NULL)
return;
find_sysvals();
/* gl_GlobalInvocationID =
* gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
*/
gl_GlobalInvocationID = new(shader) ir_variable(
glsl_type::uvec3_type, "__GlobalInvocationID", ir_var_temporary);
shader->ir->push_head(gl_GlobalInvocationID);
ir_instruction *inst =
assign(gl_GlobalInvocationID,
add(mul(gl_WorkGroupID, gl_WorkGroupSize->clone(shader, NULL)),
gl_LocalInvocationID));
main_sig->body.push_head(inst);
}
void
lower_cs_derived_visitor::make_gl_LocalInvocationIndex()
{
if (gl_LocalInvocationIndex != NULL)
return;
find_sysvals();
/* gl_LocalInvocationIndex =
* gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
* gl_LocalInvocationID.y * gl_WorkGroupSize.x +
* gl_LocalInvocationID.x;
*/
gl_LocalInvocationIndex = new(shader)
ir_variable(glsl_type::uint_type, "__LocalInvocationIndex", ir_var_temporary);
shader->ir->push_head(gl_LocalInvocationIndex);
ir_expression *index_z =
mul(mul(swizzle_z(gl_LocalInvocationID), swizzle_x(gl_WorkGroupSize->clone(shader, NULL))),
swizzle_y(gl_WorkGroupSize->clone(shader, NULL)));
ir_expression *index_y =
mul(swizzle_y(gl_LocalInvocationID), swizzle_x(gl_WorkGroupSize->clone(shader, NULL)));
ir_expression *index_y_plus_z = add(index_y, index_z);
operand index_x(swizzle_x(gl_LocalInvocationID));
ir_expression *index_x_plus_y_plus_z = add(index_y_plus_z, index_x);
ir_instruction *inst =
assign(gl_LocalInvocationIndex, index_x_plus_y_plus_z);
main_sig->body.push_head(inst);
}
ir_visitor_status
lower_cs_derived_visitor::visit(ir_dereference_variable *ir)
{
if (ir->var->data.mode == ir_var_system_value &&
ir->var->data.location == SYSTEM_VALUE_GLOBAL_INVOCATION_ID) {
make_gl_GlobalInvocationID();
ir->var = gl_GlobalInvocationID;
progress = true;
}
if (ir->var->data.mode == ir_var_system_value &&
ir->var->data.location == SYSTEM_VALUE_LOCAL_INVOCATION_INDEX) {
make_gl_LocalInvocationIndex();
ir->var = gl_LocalInvocationIndex;
progress = true;
}
return visit_continue;
}
bool
lower_cs_derived(gl_linked_shader *shader)
{
if (shader->Stage != MESA_SHADER_COMPUTE)
return false;
lower_cs_derived_visitor v(shader);
v.run(shader->ir);
return v.progress;
}

View File

@@ -358,13 +358,21 @@ lower_instructions_visitor::ldexp_to_arith(ir_expression *ir)
* into
*
* extracted_biased_exp = rshift(bitcast_f2i(abs(x)), exp_shift);
* resulting_biased_exp = extracted_biased_exp + exp;
* resulting_biased_exp = min(extracted_biased_exp + exp, 255);
*
* if (resulting_biased_exp < 1 || x == 0.0f) {
* return copysign(0.0, x);
* if (extracted_biased_exp >= 255)
* return x; // +/-inf, NaN
*
* sign_mantissa = bitcast_f2u(x) & sign_mantissa_mask;
*
* if (min(resulting_biased_exp, extracted_biased_exp) < 1)
* resulting_biased_exp = 0;
* if (resulting_biased_exp >= 255 ||
* min(resulting_biased_exp, extracted_biased_exp) < 1) {
* sign_mantissa &= sign_mask;
* }
*
* return bitcast_u2f((bitcast_f2u(x) & sign_mantissa_mask) |
* return bitcast_u2f(sign_mantissa |
* lshift(i2u(resulting_biased_exp), exp_shift));
*
* which we can't actually implement as such, since the GLSL IR doesn't
@@ -372,45 +380,58 @@ lower_instructions_visitor::ldexp_to_arith(ir_expression *ir)
* using conditional-select:
*
* extracted_biased_exp = rshift(bitcast_f2i(abs(x)), exp_shift);
* resulting_biased_exp = extracted_biased_exp + exp;
* resulting_biased_exp = min(extracted_biased_exp + exp, 255);
*
* is_not_zero_or_underflow = logic_and(nequal(x, 0.0f),
* gequal(resulting_biased_exp, 1);
* x = csel(is_not_zero_or_underflow, x, copysign(0.0f, x));
* resulting_biased_exp = csel(is_not_zero_or_underflow,
* resulting_biased_exp, 0);
* sign_mantissa = bitcast_f2u(x) & sign_mantissa_mask;
*
* return bitcast_u2f((bitcast_f2u(x) & sign_mantissa_mask) |
* lshift(i2u(resulting_biased_exp), exp_shift));
* flush_to_zero = lequal(min(resulting_biased_exp, extracted_biased_exp), 0);
* resulting_biased_exp = csel(flush_to_zero, 0, resulting_biased_exp)
* zero_mantissa = logic_or(flush_to_zero,
* gequal(resulting_biased_exp, 255));
* sign_mantissa = csel(zero_mantissa, sign_mantissa & sign_mask, sign_mantissa);
*
* result = sign_mantissa |
* lshift(i2u(resulting_biased_exp), exp_shift));
*
* return csel(extracted_biased_exp >= 255, x, bitcast_u2f(result));
*
* The definition of ldexp in the GLSL spec says:
*
* "If this product is too large to be represented in the
* floating-point type, the result is undefined."
*
* However, the definition of ldexp in the GLSL ES spec does not contain
* this sentence, so we do need to handle overflow correctly.
*
* There is additional language limiting the defined range of exp, but this
* is merely to allow implementations that store 2^exp in a temporary
* variable.
*/
const unsigned vec_elem = ir->type->vector_elements;
/* Types */
const glsl_type *ivec = glsl_type::get_instance(GLSL_TYPE_INT, vec_elem, 1);
const glsl_type *uvec = glsl_type::get_instance(GLSL_TYPE_UINT, vec_elem, 1);
const glsl_type *bvec = glsl_type::get_instance(GLSL_TYPE_BOOL, vec_elem, 1);
/* Constants */
ir_constant *zeroi = ir_constant::zero(ir, ivec);
ir_constant *sign_mask = new(ir) ir_constant(0x80000000u, vec_elem);
ir_constant *exp_shift = new(ir) ir_constant(23, vec_elem);
/* Temporary variables */
ir_variable *x = new(ir) ir_variable(ir->type, "x", ir_var_temporary);
ir_variable *exp = new(ir) ir_variable(ivec, "exp", ir_var_temporary);
ir_variable *zero_sign_x = new(ir) ir_variable(ir->type, "zero_sign_x",
ir_var_temporary);
ir_variable *result = new(ir) ir_variable(uvec, "result", ir_var_temporary);
ir_variable *extracted_biased_exp =
new(ir) ir_variable(ivec, "extracted_biased_exp", ir_var_temporary);
ir_variable *resulting_biased_exp =
new(ir) ir_variable(ivec, "resulting_biased_exp", ir_var_temporary);
ir_variable *is_not_zero_or_underflow =
new(ir) ir_variable(bvec, "is_not_zero_or_underflow", ir_var_temporary);
ir_variable *sign_mantissa =
new(ir) ir_variable(uvec, "sign_mantissa", ir_var_temporary);
ir_variable *flush_to_zero =
new(ir) ir_variable(bvec, "flush_to_zero", ir_var_temporary);
ir_variable *zero_mantissa =
new(ir) ir_variable(bvec, "zero_mantissa", ir_var_temporary);
ir_instruction &i = *base_ir;
@@ -423,58 +444,82 @@ lower_instructions_visitor::ldexp_to_arith(ir_expression *ir)
/* Extract the biased exponent from <x>. */
i.insert_before(extracted_biased_exp);
i.insert_before(assign(extracted_biased_exp,
rshift(bitcast_f2i(abs(x)), exp_shift)));
rshift(bitcast_f2i(abs(x)),
new(ir) ir_constant(23, vec_elem))));
/* The definition of ldexp in the GLSL 4.60 spec says:
*
* "If exp is greater than +128 (single-precision) or +1024
* (double-precision), the value returned is undefined. If exp is less
* than -126 (single-precision) or -1022 (double-precision), the value
* returned may be flushed to zero."
*
* So we do not have to guard against the possibility of addition overflow,
* which could happen when exp is close to INT_MAX. Addition underflow
* cannot happen (the worst case is 0 + (-INT_MAX)).
*/
i.insert_before(resulting_biased_exp);
i.insert_before(assign(resulting_biased_exp,
add(extracted_biased_exp, exp)));
min2(add(extracted_biased_exp, exp),
new(ir) ir_constant(255, vec_elem))));
/* Test if result is ±0.0, subnormal, or underflow by checking if the
* resulting biased exponent would be less than 0x1. If so, the result is
* 0.0 with the sign of x. (Actually, invert the conditions so that
* immediate values are the second arguments, which is better for i965)
*/
i.insert_before(zero_sign_x);
i.insert_before(assign(zero_sign_x,
bitcast_u2f(bit_and(bitcast_f2u(x), sign_mask))));
i.insert_before(sign_mantissa);
i.insert_before(assign(sign_mantissa,
bit_and(bitcast_f2u(x),
new(ir) ir_constant(0x807fffffu, vec_elem))));
i.insert_before(is_not_zero_or_underflow);
i.insert_before(assign(is_not_zero_or_underflow,
logic_and(nequal(x, new(ir) ir_constant(0.0f, vec_elem)),
gequal(resulting_biased_exp,
new(ir) ir_constant(0x1, vec_elem)))));
i.insert_before(assign(x, csel(is_not_zero_or_underflow,
x, zero_sign_x)));
i.insert_before(assign(resulting_biased_exp,
csel(is_not_zero_or_underflow,
resulting_biased_exp, zeroi)));
/* We could test for overflows by checking if the resulting biased exponent
* would be greater than 0xFE. Turns out we don't need to because the GLSL
* spec says:
/* We flush to zero if the original or resulting biased exponent is 0,
* indicating a +/-0.0 or subnormal input or output.
*
* "If this product is too large to be represented in the
* floating-point type, the result is undefined."
* The mantissa is set to 0 if the resulting biased exponent is 255, since
* an overflow should produce a +/-inf result.
*
* Note that NaN inputs are handled separately.
*/
i.insert_before(flush_to_zero);
i.insert_before(assign(flush_to_zero,
lequal(min2(resulting_biased_exp,
extracted_biased_exp),
ir_constant::zero(ir, ivec))));
i.insert_before(assign(resulting_biased_exp,
csel(flush_to_zero,
ir_constant::zero(ir, ivec),
resulting_biased_exp)));
ir_constant *exp_shift_clone = exp_shift->clone(ir, NULL);
i.insert_before(zero_mantissa);
i.insert_before(assign(zero_mantissa,
logic_or(flush_to_zero,
equal(resulting_biased_exp,
new(ir) ir_constant(255, vec_elem)))));
i.insert_before(assign(sign_mantissa,
csel(zero_mantissa,
bit_and(sign_mantissa,
new(ir) ir_constant(0x80000000u, vec_elem)),
sign_mantissa)));
/* Don't generate new IR that would need to be lowered in an additional
* pass.
*/
i.insert_before(result);
if (!lowering(INSERT_TO_SHIFTS)) {
ir_constant *exp_width = new(ir) ir_constant(8, vec_elem);
ir->operation = ir_unop_bitcast_i2f;
ir->operands[0] = bitfield_insert(bitcast_f2i(x), resulting_biased_exp,
exp_shift_clone, exp_width);
ir->operands[1] = NULL;
i.insert_before(assign(result,
bitfield_insert(sign_mantissa,
i2u(resulting_biased_exp),
new(ir) ir_constant(23u, vec_elem),
new(ir) ir_constant(8u, vec_elem))));
} else {
ir_constant *sign_mantissa_mask = new(ir) ir_constant(0x807fffffu, vec_elem);
ir->operation = ir_unop_bitcast_u2f;
ir->operands[0] = bit_or(bit_and(bitcast_f2u(x), sign_mantissa_mask),
lshift(i2u(resulting_biased_exp), exp_shift_clone));
i.insert_before(assign(result,
bit_or(sign_mantissa,
lshift(i2u(resulting_biased_exp),
new(ir) ir_constant(23, vec_elem)))));
}
ir->operation = ir_triop_csel;
ir->operands[0] = gequal(extracted_biased_exp,
new(ir) ir_constant(255, vec_elem));
ir->operands[1] = new(ir) ir_dereference_variable(x);
ir->operands[2] = bitcast_u2f(result);
this->progress = true;
}

View File

@@ -115,6 +115,7 @@ public:
void run(exec_list *instructions);
virtual ir_visitor_status visit_leave(ir_assignment *);
virtual ir_visitor_status visit_leave(ir_expression *);
virtual void handle_rvalue(ir_rvalue **rvalue);
};
@@ -238,6 +239,23 @@ flatten_named_interface_blocks_declarations::visit_leave(ir_assignment *ir)
return rvalue_visit(ir);
}
ir_visitor_status
flatten_named_interface_blocks_declarations::visit_leave(ir_expression *ir)
{
ir_visitor_status status = rvalue_visit(ir);
if (ir->operation == ir_unop_interpolate_at_centroid ||
ir->operation == ir_binop_interpolate_at_offset ||
ir->operation == ir_binop_interpolate_at_sample) {
const ir_rvalue *val = ir->operands[0];
/* This disables varying packing for this input. */
val->variable_referenced()->data.must_be_shader_input = 1;
}
return status;
}
void
flatten_named_interface_blocks_declarations::handle_rvalue(ir_rvalue **rvalue)
{

View File

@@ -151,7 +151,36 @@ ir_vec_index_to_cond_assign_visitor::convert_vector_extract_to_cond_assign(ir_rv
{
ir_expression *const expr = ir->as_expression();
if (expr == NULL || expr->operation != ir_binop_vector_extract)
if (expr == NULL)
return ir;
if (expr->operation == ir_unop_interpolate_at_centroid ||
expr->operation == ir_binop_interpolate_at_offset ||
expr->operation == ir_binop_interpolate_at_sample) {
/* Lower interpolateAtXxx(some_vec[idx], ...) to
* interpolateAtXxx(some_vec, ...)[idx] before lowering to conditional
* assignments, to maintain the rule that the interpolant is an l-value
* referring to a (part of a) shader input.
*
* This is required when idx is dynamic (otherwise it gets lowered to
* a swizzle).
*/
ir_expression *const interpolant = expr->operands[0]->as_expression();
if (!interpolant || interpolant->operation != ir_binop_vector_extract)
return ir;
ir_rvalue *vec_input = interpolant->operands[0];
ir_expression *const vec_interpolate =
new(base_ir) ir_expression(expr->operation, vec_input->type,
vec_input, expr->operands[1]);
return convert_vec_index_to_cond_assign(ralloc_parent(ir),
vec_interpolate,
interpolant->operands[1],
ir->type);
}
if (expr->operation != ir_binop_vector_extract)
return ir;
return convert_vec_index_to_cond_assign(ralloc_parent(ir),

View File

@@ -237,6 +237,12 @@ ir_constant_propagation_visitor::constant_propagation(ir_rvalue **rvalue) {
case GLSL_TYPE_BOOL:
data.b[i] = found->constant->value.b[rhs_channel];
break;
case GLSL_TYPE_UINT64:
data.u64[i] = found->constant->value.u64[rhs_channel];
break;
case GLSL_TYPE_INT64:
data.i64[i] = found->constant->value.i64[rhs_channel];
break;
default:
assert(!"not reached");
break;

View File

@@ -62,23 +62,6 @@ optimize_dead_builtin_variables(exec_list *instructions,
* information, so removing these variables from the user shader will
* cause problems later.
*
* For compute shaders, gl_GlobalInvocationID has some dependencies, so
* we avoid removing these dependencies.
*
* We also avoid removing gl_GlobalInvocationID at this stage because it
* might be used by a linked shader. In this case it still needs to be
* initialized by the main function.
*
* gl_GlobalInvocationID =
* gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
*
* Similarly, we initialize gl_LocalInvocationIndex in the main function:
*
* gl_LocalInvocationIndex =
* gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
* gl_LocalInvocationID.y * gl_WorkGroupSize.x +
* gl_LocalInvocationID.x;
*
* Matrix uniforms with "Transpose" are not eliminated because there's
* an optimization pass that can turn references to the regular matrix
* into references to the transpose matrix. Eliminating the transpose
@@ -90,11 +73,6 @@ optimize_dead_builtin_variables(exec_list *instructions,
*/
if (strcmp(var->name, "gl_ModelViewProjectionMatrix") == 0
|| strcmp(var->name, "gl_Vertex") == 0
|| strcmp(var->name, "gl_WorkGroupID") == 0
|| strcmp(var->name, "gl_WorkGroupSize") == 0
|| strcmp(var->name, "gl_LocalInvocationID") == 0
|| strcmp(var->name, "gl_GlobalInvocationID") == 0
|| strcmp(var->name, "gl_LocalInvocationIndex") == 0
|| strstr(var->name, "Transpose") != NULL)
continue;

View File

@@ -59,7 +59,7 @@
#include "program.h"
#include "shader_cache.h"
#include "util/mesa-sha1.h"
#include "util/string_to_uint_map.h"
#include "string_to_uint_map.h"
extern "C" {
#include "main/enums.h"
@@ -74,11 +74,26 @@ compile_shaders(struct gl_context *ctx, struct gl_shader_program *prog) {
}
}
static void
get_struct_type_field_and_pointer_sizes(size_t *s_field_size,
size_t *s_field_ptrs)
{
*s_field_size = sizeof(glsl_struct_field);
*s_field_ptrs =
sizeof(((glsl_struct_field *)0)->type) +
sizeof(((glsl_struct_field *)0)->name);
}
static void
encode_type_to_blob(struct blob *blob, const glsl_type *type)
{
uint32_t encoding;
if (!type) {
blob_write_uint32(blob, 0);
return;
}
switch (type->base_type) {
case GLSL_TYPE_UINT:
case GLSL_TYPE_INT:
@@ -122,11 +137,18 @@ encode_type_to_blob(struct blob *blob, const glsl_type *type)
blob_write_uint32(blob, (type->base_type) << 24);
blob_write_string(blob, type->name);
blob_write_uint32(blob, type->length);
blob_write_bytes(blob, type->fields.structure,
sizeof(glsl_struct_field) * type->length);
size_t s_field_size, s_field_ptrs;
get_struct_type_field_and_pointer_sizes(&s_field_size, &s_field_ptrs);
for (unsigned i = 0; i < type->length; i++) {
encode_type_to_blob(blob, type->fields.structure[i].type);
blob_write_string(blob, type->fields.structure[i].name);
/* Write the struct field skipping the pointers */
blob_write_bytes(blob,
((char *)&type->fields.structure[i]) + s_field_ptrs,
s_field_size - s_field_ptrs);
}
if (type->is_interface()) {
@@ -149,6 +171,11 @@ static const glsl_type *
decode_type_from_blob(struct blob_reader *blob)
{
uint32_t u = blob_read_uint32(blob);
if (u == 0) {
return NULL;
}
glsl_base_type base_type = (glsl_base_type) (u >> 24);
switch (base_type) {
@@ -182,22 +209,33 @@ decode_type_from_blob(struct blob_reader *blob)
case GLSL_TYPE_INTERFACE: {
char *name = blob_read_string(blob);
unsigned num_fields = blob_read_uint32(blob);
glsl_struct_field *fields = (glsl_struct_field *)
blob_read_bytes(blob, sizeof(glsl_struct_field) * num_fields);
size_t s_field_size, s_field_ptrs;
get_struct_type_field_and_pointer_sizes(&s_field_size, &s_field_ptrs);
glsl_struct_field *fields =
(glsl_struct_field *) malloc(s_field_size * num_fields);
for (unsigned i = 0; i < num_fields; i++) {
fields[i].type = decode_type_from_blob(blob);
fields[i].name = blob_read_string(blob);
blob_copy_bytes(blob, ((uint8_t *) &fields[i]) + s_field_ptrs,
s_field_size - s_field_ptrs);
}
const glsl_type *t;
if (base_type == GLSL_TYPE_INTERFACE) {
enum glsl_interface_packing packing =
(glsl_interface_packing) blob_read_uint32(blob);
bool row_major = blob_read_uint32(blob);
return glsl_type::get_interface_instance(fields, num_fields,
packing, row_major, name);
t = glsl_type::get_interface_instance(fields, num_fields, packing,
row_major, name);
} else {
return glsl_type::get_record_instance(fields, num_fields, name);
t = glsl_type::get_record_instance(fields, num_fields, name);
}
free(fields);
return t;
}
case GLSL_TYPE_VOID:
case GLSL_TYPE_ERROR:
@@ -555,6 +593,17 @@ read_xfb(struct blob_reader *metadata, struct gl_shader_program *shProg)
MAX_FEEDBACK_BUFFERS);
}
static bool
has_uniform_storage(struct gl_shader_program *prog, unsigned idx)
{
if (!prog->data->UniformStorage[idx].builtin &&
!prog->data->UniformStorage[idx].is_shader_storage &&
prog->data->UniformStorage[idx].block_index == -1)
return true;
return false;
}
static void
write_uniforms(struct blob *metadata, struct gl_shader_program *prog)
{
@@ -566,8 +615,6 @@ write_uniforms(struct blob *metadata, struct gl_shader_program *prog)
encode_type_to_blob(metadata, prog->data->UniformStorage[i].type);
blob_write_uint32(metadata, prog->data->UniformStorage[i].array_elements);
blob_write_string(metadata, prog->data->UniformStorage[i].name);
blob_write_uint32(metadata, prog->data->UniformStorage[i].storage -
prog->data->UniformDataSlots);
blob_write_uint32(metadata, prog->data->UniformStorage[i].builtin);
blob_write_uint32(metadata, prog->data->UniformStorage[i].remap_location);
blob_write_uint32(metadata, prog->data->UniformStorage[i].block_index);
@@ -586,6 +633,12 @@ write_uniforms(struct blob *metadata, struct gl_shader_program *prog)
prog->data->UniformStorage[i].top_level_array_size);
blob_write_uint32(metadata,
prog->data->UniformStorage[i].top_level_array_stride);
if (has_uniform_storage(prog, i)) {
blob_write_uint32(metadata, prog->data->UniformStorage[i].storage -
prog->data->UniformDataSlots);
}
blob_write_bytes(metadata, prog->data->UniformStorage[i].opaque,
sizeof(prog->data->UniformStorage[i].opaque));
}
@@ -597,9 +650,7 @@ write_uniforms(struct blob *metadata, struct gl_shader_program *prog)
*/
blob_write_uint32(metadata, prog->data->NumHiddenUniforms);
for (unsigned i = 0; i < prog->data->NumUniformStorage; i++) {
if (!prog->data->UniformStorage[i].builtin &&
!prog->data->UniformStorage[i].is_shader_storage &&
prog->data->UniformStorage[i].block_index == -1) {
if (has_uniform_storage(prog, i)) {
unsigned vec_size =
prog->data->UniformStorage[i].type->component_slots() *
MAX2(prog->data->UniformStorage[i].array_elements, 1);
@@ -619,7 +670,7 @@ read_uniforms(struct blob_reader *metadata, struct gl_shader_program *prog)
prog->data->NumUniformStorage = blob_read_uint32(metadata);
prog->data->NumUniformDataSlots = blob_read_uint32(metadata);
uniforms = rzalloc_array(prog, struct gl_uniform_storage,
uniforms = rzalloc_array(prog->data, struct gl_uniform_storage,
prog->data->NumUniformStorage);
prog->data->UniformStorage = uniforms;
@@ -633,7 +684,6 @@ read_uniforms(struct blob_reader *metadata, struct gl_shader_program *prog)
uniforms[i].type = decode_type_from_blob(metadata);
uniforms[i].array_elements = blob_read_uint32(metadata);
uniforms[i].name = ralloc_strdup(prog, blob_read_string (metadata));
uniforms[i].storage = data + blob_read_uint32(metadata);
uniforms[i].builtin = blob_read_uint32(metadata);
uniforms[i].remap_location = blob_read_uint32(metadata);
uniforms[i].block_index = blob_read_uint32(metadata);
@@ -651,6 +701,10 @@ read_uniforms(struct blob_reader *metadata, struct gl_shader_program *prog)
uniforms[i].top_level_array_stride = blob_read_uint32(metadata);
prog->UniformHash->put(i, uniforms[i].name);
if (has_uniform_storage(prog, i)) {
uniforms[i].storage = data + blob_read_uint32(metadata);
}
memcpy(uniforms[i].opaque,
blob_read_bytes(metadata, sizeof(uniforms[i].opaque)),
sizeof(uniforms[i].opaque));
@@ -659,9 +713,7 @@ read_uniforms(struct blob_reader *metadata, struct gl_shader_program *prog)
/* Restore uniform values. */
prog->data->NumHiddenUniforms = blob_read_uint32(metadata);
for (unsigned i = 0; i < prog->data->NumUniformStorage; i++) {
if (!prog->data->UniformStorage[i].builtin &&
!prog->data->UniformStorage[i].is_shader_storage &&
prog->data->UniformStorage[i].block_index == -1) {
if (has_uniform_storage(prog, i)) {
unsigned vec_size =
prog->data->UniformStorage[i].type->component_slots() *
MAX2(prog->data->UniformStorage[i].array_elements, 1);
@@ -867,6 +919,18 @@ write_shader_subroutine_index(struct blob *metadata,
}
}
static void
get_shader_var_and_pointer_sizes(size_t *s_var_size, size_t *s_var_ptrs,
const gl_shader_variable *var)
{
*s_var_size = sizeof(gl_shader_variable);
*s_var_ptrs =
sizeof(var->type) +
sizeof(var->interface_type) +
sizeof(var->outermost_struct_type) +
sizeof(var->name);
}
static void
write_program_resource_data(struct blob *metadata,
struct gl_shader_program *prog,
@@ -878,16 +942,19 @@ write_program_resource_data(struct blob *metadata,
case GL_PROGRAM_INPUT:
case GL_PROGRAM_OUTPUT: {
const gl_shader_variable *var = (gl_shader_variable *)res->Data;
blob_write_bytes(metadata, var, sizeof(gl_shader_variable));
encode_type_to_blob(metadata, var->type);
if (var->interface_type)
encode_type_to_blob(metadata, var->interface_type);
if (var->outermost_struct_type)
encode_type_to_blob(metadata, var->outermost_struct_type);
encode_type_to_blob(metadata, var->interface_type);
encode_type_to_blob(metadata, var->outermost_struct_type);
blob_write_string(metadata, var->name);
size_t s_var_size, s_var_ptrs;
get_shader_var_and_pointer_sizes(&s_var_size, &s_var_ptrs, var);
/* Write gl_shader_variable skipping over the pointers */
blob_write_bytes(metadata, ((char *)var) + s_var_ptrs,
s_var_size - s_var_ptrs);
break;
}
case GL_UNIFORM_BLOCK:
@@ -978,17 +1045,18 @@ read_program_resource_data(struct blob_reader *metadata,
case GL_PROGRAM_OUTPUT: {
gl_shader_variable *var = ralloc(prog, struct gl_shader_variable);
blob_copy_bytes(metadata, (uint8_t *) var, sizeof(gl_shader_variable));
var->type = decode_type_from_blob(metadata);
if (var->interface_type)
var->interface_type = decode_type_from_blob(metadata);
if (var->outermost_struct_type)
var->outermost_struct_type = decode_type_from_blob(metadata);
var->interface_type = decode_type_from_blob(metadata);
var->outermost_struct_type = decode_type_from_blob(metadata);
var->name = ralloc_strdup(prog, blob_read_string(metadata));
size_t s_var_size, s_var_ptrs;
get_shader_var_and_pointer_sizes(&s_var_size, &s_var_ptrs, var);
blob_copy_bytes(metadata, ((uint8_t *) var) + s_var_ptrs,
s_var_size - s_var_ptrs);
res->Data = var;
break;
}
@@ -1058,7 +1126,7 @@ read_program_resource_list(struct blob_reader *metadata,
prog->data->NumProgramResourceList = blob_read_uint32(metadata);
prog->data->ProgramResourceList =
ralloc_array(prog, gl_program_resource,
ralloc_array(prog->data, gl_program_resource,
prog->data->NumProgramResourceList);
for (unsigned i = 0; i < prog->data->NumProgramResourceList; i++) {
@@ -1148,18 +1216,20 @@ write_shader_metadata(struct blob *metadata, gl_linked_shader *shader)
blob_write_bytes(metadata, glprog->sh.ImageUnits,
sizeof(glprog->sh.ImageUnits));
size_t ptr_size = sizeof(GLvoid *);
blob_write_uint32(metadata, glprog->sh.NumBindlessSamplers);
blob_write_uint32(metadata, glprog->sh.HasBoundBindlessSampler);
for (i = 0; i < glprog->sh.NumBindlessSamplers; i++) {
blob_write_bytes(metadata, &glprog->sh.BindlessSamplers[i],
sizeof(struct gl_bindless_sampler));
sizeof(struct gl_bindless_sampler) - ptr_size);
}
blob_write_uint32(metadata, glprog->sh.NumBindlessImages);
blob_write_uint32(metadata, glprog->sh.HasBoundBindlessImage);
for (i = 0; i < glprog->sh.NumBindlessImages; i++) {
blob_write_bytes(metadata, &glprog->sh.BindlessImages[i],
sizeof(struct gl_bindless_image));
sizeof(struct gl_bindless_image) - ptr_size);
}
write_shader_parameters(metadata, glprog->Parameters);
@@ -1187,6 +1257,8 @@ read_shader_metadata(struct blob_reader *metadata,
blob_copy_bytes(metadata, (uint8_t *) glprog->sh.ImageUnits,
sizeof(glprog->sh.ImageUnits));
size_t ptr_size = sizeof(GLvoid *);
glprog->sh.NumBindlessSamplers = blob_read_uint32(metadata);
glprog->sh.HasBoundBindlessSampler = blob_read_uint32(metadata);
if (glprog->sh.NumBindlessSamplers > 0) {
@@ -1196,7 +1268,7 @@ read_shader_metadata(struct blob_reader *metadata,
for (i = 0; i < glprog->sh.NumBindlessSamplers; i++) {
blob_copy_bytes(metadata, (uint8_t *) &glprog->sh.BindlessSamplers[i],
sizeof(struct gl_bindless_sampler));
sizeof(struct gl_bindless_sampler) - ptr_size);
}
}
@@ -1209,7 +1281,7 @@ read_shader_metadata(struct blob_reader *metadata,
for (i = 0; i < glprog->sh.NumBindlessImages; i++) {
blob_copy_bytes(metadata, (uint8_t *) &glprog->sh.BindlessImages[i],
sizeof(struct gl_bindless_image));
sizeof(struct gl_bindless_image) - ptr_size);
}
}
@@ -1224,6 +1296,14 @@ create_binding_str(const char *key, unsigned value, void *closure)
ralloc_asprintf_append(bindings_str, "%s:%u,", key, value);
}
static void
get_shader_info_and_pointer_sizes(size_t *s_info_size, size_t *s_info_ptrs,
shader_info *info)
{
*s_info_size = sizeof(shader_info);
*s_info_ptrs = sizeof(info->name) + sizeof(info->label);
}
static void
create_linked_shader_and_program(struct gl_context *ctx,
gl_shader_stage stage,
@@ -1242,12 +1322,16 @@ create_linked_shader_and_program(struct gl_context *ctx,
read_shader_metadata(metadata, glprog, linked);
glprog->info.name = ralloc_strdup(glprog, blob_read_string(metadata));
glprog->info.label = ralloc_strdup(glprog, blob_read_string(metadata));
size_t s_info_size, s_info_ptrs;
get_shader_info_and_pointer_sizes(&s_info_size, &s_info_ptrs,
&glprog->info);
/* Restore shader info */
blob_copy_bytes(metadata, (uint8_t *) &glprog->info, sizeof(shader_info));
if (glprog->info.name)
glprog->info.name = ralloc_strdup(glprog, blob_read_string(metadata));
if (glprog->info.label)
glprog->info.label = ralloc_strdup(glprog, blob_read_string(metadata));
blob_copy_bytes(metadata, ((uint8_t *) &glprog->info) + s_info_ptrs,
s_info_size - s_info_ptrs);
_mesa_reference_shader_program_data(ctx, &glprog->sh.data, prog->data);
_mesa_reference_program(ctx, &linked->Program, glprog);
@@ -1286,14 +1370,24 @@ shader_cache_write_program_metadata(struct gl_context *ctx,
if (sh) {
write_shader_metadata(metadata, sh);
/* Store nir shader info */
blob_write_bytes(metadata, &sh->Program->info, sizeof(shader_info));
if (sh->Program->info.name)
blob_write_string(metadata, sh->Program->info.name);
else
blob_write_string(metadata, "");
if (sh->Program->info.label)
blob_write_string(metadata, sh->Program->info.label);
else
blob_write_string(metadata, "");
size_t s_info_size, s_info_ptrs;
get_shader_info_and_pointer_sizes(&s_info_size, &s_info_ptrs,
&sh->Program->info);
/* Store shader info */
blob_write_bytes(metadata,
((char *) &sh->Program->info) + s_info_ptrs,
s_info_size - s_info_ptrs);
}
}
@@ -1339,7 +1433,7 @@ shader_cache_read_program_metadata(struct gl_context *ctx,
return false;
struct disk_cache *cache = ctx->Cache;
if (!cache || prog->data->cache_fallback || prog->data->skip_cache)
if (!cache || prog->data->skip_cache)
return false;
/* Include bindings when creating sha1. These bindings change the resulting

View File

@@ -36,7 +36,7 @@
#include "loop_analysis.h"
#include "standalone_scaffolding.h"
#include "standalone.h"
#include "util/string_to_uint_map.h"
#include "string_to_uint_map.h"
#include "util/set.h"
#include "linker.h"
#include "glsl_parser_extras.h"

View File

@@ -25,7 +25,7 @@
#include "main/mtypes.h"
#include "main/macros.h"
#include "util/ralloc.h"
#include "util/string_to_uint_map.h"
#include "string_to_uint_map.h"
#include "uniform_initializer_utils.h"
namespace linker {

View File

@@ -1975,10 +1975,10 @@ nir_system_value_from_intrinsic(nir_intrinsic_op intrin)
return SYSTEM_VALUE_HELPER_INVOCATION;
case nir_intrinsic_load_view_index:
return SYSTEM_VALUE_VIEW_INDEX;
case SYSTEM_VALUE_SUBGROUP_SIZE:
return nir_intrinsic_load_subgroup_size;
case SYSTEM_VALUE_SUBGROUP_INVOCATION:
return nir_intrinsic_load_subgroup_invocation;
case nir_intrinsic_load_subgroup_size:
return SYSTEM_VALUE_SUBGROUP_SIZE;
case nir_intrinsic_load_subgroup_invocation:
return SYSTEM_VALUE_SUBGROUP_INVOCATION;
case nir_intrinsic_load_subgroup_eq_mask:
return SYSTEM_VALUE_SUBGROUP_EQ_MASK;
case nir_intrinsic_load_subgroup_ge_mask:

View File

@@ -41,9 +41,9 @@
#include "compiler/shader_info.h"
#include <stdio.h>
#ifdef DEBUG
#ifndef NDEBUG
#include "util/debug.h"
#endif /* DEBUG */
#endif /* NDEBUG */
#include "nir_opcodes.h"
@@ -1204,7 +1204,6 @@ typedef struct {
* - nir_texop_txf_ms
* - nir_texop_txs
* - nir_texop_lod
* - nir_texop_tg4
* - nir_texop_query_levels
* - nir_texop_texture_samples
* - nir_texop_samples_identical
@@ -2299,7 +2298,7 @@ nir_variable *nir_variable_clone(const nir_variable *c, nir_shader *shader);
nir_deref *nir_deref_clone(const nir_deref *deref, void *mem_ctx);
nir_deref_var *nir_deref_var_clone(const nir_deref_var *deref, void *mem_ctx);
#ifdef DEBUG
#ifndef NDEBUG
void nir_validate_shader(nir_shader *shader);
void nir_metadata_set_validation_flag(nir_shader *shader);
void nir_metadata_check_validation_flag(nir_shader *shader);
@@ -2329,7 +2328,7 @@ static inline void nir_metadata_set_validation_flag(nir_shader *shader) { (void)
static inline void nir_metadata_check_validation_flag(nir_shader *shader) { (void) shader; }
static inline bool should_clone_nir(void) { return false; }
static inline bool should_print_nir(void) { return false; }
#endif /* DEBUG */
#endif /* NDEBUG */
#define _PASS(nir, do_pass) do { \
do_pass \

View File

@@ -121,9 +121,9 @@ BARRIER(memory_barrier_shared)
INTRINSIC(discard_if, 1, ARR(1), false, 0, 0, 0, xx, xx, xx, 0)
/** ARB_shader_group_vote intrinsics */
INTRINSIC(vote_any, 1, ARR(1), true, 1, 1, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(vote_all, 1, ARR(1), true, 1, 1, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(vote_eq, 1, ARR(1), true, 1, 1, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(vote_any, 1, ARR(1), true, 1, 0, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(vote_all, 1, ARR(1), true, 1, 0, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(vote_eq, 1, ARR(1), true, 1, 0, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
/**
* Basic Geometry Shader intrinsics.
@@ -433,7 +433,7 @@ INTRINSIC(load_interpolated_input, 2, ARR(2, 1), true, 0, 0,
/* src[] = { buffer_index, offset }. No const_index */
LOAD(ssbo, 2, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
/* src[] = { offset }. const_index[] = { base, component } */
LOAD(output, 1, 1, BASE, COMPONENT, xx, NIR_INTRINSIC_CAN_ELIMINATE)
LOAD(output, 1, 2, BASE, COMPONENT, xx, NIR_INTRINSIC_CAN_ELIMINATE)
/* src[] = { vertex, offset }. const_index[] = { base, component } */
LOAD(per_vertex_output, 2, 1, BASE, COMPONENT, xx, NIR_INTRINSIC_CAN_ELIMINATE)
/* src[] = { offset }. const_index[] = { base } */

View File

@@ -59,7 +59,7 @@ nir_metadata_preserve(nir_function_impl *impl, nir_metadata preserved)
impl->valid_metadata &= preserved;
}
#ifdef DEBUG
#ifndef NDEBUG
/**
* Make sure passes properly invalidate metadata (part 1).
*

View File

@@ -308,7 +308,7 @@ for (unsigned bit = 0; bit < 32; bit++) {
unop_convert("ufind_msb", tint32, tuint32, """
dst = -1;
for (int bit = 31; bit > 0; bit--) {
for (int bit = 31; bit >= 0; bit--) {
if ((src0 >> bit) & 1) {
dst = bit;
break;
@@ -717,12 +717,12 @@ opcode("bitfield_insert", 0, tuint32, [0, 0, 0, 0],
unsigned base = src0, insert = src1;
int offset = src2, bits = src3;
if (bits == 0) {
dst = 0;
dst = base;
} else if (offset < 0 || bits < 0 || bits + offset > 32) {
dst = 0;
} else {
unsigned mask = ((1ull << bits) - 1) << offset;
dst = (base & ~mask) | ((insert << bits) & mask);
dst = (base & ~mask) | ((insert << offset) & mask);
}
""")

View File

@@ -250,8 +250,8 @@ optimizations = [
(('ishr', a, 0), a),
(('ushr', 0, a), 0),
(('ushr', a, 0), a),
(('iand', 0xff, ('ushr', a, 24)), ('ushr', a, 24)),
(('iand', 0xffff, ('ushr', a, 16)), ('ushr', a, 16)),
(('iand', 0xff, ('ushr@32', a, 24)), ('ushr', a, 24)),
(('iand', 0xffff, ('ushr@32', a, 16)), ('ushr', a, 16)),
# Exponential/logarithmic identities
(('~fexp2', ('flog2', a)), a), # 2^lg2(a) = a
(('~flog2', ('fexp2', a)), a), # lg2(2^a) = a

View File

@@ -28,6 +28,26 @@
* \file nir_opt_intrinsics.c
*/
static nir_ssa_def *
high_subgroup_mask(nir_builder *b,
nir_ssa_def *count,
uint64_t base_mask)
{
/* group_mask could probably be calculated more efficiently but we want to
* be sure not to shift by 64 if the subgroup size is 64 because the GLSL
* shift operator is undefined in that case. In any case if we were worried
* about efficency this should probably be done further down because the
* subgroup size is likely to be known at compile time.
*/
nir_ssa_def *subgroup_size = nir_load_subgroup_size(b);
nir_ssa_def *all_bits = nir_imm_int64(b, ~0ull);
nir_ssa_def *shift = nir_isub(b, nir_imm_int(b, 64), subgroup_size);
nir_ssa_def *group_mask = nir_ushr(b, all_bits, shift);
nir_ssa_def *higher_bits = nir_ishl(b, nir_imm_int64(b, base_mask), count);
return nir_iand(b, higher_bits, group_mask);
}
static bool
opt_intrinsics_impl(nir_function_impl *impl)
{
@@ -95,10 +115,10 @@ opt_intrinsics_impl(nir_function_impl *impl)
replacement = nir_ishl(&b, nir_imm_int64(&b, 1ull), count);
break;
case nir_intrinsic_load_subgroup_ge_mask:
replacement = nir_ishl(&b, nir_imm_int64(&b, ~0ull), count);
replacement = high_subgroup_mask(&b, count, ~0ull);
break;
case nir_intrinsic_load_subgroup_gt_mask:
replacement = nir_ishl(&b, nir_imm_int64(&b, ~1ull), count);
replacement = high_subgroup_mask(&b, count, ~1ull);
break;
case nir_intrinsic_load_subgroup_le_mask:
replacement = nir_inot(&b, nir_ishl(&b, nir_imm_int64(&b, ~1ull), count));

View File

@@ -35,7 +35,7 @@
/* Since this file is just a pile of asserts, don't bother compiling it if
* we're not building a debug build.
*/
#ifdef DEBUG
#ifndef NDEBUG
/*
* Per-register validation state.

View File

@@ -32,14 +32,14 @@ extern "C" {
#endif
typedef struct shader_info {
/** The shader stage, such as MESA_SHADER_VERTEX. */
gl_shader_stage stage;
const char *name;
/* Descriptive name provided by the client; may be NULL */
const char *label;
/** The shader stage, such as MESA_SHADER_VERTEX. */
gl_shader_stage stage;
/* Number of textures used by this shader */
unsigned num_textures;
/* Number of uniform buffers used by this shader */

View File

@@ -721,7 +721,7 @@ translate_image_format(SpvImageFormat format)
case SpvImageFormatRg32ui: return 0x823C; /* GL_RG32UI */
case SpvImageFormatRg16ui: return 0x823A; /* GL_RG16UI */
case SpvImageFormatRg8ui: return 0x8238; /* GL_RG8UI */
case SpvImageFormatR16ui: return 0x823A; /* GL_RG16UI */
case SpvImageFormatR16ui: return 0x8234; /* GL_R16UI */
case SpvImageFormatR8ui: return 0x8232; /* GL_R8UI */
default:
assert(!"Invalid image format");
@@ -1490,6 +1490,8 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
struct vtn_value *val =
vtn_push_value(b, w[2], vtn_value_type_sampled_image);
val->sampled_image = ralloc(b, struct vtn_sampled_image);
val->sampled_image->type =
vtn_value(b, w[1], vtn_value_type_type)->type;
val->sampled_image->image =
vtn_value(b, w[3], vtn_value_type_pointer)->pointer;
val->sampled_image->sampler =
@@ -1516,16 +1518,12 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
sampled = *sampled_val->sampled_image;
} else {
assert(sampled_val->value_type == vtn_value_type_pointer);
sampled.type = sampled_val->pointer->type;
sampled.image = NULL;
sampled.sampler = sampled_val->pointer;
}
const struct glsl_type *image_type;
if (sampled.image) {
image_type = sampled.image->var->var->interface_type;
} else {
image_type = sampled.sampler->var->var->interface_type;
}
const struct glsl_type *image_type = sampled.type->type;
const enum glsl_sampler_dim sampler_dim = glsl_get_sampler_dim(image_type);
const bool is_array = glsl_sampler_type_is_array(image_type);
const bool is_shadow = glsl_sampler_type_is_shadow(image_type);
@@ -1757,6 +1755,7 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
case nir_texop_txb:
case nir_texop_txl:
case nir_texop_txd:
case nir_texop_tg4:
/* These operations require a sampler */
instr->sampler = nir_deref_var_clone(sampler, instr);
break;
@@ -1764,7 +1763,6 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
case nir_texop_txf_ms:
case nir_texop_txs:
case nir_texop_lod:
case nir_texop_tg4:
case nir_texop_query_levels:
case nir_texop_texture_samples:
case nir_texop_samples_identical:
@@ -2034,6 +2032,7 @@ vtn_handle_image(struct vtn_builder *b, SpvOp opcode,
case SpvOpAtomicIDecrement:
case SpvOpAtomicExchange:
case SpvOpAtomicIAdd:
case SpvOpAtomicISub:
case SpvOpAtomicSMin:
case SpvOpAtomicUMin:
case SpvOpAtomicSMax:
@@ -2801,7 +2800,8 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, SpvOp opcode,
case SpvOpMemoryModel:
assert(w[1] == SpvAddressingModelLogical);
assert(w[2] == SpvMemoryModelGLSL450);
assert(w[2] == SpvMemoryModelSimple ||
w[2] == SpvMemoryModelGLSL450);
break;
case SpvOpEntryPoint: {

Some files were not shown because too many files have changed in this diff Show More