Compare commits

...

85 Commits

Author SHA1 Message Date
Emil Velikov
b26488dead docs: add release notes for 18.3.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-18 18:23:55 +00:00
Emil Velikov
a41881fcaa Update version to 18.3.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-18 18:19:54 +00:00
Eric Anholt
55f3a4fac3 vc4: Fix copy-and-paste fail in backport of NEON asm fixes.
One of the cpu pointers wasn't marked as read-write, causing gcc to complain:

../src/gallium/drivers/vc4/vc4_tiling_lt.c:181:17: error: output operand constraint lacks ‘=’
                 __asm__ volatile (

Cc: Emil Velikov <emil.l.velikov@gmail.com>
Fixes: 813f0a8296 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
2019-02-16 13:46:37 +00:00
Dylan Baker
d000488c2e meson: Add dependency on genxml to anvil
Currently the Intel "anvil" driver races with the generation of genxml
files, while i965 has an explicit dependency. This patch adds the same
dependency to anvil.

Fixes: d1992255bb
       ("meson: Add build Intel "anv" vulkan driver")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 279060cd32)
2019-02-15 11:40:11 +00:00
Samuel Pitoiset
4aa92b54e5 radv: always export gl_SampleMask when the fragment shader uses it
For some reasons, this breaks trees rendering in Project Cars.

Fixes: 85010585cd ("radv: only enable gl_SampleMask if MSAA is enabled too")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109401
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 334da034d8)
2019-02-15 11:40:11 +00:00
Dylan Baker
08ab660bf5 get-pick-list: Add --pretty=medium to the arguments for Cc patches
Because none of them have been picked up for 19.0 due to this bug
being reintroduced.

v2: - Fix fixes tags

Fixes: e6b3a3b201
       ("bin/get-pick-list.sh: handle "typod" usecase.")
Fixes: fac10169bb
       ("bin/get-pick-list.sh: prefix output with "[stable] "")
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit aff52dd2c6)
2019-02-15 11:40:11 +00:00
Oscar Blumberg
4bb51927aa radeonsi: Fix guardband computation for large render targets
Stop using 12.12 quantization for viewports that are not contained in
the lower 4k corner of the render target as the hardware needs to keep
both absolute and relative coordinates representable.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3c540e0a74)
2019-02-15 11:40:11 +00:00
Juan A. Suarez Romero
7662965ce9 anv/cmd_buffer: check for NULL framebuffer
This can happen when we record a VkCmdDraw in a secondary buffer that
was created inheriting from the primary buffer, but with the framebuffer
set to NULL in the VkCommandBufferInheritanceInfo.

Vulkan 1.1.81 spec says that "the application must ensure (using scissor
if neccesary) that all rendering is contained in the render area [...]
[which] must be contained within the framebuffer dimesions".

While this should be done by the application, commit 465e5a86 added the
clamp to the framebuffer size, in case of application does not do it.
But this requires to know the framebuffer dimensions.

If we do not have a framebuffer at that moment, the best compromise we
can do is to just apply the scissor as it is, and let the application to
ensure the rendering is contained in the render area.

v2: do not clamp to framebuffer if there isn't a framebuffer

v3 (Jason):
- clamp earlier in the conditional
- clamp to render area if command buffer is primary

v4: clamp also x and y to render area (Jason)

v5: rename used variables (Jason)

Fixes: 465e5a86 ("anv: Clamp scissors to the framebuffer boundary")
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 1ad26f9417)
2019-02-15 11:40:11 +00:00
Emil Velikov
6cea56e2c2 cherry-ignore: radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8
stable The commit addresses functionality not present in branch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-15 11:40:09 +00:00
Rodrigo Vivi
5b48a26072 intel: Add more PCI Device IDs for Coffee Lake and Ice Lake.
Align with kernel commits:

5e0f5a58b167 ("drm/i915/cfl: Adding another PCI Device ID.")
03ca3cf8e9aa ("drm/i915/icl: Adding few more device IDs for Ice Lake")

Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 56c3b4971d)
2019-02-15 11:39:41 +00:00
Mario Kleiner
d3f49ece4e egl/wayland-drm: Only announce formats via wl_drm which the driver supports.
Check if a pixel format is supported by the Wayland servers gpu driver
before exposing it to the client via wl_drm, so we avoid reporting formats
to the client which the server gpu can't handle.

Restrict this reporting to the new color depth 30 formats for now, as the
ARGB/XRGB8888 and RGB565 formats are probably supported by every gpu under
the sun.

Atm. this is mostly useful to allow proper PRIME renderoffload for depth
30 formats on the typical Intel iGPU + NVidia dGPU "NVidia Optimus" laptop
combo.

Tested on Intel, AMD, NVidia with single-gpu setup and on a Intel + NVidia
Optimus setup.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 820dfcea43)
2019-02-15 11:39:41 +00:00
Mario Kleiner
ecad528a11 egl/wayland: Allow client->server format conversion for PRIME offload. (v2)
Support PRIME render offload between a Wayland server gpu and a Wayland
client gpu with different channel ordering for their color formats,
e.g., between Intel drivers which currently only support ARGB2101010
and XRGB2101010 import/display and nouveau which only supports ABGR2101010
rendering and display on nv-50 and later.

In the wl_visuals table, we also store for each format an alternate
sibling format which stores colors at the same precision, but with
different channel ordering, e.g., ARGB2101010 <-> ABGR2101010.

If a given client-gpu renderable format is not supported by the server
for import, but the alternate format is supported by the server, expose
the client-gpu renderable format as a valid EGLConfig to the client. At
eglSwapBuffers time, during the blitImage() detiling blit from the client
backbuffer to the linear buffer, the client format is converted to the
server supported format. As we have to do a copy for PRIME anyway,
this channel swizzling conversion comes essentially for free.

Note that even if a server gpu in principle does support sampling
from the clients native format, this conversion will be a performance
advantage if it allows to convert to the servers preferred format
for direct scanout, as the Wayland compositor may then be able to
directly page-flip a fullscreen client wl_buffer onto the primary
plane, or onto a hardware overlay plane, avoiding an extra data copy
for desktop composition.

Tested so far under Weston with: nouveau single-gpu, Intel single-gpu,
AMD single-gpu, "Optimus" Intel server iGPU for display + NVidia
client dGPU for rendering.

v2: Implement minor review comments by Eric Engestrom: Add some
    comment and assert, and some style fixes for clarity.
    No functional change.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit a34b0d68bb)
2019-02-15 11:39:41 +00:00
Iago Toral Quiroga
f036a040bb intel/compiler: do not copy-propagate strided regions to ddx/ddy arguments
The implementation of these opcodes in the generator assumes that their
arguments are packed, and it generates register regions based on that
assumption.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 3918943211)
2019-02-15 11:39:41 +00:00
Samuel Pitoiset
5694279c14 radv: fix compiler issues with GCC 9
"The C standard says that compound literals which occur inside of
the body of a function have automatic storage duration associated
with the enclosing block. Older GCC releases were putting such
compound literals into the scope of the whole function, so their
lifetime actually ended at the end of containing function. This
has been fixed in GCC 9. Code that relied on this extended lifetime
needs to be fixed, move the compound literals to whatever scope
they need to accessible in."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 129a9f4937)
2019-02-15 11:39:41 +00:00
Kenneth Graunke
75340edb27 st/mesa: Limit GL_MAX_[NATIVE_]PROGRAM_PARAMETERS_ARB to 2048
Piglit's vp-max-array test creates a vertex program containing a uniform
array sized to the value of GL_MAX_NATIVE_PROGRAM_PARAMETERS_ARB.  Mesa
will then add additional state-var parameters for things like the MVP
matrix.

radeonsi currently exposes a value of 4096, derived from constant buffer
upload size.  This means the array will have 4096 elements, and the
extra MVP state-vars would get a prog_src_register::Index of over 4096.

Unfortunately, prog_src_register::Index is a signed 13-bit integer, so
values beyond 4096 end up turning into negative numbers.  Negative
source indexes are only valid for relative addressing, so this ends up
generating illegal IR.

In prog_to_nir, this would cause an out of bounds array access.
st_mesa_to_tgsi checks for a negative value, assumes it's bogus,
and remaps it to parameter 0 in order to get something in-range.
This isn't right - instead of reading the MVP matrix, it would read
the first element of the vertex program's large array.  But the test
only checks that the program compiles, so we never noticed that it
was broken.

This patch limits the size of the program limits, with the understanding
that we may need to generate additional state-vars internally.  i965 has
exposed 1024 for this limit for years, so I don't expect lowering it to
2048 will cause any practical problems for radeonsi or other drivers.

Fixes vp-max-array with prog_to_nir.c.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit f45dd6d31b)
2019-02-15 11:39:41 +00:00
Leo Liu
dafa02c980 st/va/vp9: set max reference as default of VP9 reference number
If there is no information about number of render targets

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a0a52a0367)
2019-02-15 11:39:21 +00:00
Leo Liu
36258308a7 st/va: fix the incorrect max profiles report
Add "PIPE_VIDEO_PROFILE_MAX" to enum, so it will make sure here will
be correct when adding more profiles in the future.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109107

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 21cdb828a3)
2019-02-15 11:38:47 +00:00
Marek Olšák
f1eccd091d winsys/amdgpu: don't drop manually added fence dependencies
wow, it's hard to believe that fence and syncobjs dependencies were ignored.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit ddfe209a0d)
2019-02-15 11:38:47 +00:00
Marek Olšák
945aa87408 radeonsi: fix EXPLICIT_FLUSH for flush offsets > 0
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 61c678d4bc)
2019-02-15 11:38:47 +00:00
Marek Olšák
b3b0a97f69 gallium/u_threaded: fix EXPLICIT_FLUSH for flush offsets > 0
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 4522f01d4e)
2019-02-15 11:38:47 +00:00
Jason Ekstrand
3545986962 nir/deref: Rematerialize parents in rematerialize_derefs_in_use_blocks
When nir_rematerialize_derefs_in_use_blocks_impl was first written, I
attempted to optimize things a bit by not bothering to re-materialize
the sources of deref instructions figuring that the final caller would
take care of that.  However, in the case of more complex deref chains
where the first link or two lives in block A and then another link and
the load/store_deref intrinsic live in block B it doesn't work.  The
code in rematerialize_deref_in_block looks at the tail of the chain,
sees that it's already in block B and skips it, not realizing that part
of the chain also lives in block A.

The easy solution here is to just rematerialize deref sources of deref
instructions as well.  This may potentially lead to a few more deref
instructions being created by the conditions required for that to
actually happen are fairly unlikely and, thanks to the caching, it's all
linear time regardless.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603
Fixes: 7d1d1208c2 "nir: Add a small pass to rematerialize derefs per-block"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 9e6a6ef0d4)
2019-02-15 11:38:47 +00:00
Ilia Mirkin
a9c0e146ef nvc0: we have 16k-sized framebuffers, fix default scissors
For some reason we don't use view volume clipping by default, and use
scissors instead. These scissors were set to an 8k max fb size, while
the driver advertises 16k-sized framebuffers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cc79a1483f)
2019-02-15 11:38:47 +00:00
Emil Velikov
541eb984ea cherry-ignore: add more 19.0 only nominations from Ilia
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-15 11:38:13 +00:00
Kristian H. Kristensen
fb63b1b3bf freedreno/a6xx: Emit blitter dst with OUT_RELOCW
We're writing to the bo and the kernel needs to know for
fd_bo_cpu_prep() to work.

Fixes: f93e431272 ("freedreno/a6xx: Enable blitter")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
(cherry picked from commit 357ea7da51)
2019-02-14 12:28:47 +00:00
Bas Nieuwenhuizen
08834a3721 amd/common: Use correct writemask for shared memory stores.
The check was for 1 bit being set, which is clearly not what we want.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3c24fc64c7)
2019-02-14 12:28:47 +00:00
Bas Nieuwenhuizen
f04d57ff1f radv: Only look at pImmutableSamples if the descriptor has a sampler.
Equivalent of ANV patch c7f4a2867c

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 39ab4e12f7)
2019-02-14 12:28:47 +00:00
Eric Engestrom
45c3bf14ca xvmc: fix string comparison
Fixes: 6fca18696d "g3dvl: Update XvMC unit tests."
Cc: Younes Manton <younes.m@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 40b53a7203)
2019-02-14 12:28:47 +00:00
Eric Engestrom
2180aa1bb2 xvmc: fix string comparison
Fixes: c7b65dcaff "xvmc: Define some Xv attribs to allow users
                             to specify color standard and procamp"
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 110a6e1839)
2019-02-14 12:28:47 +00:00
Bart Oldeman
fdb66dd155 gallium-xlib: query MIT-SHM before using it.
When Mesa is compiled for gallium-xlib using e.g.
./configure --enable-glx=gallium-xlib --disable-dri --disable-gbm
-disable-egl
and is used by an X server (usually remotely via SSH X11 forwarding)
that does not support MIT-SHM such as XMing or MobaXterm, OpenGL
clients report error messages such as
Xlib:  extension "MIT-SHM" missing on display "localhost:11.0".
ad infinitum.

The reason is that the code in src/gallium/winsys/sw/xlib uses
MIT-SHM without checking for its existence, unlike the code
in src/glx/drisw_glx.c and src/mesa/drivers/x11/xm_api.c.
I copied the same check using XQueryExtension, and tested with
glxgears on MobaXterm.

This issue was reported before here:
https://lists.freedesktop.org/archives/mesa-users/2016-July/001183.html

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a203eaa4f4)
2019-02-14 12:28:47 +00:00
Emil Velikov
e868c77615 cherry-ignore: nv50,nvc0: add explicit settings for recent caps
stable Explicit 19.0 only nomination.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-14 12:28:28 +00:00
Marek Olšák
a19ddce953 meson: drop the xcb-xrandr version requirement
autotools doesn't have any requirement. This fixes meson on Ubuntu 16.04.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 1e85cfb91a)
2019-02-12 12:53:18 +00:00
Jason Ekstrand
7bf9cf29dc intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b4f0d062cd)
2019-02-12 12:53:14 +00:00
Ernestas Kulik
e0eba40ae4 v3d: Fix leak in resource setup error path
Reported by Coverity: in the case of unsupported modifier request, the
code does not jump to the “fail” label to destroy the acquired resource.

CID: 1435704
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
(cherry picked from commit 90458bef54)
2019-02-12 12:53:12 +00:00
Ernestas Kulik
1a2b227fce vc4: Fix leak in HW queries error path
Reported by Coverity: in the case where there exist hardware and
non-hardware queries, the code does not jump to err_free_query and leaks
the query.

CID: 1430194
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 9ea90ffb98 ("broadcom/vc4: Add support for HW perfmon")
(cherry picked from commit f6e49d5ad0)
2019-02-12 12:53:09 +00:00
Jason Ekstrand
6beaa2d7fb intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf()
Like all the other sends, it's just mlen * REG_SIZE.

Fixes: 3cbc02e469 "intel: Use TXS for image_size when we have..."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit cf42b0f9e2)
2019-02-12 12:53:06 +00:00
Rob Clark
434f19a8dc freedreno: stop frob'ing pipe_resource::nr_samples
Previously we tried to normalize nr_samples to MAX2(1, nr_samples) to
avoid having to deal with 0 vs 1 everywhere.  But this causes problems
in mesa/st, for example st_finalize_texture() will think there is a
nr_samples mismatch and recreate the texture.  Somehow this manifests
as corrupt x11 font rendering on generations that do not support MSAA
(but apparently works fine on a5xx and a6xx which do support MSAA.)

Fixes: cf0c7258ee freedreno/a5xx: MSAA
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit c3baa077bf)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/freedreno/freedreno_batch_cache.c
2019-02-12 12:52:32 +00:00
Emil Velikov
7475d7727f docs: add sha256 checksums for 18.3.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-31 21:08:36 +00:00
Emil Velikov
190a79f462 docs: add release notes for 18.3.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-31 20:58:09 +00:00
Danylo Piliaiev
871aea89fd glsl: Fix copying function's out to temp if dereferenced by array
Function's out variable could be an array dereferenced by an array:
 func(v[w[i]]);
or something more complicated.

Copy index in any case.

Fixes: 76c27e47b9 ("glsl: Copy function out to temp if we don't directly ref a variable")

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0862929bf6)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109488
Nominated-by: Matt Turner <mattst88@gmail.com>
2019-01-31 12:06:17 +00:00
Timothy Arceri
f2c1d7acd0 glsl: Copy function out to temp if we don't directly ref a variable
Otherwise we can end up with IR that looks like this:

    (
      (declare (temporary ) vec4 f@8)
      (assign  (xyzw) (var_ref f@8)  (var_ref f) )
      (call f16  ((swiz y (var_ref f@8) )))

      (assign  (xyzw) (var_ref f)  (var_ref f@8) )
    ))

When we really need:

      (declare (temporary ) float inout_tmp)
      (assign  (x) (var_ref inout_tmp)  (swiz y (var_ref f) ))
      (call f16  ((var_ref inout_tmp) ))

      (assign  (y) (var_ref f)  (swiz y (swiz xxxx (var_ref inout_tmp) )))
      (declare (temporary ) void void_var)

The GLSL IR function inlining code seemed to produce correct code
even without this but we need the correct IR for GLSL IR -> NIR to
be able to understand whats going on.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 76c27e47b9)
Nominated-by: Matt Turner <mattst88@gmail.com>
2019-01-31 12:05:54 +00:00
Tomeu Vizoso
5e8af9e609 etnaviv: Consolidate buffer references from framebuffers
We were leaking surfaces because the references taken in
etna_set_framebuffer_state weren't being released on context destroy.

Instead of just directly releasing those references in
etna_context_destroy, use the util_copy_framebuffer_state helper.

Take the chance to remove the duplicated buffer references in
compiled_framebuffer_state to avoid confusion.

The leak can be reproduced with a client that continuously creates and
destroys contexts.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit bf1dfcc3e8)
[Emil: resolve trivial conflict - dummy_rt does not exist in branch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/gallium/drivers/etnaviv/etnaviv_context.c
2019-01-30 17:33:50 +00:00
Eric Anholt
f072585522 vc4: Enable NEON asm on meson cross-builds.
The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson
cross-builds because of the need to run host binaries on the build system.
vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my
cross-builds.

Fixes: ebcb4c2156 ("meson: Enable VC4's NEON assembly support.")
(cherry picked from commit 932ed9c00b)
2019-01-30 17:33:50 +00:00
Vinson Lee
f275e16c9e meson: Fix typo.
meson.build:166:21: ERROR:  Unknown method "verson_compare" for a string.

Fixes: c1efa240c9 ("meson: Add warnings and errors when using ICC")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit be5b271ea7)
2019-01-30 17:33:50 +00:00
Carsten Haitzler (Rasterman)
813f0a8296 vc4: Declare the cpu pointers as being modified in NEON asm.
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified.  Fixes
crashes on texture upload/download with current gcc.

We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.

(commit message by anholt)

Fixes: 4d30024238 ("vc4: Use NEON to speed up utile loads on Pi2.")
(cherry picked from commit 300d3ae8b1)
[Emil: apply the patch to vc4_tiling_lt.c instead of v3d_cpu_tiling.h]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/broadcom/common/v3d_cpu_tiling.h

Squashed with commit:

vc4: Declare the last cpu pointer as being modified in NEON asm.

Earlier commit addressed 7 of the 8 instances available.

v2: Rebase patch back to master (by anholt)

Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com>
Cc: Eric Anholt <eric@anholt.net>
Fixes: 300d3ae8b1 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 385843ac3c)

Conflicts:
	src/broadcom/common/v3d_cpu_tiling.h
2019-01-30 17:33:23 +00:00
Carsten Haitzler (Rasterman)
b280cdb59e vc4: Use named parameters for the NEON inline asm.
This makes the asm code more intelligible and clarifies the functional
change in the next commit.

(commit message and commit squashing by anholt)
(cherry picked from commiti 522f688471)
[Emil: apply the patch to vc4_tiling_lt.c instead of v3d_cpu_tiling.h]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
        src/broadcom/common/v3d_cpu_tiling.h
2019-01-29 19:30:12 +00:00
Timothy Arceri
3b9e9e4723 glsl: use remap location when serialising uniform program resource data
This allows us to avoid expensive string compares since we already have
a map to the pointers.

These compares were taking ~30 seconds for a single shader compile
in Godot due to it using 64,000+ uniforms.

Fixes: c4cff5f402 ("glsl: add basic support for resource list to shader cache")

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109229
(cherry picked from commit fb78a6cb72)
2019-01-29 17:44:27 +00:00
Timothy Arceri
12586d5846 radv/ac: fix some fp16 handling
Fixes: b722b29f10 ("radv: add support for 16bit input/output")

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 0907ae35ad)
2019-01-29 17:44:27 +00:00
Niklas Haas
e362fe26ea radv: correctly use vulkan 1.0 by default
From the vulkan spec 3.2 "Instances":

"Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing an
apiVersion of 0 is equivalent to providing an apiVersion of
VK_MAKE_VERSION(1,0,0)."

Fixes: ffa15861ef "radv: UseEnumerateInstanceVersion for the default version."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit d12dc39396)
2019-01-29 17:44:27 +00:00
Axel Davy
220490cf5f st/nine: Immediately upload user provided textures
Fixes regression caused by
42d672fa6a
st/nine: Bind src not dst in nine_context_box_upload

Before that patch, for user provided textures,
when the texture was destroyed, the safety
check for pending uploads, which according to
the code "Following condition cannot happen currently",
was flushing the queue and thus triggering the upload.

After the patch, the texture destruction was delayed after
the upload. However the user frees the texture buffer,
as it thinks the texture released.

Instead of reverting the faulty patch,
this patch instead flushes the csmt queue right away
after queuing the upload for this type of textures.
This is more future-proof, as we may want to bind the
surface for other reasons in the future.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d7433c22e6)
2019-01-29 17:44:27 +00:00
Dylan Baker
991f9ea553 meson: Add warnings and errors when using ICC
ICC tries to be helpful by not erroring when it sees something that it
doesn't understand, which is completely the opposite of helpful. Meson
0.49.0 does much better at handling this by really trying to make ICC
error, but there are some things in mesa that still get ignored until
0.49.1

v2: - Fix id check, which is 'intel' not 'icc'

Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
(cherry picked from commit c1efa240c9)
2019-01-29 17:44:27 +00:00
Lionel Landwerlin
84f59f6bbc anv: fix invalid binding table index computation
The ++ operator strikes again.

Fixes: f92c5bc8f3 ("anv/device: fix maximum number of images supported")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 4149d41f2e)
2019-01-29 17:44:27 +00:00
Emil Velikov
e1374ce107 cherry-ignore: WARNING: Commit XXX lists invalid sha
warn The commits refer stale sha, yet don't fix anything in particular.

98984b7cdd
9f86f1da7c

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-29 17:44:27 +00:00
Timothy Arceri
0b4d381ee0 ac/nir_to_llvm: fix clamp shadow reference for more hardware
Fixes the following piglit test on my VEGA and matches the behaviour in the
tgsi backend.

tests/spec/glsl-1.10/execution/samplers/glsl-fs-shadow2D-clamp-z.shader_test

Fixes: 625dcbbc45 ("amd/common: pass address components individually to ac_build_image_intrinsic")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 5d66f7103f)
2019-01-29 17:44:27 +00:00
Eric Engestrom
6050d6f1cf meson/vdpau: add missing soversion
This mirrors what autotools does in src/gallium/state_trackers/vdpau/Makefile.am
and src/gallium/targets/vdpau/Makefile.am:

  VDPAU_MAJOR = 1
  VDPAU_MINOR = 0
  libvdpau_gallium_la_LDFLAGS = -version-number $(VDPAU_MAJOR):$(VDPAU_MINOR)

Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Fixes: 68076b8747 "meson: build gallium vdpau state tracker"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 69e9440367)
2019-01-29 17:44:27 +00:00
Dylan Baker
cadab68f95 meson: Fix compiler checks for SWR with ICC
This is a bit fragile, as the way this "fixes" the check is to move the
one that we know is correct before the one that is incorrectly reported
as working. In meson 0.49.1 (which isn't out yet) this is fixed that the
incorrect check is reported as a failure.

Fixes: e0b037d697
       ("meson: Build SWR driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109129
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 7cb7f35bc7)
2019-01-29 17:44:27 +00:00
Dylan Baker
8f45b22c11 meson: fix swr KNL build
There's a typo in one of the #defines that breaks compilation.

Fixes: e0b037d697
       ("meson: Build SWR driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109023
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 3ba7ab8d2c)
2019-01-29 17:44:27 +00:00
Dave Airlie
fea0bca1be gallium: use put image shm2 path (v2)
This fixes the drisw paths to use the new shm2 interface, so that
we don't trigger the X server overflow checks when the x offset is non-zero.

This just hides the versioning in drisw, and either passes the src_x
or adds the offset fixup for the fallback path.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 1f6b92b476)
2019-01-29 17:44:27 +00:00
Dave Airlie
32c0f59c48 glx: add support for putimageshm2 path (v2)
v2: pass x,0 in as the offset coords at glx level not earlier

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 00af91ca46)
2019-01-29 17:44:27 +00:00
Dave Airlie
2733d26011 dri_interface: add put shm image2 (v2)
This adds a new interface to the swrast interface to fix an shm put image bug.

The current code adds the x,y src offsets into the offset parameters,
however if the x offset is > 0, and the put image copies up to the height
of the image, this can trigger an X server validation check to fail and
the renderering to get BadMatch.

This patch fixes it to pass the x offset coord in as a src x.

We cannot pass the Y coordinate due to the horrible code mangling the
image w/h vs stride in swrastXPutImage.

v2: drop srcx,y from api

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit db83a2b40f)
2019-01-29 17:44:27 +00:00
Marek Olšák
8a6c154496 st/mesa: purge framebuffers when unbinding a context
This fixes pipe_surface "leaks".

Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit c89e8470e5)
2019-01-29 17:44:27 +00:00
Rob Clark
9d45651005 loader: fix the no-modifiers case
Normally modifiers take precendence over use flags, as they are more
explicit.  But if the driver supports modifiers, but the xserver does
not, then we should fallback to the old mechanism of allocating a buffer
using 'use' flags.

Fixes: 069fdd5f9f
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
(cherry picked from commit c56fe4118a)
2019-01-29 17:44:27 +00:00
Marek Olšák
77ac39c359 radeonsi: fix rendering to tiny viewports where the viewport center is > 8K
This fixes an assertion failure with GL CTS when cts-runner is used.
(not a specific test)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108877
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4c4c8bb1f0)
2019-01-29 17:44:27 +00:00
Marek Olšák
ae91c29a25 radeonsi: fix a u_blitter crash after a shader with FBFETCH
This fixes an assertion failure with GL CTS when cts-runner is used.
(not a specific test)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108877
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit caa2dcd730)
2019-01-29 17:44:27 +00:00
Jason Ekstrand
b6cd30de3a nir/xfb: Fix offset accounting for dvec3/4
Before, we were double-counting the component slots when we had a dvec3
or dvec4.  Instead, just add them in once and manually offset the
recorded output offset.

Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 4f99ac9144)
2019-01-29 17:44:27 +00:00
Eric Engestrom
a1605e77d2 configure: EGL requirements only apply if EGL is built
Issue was hit with this configuration:
  --disable-{egl,gbm} --with-platform=drm

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 3208fd2e46 ("configure: move platform handling further up")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 610f956fde)
2019-01-29 17:44:27 +00:00
Jason Ekstrand
f5b6f5ad64 anv: Only parse pImmutableSamplers if the descriptor has samplers
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit c7f4a2867c)
2019-01-29 17:44:27 +00:00
Karol Herbst
93db1e7153 glsl/lower_output_reads: set invariant and precise flags on temporaries
fixes a couple of deqp tests (on nvc0 and potential other drivers):
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_3

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 987744be98)
2019-01-29 17:44:27 +00:00
Timothy Arceri
313c1487b7 ac/nir_to_llvm: fix interpolateAt* for arrays
This builds on the recent interpolate fix by Rhys ee8488ea3b.

This fixes the arb_gpu_shader5 interpolateAt* tests that contain
arrays.

Fixes: ee8488ea3b ("ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsics")

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9e669ed22b)
2019-01-29 17:44:27 +00:00
Karol Herbst
98a661f2b1 nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise instructions
fixes dEQP-GLES2.functional.shaders.invariance.mediump.loop_3

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 30b5c9eda2)
2019-01-29 17:44:27 +00:00
Bas Nieuwenhuizen
ea2bf29ed9 nir: Account for atomics in copy propagation.
Otherwise writes get propagated across atomics if no barrier is
used. Without barrier writes should still be visible in the same
invocation, so an atomic has to be considered a write.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
(cherry picked from commit 8424cd8fbd)
2019-01-29 17:44:27 +00:00
Iago Toral Quiroga
252beed945 anv/device: fix maximum number of images supported
We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.

Furthermore, despite this, we are exposing up to 64 images per shader
stage on all gens, gen8 included.

This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+ while making sure that only pre-SKL gens
use push constant space to handle images.

v2:
 - <= instead of < in the assert (Eric, Lionel)
 - Change the way the assertion is written (Eric)

v3:
 - Revert the way the assertion is written to the form it had in v1,
   the version in v2 was not equivalent and was incorrect. (Lionel)

v4:
 - gen9+ doesn't need push constants for images at all (Jason)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)
(cherry picked from commit f92c5bc8f3)
2019-01-29 17:44:27 +00:00
Jason Ekstrand
5f25cfdaf6 anv/nir: Rework arguments to apply_pipeline_layout
Instead of taking a whole pipeline (which could be anything!), just take
a physical device and robust_buffer_access boolean.  This makes it
easier to verify that only the things in the hash actually affect
pipeline compilation.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit a24654b49d)
2019-01-29 17:43:42 +00:00
Eric Anholt
92273935a5 vc4: Don't leak the GPU fd for renderonly usage.
Noticed while debugging V3D -- the ro->gpu_fd was freshly opened in ro
setup, and it needs to stay open until screen close (since it may be used
by renderonly) and should be the same one used by the vc4 screen.

Fixes: 7029ec05e2 ("gallium: Add renderonly-based support for pl111+vc4.")
(cherry picked from commit 99ef66c325)
2019-01-29 15:20:43 +00:00
Dylan Baker
8f1c75e9a0 meson: allow building dri driver without window system if osmesa is classic
This was already enabled for gallium based osmesa with gallium drivers
in 9d10581897, so do the same for classic
driver with classic osmesa.

Fixes: cbbd5bb889
       ("meson: build classic osmesa")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 431e9abaab)
2019-01-29 15:19:49 +00:00
Bruce Cherniak
e38d275a86 gallium/swr: Fix multi-context sync fence deadlock.
Various recreation scenarios lead to API thread getting stuck in
swr_fence_finish().  This is a multi-context issue, whereby one context
overwrites the fence read-value with a previous sync's lesser value.
The fence sync value is supposed to be always increasing.

In swr_fence_cb(), only update the "read" value if the new value is
greater.

(This may seem like we're not waiting on the other context to finish, but
had we needed for it to finish there would have been a wait prior to
submitting a new sync.)

cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ed7673afd2)
2019-01-29 15:19:40 +00:00
Pierre Moreau
f0eee7df43 meson: Fix with_gallium_icd to with_opencl_icd
`with_gallium_icd` is never used throughout the different Meson build
files, whereas `with_opencl_icd` tracks whether or not `gallium-opencl`
was set to "icd".

Fixes: 42ea0631f1
         ("meson: build clover")
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 0b736f7fd4)
2019-01-29 15:19:25 +00:00
Bas Nieuwenhuizen
bd9edb5f2e radv: Set partial_vs_wave for pipelines with just GS, not tess.
Looking at -pro we need to enable it for pipelines with just a
GS too.

This seems to reduce the hangs from
https://bugs.freedesktop.org/show_bug.cgi?id=109242 on a RX 550 to
the point where I can't reproduce, after the false start with the
wd_switch_on_eop patch due to flakiness.

(but people are reporting it does not fix the issue completely for
 them on polaris 11)

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 568e7a2998)
2019-01-29 15:19:03 +00:00
Samuel Pitoiset
cad3d0735d radv: clean up setting partial_es_wave for distributed tess on VI
Only needed when the pipeline actually uses tessellation. I don't
think that changes anything, except improving readability.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit d9d14346c2)
2019-01-29 15:18:57 +00:00
Marek Olšák
4b91802bef radeonsi: also apply the GS hang workaround to draws without tessellation
ported from AMDVLK.

Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 5183e794af)
2019-01-29 15:12:43 +00:00
Bas Nieuwenhuizen
5d2cfa64c1 radv: Only use 32 KiB per threadgroup on Stoney.
Causes hangs on some machines.

What works for dEQP-VK.tessellation.shader_input_output.barrier:

- running num_patches = 6 (which limits LDS to 32 KiB)
- running num_patches = 8, and artificially cutting LDS size at 32 KiB.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 76b12fa564)
2019-01-29 15:12:39 +00:00
Andres Gomez
220705036c bin/get-pick-list.sh: fix redirection in sh
"&>" is bash specific.

Fixes: e0dbfc9953 ("bin/get-pick-list.sh: warn when commit lists invalid sha")
Cc: Juan A. Suarez <jasuarez@igalia.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 3ec9ab80b8)
2019-01-29 15:12:36 +00:00
Andres Gomez
fa11468db4 bin/get-pick-list.sh: fix the oneline printing
"--summary" will also print extended header information such as
creations, renames and mode changes.

Let's just use "--no-patch", which suppresses the diff output.

v2: Use "--no-patch" instead of the "-s" abbreviation (Eric).

Fixes: 559c32d241 ("bin/get-pick-list.sh: simplify git oneline printing")
Cc: Juan A. Suarez <jasuarez@igalia.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 716ed41a36)
2019-01-29 15:12:32 +00:00
Emil Velikov
029dced476 cherry-ignore: spirv: Handle arbitrary bit sizes for deref array indices
stable The commits aren't suitable in their present form.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-29 15:10:34 +00:00
Emil Velikov
ec40bc62a5 cherry-ignore: radv: Fix multiview depth clears
fixes: This commit requires commits aeaf8dbd09 and 7484bc894b which
did not land in branch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-29 14:58:58 +00:00
Emil Velikov
8320a07221 docs: add sha256 checksums for 18.3.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-17 11:30:49 +00:00
77 changed files with 1286 additions and 400 deletions

View File

@@ -1 +1 @@
18.3.2
18.3.4

View File

@@ -2,3 +2,37 @@
c02390f8fcd367c7350db568feabb2f062efca14 egl/wayland: rather obvious build fix
# fixes: The commit addresses b4476138d5ad3f8d30c14ee61f2f375edfdbab2a
ff6f1dd0d3c6b4c15ca51b478b2884d14f6a1e06 meson: libfreedreno depends upon libdrm (for fence support)
# fixes: This commit requires commits aeaf8dbd097 and 7484bc894b9 which did not
# land in branch.
f67dea5e19ef14187be0e8d0f61b1f764c7ccb4f radv: Fix multiview depth clears
# stable The commits aren't suitable in their present form.
bfe31c5e461a1330d6f606bf5310685eff1198dd nir/builder: Add nir_i2i and nir_u2u helpers which take a bit size
abfe674c54bee6f8fdcae411b07db89c10b9d530 spirv: Handle arbitrary bit sizes for deref array indices
# warn The commits refer stale sha, yet don't fix anything in particular.
98984b7cdd79c15cc7331c791f8be61e873b8bbd Revert "mapi/new: sort by slot number"
9f86f1da7c68b5b900cd6f60925610ff1225a72d egl: add glvnd entrypoints for EGL_MESA_query_driver
# stable Explicit 19.0 only nomination.
38f542783faa360020b77fdd76b97f207a9e0068 v50,nvc0: add explicit settings for recent caps
# stable Explicit 19.0 only nominations.
399215eb7a0517463e5757c598d6cff6ae2301d0 nvc0: add support for handling indirect draws with attrib conversion
4443b6ddf2e08d06f3d0457cf20a2e04244cde37 nvc0/ir: always use CG mode for loads from atomic-only buffers
5de5beedf21306b01730085f8e03d8f424729016 nvc0/ir: fix second tex argument after levelZero optimization
162352e6711b3ceab114686f7a3248074339e7f7 nvc0: fix 3d images on kepler
e00799d3dc0595dc3998dbf199ceec8b1eece966 nv50,nvc0: use condition for occlusion queries when already complete
6adb9b38bfb1f6ee4c94596bf0744225aa8e967a nvc0: stick zero values for the compute invocation counts
04593d9a73ea257a36cc3b9fb5cd41427beaaea5 gk110/ir: Add rcp f64 implementation
7937408052a1896f0b08b0110bb8a1790eeee351 gk110/ir: Add rsq f64 implementation
656ad060518d067a3b311db8c2de2a396fb41898 gk110/ir: Use the new rcp/rsq in library
12669d29705a26478aa691cb454149628be65f17 gk104/ir: Use the new rcp/rsq in library
815a8e59c6d462a7008653ea9e3010d40b6ba589 gm107/ir: add fp64 rcp
cce495572136a606dd2a35e79f45080c3796e2cc gm107/ir: add fp64 rsq
6010d7b8e8bee1bcea2b329cf6d3b44c5fc3ca66 gallium: add PIPE_CAP_MAX_VARYINGS
cbd1ad6165f0aea7fb7c6fd1b36ad5317dd65cb7 st/mesa: require RGBA2, RGB4, and RGBA4 to be renderable
# stable The commit addresses functionality not present in branch
1b8983c25be19073c02fe9630e949be55f8280fa radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8

View File

@@ -13,12 +13,12 @@
is_stable_nomination()
{
git show --summary "$1" | grep -q -i -o "CC:.*mesa-stable"
git show --pretty=medium --summary "$1" | grep -q -i -o "CC:.*mesa-stable"
}
is_typod_nomination()
{
git show --summary "$1" | grep -q -i -o "CC:.*mesa-dev"
git show --pretty=medium --summary "$1" | grep -q -i -o "CC:.*mesa-dev"
}
fixes=
@@ -44,7 +44,7 @@ is_sha_nomination()
# Treat only the current line
id=`echo "$fixes" | tail -n $fixes_count | head -n 1 | cut -d : -f 2`
fixes_count=$(($fixes_count-1))
if ! git show $id &>/dev/null; then
if ! git show $id >/dev/null 2>&1; then
echo WARNING: Commit $1 lists invalid sha $id
fi
done
@@ -143,7 +143,7 @@ do
esac
printf "[ %8s ] " "$tag"
git --no-pager show --summary --oneline $sha
git --no-pager show --no-patch --oneline $sha
done
rm -f already_picked

View File

@@ -1864,6 +1864,7 @@ for plat in $platforms; do
;;
drm)
test "x$enable_egl" = "xyes" &&
test "x$enable_gbm" = "xno" &&
AC_MSG_ERROR([EGL platform drm needs gbm])
DEFINES="$DEFINES -DHAVE_DRM_PLATFORM"

View File

@@ -31,7 +31,8 @@ Compatibility contexts may report a lower version depending on each driver.
<h2>SHA256 checksums</h2>
<pre>
TBD
1cde4fafd40cd1ad4ee3a13b364b7a0175a08b7afdd127fb46f918c1e1dfd4b0 mesa-18.3.2.tar.gz
f7ce7181c07b6d8e0132da879af1729523a6c8aa87f79a9d59dfd064024cfb35 mesa-18.3.2.tar.xz
</pre>

208
docs/relnotes/18.3.3.html Normal file
View File

@@ -0,0 +1,208 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.3.3 Release Notes / January 31, 2019</h1>
<p>
Mesa 18.3.3 is a bug fix release which fixes bugs found since the 18.3.2 release.
</p>
<p>
Mesa 18.3.3 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
6b9893942fe8011c7736d51448deb6ef80ece2257e0fac27b02e997a6605d5e4 mesa-18.3.3.tar.gz
2ab6886a6966c532ccbcc3b240925e681464b658244f0cbed752615af3936299 mesa-18.3.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108877">Bug 108877</a> - OpenGL CTS gl43 test cases were interrupted due to segment fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109023">Bug 109023</a> - error: inlining failed in call to always_inline __m512 _mm512_and_ps(__m512, __m512): target specific option mismatch</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109129">Bug 109129</a> - format_types.h:1220: undefined reference to `_mm256_cvtps_ph'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109229">Bug 109229</a> - glLinkProgram locks up for ~30 seconds</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109242">Bug 109242</a> - [RADV] The Witcher 3 system freeze</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109488">Bug 109488</a> - Mesa 18.3.2 crash on a specific fragment shader (assert triggered) / already fixed on the master branch.</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (2):</p>
<ul>
<li>bin/get-pick-list.sh: fix the oneline printing</li>
<li>bin/get-pick-list.sh: fix redirection in sh</li>
</ul>
<p>Axel Davy (1):</p>
<ul>
<li>st/nine: Immediately upload user provided textures</li>
</ul>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Only use 32 KiB per threadgroup on Stoney.</li>
<li>radv: Set partial_vs_wave for pipelines with just GS, not tess.</li>
<li>nir: Account for atomics in copy propagation.</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>gallium/swr: Fix multi-context sync fence deadlock.</li>
</ul>
<p>Carsten Haitzler (Rasterman) (2):</p>
<ul>
<li>vc4: Use named parameters for the NEON inline asm.</li>
<li>vc4: Declare the cpu pointers as being modified in NEON asm.</li>
</ul>
<p>Danylo Piliaiev (1):</p>
<ul>
<li>glsl: Fix copying function's out to temp if dereferenced by array</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>dri_interface: add put shm image2 (v2)</li>
<li>glx: add support for putimageshm2 path (v2)</li>
<li>gallium: use put image shm2 path (v2)</li>
</ul>
<p>Dylan Baker (4):</p>
<ul>
<li>meson: allow building dri driver without window system if osmesa is classic</li>
<li>meson: fix swr KNL build</li>
<li>meson: Fix compiler checks for SWR with ICC</li>
<li>meson: Add warnings and errors when using ICC</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 18.3.2</li>
<li>cherry-ignore: radv: Fix multiview depth clears</li>
<li>cherry-ignore: spirv: Handle arbitrary bit sizes for deref array indices</li>
<li>cherry-ignore: WARNING: Commit XXX lists invalid sha</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>vc4: Don't leak the GPU fd for renderonly usage.</li>
<li>vc4: Enable NEON asm on meson cross-builds.</li>
</ul>
<p>Eric Engestrom (2):</p>
<ul>
<li>configure: EGL requirements only apply if EGL is built</li>
<li>meson/vdpau: add missing soversion</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>anv/device: fix maximum number of images supported</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>anv/nir: Rework arguments to apply_pipeline_layout</li>
<li>anv: Only parse pImmutableSamplers if the descriptor has samplers</li>
<li>nir/xfb: Fix offset accounting for dvec3/4</li>
</ul>
<p>Karol Herbst (2):</p>
<ul>
<li>nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise instructions</li>
<li>glsl/lower_output_reads: set invariant and precise flags on temporaries</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: fix invalid binding table index computation</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>radeonsi: also apply the GS hang workaround to draws without tessellation</li>
<li>radeonsi: fix a u_blitter crash after a shader with FBFETCH</li>
<li>radeonsi: fix rendering to tiny viewports where the viewport center is &gt; 8K</li>
<li>st/mesa: purge framebuffers when unbinding a context</li>
</ul>
<p>Niklas Haas (1):</p>
<ul>
<li>radv: correctly use vulkan 1.0 by default</li>
</ul>
<p>Pierre Moreau (1):</p>
<ul>
<li>meson: Fix with_gallium_icd to with_opencl_icd</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>loader: fix the no-modifiers case</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: clean up setting partial_es_wave for distributed tess on VI</li>
</ul>
<p>Timothy Arceri (5):</p>
<ul>
<li>ac/nir_to_llvm: fix interpolateAt* for arrays</li>
<li>ac/nir_to_llvm: fix clamp shadow reference for more hardware</li>
<li>radv/ac: fix some fp16 handling</li>
<li>glsl: use remap location when serialising uniform program resource data</li>
<li>glsl: Copy function out to temp if we don't directly ref a variable</li>
</ul>
<p>Tomeu Vizoso (1):</p>
<ul>
<li>etnaviv: Consolidate buffer references from framebuffers</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>meson: Fix typo.</li>
</ul>
</div>
</body>
</html>

179
docs/relnotes/18.3.4.html Normal file
View File

@@ -0,0 +1,179 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.3.4 Release Notes / February 18, 2019</h1>
<p>
Mesa 18.3.4 is a bug fix release which fixes bugs found since the 18.3.3 release.
</p>
<p>
Mesa 18.3.4 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109107">Bug 109107</a> - gallium/st/va: change va max_profiles when using Radeon VCN Hardware</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109401">Bug 109401</a> - [DXVK] Project Cars rendering problems</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109543">Bug 109543</a> - After upgrade mesa to 19.0.0~rc1 all vulkan based application stop working [&quot;vulkan-cube&quot; received SIGSEGV in radv_pipeline_init_blend_state at ../src/amd/vulkan/radv_pipeline.c:699]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109603">Bug 109603</a> - nir_instr_as_deref: Assertion `parent &amp;&amp; parent-&gt;type == nir_instr_type_deref' failed.</li>
</ul>
<h2>Changes</h2>
<p>Bart Oldeman (1):</p>
<ul>
<li>gallium-xlib: query MIT-SHM before using it.</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>radv: Only look at pImmutableSamples if the descriptor has a sampler.</li>
<li>amd/common: Use correct writemask for shared memory stores.</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>get-pick-list: Add --pretty=medium to the arguments for Cc patches</li>
<li>meson: Add dependency on genxml to anvil</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 18.3.3</li>
<li>cherry-ignore: nv50,nvc0: add explicit settings for recent caps</li>
<li>cherry-ignore: add more 19.0 only nominations from Ilia</li>
<li>cherry-ignore: radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8</li>
<li>Update version to 18.3.4</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: Fix copy-and-paste fail in backport of NEON asm fixes.</li>
</ul>
<p>Eric Engestrom (2):</p>
<ul>
<li>xvmc: fix string comparison</li>
<li>xvmc: fix string comparison</li>
</ul>
<p>Ernestas Kulik (2):</p>
<ul>
<li>vc4: Fix leak in HW queries error path</li>
<li>v3d: Fix leak in resource setup error path</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>intel/compiler: do not copy-propagate strided regions to ddx/ddy arguments</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nvc0: we have 16k-sized framebuffers, fix default scissors</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf()</li>
<li>intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode</li>
<li>nir/deref: Rematerialize parents in rematerialize_derefs_in_use_blocks</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>anv/cmd_buffer: check for NULL framebuffer</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>st/mesa: Limit GL_MAX_[NATIVE_]PROGRAM_PARAMETERS_ARB to 2048</li>
</ul>
<p>Kristian H. Kristensen (1):</p>
<ul>
<li>freedreno/a6xx: Emit blitter dst with OUT_RELOCW</li>
</ul>
<p>Leo Liu (2):</p>
<ul>
<li>st/va: fix the incorrect max profiles report</li>
<li>st/va/vp9: set max reference as default of VP9 reference number</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>meson: drop the xcb-xrandr version requirement</li>
<li>gallium/u_threaded: fix EXPLICIT_FLUSH for flush offsets &gt; 0</li>
<li>radeonsi: fix EXPLICIT_FLUSH for flush offsets &gt; 0</li>
<li>winsys/amdgpu: don't drop manually added fence dependencies</li>
</ul>
<p>Mario Kleiner (2):</p>
<ul>
<li>egl/wayland: Allow client-&gt;server format conversion for PRIME offload. (v2)</li>
<li>egl/wayland-drm: Only announce formats via wl_drm which the driver supports.</li>
</ul>
<p>Oscar Blumberg (1):</p>
<ul>
<li>radeonsi: Fix guardband computation for large render targets</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno: stop frob'ing pipe_resource::nr_samples</li>
</ul>
<p>Rodrigo Vivi (1):</p>
<ul>
<li>intel: Add more PCI Device IDs for Coffee Lake and Ice Lake.</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>radv: fix compiler issues with GCC 9</li>
<li>radv: always export gl_SampleMask when the fragment shader uses it</li>
</ul>
</div>
</body>
</html>

View File

@@ -589,7 +589,7 @@ struct __DRIdamageExtensionRec {
* SWRast Loader extension.
*/
#define __DRI_SWRAST_LOADER "DRI_SWRastLoader"
#define __DRI_SWRAST_LOADER_VERSION 4
#define __DRI_SWRAST_LOADER_VERSION 5
struct __DRIswrastLoaderExtensionRec {
__DRIextension base;
@@ -649,6 +649,23 @@ struct __DRIswrastLoaderExtensionRec {
void (*getImageShm)(__DRIdrawable *readable,
int x, int y, int width, int height,
int shmid, void *loaderPrivate);
/**
* Put shm image to drawable (v2)
*
* The original version fixes srcx/y to 0, and expected
* the offset to be adjusted. This version allows src x,y
* to not be included in the offset. This is needed to
* avoid certain overflow checks in the X server, that
* result in lost rendering.
*
* \since 5
*/
void (*putImageShm2)(__DRIdrawable *drawable, int op,
int x, int y,
int width, int height, int stride,
int shmid, char *shmaddr, unsigned offset,
void *loaderPrivate);
};
/**

View File

@@ -171,6 +171,7 @@ CHIPSET(0x3185, glk_2x6, "Intel(R) UHD Graphics 600 (Geminilake 2x6)")
CHIPSET(0x3E90, cfl_gt1, "Intel(R) UHD Graphics 610 (Coffeelake 2x6 GT1)")
CHIPSET(0x3E93, cfl_gt1, "Intel(R) UHD Graphics 610 (Coffeelake 2x6 GT1)")
CHIPSET(0x3E99, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
CHIPSET(0x3E9C, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
CHIPSET(0x3E91, cfl_gt2, "Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2)")
CHIPSET(0x3E92, cfl_gt2, "Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2)")
CHIPSET(0x3E96, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
@@ -203,6 +204,10 @@ CHIPSET(0x5A54, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
CHIPSET(0x8A50, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
CHIPSET(0x8A51, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
CHIPSET(0x8A52, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
CHIPSET(0x8A56, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
CHIPSET(0x8A57, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
CHIPSET(0x8A58, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
CHIPSET(0x8A59, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
CHIPSET(0x8A5A, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
CHIPSET(0x8A5B, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
CHIPSET(0x8A5C, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")

View File

@@ -1,4 +1,4 @@
# Copyright © 2017-2018 Intel Corporation
# Copyright © 2017-2019 Intel Corporation
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -165,6 +165,14 @@ with_gallium_svga = _drivers.contains('svga')
with_gallium_virgl = _drivers.contains('virgl')
with_gallium_swr = _drivers.contains('swr')
if cc.get_id() == 'intel'
if meson.version().version_compare('< 0.49.0')
error('Meson does not have sufficient support of ICC before 0.49.0 to compile mesa')
elif with_gallium_swr and meson.version().version_compare('== 0.49.0')
warning('Meson as of 0.49.0 is sufficient for compiling mesa with ICC, but there are some caveats with SWR. 0.49.1 should resolve all of these')
endif
endif
with_gallium = _drivers.length() != 0 and _drivers != ['']
if with_gallium and system_has_kms_drm
@@ -385,8 +393,8 @@ if with_any_vk and (with_platform_x11 and not with_dri3)
error('Vulkan drivers require dri3 for X11 support')
endif
if with_dri
if with_glx == 'disabled' and not with_egl and not with_gbm
error('building dri drivers require at least one windowing system')
if with_glx == 'disabled' and not with_egl and not with_gbm and with_osmesa != 'classic'
error('building dri drivers require at least one windowing system or classic osmesa')
endif
endif
@@ -671,7 +679,7 @@ if _opencl != 'disabled'
else
dep_clc = null_dep
with_gallium_opencl = false
with_gallium_icd = false
with_opencl_icd = false
endif
gl_pkgconfig_c_flags = []
@@ -1399,7 +1407,7 @@ if with_platform_x11
dep_xcb_xfixes = dependency('xcb-xfixes')
endif
if with_xlib_lease
dep_xcb_xrandr = dependency('xcb-randr', version : '>= 1.12')
dep_xcb_xrandr = dependency('xcb-randr')
dep_xlib_xrandr = dependency('xrandr', version : '>= 1.3')
endif
endif

View File

@@ -2072,7 +2072,7 @@ visit_store_var(struct ac_nir_context *ctx,
int writemask = instr->const_index[0];
LLVMValueRef address = get_src(ctx, instr->src[0]);
LLVMValueRef val = get_src(ctx, instr->src[1]);
if (util_is_power_of_two_nonzero(writemask)) {
if (writemask == (1u << ac_get_llvm_num_components(val)) - 1) {
val = LLVMBuildBitCast(
ctx->ac.builder, val,
LLVMGetElementType(LLVMTypeOf(address)), "");
@@ -2802,15 +2802,16 @@ static LLVMValueRef visit_interp(struct ac_nir_context *ctx,
const nir_intrinsic_instr *instr)
{
LLVMValueRef result[4];
LLVMValueRef interp_param, attr_number;
LLVMValueRef interp_param;
unsigned location;
unsigned chan;
LLVMValueRef src_c0 = NULL;
LLVMValueRef src_c1 = NULL;
LLVMValueRef src0 = NULL;
nir_variable *var = nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
int input_index = ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
nir_deref_instr *deref_instr = nir_instr_as_deref(instr->src[0].ssa->parent_instr);
nir_variable *var = nir_deref_instr_get_variable(deref_instr);
int input_base = ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
switch (instr->intrinsic) {
case nir_intrinsic_interp_deref_at_centroid:
location = INTERP_CENTROID;
@@ -2840,7 +2841,6 @@ static LLVMValueRef visit_interp(struct ac_nir_context *ctx,
src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, halfval, "");
}
interp_param = ctx->abi->lookup_interp_param(ctx->abi, var->data.interpolation, location);
attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);
if (location == INTERP_CENTER) {
LLVMValueRef ij_out[2];
@@ -2878,26 +2878,65 @@ static LLVMValueRef visit_interp(struct ac_nir_context *ctx,
}
LLVMValueRef array_idx = ctx->ac.i32_0;
while(deref_instr->deref_type != nir_deref_type_var) {
if (deref_instr->deref_type == nir_deref_type_array) {
unsigned array_size = glsl_get_aoa_size(deref_instr->type);
if (!array_size)
array_size = 1;
LLVMValueRef offset;
nir_const_value *const_value = nir_src_as_const_value(deref_instr->arr.index);
if (const_value) {
offset = LLVMConstInt(ctx->ac.i32, array_size * const_value->u32[0], false);
} else {
LLVMValueRef indirect = get_src(ctx, deref_instr->arr.index);
offset = LLVMBuildMul(ctx->ac.builder, indirect,
LLVMConstInt(ctx->ac.i32, array_size, false), "");
}
array_idx = LLVMBuildAdd(ctx->ac.builder, array_idx, offset, "");
deref_instr = nir_src_as_deref(deref_instr->parent);
} else {
unreachable("Unsupported deref type");
}
}
unsigned input_array_size = glsl_get_aoa_size(var->type);
if (!input_array_size)
input_array_size = 1;
for (chan = 0; chan < 4; chan++) {
LLVMValueRef gather = LLVMGetUndef(LLVMVectorType(ctx->ac.f32, input_array_size));
LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, chan, false);
if (interp_param) {
interp_param = LLVMBuildBitCast(ctx->ac.builder,
interp_param, ctx->ac.v2f32, "");
LLVMValueRef i = LLVMBuildExtractElement(
ctx->ac.builder, interp_param, ctx->ac.i32_0, "");
LLVMValueRef j = LLVMBuildExtractElement(
ctx->ac.builder, interp_param, ctx->ac.i32_1, "");
for (unsigned idx = 0; idx < input_array_size; ++idx) {
LLVMValueRef v, attr_number;
result[chan] = ac_build_fs_interp(&ctx->ac,
llvm_chan, attr_number,
ctx->abi->prim_mask, i, j);
} else {
result[chan] = ac_build_fs_interp_mov(&ctx->ac,
LLVMConstInt(ctx->ac.i32, 2, false),
llvm_chan, attr_number,
ctx->abi->prim_mask);
attr_number = LLVMConstInt(ctx->ac.i32, input_base + idx, false);
if (interp_param) {
interp_param = LLVMBuildBitCast(ctx->ac.builder,
interp_param, ctx->ac.v2f32, "");
LLVMValueRef i = LLVMBuildExtractElement(
ctx->ac.builder, interp_param, ctx->ac.i32_0, "");
LLVMValueRef j = LLVMBuildExtractElement(
ctx->ac.builder, interp_param, ctx->ac.i32_1, "");
v = ac_build_fs_interp(&ctx->ac, llvm_chan, attr_number,
ctx->abi->prim_mask, i, j);
} else {
v = ac_build_fs_interp_mov(&ctx->ac, LLVMConstInt(ctx->ac.i32, 2, false),
llvm_chan, attr_number, ctx->abi->prim_mask);
}
gather = LLVMBuildInsertElement(ctx->ac.builder, gather, v,
LLVMConstInt(ctx->ac.i32, idx, false), "");
}
result[chan] = LLVMBuildExtractElement(ctx->ac.builder, gather, array_idx, "");
}
return ac_build_varying_gather_values(&ctx->ac, result, instr->num_components,
var->data.location_frac);
@@ -3460,7 +3499,7 @@ static void visit_tex(struct ac_nir_context *ctx, nir_tex_instr *instr)
* It's unnecessary if the original texture format was
* Z32_FLOAT, but we don't know that here.
*/
if (args.compare && ctx->ac.chip_class == VI && ctx->abi->clamp_shadow_reference)
if (args.compare && ctx->ac.chip_class >= VI && ctx->abi->clamp_shadow_reference)
args.compare = ac_build_clamp(&ctx->ac, ac_to_float(&ctx->ac, args.compare));
/* pack derivatives */
@@ -3851,7 +3890,7 @@ ac_handle_shader_output_decl(struct ac_llvm_context *ctx,
}
}
bool is_16bit = glsl_type_is_16bit(variable->type);
bool is_16bit = glsl_type_is_16bit(glsl_without_array(variable->type));
LLVMTypeRef type = is_16bit ? ctx->f16 : ctx->f32;
for (unsigned i = 0; i < attrib_count; ++i) {
for (unsigned chan = 0; chan < 4; chan++) {

View File

@@ -84,7 +84,9 @@ VkResult radv_CreateDescriptorSetLayout(
uint32_t immutable_sampler_count = 0;
for (uint32_t j = 0; j < pCreateInfo->bindingCount; j++) {
max_binding = MAX2(max_binding, pCreateInfo->pBindings[j].binding);
if (pCreateInfo->pBindings[j].pImmutableSamplers)
if ((pCreateInfo->pBindings[j].descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER ||
pCreateInfo->pBindings[j].descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER) &&
pCreateInfo->pBindings[j].pImmutableSamplers)
immutable_sampler_count += pCreateInfo->pBindings[j].descriptorCount;
}
@@ -182,7 +184,9 @@ VkResult radv_CreateDescriptorSetLayout(
set_layout->has_variable_descriptors = true;
}
if (binding->pImmutableSamplers) {
if ((binding->descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER ||
binding->descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER) &&
binding->pImmutableSamplers) {
set_layout->binding[b].immutable_samplers_offset = samplers_offset;
set_layout->binding[b].immutable_samplers_equal =
has_equal_immutable_samplers(binding->pImmutableSamplers, binding->descriptorCount);

View File

@@ -525,7 +525,7 @@ VkResult radv_CreateInstance(
pCreateInfo->pApplicationInfo->apiVersion != 0) {
client_version = pCreateInfo->pApplicationInfo->apiVersion;
} else {
radv_EnumerateInstanceVersion(&client_version);
client_version = VK_API_VERSION_1_0;
}
instance = vk_zalloc2(&default_alloc, pAllocator, sizeof(*instance), 8,

View File

@@ -849,54 +849,60 @@ build_pipeline(struct radv_device *device,
.subpass = 0,
};
switch(aspect) {
case VK_IMAGE_ASPECT_COLOR_BIT:
vk_pipeline_info.pColorBlendState = &(VkPipelineColorBlendStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = (VkPipelineColorBlendAttachmentState []) {
{ .colorWriteMask =
VK_COLOR_COMPONENT_A_BIT |
VK_COLOR_COMPONENT_R_BIT |
VK_COLOR_COMPONENT_G_BIT |
VK_COLOR_COMPONENT_B_BIT },
VkPipelineColorBlendStateCreateInfo color_blend_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = (VkPipelineColorBlendAttachmentState []) {
{
.colorWriteMask = VK_COLOR_COMPONENT_A_BIT |
VK_COLOR_COMPONENT_R_BIT |
VK_COLOR_COMPONENT_G_BIT |
VK_COLOR_COMPONENT_B_BIT },
}
};
VkPipelineDepthStencilStateCreateInfo depth_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO,
.depthTestEnable = true,
.depthWriteEnable = true,
.depthCompareOp = VK_COMPARE_OP_ALWAYS,
};
VkPipelineDepthStencilStateCreateInfo stencil_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO,
.depthTestEnable = false,
.depthWriteEnable = false,
.stencilTestEnable = true,
.front = {
.failOp = VK_STENCIL_OP_REPLACE,
.passOp = VK_STENCIL_OP_REPLACE,
.depthFailOp = VK_STENCIL_OP_REPLACE,
.compareOp = VK_COMPARE_OP_ALWAYS,
.compareMask = 0xff,
.writeMask = 0xff,
.reference = 0
},
.back = {
.failOp = VK_STENCIL_OP_REPLACE,
.passOp = VK_STENCIL_OP_REPLACE,
.depthFailOp = VK_STENCIL_OP_REPLACE,
.compareOp = VK_COMPARE_OP_ALWAYS,
.compareMask = 0xff,
.writeMask = 0xff,
.reference = 0
},
.depthCompareOp = VK_COMPARE_OP_ALWAYS,
};
switch(aspect) {
case VK_IMAGE_ASPECT_COLOR_BIT:
vk_pipeline_info.pColorBlendState = &color_blend_info;
break;
case VK_IMAGE_ASPECT_DEPTH_BIT:
vk_pipeline_info.pDepthStencilState = &(VkPipelineDepthStencilStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO,
.depthTestEnable = true,
.depthWriteEnable = true,
.depthCompareOp = VK_COMPARE_OP_ALWAYS,
};
vk_pipeline_info.pDepthStencilState = &depth_info;
break;
case VK_IMAGE_ASPECT_STENCIL_BIT:
vk_pipeline_info.pDepthStencilState = &(VkPipelineDepthStencilStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO,
.depthTestEnable = false,
.depthWriteEnable = false,
.stencilTestEnable = true,
.front = {
.failOp = VK_STENCIL_OP_REPLACE,
.passOp = VK_STENCIL_OP_REPLACE,
.depthFailOp = VK_STENCIL_OP_REPLACE,
.compareOp = VK_COMPARE_OP_ALWAYS,
.compareMask = 0xff,
.writeMask = 0xff,
.reference = 0
},
.back = {
.failOp = VK_STENCIL_OP_REPLACE,
.passOp = VK_STENCIL_OP_REPLACE,
.depthFailOp = VK_STENCIL_OP_REPLACE,
.compareOp = VK_COMPARE_OP_ALWAYS,
.compareMask = 0xff,
.writeMask = 0xff,
.reference = 0
},
.depthCompareOp = VK_COMPARE_OP_ALWAYS,
};
vk_pipeline_info.pDepthStencilState = &stencil_info;
break;
default:
unreachable("Unhandled aspect");

View File

@@ -256,7 +256,16 @@ get_tcs_num_patches(struct radv_shader_context *ctx)
/* Make sure that the data fits in LDS. This assumes the shaders only
* use LDS for the inputs and outputs.
*/
hardware_lds_size = ctx->options->chip_class >= CIK ? 65536 : 32768;
hardware_lds_size = 32768;
/* Looks like STONEY hangs if we use more than 32 KiB LDS in a single
* threadgroup, even though there is more than 32 KiB LDS.
*
* Test: dEQP-VK.tessellation.shader_input_output.barrier
*/
if (ctx->options->chip_class >= CIK && ctx->options->family != CHIP_STONEY)
hardware_lds_size = 65536;
num_patches = MIN2(num_patches, hardware_lds_size / (input_patch_size + output_patch_size));
/* Make sure the output data fits in the offchip buffer */
num_patches = MIN2(num_patches, (ctx->options->tess_offchip_block_dw_size * 4) / output_patch_size);
@@ -2160,7 +2169,7 @@ handle_fs_input_decl(struct radv_shader_context *ctx,
interp = lookup_interp_param(&ctx->abi, variable->data.interpolation, interp_type);
}
bool is_16bit = glsl_type_is_16bit(variable->type);
bool is_16bit = glsl_type_is_16bit(glsl_without_array(variable->type));
LLVMTypeRef type = is_16bit ? ctx->ac.i16 : ctx->ac.i32;
if (interp == NULL)
interp = LLVMGetUndef(type);

View File

@@ -3179,11 +3179,11 @@ radv_compute_db_shader_control(const struct radv_device *device,
bool disable_rbplus = device->physical_device->has_rbplus &&
!device->physical_device->rbplus_allowed;
/* Do not enable the gl_SampleMask fragment shader output if MSAA is
* disabled.
/* It shouldn't be needed to export gl_SampleMask when MSAA is disabled
* but this appears to break Project Cars (DXVK). See
* https://bugs.freedesktop.org/show_bug.cgi?id=109401
*/
bool mask_export_enable = ms->num_samples > 1 &&
ps->info.info.ps.writes_sample_mask;
bool mask_export_enable = ps->info.info.ps.writes_sample_mask;
return S_02880C_Z_EXPORT_ENABLE(ps->info.info.ps.writes_z) |
S_02880C_STENCIL_TEST_VAL_EXPORT_ENABLE(ps->info.info.ps.writes_stencil) |
@@ -3371,14 +3371,8 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
else
ia_multi_vgt_param.primgroup_size = 128; /* recommended without a GS */
ia_multi_vgt_param.partial_es_wave = false;
if (pipeline->device->has_distributed_tess) {
if (radv_pipeline_has_gs(pipeline)) {
if (device->physical_device->rad_info.chip_class <= VI)
ia_multi_vgt_param.partial_es_wave = true;
}
}
/* GS requirement. */
ia_multi_vgt_param.partial_es_wave = false;
if (radv_pipeline_has_gs(pipeline) && device->physical_device->rad_info.chip_class <= VI)
if (SI_GS_PER_ES / ia_multi_vgt_param.primgroup_size >= pipeline->device->gs_table_depth - 3)
ia_multi_vgt_param.partial_es_wave = true;
@@ -3424,13 +3418,8 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
/* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
if (device->has_distributed_tess) {
if (radv_pipeline_has_gs(pipeline)) {
if (device->physical_device->rad_info.family == CHIP_TONGA ||
device->physical_device->rad_info.family == CHIP_FIJI ||
device->physical_device->rad_info.family == CHIP_POLARIS10 ||
device->physical_device->rad_info.family == CHIP_POLARIS11 ||
device->physical_device->rad_info.family == CHIP_POLARIS12 ||
device->physical_device->rad_info.family == CHIP_VEGAM)
ia_multi_vgt_param.partial_vs_wave = true;
if (device->physical_device->rad_info.chip_class <= VI)
ia_multi_vgt_param.partial_es_wave = true;
} else {
ia_multi_vgt_param.partial_vs_wave = true;
}
@@ -3448,6 +3437,26 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
ia_multi_vgt_param.partial_vs_wave = true;
}
if (radv_pipeline_has_gs(pipeline)) {
/* On these chips there is the possibility of a hang if the
* pipeline uses a GS and partial_vs_wave is not set.
*
* This mostly does not hit 4-SE chips, as those typically set
* ia_switch_on_eoi and then partial_vs_wave is set for pipelines
* with GS due to another workaround.
*
* Reproducer: https://bugs.freedesktop.org/show_bug.cgi?id=109242
*/
if (device->physical_device->rad_info.family == CHIP_TONGA ||
device->physical_device->rad_info.family == CHIP_FIJI ||
device->physical_device->rad_info.family == CHIP_POLARIS10 ||
device->physical_device->rad_info.family == CHIP_POLARIS11 ||
device->physical_device->rad_info.family == CHIP_POLARIS12 ||
device->physical_device->rad_info.family == CHIP_VEGAM) {
ia_multi_vgt_param.partial_vs_wave = true;
}
}
ia_multi_vgt_param.base =
S_028AA8_PRIMGROUP_SIZE(ia_multi_vgt_param.primgroup_size - 1) |
/* The following field was moved to VGT_SHADER_STAGES_EN in GFX9. */

View File

@@ -363,31 +363,29 @@ copy_index_derefs_to_temps(ir_instruction *ir, void *data)
ir = a->array->as_dereference();
ir_rvalue *idx = a->array_index;
if (idx->as_dereference_variable()) {
ir_variable *var = idx->variable_referenced();
ir_variable *var = idx->variable_referenced();
/* If the index is read only it cannot change so there is no need
* to copy it.
*/
if (var->data.read_only || var->data.memory_read_only)
return;
/* If the index is read only it cannot change so there is no need
* to copy it.
*/
if (!var || var->data.read_only || var->data.memory_read_only)
return;
ir_variable *tmp = new(d->mem_ctx) ir_variable(idx->type, "idx_tmp",
ir_var_temporary);
d->before_instructions->push_tail(tmp);
ir_variable *tmp = new(d->mem_ctx) ir_variable(idx->type, "idx_tmp",
ir_var_temporary);
d->before_instructions->push_tail(tmp);
ir_dereference_variable *const deref_tmp_1 =
new(d->mem_ctx) ir_dereference_variable(tmp);
ir_assignment *const assignment =
new(d->mem_ctx) ir_assignment(deref_tmp_1,
idx->clone(d->mem_ctx, NULL));
d->before_instructions->push_tail(assignment);
ir_dereference_variable *const deref_tmp_1 =
new(d->mem_ctx) ir_dereference_variable(tmp);
ir_assignment *const assignment =
new(d->mem_ctx) ir_assignment(deref_tmp_1,
idx->clone(d->mem_ctx, NULL));
d->before_instructions->push_tail(assignment);
/* Replace the array index with a dereference of the new temporary */
ir_dereference_variable *const deref_tmp_2 =
new(d->mem_ctx) ir_dereference_variable(tmp);
a->array_index = deref_tmp_2;
}
/* Replace the array index with a dereference of the new temporary */
ir_dereference_variable *const deref_tmp_2 =
new(d->mem_ctx) ir_dereference_variable(tmp);
a->array_index = deref_tmp_2;
}
}
@@ -402,7 +400,8 @@ fix_parameter(void *mem_ctx, ir_rvalue *actual, const glsl_type *formal_type,
* nothing needs to be done to fix the parameter.
*/
if (formal_type == actual->type
&& (expr == NULL || expr->operation != ir_binop_vector_extract))
&& (expr == NULL || expr->operation != ir_binop_vector_extract)
&& actual->as_dereference_variable())
return;
/* An array index could also be an out variable so we need to make a copy
@@ -456,7 +455,7 @@ fix_parameter(void *mem_ctx, ir_rvalue *actual, const glsl_type *formal_type,
ir_dereference_variable *const deref_tmp_1 =
new(mem_ctx) ir_dereference_variable(tmp);
ir_assignment *const assignment =
new(mem_ctx) ir_assignment(deref_tmp_1, actual);
new(mem_ctx) ir_assignment(deref_tmp_1, actual->clone(mem_ctx, NULL));
before_instructions->push_tail(assignment);
}

View File

@@ -101,6 +101,10 @@ output_read_remover::visit(ir_dereference_variable *ir)
void *var_ctx = ralloc_parent(ir->var);
temp = new(var_ctx) ir_variable(ir->var->type, ir->var->name,
ir_var_temporary);
/* copy flags which affect arithematical precision */
temp->data.invariant = ir->var->data.invariant;
temp->data.precise = ir->var->data.precise;
temp->data.precision = ir->var->data.precision;
_mesa_hash_table_insert(replacements, ir->var, temp);
ir->var->insert_after(temp);
}

View File

@@ -764,6 +764,12 @@ get_shader_var_and_pointer_sizes(size_t *s_var_size, size_t *s_var_ptrs,
sizeof(var->name);
}
enum uniform_type
{
uniform_remapped,
uniform_not_remapped
};
static void
write_program_resource_data(struct blob *metadata,
struct gl_shader_program *prog,
@@ -816,12 +822,19 @@ write_program_resource_data(struct blob *metadata,
case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
case GL_UNIFORM:
for (unsigned i = 0; i < prog->data->NumUniformStorage; i++) {
if (strcmp(((gl_uniform_storage *)res->Data)->name,
prog->data->UniformStorage[i].name) == 0) {
blob_write_uint32(metadata, i);
break;
if (((gl_uniform_storage *)res->Data)->builtin ||
res->Type != GL_UNIFORM) {
blob_write_uint32(metadata, uniform_not_remapped);
for (unsigned i = 0; i < prog->data->NumUniformStorage; i++) {
if (strcmp(((gl_uniform_storage *)res->Data)->name,
prog->data->UniformStorage[i].name) == 0) {
blob_write_uint32(metadata, i);
break;
}
}
} else {
blob_write_uint32(metadata, uniform_remapped);
blob_write_uint32(metadata, ((gl_uniform_storage *)res->Data)->remap_location);
}
break;
case GL_ATOMIC_COUNTER_BUFFER:
@@ -906,9 +919,15 @@ read_program_resource_data(struct blob_reader *metadata,
case GL_COMPUTE_SUBROUTINE_UNIFORM:
case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
case GL_UNIFORM:
res->Data = &prog->data->UniformStorage[blob_read_uint32(metadata)];
case GL_UNIFORM: {
enum uniform_type type = (enum uniform_type) blob_read_uint32(metadata);
if (type == uniform_not_remapped) {
res->Data = &prog->data->UniformStorage[blob_read_uint32(metadata)];
} else {
res->Data = prog->UniformRemapTable[blob_read_uint32(metadata)];
}
break;
}
case GL_ATOMIC_COUNTER_BUFFER:
res->Data = &prog->data->AtomicBuffers[blob_read_uint32(metadata)];
break;

View File

@@ -490,10 +490,9 @@ nir_rematerialize_derefs_in_use_blocks_impl(nir_function_impl *impl)
_mesa_hash_table_clear(state.cache, NULL);
nir_foreach_instr_safe(instr, block) {
if (instr->type == nir_instr_type_deref) {
nir_deref_instr_remove_if_unused(nir_instr_as_deref(instr));
if (instr->type == nir_instr_type_deref &&
nir_deref_instr_remove_if_unused(nir_instr_as_deref(instr)))
continue;
}
state.builder.cursor = nir_before_instr(instr);
nir_foreach_src(instr, rematerialize_deref_src, &state);

View File

@@ -76,13 +76,13 @@ add_var_xfb_outputs(nir_xfb_info *xfb,
nir_xfb_output_info *output = &xfb->outputs[xfb->output_count++];
output->buffer = var->data.xfb_buffer;
output->offset = *offset;
output->offset = *offset + s * 16;
output->location = *location;
output->component_mask = (comp_mask >> (s * 4)) & 0xf;
(*location)++;
*offset += comp_slots * 4;
}
*offset += comp_slots * 4;
}
}

View File

@@ -143,9 +143,19 @@ gather_vars_written(struct copy_prop_var_state *state,
written->modes = nir_var_shader_out;
break;
case nir_intrinsic_deref_atomic_add:
case nir_intrinsic_deref_atomic_imin:
case nir_intrinsic_deref_atomic_umin:
case nir_intrinsic_deref_atomic_imax:
case nir_intrinsic_deref_atomic_umax:
case nir_intrinsic_deref_atomic_and:
case nir_intrinsic_deref_atomic_or:
case nir_intrinsic_deref_atomic_xor:
case nir_intrinsic_deref_atomic_exchange:
case nir_intrinsic_deref_atomic_comp_swap:
case nir_intrinsic_store_deref:
case nir_intrinsic_copy_deref: {
/* Destination in _both_ store_deref and copy_deref is src[0]. */
/* Destination in all of store_deref, copy_deref and the atomics is src[0]. */
nir_deref_instr *dst = nir_src_as_deref(intrin->src[0]);
uintptr_t mask = intrin->intrinsic == nir_intrinsic_store_deref ?
@@ -750,6 +760,19 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
break;
}
case nir_intrinsic_deref_atomic_add:
case nir_intrinsic_deref_atomic_imin:
case nir_intrinsic_deref_atomic_umin:
case nir_intrinsic_deref_atomic_imax:
case nir_intrinsic_deref_atomic_umax:
case nir_intrinsic_deref_atomic_and:
case nir_intrinsic_deref_atomic_or:
case nir_intrinsic_deref_atomic_xor:
case nir_intrinsic_deref_atomic_exchange:
case nir_intrinsic_deref_atomic_comp_swap:
kill_aliases(copies, nir_src_as_deref(intrin->src[0]), 0xf);
break;
default:
break;
}

View File

@@ -2819,7 +2819,8 @@ dri2_bind_wayland_display_wl(_EGLDriver *drv, _EGLDisplay *disp,
const struct wayland_drm_callbacks wl_drm_callbacks = {
.authenticate = (int(*)(void *, uint32_t)) dri2_dpy->vtbl->authenticate,
.reference_buffer = dri2_wl_reference_buffer,
.release_buffer = dri2_wl_release_buffer
.release_buffer = dri2_wl_release_buffer,
.is_format_supported = dri2_wl_is_format_supported
};
int flags = 0;
uint64_t cap;

View File

@@ -457,6 +457,8 @@ EGLBoolean
dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay *disp);
void
dri2_teardown_wayland(struct dri2_egl_display *dri2_dpy);
bool
dri2_wl_is_format_supported(void* user_data, uint32_t format);
#else
static inline EGLBoolean
dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay *disp)

View File

@@ -59,49 +59,57 @@ static const struct dri2_wl_visual {
uint32_t wl_drm_format;
uint32_t wl_shm_format;
int dri_image_format;
/* alt_dri_image_format is a substitute wl_buffer format to use for a
* wl-server unsupported dri_image_format, ie. some other dri_image_format in
* the table, of the same precision but with different channel ordering, or
* __DRI_IMAGE_FORMAT_NONE if an alternate format is not needed or supported.
* The code checks if alt_dri_image_format can be used as a fallback for a
* dri_image_format for a given wl-server implementation.
*/
int alt_dri_image_format;
int bpp;
unsigned int rgba_masks[4];
} dri2_wl_visuals[] = {
{
"XRGB2101010",
WL_DRM_FORMAT_XRGB2101010, WL_SHM_FORMAT_XRGB2101010,
__DRI_IMAGE_FORMAT_XRGB2101010, 32,
__DRI_IMAGE_FORMAT_XRGB2101010, __DRI_IMAGE_FORMAT_XBGR2101010, 32,
{ 0x3ff00000, 0x000ffc00, 0x000003ff, 0x00000000 }
},
{
"ARGB2101010",
WL_DRM_FORMAT_ARGB2101010, WL_SHM_FORMAT_ARGB2101010,
__DRI_IMAGE_FORMAT_ARGB2101010, 32,
__DRI_IMAGE_FORMAT_ARGB2101010, __DRI_IMAGE_FORMAT_ABGR2101010, 32,
{ 0x3ff00000, 0x000ffc00, 0x000003ff, 0xc0000000 }
},
{
"XBGR2101010",
WL_DRM_FORMAT_XBGR2101010, WL_SHM_FORMAT_XBGR2101010,
__DRI_IMAGE_FORMAT_XBGR2101010, 32,
__DRI_IMAGE_FORMAT_XBGR2101010, __DRI_IMAGE_FORMAT_XRGB2101010, 32,
{ 0x000003ff, 0x000ffc00, 0x3ff00000, 0x00000000 }
},
{
"ABGR2101010",
WL_DRM_FORMAT_ABGR2101010, WL_SHM_FORMAT_ABGR2101010,
__DRI_IMAGE_FORMAT_ABGR2101010, 32,
__DRI_IMAGE_FORMAT_ABGR2101010, __DRI_IMAGE_FORMAT_ARGB2101010, 32,
{ 0x000003ff, 0x000ffc00, 0x3ff00000, 0xc0000000 }
},
{
"XRGB8888",
WL_DRM_FORMAT_XRGB8888, WL_SHM_FORMAT_XRGB8888,
__DRI_IMAGE_FORMAT_XRGB8888, 32,
__DRI_IMAGE_FORMAT_XRGB8888, __DRI_IMAGE_FORMAT_NONE, 32,
{ 0x00ff0000, 0x0000ff00, 0x000000ff, 0x00000000 }
},
{
"ARGB8888",
WL_DRM_FORMAT_ARGB8888, WL_SHM_FORMAT_ARGB8888,
__DRI_IMAGE_FORMAT_ARGB8888, 32,
__DRI_IMAGE_FORMAT_ARGB8888, __DRI_IMAGE_FORMAT_NONE, 32,
{ 0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000 }
},
{
"RGB565",
WL_DRM_FORMAT_RGB565, WL_SHM_FORMAT_RGB565,
__DRI_IMAGE_FORMAT_RGB565, 16,
__DRI_IMAGE_FORMAT_RGB565, __DRI_IMAGE_FORMAT_NONE, 16,
{ 0xf800, 0x07e0, 0x001f, 0x0000 }
},
};
@@ -166,6 +174,24 @@ dri2_wl_visual_idx_from_shm_format(uint32_t shm_format)
return -1;
}
bool
dri2_wl_is_format_supported(void* user_data, uint32_t format)
{
_EGLDisplay *disp = (_EGLDisplay *) user_data;
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
int j = dri2_wl_visual_idx_from_fourcc(format);
if (j == -1)
return false;
for (int i = 0; dri2_dpy->driver_configs[i]; i++)
if (j == dri2_wl_visual_idx_from_config(dri2_dpy,
dri2_dpy->driver_configs[i]))
return true;
return false;
}
static int
roundtrip(struct dri2_egl_display *dri2_dpy)
{
@@ -461,15 +487,29 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
int use_flags;
int visual_idx;
unsigned int dri_image_format;
unsigned int linear_dri_image_format;
uint64_t *modifiers;
int num_modifiers;
visual_idx = dri2_wl_visual_idx_from_fourcc(dri2_surf->format);
assert(visual_idx != -1);
dri_image_format = dri2_wl_visuals[visual_idx].dri_image_format;
linear_dri_image_format = dri_image_format;
modifiers = u_vector_tail(&dri2_dpy->wl_modifiers[visual_idx]);
num_modifiers = u_vector_length(&dri2_dpy->wl_modifiers[visual_idx]);
/* Substitute dri image format if server does not support original format */
if (!(dri2_dpy->formats & (1 << visual_idx)))
linear_dri_image_format = dri2_wl_visuals[visual_idx].alt_dri_image_format;
/* These asserts hold, as long as dri2_wl_visuals[] is self-consistent and
* the PRIME substitution logic in dri2_wl_add_configs_for_visuals() is free
* of bugs.
*/
assert(linear_dri_image_format != __DRI_IMAGE_FORMAT_NONE);
assert(dri2_dpy->formats &
(1 << dri2_wl_visual_idx_from_dri_image_format(linear_dri_image_format)));
/* There might be a buffer release already queued that wasn't processed */
wl_display_dispatch_queue_pending(dri2_dpy->wl_dpy, dri2_surf->wl_queue);
@@ -516,7 +556,7 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
dri2_dpy->image->createImageWithModifiers(dri2_dpy->dri_screen,
dri2_surf->base.Width,
dri2_surf->base.Height,
dri_image_format,
linear_dri_image_format,
&linear_mod,
1,
NULL);
@@ -525,7 +565,7 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
dri2_dpy->image->createImage(dri2_dpy->dri_screen,
dri2_surf->base.Width,
dri2_surf->base.Height,
dri_image_format,
linear_dri_image_format,
use_flags |
__DRI_IMAGE_USE_LINEAR,
NULL);
@@ -1298,8 +1338,11 @@ dri2_wl_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *disp)
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
unsigned int format_count[ARRAY_SIZE(dri2_wl_visuals)] = { 0 };
unsigned int count = 0;
bool assigned;
for (unsigned i = 0; dri2_dpy->driver_configs[i]; i++) {
assigned = false;
for (unsigned j = 0; j < ARRAY_SIZE(dri2_wl_visuals); j++) {
struct dri2_egl_config *dri2_conf;
@@ -1312,6 +1355,43 @@ dri2_wl_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *disp)
if (dri2_conf->base.ConfigID == count + 1)
count++;
format_count[j]++;
assigned = true;
}
}
if (!assigned && dri2_dpy->is_different_gpu) {
struct dri2_egl_config *dri2_conf;
int alt_dri_image_format, c, s;
/* No match for config. Try if we can blitImage convert to a visual */
c = dri2_wl_visual_idx_from_config(dri2_dpy,
dri2_dpy->driver_configs[i]);
if (c == -1)
continue;
/* Find optimal target visual for blitImage conversion, if any. */
alt_dri_image_format = dri2_wl_visuals[c].alt_dri_image_format;
s = dri2_wl_visual_idx_from_dri_image_format(alt_dri_image_format);
if (s == -1 || !(dri2_dpy->formats & (1 << s)))
continue;
/* Visual s works for the Wayland server, and c can be converted into s
* by our client gpu during PRIME blitImage conversion to a linear
* wl_buffer, so add visual c as supported by the client renderer.
*/
dri2_conf = dri2_add_config(disp, dri2_dpy->driver_configs[i],
count + 1, EGL_WINDOW_BIT, NULL,
dri2_wl_visuals[c].rgba_masks);
if (dri2_conf) {
if (dri2_conf->base.ConfigID == count + 1)
count++;
format_count[c]++;
if (format_count[c] == 1)
_eglLog(_EGL_DEBUG, "Client format %s to server format %s via "
"PRIME blitImage.", dri2_wl_visuals[c].format_name,
dri2_wl_visuals[s].format_name);
}
}
}

View File

@@ -111,6 +111,8 @@ drm_create_buffer(struct wl_client *client, struct wl_resource *resource,
uint32_t stride, uint32_t format)
{
switch (format) {
case WL_DRM_FORMAT_ABGR2101010:
case WL_DRM_FORMAT_XBGR2101010:
case WL_DRM_FORMAT_ARGB2101010:
case WL_DRM_FORMAT_XRGB2101010:
case WL_DRM_FORMAT_ARGB8888:
@@ -210,10 +212,31 @@ bind_drm(struct wl_client *client, void *data, uint32_t version, uint32_t id)
wl_resource_set_implementation(resource, &drm_interface, data, NULL);
wl_resource_post_event(resource, WL_DRM_DEVICE, drm->device_name);
wl_resource_post_event(resource, WL_DRM_FORMAT,
WL_DRM_FORMAT_ARGB2101010);
wl_resource_post_event(resource, WL_DRM_FORMAT,
WL_DRM_FORMAT_XRGB2101010);
if (drm->callbacks.is_format_supported(drm->user_data,
WL_DRM_FORMAT_ARGB2101010)) {
wl_resource_post_event(resource, WL_DRM_FORMAT,
WL_DRM_FORMAT_ARGB2101010);
}
if (drm->callbacks.is_format_supported(drm->user_data,
WL_DRM_FORMAT_XRGB2101010)) {
wl_resource_post_event(resource, WL_DRM_FORMAT,
WL_DRM_FORMAT_XRGB2101010);
}
if (drm->callbacks.is_format_supported(drm->user_data,
WL_DRM_FORMAT_ABGR2101010)) {
wl_resource_post_event(resource, WL_DRM_FORMAT,
WL_DRM_FORMAT_ABGR2101010);
}
if (drm->callbacks.is_format_supported(drm->user_data,
WL_DRM_FORMAT_XBGR2101010)) {
wl_resource_post_event(resource, WL_DRM_FORMAT,
WL_DRM_FORMAT_XBGR2101010);
}
wl_resource_post_event(resource, WL_DRM_FORMAT,
WL_DRM_FORMAT_ARGB8888);
wl_resource_post_event(resource, WL_DRM_FORMAT,

View File

@@ -14,6 +14,8 @@ struct wayland_drm_callbacks {
struct wl_drm_buffer *buffer);
void (*release_buffer)(void *user_data, struct wl_drm_buffer *buffer);
bool (*is_format_supported)(void *user_data, uint32_t format);
};

View File

@@ -1524,7 +1524,8 @@ tc_buffer_do_flush_region(struct threaded_context *tc,
if (ttrans->staging) {
struct pipe_box src_box;
u_box_1d(ttrans->offset + box->x % tc->map_buffer_alignment,
u_box_1d(ttrans->offset + ttrans->b.box.x % tc->map_buffer_alignment +
(box->x - ttrans->b.box.x),
box->width, &src_box);
/* Copy the staging buffer into the original one. */

View File

@@ -60,6 +60,8 @@ etna_context_destroy(struct pipe_context *pctx)
{
struct etna_context *ctx = etna_context(pctx);
util_copy_framebuffer_state(&ctx->framebuffer_s, NULL);
if (ctx->primconvert)
util_primconvert_destroy(ctx->primconvert);
@@ -296,10 +298,10 @@ etna_draw_vbo(struct pipe_context *pctx, const struct pipe_draw_info *info)
if (DBG_ENABLED(ETNA_DBG_FLUSH_ALL))
pctx->flush(pctx, NULL, 0);
if (ctx->framebuffer.cbuf)
etna_resource(ctx->framebuffer.cbuf->texture)->seqno++;
if (ctx->framebuffer.zsbuf)
etna_resource(ctx->framebuffer.zsbuf->texture)->seqno++;
if (ctx->framebuffer_s.cbufs[0])
etna_resource(ctx->framebuffer_s.cbufs[0]->texture)->seqno++;
if (ctx->framebuffer_s.zsbuf)
etna_resource(ctx->framebuffer_s.zsbuf->texture)->seqno++;
if (info->index_size && indexbuf != info->index.resource)
pipe_resource_reference(&indexbuf, NULL);
}

View File

@@ -182,7 +182,6 @@ struct compiled_viewport_state {
/* Compiled pipe_framebuffer_state */
struct compiled_framebuffer_state {
struct pipe_surface *cbuf, *zsbuf; /* keep reference to surfaces */
uint32_t GL_MULTI_SAMPLE_CONFIG;
uint32_t PE_COLOR_FORMAT;
uint32_t PE_DEPTH_CONFIG;

View File

@@ -37,6 +37,7 @@
#include "etnaviv_surface.h"
#include "etnaviv_translate.h"
#include "etnaviv_util.h"
#include "util/u_framebuffer.h"
#include "util/u_helpers.h"
#include "util/u_inlines.h"
#include "util/u_math.h"
@@ -130,7 +131,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
assert(res->layout & ETNA_LAYOUT_BIT_TILE); /* Cannot render to linear surfaces */
etna_update_render_resource(pctx, cbuf->base.texture);
pipe_surface_reference(&cs->cbuf, &cbuf->base);
cs->PE_COLOR_FORMAT =
VIVS_PE_COLOR_FORMAT_FORMAT(translate_rs_format(cbuf->base.format)) |
VIVS_PE_COLOR_FORMAT_COMPONENTS__MASK |
@@ -182,7 +182,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
nr_samples_color = cbuf->base.texture->nr_samples;
} else {
pipe_surface_reference(&cs->cbuf, NULL);
/* Clearing VIVS_PE_COLOR_FORMAT_COMPONENTS__MASK and
* VIVS_PE_COLOR_FORMAT_OVERWRITE prevents us from overwriting the
* color target */
@@ -201,7 +200,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
etna_update_render_resource(pctx, zsbuf->base.texture);
pipe_surface_reference(&cs->zsbuf, &zsbuf->base);
assert(res->layout &ETNA_LAYOUT_BIT_TILE); /* Cannot render to linear surfaces */
uint32_t depth_format = translate_depth_format(zsbuf->base.format);
@@ -252,7 +250,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
nr_samples_depth = zsbuf->base.texture->nr_samples;
} else {
pipe_surface_reference(&cs->zsbuf, NULL);
cs->PE_DEPTH_CONFIG = VIVS_PE_DEPTH_CONFIG_DEPTH_MODE_NONE;
cs->PE_DEPTH_ADDR.bo = NULL;
cs->PE_DEPTH_STRIDE = 0;
@@ -325,7 +322,8 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
*/
cs->PE_LOGIC_OP = VIVS_PE_LOGIC_OP_SINGLE_BUFFER(ctx->specs.single_buffer ? 3 : 0);
ctx->framebuffer_s = *sv; /* keep copy of original structure */
/* keep copy of original structure */
util_copy_framebuffer_state(&ctx->framebuffer_s, sv);
ctx->dirty |= ETNA_DIRTY_FRAMEBUFFER | ETNA_DIRTY_DERIVE_TS;
}

View File

@@ -430,7 +430,7 @@ emit_blit_texture(struct fd_ringbuffer *ring, const struct pipe_blit_info *info)
OUT_RING(ring, A6XX_RB_2D_DST_INFO_COLOR_FORMAT(dfmt) |
A6XX_RB_2D_DST_INFO_TILE_MODE(dtile) |
A6XX_RB_2D_DST_INFO_COLOR_SWAP(dswap));
OUT_RELOC(ring, dst->bo, doff, 0, 0); /* RB_2D_DST_LO/HI */
OUT_RELOCW(ring, dst->bo, doff, 0, 0); /* RB_2D_DST_LO/HI */
OUT_RING(ring, A6XX_RB_2D_DST_SIZE_PITCH(dpitch));
OUT_RING(ring, 0x00000000);
OUT_RING(ring, 0x00000000);

View File

@@ -839,8 +839,7 @@ fd_resource_create(struct pipe_screen *pscreen,
rsc->internal_format = format;
rsc->cpp = util_format_get_blocksize(format);
prsc->nr_samples = MAX2(1, prsc->nr_samples);
rsc->cpp *= prsc->nr_samples;
rsc->cpp *= fd_resource_nr_samples(prsc);
assert(rsc->cpp);
@@ -924,9 +923,9 @@ fd_resource_from_handle(struct pipe_screen *pscreen,
if (!rsc->bo)
goto fail;
prsc->nr_samples = MAX2(1, prsc->nr_samples);
rsc->internal_format = tmpl->format;
rsc->cpp = prsc->nr_samples * util_format_get_blocksize(tmpl->format);
rsc->cpp = util_format_get_blocksize(tmpl->format);
rsc->cpp *= fd_resource_nr_samples(prsc);
slice->pitch = handle->stride / rsc->cpp;
slice->offset = handle->offset;
slice->size0 = handle->stride * prsc->height0;

View File

@@ -178,6 +178,15 @@ fd_resource_level_linear(struct pipe_resource *prsc, int level)
return false;
}
/* access # of samples, with 0 normalized to 1 (which is what we care about
* most of the time)
*/
static inline unsigned
fd_resource_nr_samples(struct pipe_resource *prsc)
{
return MAX2(1, prsc->nr_samples);
}
void fd_blitter_pipe_begin(struct fd_context *ctx, bool render_cond, bool discard,
enum fd_render_stage stage);
void fd_blitter_pipe_end(struct fd_context *ctx);

View File

@@ -31,6 +31,7 @@
#include "freedreno_texture.h"
#include "freedreno_context.h"
#include "freedreno_resource.h"
#include "freedreno_util.h"
static void
@@ -83,7 +84,7 @@ static void set_sampler_views(struct fd_texture_stateobj *tex,
tex->num_textures = util_last_bit(tex->valid_textures);
for (i = 0; i < tex->num_textures; i++) {
uint nr_samples = tex->textures[i]->texture->nr_samples;
uint nr_samples = fd_resource_nr_samples(tex->textures[i]->texture);
samplers |= (nr_samples >> 1) << (i * 2);
}

View File

@@ -1044,7 +1044,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
break;
}
case OP_MUL:
if (i->dType == TYPE_F32)
if (i->dType == TYPE_F32 && !i->precise)
tryCollapseChainedMULs(i, s, imm0);
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH) {

View File

@@ -1279,8 +1279,8 @@ nvc0_screen_create(struct nouveau_device *dev)
for (i = 0; i < NVC0_MAX_VIEWPORTS; i++) {
BEGIN_NVC0(push, NVC0_3D(SCISSOR_ENABLE(i)), 3);
PUSH_DATA (push, 1);
PUSH_DATA (push, 8192 << 16);
PUSH_DATA (push, 8192 << 16);
PUSH_DATA (push, 16384 << 16);
PUSH_DATA (push, 16384 << 16);
}
#define MK_MACRO(m, n) i = nvc0_graph_set_macro(screen, m, i, sizeof(n), n);

View File

@@ -521,10 +521,13 @@ static void si_buffer_do_flush_region(struct pipe_context *ctx,
struct r600_resource *rbuffer = r600_resource(transfer->resource);
if (stransfer->staging) {
unsigned src_offset = stransfer->offset +
transfer->box.x % SI_MAP_BUFFER_ALIGNMENT +
(box->x - transfer->box.x);
/* Copy the staging buffer into the original one. */
si_copy_buffer((struct si_context*)ctx, transfer->resource,
&stransfer->staging->b.b, box->x,
stransfer->offset + box->x % SI_MAP_BUFFER_ALIGNMENT,
&stransfer->staging->b.b, box->x, src_offset,
box->width);
}

View File

@@ -348,20 +348,11 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen,
key->u.uses_gs)
partial_vs_wave = true;
/* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
/* Needed for 028B6C_DISTRIBUTION_MODE != 0. (implies >= VI) */
if (sscreen->has_distributed_tess) {
if (key->u.uses_gs) {
if (sscreen->info.chip_class <= VI)
if (sscreen->info.chip_class == VI)
partial_es_wave = true;
/* GPU hang workaround. */
if (sscreen->info.family == CHIP_TONGA ||
sscreen->info.family == CHIP_FIJI ||
sscreen->info.family == CHIP_POLARIS10 ||
sscreen->info.family == CHIP_POLARIS11 ||
sscreen->info.family == CHIP_POLARIS12 ||
sscreen->info.family == CHIP_VEGAM)
partial_vs_wave = true;
} else {
partial_vs_wave = true;
}
@@ -417,6 +408,18 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen,
if (sscreen->info.max_se == 4 && !wd_switch_on_eop)
ia_switch_on_eoi = true;
/* HW engineers suggested that PARTIAL_VS_WAVE_ON should be set
* to work around a GS hang.
*/
if (key->u.uses_gs &&
(sscreen->info.family == CHIP_TONGA ||
sscreen->info.family == CHIP_FIJI ||
sscreen->info.family == CHIP_POLARIS10 ||
sscreen->info.family == CHIP_POLARIS11 ||
sscreen->info.family == CHIP_POLARIS12 ||
sscreen->info.family == CHIP_VEGAM))
partial_vs_wave = true;
/* Required by Hawaii and, for some special cases, by VI. */
if (ia_switch_on_eoi &&
(sscreen->info.family == CHIP_HAWAII ||

View File

@@ -1662,7 +1662,7 @@ static inline void si_shader_selector_key(struct pipe_context *ctx,
key->part.ps.epilog.alpha_func = si_get_alpha_test_func(sctx);
/* ps_uses_fbfetch is true only if the color buffer is bound. */
if (sctx->ps_uses_fbfetch) {
if (sctx->ps_uses_fbfetch && !sctx->blitter->running) {
struct pipe_surface *cb0 = sctx->framebuffer.state.cbufs[0];
struct pipe_resource *tex = cb0->texture;

View File

@@ -146,6 +146,8 @@ static void si_emit_one_scissor(struct si_context *ctx,
S_028254_BR_Y(final.maxy));
}
#define MAX_PA_SU_HARDWARE_SCREEN_OFFSET 8176
static void si_emit_guardband(struct si_context *ctx)
{
const struct si_state_rasterizer *rs = ctx->queued.named.rasterizer;
@@ -179,13 +181,22 @@ static void si_emit_guardband(struct si_context *ctx)
int hw_screen_offset_x = (vp_as_scissor.maxx + vp_as_scissor.minx) / 2;
int hw_screen_offset_y = (vp_as_scissor.maxy + vp_as_scissor.miny) / 2;
const unsigned hw_screen_offset_max = 8176;
/* SI-CI need to align the offset to an ubertile consisting of all SEs. */
const unsigned hw_screen_offset_alignment =
ctx->chip_class >= VI ? 16 : MAX2(ctx->screen->se_tile_repeat, 16);
hw_screen_offset_x = CLAMP(hw_screen_offset_x, 0, hw_screen_offset_max);
hw_screen_offset_y = CLAMP(hw_screen_offset_y, 0, hw_screen_offset_max);
/* Indexed by quantization modes */
static unsigned max_viewport_size[] = {65535, 16383, 4095};
/* Ensure that the whole viewport stays representable in
* absolute coordinates.
* See comment in si_set_viewport_states.
*/
assert(vp_as_scissor.maxx <= max_viewport_size[vp_as_scissor.quant_mode] &&
vp_as_scissor.maxy <= max_viewport_size[vp_as_scissor.quant_mode]);
hw_screen_offset_x = CLAMP(hw_screen_offset_x, 0, MAX_PA_SU_HARDWARE_SCREEN_OFFSET);
hw_screen_offset_y = CLAMP(hw_screen_offset_y, 0, MAX_PA_SU_HARDWARE_SCREEN_OFFSET);
/* Align the screen offset by dropping the low bits. */
hw_screen_offset_x &= ~(hw_screen_offset_alignment - 1);
@@ -218,7 +229,6 @@ static void si_emit_guardband(struct si_context *ctx)
*
* The viewport range is [-max_viewport_size/2, max_viewport_size/2].
*/
static unsigned max_viewport_size[] = {65535, 16383, 4095};
assert(vp_as_scissor.quant_mode < ARRAY_SIZE(max_viewport_size));
max_range = max_viewport_size[vp_as_scissor.quant_mode] / 2;
left = (-max_range - vp.translate[0]) / vp.scale[0];
@@ -332,6 +342,22 @@ static void si_set_viewport_states(struct pipe_context *pctx,
unsigned h = scissor->maxy - scissor->miny;
unsigned max_extent = MAX2(w, h);
int max_corner = MAX2(scissor->maxx, scissor->maxy);
unsigned center_x = (scissor->maxx + scissor->minx) / 2;
unsigned center_y = (scissor->maxy + scissor->miny) / 2;
unsigned max_center = MAX2(center_x, center_y);
/* PA_SU_HARDWARE_SCREEN_OFFSET can't center viewports whose
* center start farther than MAX_PA_SU_HARDWARE_SCREEN_OFFSET.
* (for example, a 1x1 viewport in the lower right corner of
* 16Kx16K) Such viewports need a greater guardband, so they
* have to use a worse quantization mode.
*/
unsigned distance_off_center =
MAX2(0, (int)max_center - MAX_PA_SU_HARDWARE_SCREEN_OFFSET);
max_extent += distance_off_center;
/* Determine the best quantization mode (subpixel precision),
* but also leave enough space for the guardband.
*
@@ -343,7 +369,22 @@ static void si_set_viewport_states(struct pipe_context *pctx,
if (ctx->family == CHIP_RAVEN)
max_extent = 16384; /* Use QUANT_MODE == 16_8. */
if (max_extent <= 1024) /* 4K scanline area for guardband */
/* Another constraint is that all coordinates in the viewport
* are representable in fixed point with respect to the
* surface origin.
*
* It means that PA_SU_HARDWARE_SCREEN_OFFSET can't be given
* an offset that would make the upper corner of the viewport
* greater than the maximum representable number post
* quantization, ie 2^quant_bits.
*
* This does not matter for 14.10 and 16.8 formats since the
* offset is already limited at 8k, but it means we can't use
* 12.12 if we are drawing to some pixels outside the lower
* 4k x 4k of the render target.
*/
if (max_extent <= 1024 && max_corner < 4096) /* 4K scanline area for guardband */
scissor->quant_mode = SI_QUANT_MODE_12_12_FIXED_POINT_1_4096TH;
else if (max_extent <= 4096) /* 16K scanline area for guardband */
scissor->quant_mode = SI_QUANT_MODE_14_10_FIXED_POINT_1_1024TH;

View File

@@ -190,11 +190,7 @@ swr_arch_libs = []
swr_arch_defines = []
swr_avx_args = cpp.first_supported_argument(
'-target-cpu=sandybridge', '-mavx', '-march=core-avx', '-tp=sandybridge',
prefix : '''
#if !defined(__AVX__)
# error
#endif ''',
'-mavx', '-target-cpu=sandybridge', '-march=core-avx', '-tp=sandybridge',
)
if swr_avx_args == []
error('Cannot find AVX support for swr. (these are required for SWR an all architectures.)')
@@ -215,18 +211,10 @@ endif
if with_swr_arches.contains('avx2')
swr_avx2_args = cpp.first_supported_argument(
'-target-cpu=haswell', '-march=core-avx2', '-tp=haswell',
prefix : '''
#if !defined(__AVX2__)
# error
#endif ''',
'-march=core-avx2', '-target-cpu=haswell', '-tp=haswell',
)
if swr_avx2_args == []
if cpp.has_argument(['-mavx2', '-mfma', '-mbmi2', '-mf16c'],
prefix : '''
#if !defined(__AVX2__)
# error
#endif ''')
if cpp.has_argument(['-mavx2', '-mfma', '-mbmi2', '-mf16c'])
swr_avx2_args = ['-mavx2', '-mfma', '-mbmi2', '-mf16c']
else
error('Cannot find AVX2 support for swr.')
@@ -248,11 +236,7 @@ endif
if with_swr_arches.contains('knl')
swr_knl_args = cpp.first_supported_argument(
'-target-cpu=mic-knl', '-march=knl', '-xMIC-AVX512',
prefix : '''
#if !defined(__AVX512F__) || !defined(__AVX512ER__)
# error
#endif ''',
'-march=knl', '-target-cpu=mic-knl', '-xMIC-AVX512',
)
if swr_knl_args == []
error('Cannot find KNL support for swr.')
@@ -264,7 +248,7 @@ if with_swr_arches.contains('knl')
[files_swr_common, files_swr_arch],
cpp_args : [
swr_cpp_args, swr_knl_args, '-DKNOB_ARCH=KNOB_ARCH_AVX512',
'-DKNOB_ARCH_KNIGHTS',
'-DSIMD_ARCH_KNIGHTS',
],
link_args : [ld_args_gc_sections],
include_directories : [swr_incs],
@@ -276,11 +260,7 @@ endif
if with_swr_arches.contains('skx')
swr_skx_args = cpp.first_supported_argument(
'-target-cpu=x86-skylake', '-march=skylake-avx512', '-xCORE-AVX512',
prefix : '''
#if !defined(__AVX512F__) || !defined(__AVX512BW__)
# error
#endif ''',
'-march=skylake-avx512', '-target-cpu=x86-skylake', '-xCORE-AVX512',
)
if swr_skx_args == []
error('Cannot find SKX support for swr.')

View File

@@ -50,7 +50,9 @@ swr_fence_cb(uint64_t userData, uint64_t userData2, uint64_t userData3)
swr_fence_do_work(fence);
/* Correct value is in SwrSync data, and not the fence write field. */
fence->read = userData2;
/* Contexts may not finish in order, but fence value always increases */
if (fence->read < userData2)
fence->read = userData2;
}
/*

View File

@@ -669,7 +669,7 @@ v3d_resource_create_with_modifiers(struct pipe_screen *pscreen,
rsc->tiled = false;
} else {
fprintf(stderr, "Unsupported modifier requested\n");
return NULL;
goto fail;
}
rsc->internal_format = prsc->format;

View File

@@ -81,8 +81,10 @@ files_libvc4 = files(
'vc4_uniforms.c',
)
vc4_c_args = []
libvc4_neon = []
if with_asm_arch == 'arm'
if host_machine.cpu_family() == 'arm'
libvc4_neon = static_library(
'vc4_neon',
'vc4_tiling_lt_neon.c',
@@ -91,12 +93,12 @@ if with_asm_arch == 'arm'
],
c_args : '-mfpu=neon',
)
vc4_c_args += '-DUSE_ARM_ASM'
endif
simpenrose_c_args = []
dep_simpenrose = dependency('simpenrose', required : false)
if dep_simpenrose.found()
simpenrose_c_args = '-DUSE_VC4_SIMULATOR'
vc4_c_args += '-DUSE_VC4_SIMULATOR'
endif
libvc4 = static_library(
@@ -107,7 +109,7 @@ libvc4 = static_library(
inc_gallium_drivers, inc_drm_uapi,
],
link_with: libvc4_neon,
c_args : [c_vis_args, simpenrose_c_args],
c_args : [c_vis_args, vc4_c_args],
cpp_args : [cpp_vis_args],
dependencies : [dep_simpenrose, dep_libdrm, dep_valgrind, idep_nir_headers],
build_by_default : false,

View File

@@ -132,7 +132,7 @@ vc4_create_batch_query(struct pipe_context *pctx, unsigned num_queries,
/* We can't mix HW and non-HW queries. */
if (nhwqueries && nhwqueries != num_queries)
return NULL;
goto err_free_query;
if (!nhwqueries)
return (struct pipe_query *)query;

View File

@@ -73,42 +73,46 @@ vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp)
/* Load from the GPU in one shot, no interleave, to
* d0-d7.
*/
"vldm %0, {q0, q1, q2, q3}\n"
"vldm %[gpu], {q0, q1, q2, q3}\n"
/* Store each 8-byte line to cpu-side destination,
* incrementing it by the stride each time.
*/
"vst1.8 d0, [%1], %2\n"
"vst1.8 d1, [%1], %2\n"
"vst1.8 d2, [%1], %2\n"
"vst1.8 d3, [%1], %2\n"
"vst1.8 d4, [%1], %2\n"
"vst1.8 d5, [%1], %2\n"
"vst1.8 d6, [%1], %2\n"
"vst1.8 d7, [%1]\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu_stride)
"vst1.8 d0, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d1, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d2, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d3, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d4, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d5, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d6, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d7, [%[cpu]]\n"
: [cpu] "+r"(cpu)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "q0", "q1", "q2", "q3");
} else {
assert(gpu_stride == 16);
void *cpu2 = cpu + 8;
__asm__ volatile (
/* Load from the GPU in one shot, no interleave, to
* d0-d7.
*/
"vldm %0, {q0, q1, q2, q3};\n"
"vldm %[gpu], {q0, q1, q2, q3};\n"
/* Store each 16-byte line in 2 parts to the cpu-side
* destination. (vld1 can only store one d-register
* at a time).
*/
"vst1.8 d0, [%1], %3\n"
"vst1.8 d1, [%2], %3\n"
"vst1.8 d2, [%1], %3\n"
"vst1.8 d3, [%2], %3\n"
"vst1.8 d4, [%1], %3\n"
"vst1.8 d5, [%2], %3\n"
"vst1.8 d6, [%1]\n"
"vst1.8 d7, [%2]\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
"vst1.8 d0, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d1, [%[cpu2]],%[cpu_stride]\n"
"vst1.8 d2, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d3, [%[cpu2]],%[cpu_stride]\n"
"vst1.8 d4, [%[cpu]], %[cpu_stride]\n"
"vst1.8 d5, [%[cpu2]],%[cpu_stride]\n"
"vst1.8 d6, [%[cpu]]\n"
"vst1.8 d7, [%[cpu2]]\n"
: [cpu] "+r"(cpu),
[cpu2] "+r"(cpu2)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "q0", "q1", "q2", "q3");
}
#elif defined (PIPE_ARCH_AARCH64)
@@ -117,42 +121,46 @@ vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp)
/* Load from the GPU in one shot, no interleave, to
* d0-d7.
*/
"ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
"ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
/* Store each 8-byte line to cpu-side destination,
* incrementing it by the stride each time.
*/
"st1 {v0.D}[0], [%1], %2\n"
"st1 {v0.D}[1], [%1], %2\n"
"st1 {v1.D}[0], [%1], %2\n"
"st1 {v1.D}[1], [%1], %2\n"
"st1 {v2.D}[0], [%1], %2\n"
"st1 {v2.D}[1], [%1], %2\n"
"st1 {v3.D}[0], [%1], %2\n"
"st1 {v3.D}[1], [%1]\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu_stride)
"st1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
"st1 {v0.D}[1], [%[cpu]], %[cpu_stride]\n"
"st1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
"st1 {v1.D}[1], [%[cpu]], %[cpu_stride]\n"
"st1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
"st1 {v2.D}[1], [%[cpu]], %[cpu_stride]\n"
"st1 {v3.D}[0], [%[cpu]], %[cpu_stride]\n"
"st1 {v3.D}[1], [%[cpu]]\n"
: [cpu] "+r"(cpu)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "v0", "v1", "v2", "v3");
} else {
assert(gpu_stride == 16);
void *cpu2 = cpu + 8;
__asm__ volatile (
/* Load from the GPU in one shot, no interleave, to
* d0-d7.
*/
"ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
"ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
/* Store each 16-byte line in 2 parts to the cpu-side
* destination. (vld1 can only store one d-register
* at a time).
*/
"st1 {v0.D}[0], [%1], %3\n"
"st1 {v0.D}[1], [%2], %3\n"
"st1 {v1.D}[0], [%1], %3\n"
"st1 {v1.D}[1], [%2], %3\n"
"st1 {v2.D}[0], [%1], %3\n"
"st1 {v2.D}[1], [%2], %3\n"
"st1 {v3.D}[0], [%1]\n"
"st1 {v3.D}[1], [%2]\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
"st1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
"st1 {v0.D}[1], [%[cpu2]],%[cpu_stride]\n"
"st1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
"st1 {v1.D}[1], [%[cpu2]],%[cpu_stride]\n"
"st1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
"st1 {v2.D}[1], [%[cpu2]],%[cpu_stride]\n"
"st1 {v3.D}[0], [%[cpu]]\n"
"st1 {v3.D}[1], [%[cpu2]]\n"
: [cpu] "+r"(cpu),
[cpu2] "+r"(cpu2)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "v0", "v1", "v2", "v3");
}
#else
@@ -174,40 +182,44 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t cpu_stride, uint32_t cpp)
/* Load each 8-byte line from cpu-side source,
* incrementing it by the stride each time.
*/
"vld1.8 d0, [%1], %2\n"
"vld1.8 d1, [%1], %2\n"
"vld1.8 d2, [%1], %2\n"
"vld1.8 d3, [%1], %2\n"
"vld1.8 d4, [%1], %2\n"
"vld1.8 d5, [%1], %2\n"
"vld1.8 d6, [%1], %2\n"
"vld1.8 d7, [%1]\n"
"vld1.8 d0, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d1, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d2, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d3, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d4, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d5, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d6, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d7, [%[cpu]]\n"
/* Load from the GPU in one shot, no interleave, to
* d0-d7.
*/
"vstm %0, {q0, q1, q2, q3}\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu_stride)
"vstm %[gpu], {q0, q1, q2, q3}\n"
: [cpu] "+r"(cpu)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "q0", "q1", "q2", "q3");
} else {
assert(gpu_stride == 16);
void *cpu2 = cpu + 8;
__asm__ volatile (
/* Load each 16-byte line in 2 parts from the cpu-side
* destination. (vld1 can only store one d-register
* at a time).
*/
"vld1.8 d0, [%1], %3\n"
"vld1.8 d1, [%2], %3\n"
"vld1.8 d2, [%1], %3\n"
"vld1.8 d3, [%2], %3\n"
"vld1.8 d4, [%1], %3\n"
"vld1.8 d5, [%2], %3\n"
"vld1.8 d6, [%1]\n"
"vld1.8 d7, [%2]\n"
"vld1.8 d0, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d1, [%[cpu2]],%[cpu_stride]\n"
"vld1.8 d2, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d3, [%[cpu2]],%[cpu_stride]\n"
"vld1.8 d4, [%[cpu]], %[cpu_stride]\n"
"vld1.8 d5, [%[cpu2]],%[cpu_stride]\n"
"vld1.8 d6, [%[cpu]]\n"
"vld1.8 d7, [%[cpu2]]\n"
/* Store to the GPU in one shot, no interleave. */
"vstm %0, {q0, q1, q2, q3}\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
"vstm %[gpu], {q0, q1, q2, q3}\n"
: [cpu] "+r"(cpu),
[cpu2] "+r"(cpu2)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "q0", "q1", "q2", "q3");
}
#elif defined (PIPE_ARCH_AARCH64)
@@ -216,38 +228,42 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t cpu_stride, uint32_t cpp)
/* Load each 8-byte line from cpu-side source,
* incrementing it by the stride each time.
*/
"ld1 {v0.D}[0], [%1], %2\n"
"ld1 {v0.D}[1], [%1], %2\n"
"ld1 {v1.D}[0], [%1], %2\n"
"ld1 {v1.D}[1], [%1], %2\n"
"ld1 {v2.D}[0], [%1], %2\n"
"ld1 {v2.D}[1], [%1], %2\n"
"ld1 {v3.D}[0], [%1], %2\n"
"ld1 {v3.D}[1], [%1]\n"
"ld1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
"ld1 {v0.D}[1], [%[cpu]], %[cpu_stride]\n"
"ld1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
"ld1 {v1.D}[1], [%[cpu]], %[cpu_stride]\n"
"ld1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
"ld1 {v2.D}[1], [%[cpu]], %[cpu_stride]\n"
"ld1 {v3.D}[0], [%[cpu]], %[cpu_stride]\n"
"ld1 {v3.D}[1], [%[cpu]]\n"
/* Store to the GPU in one shot, no interleave. */
"st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu_stride)
"st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
: [cpu] "+r"(cpu)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "v0", "v1", "v2", "v3");
} else {
assert(gpu_stride == 16);
void *cpu2 = cpu + 8;
__asm__ volatile (
/* Load each 16-byte line in 2 parts from the cpu-side
* destination. (vld1 can only store one d-register
* at a time).
*/
"ld1 {v0.D}[0], [%1], %3\n"
"ld1 {v0.D}[1], [%2], %3\n"
"ld1 {v1.D}[0], [%1], %3\n"
"ld1 {v1.D}[1], [%2], %3\n"
"ld1 {v2.D}[0], [%1], %3\n"
"ld1 {v2.D}[1], [%2], %3\n"
"ld1 {v3.D}[0], [%1]\n"
"ld1 {v3.D}[1], [%2]\n"
"ld1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
"ld1 {v0.D}[1], [%[cpu2]],%[cpu_stride]\n"
"ld1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
"ld1 {v1.D}[1], [%[cpu2]],%[cpu_stride]\n"
"ld1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
"ld1 {v2.D}[1], [%[cpu2]],%[cpu_stride]\n"
"ld1 {v3.D}[0], [%[cpu]]\n"
"ld1 {v3.D}[1], [%[cpu2]]\n"
/* Store to the GPU in one shot, no interleave. */
"st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
:
: "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
"st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
: [cpu] "+r"(cpu),
[cpu2] "+r"(cpu2)
: [gpu] "r"(gpu),
[cpu_stride] "r"(cpu_stride)
: "v0", "v1", "v2", "v3");
}
#else

View File

@@ -70,7 +70,8 @@ enum pipe_video_profile
PIPE_VIDEO_PROFILE_HEVC_MAIN_444,
PIPE_VIDEO_PROFILE_JPEG_BASELINE,
PIPE_VIDEO_PROFILE_VP9_PROFILE0,
PIPE_VIDEO_PROFILE_VP9_PROFILE2
PIPE_VIDEO_PROFILE_VP9_PROFILE2,
PIPE_VIDEO_PROFILE_MAX
};
/* Video caps, can be different for each codec/profile */

View File

@@ -20,7 +20,7 @@ struct drisw_loader_funcs
void (*put_image2) (struct dri_drawable *dri_drawable,
void *data, int x, int y, unsigned width, unsigned height, unsigned stride);
void (*put_image_shm) (struct dri_drawable *dri_drawable,
int shmid, char *shmaddr, unsigned offset,
int shmid, char *shmaddr, unsigned offset, unsigned offset_x,
int x, int y, unsigned width, unsigned height, unsigned stride);
};

View File

@@ -79,15 +79,21 @@ put_image2(__DRIdrawable *dPriv, void *data, int x, int y,
static inline void
put_image_shm(__DRIdrawable *dPriv, int shmid, char *shmaddr,
unsigned offset, int x, int y,
unsigned offset, unsigned offset_x, int x, int y,
unsigned width, unsigned height, unsigned stride)
{
__DRIscreen *sPriv = dPriv->driScreenPriv;
const __DRIswrastLoaderExtension *loader = sPriv->swrast_loader;
loader->putImageShm(dPriv, __DRI_SWRAST_IMAGE_OP_SWAP,
x, y, width, height, stride,
shmid, shmaddr, offset, dPriv->loaderPrivate);
/* if we have the newer interface, don't have to add the offset_x here. */
if (loader->base.version > 4 && loader->putImageShm2)
loader->putImageShm2(dPriv, __DRI_SWRAST_IMAGE_OP_SWAP,
x, y, width, height, stride,
shmid, shmaddr, offset, dPriv->loaderPrivate);
else
loader->putImageShm(dPriv, __DRI_SWRAST_IMAGE_OP_SWAP,
x, y, width, height, stride,
shmid, shmaddr, offset + offset_x, dPriv->loaderPrivate);
}
static inline void
@@ -179,12 +185,13 @@ drisw_put_image2(struct dri_drawable *drawable,
static inline void
drisw_put_image_shm(struct dri_drawable *drawable,
int shmid, char *shmaddr, unsigned offset,
unsigned offset_x,
int x, int y, unsigned width, unsigned height,
unsigned stride)
{
__DRIdrawable *dPriv = drawable->dPriv;
put_image_shm(dPriv, shmid, shmaddr, offset, x, y, width, height, stride);
put_image_shm(dPriv, shmid, shmaddr, offset, offset_x, x, y, width, height, stride);
}
static inline void

View File

@@ -668,6 +668,19 @@ NineSurface9_CopyMemToDefault( struct NineSurface9 *This,
From->data, From->stride,
0, /* depth = 1 */
&src_box);
if (From->texture == D3DRTYPE_TEXTURE) {
struct NineTexture9 *tex =
NineTexture9(From->base.base.container);
/* D3DPOOL_SYSTEMMEM with buffer content passed
* from the user: execute the upload right now.
* It is possible it is enough to delay upload
* until the surface refcount is 0, but the
* bind refcount may not be 0, and thus the dtor
* is not executed (and doesn't trigger the
* pending_uploads_counter check). */
if (!tex->managed_buffer)
nine_csmt_process(This->base.base.device);
}
if (This->data_conversion)
(void) util_format_translate(This->format_conversion,

View File

@@ -175,7 +175,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
ctx->version_minor = 1;
*ctx->vtable = vtable;
*ctx->vtable_vpp = vtable_vpp;
ctx->max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - PIPE_VIDEO_PROFILE_UNKNOWN;
ctx->max_profiles = PIPE_VIDEO_PROFILE_MAX - PIPE_VIDEO_PROFILE_UNKNOWN - 1;
ctx->max_entrypoints = 2;
ctx->max_attributes = 1;
ctx->max_image_formats = VL_VA_MAX_IMAGE_FORMATS;

View File

@@ -28,6 +28,8 @@
#include "vl/vl_vlc.h"
#include "va_private.h"
#define NUM_VP9_REFS 8
void vlVaHandlePictureParameterBufferVP9(vlVaDriver *drv, vlVaContext *context, vlVaBuffer *buf)
{
VADecPictureParameterBufferVP9 *vp9 = buf->data;
@@ -79,8 +81,11 @@ void vlVaHandlePictureParameterBufferVP9(vlVaDriver *drv, vlVaContext *context,
context->desc.vp9.picture_parameter.bit_depth = vp9->bit_depth;
for (i = 0 ; i < 8 ; i++)
for (i = 0 ; i < NUM_VP9_REFS ; i++)
vlVaGetReferenceFrame(drv, vp9->reference_frames[i], &context->desc.vp9.ref[i]);
if (!context->decoder && !context->templat.max_references)
context->templat.max_references = NUM_VP9_REFS;
}
void vlVaHandleSliceParameterBufferVP9(vlVaContext *context, vlVaBuffer *buf)

View File

@@ -18,13 +18,20 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
VDPAU_MAJOR = 1
VDPAU_MINOR = 0
libvdpau_st = static_library(
'vdpau_st',
files(
'bitmap.c', 'decode.c', 'device.c', 'ftab.c', 'htab.c', 'mixer.c',
'output.c', 'preemption.c', 'presentation.c', 'query.c', 'surface.c',
),
c_args : [c_vis_args, '-DVER_MAJOR=1', '-DVER_MINOR=0'],
c_args : [
c_vis_args,
'-DVER_MAJOR=@0@'.format(VDPAU_MAJOR),
'-DVER_MINOR=@0@'.format(VDPAU_MINOR),
],
include_directories : [
inc_include, inc_src, inc_util, inc_gallium, inc_gallium_aux,
],

View File

@@ -90,15 +90,15 @@ Status XvMCSetAttribute(Display *dpy, XvMCContext *context, Atom attribute, int
if (!attr)
return XvMCBadContext;
if (strcmp(attr, XV_BRIGHTNESS))
if (strcmp(attr, XV_BRIGHTNESS) == 0)
context_priv->procamp.brightness = value / 1000.0f;
else if (strcmp(attr, XV_CONTRAST))
else if (strcmp(attr, XV_CONTRAST) == 0)
context_priv->procamp.contrast = value / 1000.0f + 1.0f;
else if (strcmp(attr, XV_SATURATION))
else if (strcmp(attr, XV_SATURATION) == 0)
context_priv->procamp.saturation = value / 1000.0f + 1.0f;
else if (strcmp(attr, XV_HUE))
else if (strcmp(attr, XV_HUE) == 0)
context_priv->procamp.hue = value / 1000.0f;
else if (strcmp(attr, XV_COLORSPACE))
else if (strcmp(attr, XV_COLORSPACE) == 0)
context_priv->color_standard = value ?
VL_CSC_COLOR_STANDARD_BT_601 :
VL_CSC_COLOR_STANDARD_BT_709;
@@ -134,15 +134,15 @@ Status XvMCGetAttribute(Display *dpy, XvMCContext *context, Atom attribute, int
if (!attr)
return XvMCBadContext;
if (strcmp(attr, XV_BRIGHTNESS))
if (strcmp(attr, XV_BRIGHTNESS) == 0)
*value = context_priv->procamp.brightness * 1000;
else if (strcmp(attr, XV_CONTRAST))
else if (strcmp(attr, XV_CONTRAST) == 0)
*value = context_priv->procamp.contrast * 1000 - 1000;
else if (strcmp(attr, XV_SATURATION))
else if (strcmp(attr, XV_SATURATION) == 0)
*value = context_priv->procamp.saturation * 1000 + 1000;
else if (strcmp(attr, XV_HUE))
else if (strcmp(attr, XV_HUE) == 0)
*value = context_priv->procamp.hue * 1000;
else if (strcmp(attr, XV_COLORSPACE))
else if (strcmp(attr, XV_COLORSPACE) == 0)
*value = context_priv->color_standard == VL_CSC_COLOR_STANDARD_BT_709;
else
return BadName;

View File

@@ -123,11 +123,11 @@ void ParseArgs(int argc, char **argv, struct Config *config)
while (token && !fail)
{
if (strcmp(token, "i"))
if (strcmp(token, "i") == 0)
config->mb_types |= MB_TYPE_I;
else if (strcmp(token, "p"))
else if (strcmp(token, "p") == 0)
config->mb_types |= MB_TYPE_P;
else if (strcmp(token, "b"))
else if (strcmp(token, "b") == 0)
config->mb_types |= MB_TYPE_B;
else
fail = 1;

View File

@@ -54,13 +54,14 @@ libvdpau_gallium = shared_library(
dep_thread, driver_r300, driver_r600, driver_radeonsi, driver_nouveau,
],
link_depends : vdpau_link_depends,
soversion : '@0@.@1@.0'.format(VDPAU_MAJOR, VDPAU_MINOR),
)
foreach d : [[with_gallium_r300, 'r300'],
[with_gallium_r600, 'r600'],
[with_gallium_radeonsi, 'radeonsi'],
[with_gallium_nouveau, 'nouveau']]
if d[0]
vdpau_drivers += 'libvdpau_@0@.so.1.0.0'.format(d[1])
vdpau_drivers += 'libvdpau_@0@.so.@1@.@2@.0'.format(d[1], VDPAU_MAJOR, VDPAU_MINOR)
endif
endforeach

View File

@@ -1217,8 +1217,6 @@ static void amdgpu_add_fence_dependencies_bo_lists(struct amdgpu_cs *acs)
{
struct amdgpu_cs_context *cs = acs->csc;
cs->num_fence_dependencies = 0;
amdgpu_add_fence_dependencies_bo_list(acs, cs->fence, cs->num_real_buffers, cs->real_buffers);
amdgpu_add_fence_dependencies_bo_list(acs, cs->fence, cs->num_slab_buffers, cs->slab_buffers);
amdgpu_add_fence_dependencies_bo_list(acs, cs->fence, cs->num_sparse_buffers, cs->sparse_buffers);

View File

@@ -244,15 +244,20 @@ dri_sw_displaytarget_display(struct sw_winsys *ws,
unsigned width, height, x = 0, y = 0;
unsigned blsize = util_format_get_blocksize(dri_sw_dt->format);
unsigned offset = 0;
unsigned offset_x = 0;
char *data = dri_sw_dt->data;
bool is_shm = dri_sw_dt->shmid != -1;
/* Set the width to 'stride / cpp'.
*
* PutImage correctly clips to the width of the dst drawable.
*/
if (box) {
offset = (dri_sw_dt->stride * box->y) + box->x * blsize;
offset = dri_sw_dt->stride * box->y;
offset_x = box->x * blsize;
data += offset;
/* don't add x offset for shm, the put_image_shm will deal with it */
if (!is_shm)
data += offset_x;
x = box->x;
y = box->y;
width = box->width;
@@ -262,8 +267,8 @@ dri_sw_displaytarget_display(struct sw_winsys *ws,
height = dri_sw_dt->height;
}
if (dri_sw_dt->shmid != -1) {
dri_sw_ws->lf->put_image_shm(dri_drawable, dri_sw_dt->shmid, dri_sw_dt->data, offset,
if (is_shm) {
dri_sw_ws->lf->put_image_shm(dri_drawable, dri_sw_dt->shmid, dri_sw_dt->data, offset, offset_x,
x, y, width, height, dri_sw_dt->stride);
return;
}

View File

@@ -396,6 +396,7 @@ xlib_displaytarget_create(struct sw_winsys *winsys,
{
struct xlib_displaytarget *xlib_dt;
unsigned nblocksy, size;
int ignore;
xlib_dt = CALLOC_STRUCT(xlib_displaytarget);
if (!xlib_dt)
@@ -410,7 +411,8 @@ xlib_displaytarget_create(struct sw_winsys *winsys,
xlib_dt->stride = align(util_format_get_stride(format, width), alignment);
size = xlib_dt->stride * nblocksy;
if (!debug_get_option_xlib_no_shm()) {
if (!debug_get_option_xlib_no_shm() &&
XQueryExtension(xlib_dt->display, "MIT-SHM", &ignore, &ignore, &ignore)) {
xlib_dt->data = alloc_shm(xlib_dt, size);
if (xlib_dt->data) {
xlib_dt->shm = True;

View File

@@ -37,5 +37,5 @@ vc4_drm_screen_create(int fd)
struct pipe_screen *
vc4_drm_screen_create_renderonly(struct renderonly *ro)
{
return vc4_screen_create(fcntl(ro->gpu_fd, F_DUPFD_CLOEXEC, 3), ro);
return vc4_screen_create(ro->gpu_fd, ro);
}

View File

@@ -201,7 +201,8 @@ bytes_per_line(unsigned pitch_bits, unsigned mul)
static void
swrastXPutImage(__DRIdrawable * draw, int op,
int x, int y, int w, int h, int stride,
int srcx, int srcy, int x, int y,
int w, int h, int stride,
int shmid, char *data, void *loaderPrivate)
{
struct drisw_drawable *pdp = loaderPrivate;
@@ -235,12 +236,12 @@ swrastXPutImage(__DRIdrawable * draw, int op,
if (pdp->shminfo.shmid >= 0) {
ximage->width = ximage->bytes_per_line / ((ximage->bits_per_pixel + 7)/ 8);
ximage->height = h;
XShmPutImage(dpy, drawable, gc, ximage, 0, 0, x, y, w, h, False);
XShmPutImage(dpy, drawable, gc, ximage, srcx, srcy, x, y, w, h, False);
XSync(dpy, False);
} else {
ximage->width = w;
ximage->height = h;
XPutImage(dpy, drawable, gc, ximage, 0, 0, x, y, w, h);
XPutImage(dpy, drawable, gc, ximage, srcx, srcy, x, y, w, h);
}
ximage->data = NULL;
}
@@ -254,7 +255,21 @@ swrastPutImageShm(__DRIdrawable * draw, int op,
struct drisw_drawable *pdp = loaderPrivate;
pdp->shminfo.shmaddr = shmaddr;
swrastXPutImage(draw, op, x, y, w, h, stride, shmid,
swrastXPutImage(draw, op, 0, 0, x, y, w, h, stride, shmid,
shmaddr + offset, loaderPrivate);
}
static void
swrastPutImageShm2(__DRIdrawable * draw, int op,
int x, int y,
int w, int h, int stride,
int shmid, char *shmaddr, unsigned offset,
void *loaderPrivate)
{
struct drisw_drawable *pdp = loaderPrivate;
pdp->shminfo.shmaddr = shmaddr;
swrastXPutImage(draw, op, x, 0, x, y, w, h, stride, shmid,
shmaddr + offset, loaderPrivate);
}
@@ -263,7 +278,7 @@ swrastPutImage2(__DRIdrawable * draw, int op,
int x, int y, int w, int h, int stride,
char *data, void *loaderPrivate)
{
swrastXPutImage(draw, op, x, y, w, h, stride, -1,
swrastXPutImage(draw, op, 0, 0, x, y, w, h, stride, -1,
data, loaderPrivate);
}
@@ -272,7 +287,7 @@ swrastPutImage(__DRIdrawable * draw, int op,
int x, int y, int w, int h,
char *data, void *loaderPrivate)
{
swrastXPutImage(draw, op, x, y, w, h, 0, -1,
swrastXPutImage(draw, op, 0, 0, x, y, w, h, 0, -1,
data, loaderPrivate);
}
@@ -340,7 +355,7 @@ swrastGetImageShm(__DRIdrawable * read,
}
static const __DRIswrastLoaderExtension swrastLoaderExtension_shm = {
.base = {__DRI_SWRAST_LOADER, 4 },
.base = {__DRI_SWRAST_LOADER, 5 },
.getDrawableInfo = swrastGetDrawableInfo,
.putImage = swrastPutImage,
@@ -349,6 +364,7 @@ static const __DRIswrastLoaderExtension swrastLoaderExtension_shm = {
.getImage2 = swrastGetImage2,
.putImageShm = swrastPutImageShm,
.getImageShm = swrastGetImageShm,
.putImageShm2 = swrastPutImageShm2,
};
static const __DRIextension *loader_extensions_shm[] = {

View File

@@ -251,6 +251,7 @@ fs_inst::is_send_from_grf() const
case SHADER_OPCODE_TYPED_ATOMIC:
case SHADER_OPCODE_TYPED_SURFACE_READ:
case SHADER_OPCODE_TYPED_SURFACE_WRITE:
case SHADER_OPCODE_IMAGE_SIZE:
case SHADER_OPCODE_URB_WRITE_SIMD8:
case SHADER_OPCODE_URB_WRITE_SIMD8_PER_SLOT:
case SHADER_OPCODE_URB_WRITE_SIMD8_MASKED:
@@ -892,6 +893,7 @@ fs_inst::size_read(int arg) const
case SHADER_OPCODE_TYPED_ATOMIC:
case SHADER_OPCODE_TYPED_SURFACE_READ:
case SHADER_OPCODE_TYPED_SURFACE_WRITE:
case SHADER_OPCODE_IMAGE_SIZE:
case FS_OPCODE_INTERPOLATE_AT_SAMPLE:
case FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET:
case SHADER_OPCODE_BYTE_SCATTERED_WRITE:

View File

@@ -371,6 +371,20 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
return true;
}
static bool
instruction_requires_packed_data(fs_inst *inst)
{
switch (inst->opcode) {
case FS_OPCODE_DDX_FINE:
case FS_OPCODE_DDX_COARSE:
case FS_OPCODE_DDY_FINE:
case FS_OPCODE_DDY_COARSE:
return true;
default:
return false;
}
}
bool
fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
{
@@ -417,6 +431,13 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
return false;
/* Some instructions implemented in the generator backend, such as
* derivatives, assume that their operands are packed so we can't
* generally propagate strided regions to them.
*/
if (instruction_requires_packed_data(inst) && entry->src.stride > 1)
return false;
/* Bail if the result of composing both strides would exceed the
* hardware limit.
*/

View File

@@ -667,15 +667,14 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
* messages adding a node interference to the grf127_send_hack_node.
* This node has a fixed asignment to grf127.
*
* We don't apply it to SIMD16 because previous code avoids any register
* overlap between sources and destination.
* We don't apply it to SIMD16 instructions because previous code avoids
* any register overlap between sources and destination.
*/
ra_set_node_reg(g, grf127_send_hack_node, 127);
if (dispatch_width == 8) {
foreach_block_and_inst(block, fs_inst, inst, cfg) {
if (inst->is_send_from_grf() && inst->dst.file == VGRF)
ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
}
foreach_block_and_inst(block, fs_inst, inst, cfg) {
if (inst->exec_size < 16 && inst->is_send_from_grf() &&
inst->dst.file == VGRF)
ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
}
if (spilled_any_registers) {

View File

@@ -94,7 +94,22 @@ VkResult anv_CreateDescriptorSetLayout(
uint32_t immutable_sampler_count = 0;
for (uint32_t j = 0; j < pCreateInfo->bindingCount; j++) {
max_binding = MAX2(max_binding, pCreateInfo->pBindings[j].binding);
if (pCreateInfo->pBindings[j].pImmutableSamplers)
/* From the Vulkan 1.1.97 spec for VkDescriptorSetLayoutBinding:
*
* "If descriptorType specifies a VK_DESCRIPTOR_TYPE_SAMPLER or
* VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER type descriptor, then
* pImmutableSamplers can be used to initialize a set of immutable
* samplers. [...] If descriptorType is not one of these descriptor
* types, then pImmutableSamplers is ignored.
*
* We need to be careful here and only parse pImmutableSamplers if we
* have one of the right descriptor types.
*/
VkDescriptorType desc_type = pCreateInfo->pBindings[j].descriptorType;
if ((desc_type == VK_DESCRIPTOR_TYPE_SAMPLER ||
desc_type == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) &&
pCreateInfo->pBindings[j].pImmutableSamplers)
immutable_sampler_count += pCreateInfo->pBindings[j].descriptorCount;
}
@@ -153,6 +168,12 @@ VkResult anv_CreateDescriptorSetLayout(
if (binding == NULL)
continue;
/* We temporarily stashed the pointer to the binding in the
* immutable_samplers pointer. Now that we've pulled it back out
* again, we reset immutable_samplers to NULL.
*/
set_layout->binding[b].immutable_samplers = NULL;
if (binding->descriptorCount == 0)
continue;
@@ -170,6 +191,15 @@ VkResult anv_CreateDescriptorSetLayout(
set_layout->binding[b].stage[s].sampler_index = sampler_count[s];
sampler_count[s] += binding->descriptorCount;
}
if (binding->pImmutableSamplers) {
set_layout->binding[b].immutable_samplers = samplers;
samplers += binding->descriptorCount;
for (uint32_t i = 0; i < binding->descriptorCount; i++)
set_layout->binding[b].immutable_samplers[i] =
anv_sampler_from_handle(binding->pImmutableSamplers[i]);
}
break;
default:
break;
@@ -221,17 +251,6 @@ VkResult anv_CreateDescriptorSetLayout(
break;
}
if (binding->pImmutableSamplers) {
set_layout->binding[b].immutable_samplers = samplers;
samplers += binding->descriptorCount;
for (uint32_t i = 0; i < binding->descriptorCount; i++)
set_layout->binding[b].immutable_samplers[i] =
anv_sampler_from_handle(binding->pImmutableSamplers[i]);
} else {
set_layout->binding[b].immutable_samplers = NULL;
}
set_layout->shader_stages |= binding->stageFlags;
}

View File

@@ -980,9 +980,12 @@ void anv_GetPhysicalDeviceProperties(
const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
128 : 16;
const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : MAX_IMAGES;
VkSampleCountFlags sample_counts =
isl_device_get_sample_counts(&pdevice->isl_dev);
VkPhysicalDeviceLimits limits = {
.maxImageDimension1D = (1 << 14),
.maxImageDimension2D = (1 << 14),
@@ -1002,7 +1005,7 @@ void anv_GetPhysicalDeviceProperties(
.maxPerStageDescriptorUniformBuffers = 64,
.maxPerStageDescriptorStorageBuffers = 64,
.maxPerStageDescriptorSampledImages = max_samplers,
.maxPerStageDescriptorStorageImages = 64,
.maxPerStageDescriptorStorageImages = max_images,
.maxPerStageDescriptorInputAttachments = 64,
.maxPerStageResources = 250,
.maxDescriptorSetSamplers = 6 * max_samplers, /* number of stages * maxPerStageDescriptorSamplers */
@@ -1011,7 +1014,7 @@ void anv_GetPhysicalDeviceProperties(
.maxDescriptorSetStorageBuffers = 6 * 64, /* number of stages * maxPerStageDescriptorStorageBuffers */
.maxDescriptorSetStorageBuffersDynamic = MAX_DYNAMIC_BUFFERS / 2,
.maxDescriptorSetSampledImages = 6 * max_samplers, /* number of stages * maxPerStageDescriptorSampledImages */
.maxDescriptorSetStorageImages = 6 * 64, /* number of stages * maxPerStageDescriptorStorageImages */
.maxDescriptorSetStorageImages = 6 * max_images, /* number of stages * maxPerStageDescriptorStorageImages */
.maxDescriptorSetInputAttachments = 256,
.maxVertexInputAttributes = MAX_VBS,
.maxVertexInputBindings = MAX_VBS,

View File

@@ -40,7 +40,8 @@ bool anv_nir_lower_multiview(nir_shader *shader, uint32_t view_mask);
bool anv_nir_lower_ycbcr_textures(nir_shader *shader,
struct anv_pipeline_layout *layout);
void anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
void anv_nir_apply_pipeline_layout(const struct anv_physical_device *pdevice,
bool robust_buffer_access,
struct anv_pipeline_layout *layout,
nir_shader *shader,
struct brw_stage_prog_data *prog_data,

View File

@@ -428,7 +428,8 @@ setup_vec4_uniform_value(uint32_t *params, uint32_t offset, unsigned n)
}
void
anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
anv_nir_apply_pipeline_layout(const struct anv_physical_device *pdevice,
bool robust_buffer_access,
struct anv_pipeline_layout *layout,
nir_shader *shader,
struct brw_stage_prog_data *prog_data,
@@ -439,7 +440,7 @@ anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
struct apply_pipeline_layout_state state = {
.shader = shader,
.layout = layout,
.add_bounds_checks = pipeline->device->robust_buffer_access,
.add_bounds_checks = robust_buffer_access,
};
void *mem_ctx = ralloc_context(NULL);
@@ -518,8 +519,8 @@ anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
}
}
if (map->image_count > 0) {
assert(map->image_count <= MAX_IMAGES);
if (map->image_count > 0 && pdevice->compiler->devinfo->gen < 9) {
assert(map->image_count <= MAX_GEN8_IMAGES);
assert(shader->num_uniforms == prog_data->nr_params * 4);
state.first_image_uniform = shader->num_uniforms;
uint32_t *param = brw_stage_prog_data_add_params(prog_data,

View File

@@ -532,7 +532,9 @@ anv_pipeline_lower_nir(struct anv_pipeline *pipeline,
/* Apply the actual pipeline layout to UBOs, SSBOs, and textures */
if (layout) {
anv_nir_apply_pipeline_layout(pipeline, layout, nir, prog_data,
anv_nir_apply_pipeline_layout(&pipeline->device->instance->physicalDevice,
pipeline->device->robust_buffer_access,
layout, nir, prog_data,
&stage->bind_map);
}

View File

@@ -157,7 +157,8 @@ struct gen_l3_config;
#define MAX_SCISSORS 16
#define MAX_PUSH_CONSTANTS_SIZE 128
#define MAX_DYNAMIC_BUFFERS 16
#define MAX_IMAGES 8
#define MAX_IMAGES 64
#define MAX_GEN8_IMAGES 8
#define MAX_PUSH_DESCRIPTORS 32 /* Minimum requirement */
/* The kernel relocation API has a limitation of a 32-bit delta value
@@ -1874,7 +1875,7 @@ struct anv_push_constants {
uint32_t base_work_group_id[3];
/* Image data for image_load_store on pre-SKL */
struct brw_image_param images[MAX_IMAGES];
struct brw_image_param images[MAX_GEN8_IMAGES];
};
struct anv_dynamic_state {

View File

@@ -70,12 +70,36 @@ gen7_cmd_buffer_emit_scissor(struct anv_cmd_buffer *cmd_buffer)
};
const int max = 0xffff;
uint32_t y_min = s->offset.y;
uint32_t x_min = s->offset.x;
uint32_t y_max = s->offset.y + s->extent.height - 1;
uint32_t x_max = s->offset.x + s->extent.width - 1;
/* Do this math using int64_t so overflow gets clamped correctly. */
if (cmd_buffer->level == VK_COMMAND_BUFFER_LEVEL_PRIMARY) {
y_min = clamp_int64((uint64_t) y_min,
cmd_buffer->state.render_area.offset.y, max);
x_min = clamp_int64((uint64_t) x_min,
cmd_buffer->state.render_area.offset.x, max);
y_max = clamp_int64((uint64_t) y_max, 0,
cmd_buffer->state.render_area.offset.y +
cmd_buffer->state.render_area.extent.height - 1);
x_max = clamp_int64((uint64_t) x_max, 0,
cmd_buffer->state.render_area.offset.x +
cmd_buffer->state.render_area.extent.width - 1);
} else if (fb) {
y_min = clamp_int64((uint64_t) y_min, 0, max);
x_min = clamp_int64((uint64_t) x_min, 0, max);
y_max = clamp_int64((uint64_t) y_max, 0, fb->height - 1);
x_max = clamp_int64((uint64_t) x_max, 0, fb->width - 1);
}
struct GEN7_SCISSOR_RECT scissor = {
/* Do this math using int64_t so overflow gets clamped correctly. */
.ScissorRectangleYMin = clamp_int64(s->offset.y, 0, max),
.ScissorRectangleXMin = clamp_int64(s->offset.x, 0, max),
.ScissorRectangleYMax = clamp_int64((uint64_t) s->offset.y + s->extent.height - 1, 0, fb->height - 1),
.ScissorRectangleXMax = clamp_int64((uint64_t) s->offset.x + s->extent.width - 1, 0, fb->width - 1)
.ScissorRectangleYMin = y_min,
.ScissorRectangleXMin = x_min,
.ScissorRectangleYMax = y_max,
.ScissorRectangleXMax = x_max
};
if (s->extent.width <= 0 || s->extent.height <= 0) {

View File

@@ -1998,6 +1998,7 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
gl_shader_stage stage,
struct anv_state *bt_state)
{
const struct gen_device_info *devinfo = &cmd_buffer->device->info;
struct anv_subpass *subpass = cmd_buffer->state.subpass;
struct anv_cmd_pipeline_state *pipe_state;
struct anv_pipeline *pipeline;
@@ -2055,7 +2056,8 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
if (map->surface_count == 0)
goto out;
if (map->image_count > 0) {
/* We only use push constant space for images before gen9 */
if (map->image_count > 0 && devinfo->gen < 9) {
VkResult result =
anv_cmd_buffer_ensure_push_constant_field(cmd_buffer, stage, images);
if (result != VK_SUCCESS)
@@ -2168,11 +2170,15 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
surface_state = sstate.state;
assert(surface_state.alloc_size);
add_surface_state_relocs(cmd_buffer, sstate);
if (devinfo->gen < 9) {
assert(image < MAX_GEN8_IMAGES);
struct brw_image_param *image_param =
&cmd_buffer->state.push_constants[stage]->images[image];
struct brw_image_param *image_param =
&cmd_buffer->state.push_constants[stage]->images[image++];
*image_param = desc->image_view->planes[binding->plane].storage_image_param;
*image_param =
desc->image_view->planes[binding->plane].storage_image_param;
}
image++;
break;
}
@@ -2217,11 +2223,14 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
assert(surface_state.alloc_size);
add_surface_reloc(cmd_buffer, surface_state,
desc->buffer_view->address);
if (devinfo->gen < 9) {
assert(image < MAX_GEN8_IMAGES);
struct brw_image_param *image_param =
&cmd_buffer->state.push_constants[stage]->images[image];
struct brw_image_param *image_param =
&cmd_buffer->state.push_constants[stage]->images[image++];
*image_param = desc->buffer_view->storage_image_param;
*image_param = desc->buffer_view->storage_image_param;
}
image++;
break;
default:

View File

@@ -1,4 +1,4 @@
# Copyright © 2017-2018 Intel Corporation
# Copyright © 2017-2019 Intel Corporation
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -176,7 +176,10 @@ endif
libanv_common = static_library(
'anv_common',
[libanv_files, anv_entrypoints, anv_extensions_c, anv_extensions_h, sha1_h],
[
libanv_files, anv_entrypoints, anv_extensions_c, anv_extensions_h, sha1_h,
gen_xml_pack,
],
include_directories : [
inc_common, inc_intel, inc_compiler, inc_drm_uapi, inc_vulkan_util,
inc_vulkan_wsi,

View File

@@ -1273,12 +1273,20 @@ dri3_alloc_render_buffer(struct loader_dri3_drawable *draw, unsigned int format,
free(mod_reply);
buffer->image = draw->ext->image->createImageWithModifiers(draw->dri_screen,
width, height,
format,
modifiers,
count,
buffer);
/* don't use createImageWithModifiers() if we have no
* modifiers, other things depend on the use flags when
* there are no modifiers to know that a buffer can be
* shared.
*/
if (modifiers) {
buffer->image = draw->ext->image->createImageWithModifiers(draw->dri_screen,
width, height,
format,
modifiers,
count,
buffer);
}
free(modifiers);
}
#endif

View File

@@ -222,8 +222,13 @@ void st_init_limits(struct pipe_screen *screen,
pc->MaxUniformComponents = MIN2(pc->MaxUniformComponents,
MAX_UNIFORMS * 4);
/* For ARB programs, prog_src_register::Index is a signed 13-bit number.
* This gives us a limit of 4096 values - but we may need to generate
* internal values in addition to what the source program uses. So, we
* drop the limit one step lower, to 2048, to be safe.
*/
pc->MaxParameters =
pc->MaxNativeParameters = pc->MaxUniformComponents / 4;
pc->MaxNativeParameters = MIN2(pc->MaxUniformComponents / 4, 2048);
pc->MaxInputComponents =
screen->get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_INPUTS) * 4;
pc->MaxOutputComponents =

View File

@@ -1071,7 +1071,12 @@ st_api_make_current(struct st_api *stapi, struct st_context_iface *stctxi,
st_framebuffers_purge(st);
}
else {
GET_CURRENT_CONTEXT(ctx);
ret = _mesa_make_current(NULL, NULL, NULL);
if (ctx)
st_framebuffers_purge(ctx->st);
}
return ret;