Compare commits

..

244 Commits

Author SHA1 Message Date
Emil Velikov
f60875e211 docs: add release notes for 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-19 12:13:25 +01:00
Emil Velikov
5ab872d64a Update version to 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-19 12:10:00 +01:00
Chuck Atkins
9bc4ee1c8e configure.ac: Reduce zlib requirement from 1.2.8 to 1.2.3.
Testing with zlib versions 1.2.{3,4,5,6,7,8} showed no difference in
functionality, correctness, or zlib API usage and 1.2.3 is the oldest
version available in still actively deployed production Linux
distributions (RHEL/CentOS 6 and SuSE 11).

Build 17.1.1 against the system supplied zlib-devel packages for 1.2.3
in EL6 and 1.2.7 on EL7. I then swapped out the zlib version at runtime
via LD_LIBRARY_PATH with ones build from the release tarballs from
zlib.net

Testwise - I ran the piglit shader profile with --quick addded to the
tests since I figured that would exercise the shader cache, which would
in turn use zlib.

Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
[Emil Velikov: add hunk about version/piglit testing]
Acked-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ad69b037b1)
2017-06-14 12:47:58 +01:00
Nicolas Dechesne
b708c2961e util/rand_xor: add missing include statements
Fixes for:

src/util/rand_xor.c:60:13: error: implicit declaration of function 'open' [-Werror=implicit-function-declaration]
    int fd = open("/dev/urandom", O_RDONLY);
             ^~~~
src/util/rand_xor.c:60:34: error: 'O_RDONLY' undeclared (first use in this function)
    int fd = open("/dev/urandom", O_RDONLY);
                                  ^~~~~~~~

Signed-off-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit adadadc151)
2017-06-14 12:47:58 +01:00
Dave Airlie
538975fdf8 glsl/lower_distance: only set max_array_access for 1D clip dist arrays
The max_array_access field applies to the first dimension, which means
we only want to set it for the 1D clip dist arrays.

This fixes an ir_validate assert seen with
KHR-GL44.cull_distance.functional
on nouveau and radeon with debug builds.

Fixes: a08c4ebbe (glsl: rewrite clip/cull distance lowering pass)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 53587b7105)
2017-06-14 12:47:58 +01:00
Grazvydas Ignotas
3734a7de6c radv: fix trace dumping for !use_ib_bos
Fixes trace dumping crash for SI or when RADV_DEBUG=noibs is set.

Fixes: 97dfff5410 "radv: Dump command buffer on hang."
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit fae3b13905)
2017-06-14 12:47:58 +01:00
Dave Airlie
423dab9d32 radv: set fmask state to all 0s when no fmask. (v2)
The shader reads the descriptor to decide if it should take the
fmask value, however we weren't initing it always, which meant
random crap, esp with MSAA depth textures.

Fixes random hangs with:
dEQP-VK.glsl.builtin_var.fragdepth.*

v2: check fmask_state is not NULL

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 51553c0bea)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/vulkan/radv_image.c
2017-06-14 12:47:58 +01:00
Bas Nieuwenhuizen
18fd7249c5 radv: Remove SI num RB override for occlusion queries.
radeonsi doesn't have it anymore either.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 59c2e2a061)
2017-06-14 12:47:58 +01:00
Nicolai Hähnle
f66de22af4 radv: fewer than 8 RBs are possible
This fixes the subsequent assertion on Bonaire.

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 388d36dfd1)
2017-06-14 12:47:58 +01:00
Dave Airlie
8bd7d8c042 radv: expose integrated device type for APUs.
This just sets the vulkan device type depending on whether
this is an APU or GPU.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
(cherry picked from commit 2890a71158)
2017-06-14 12:47:58 +01:00
Bas Nieuwenhuizen
ffb46c8826 radv: Dirty all descriptors sets when changing the pipeline.
Sets could have been ignored during previous descriptor set flush
due to the shader not using them and therefore no SGPR being assigned.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: ae61ddabe8 "radv: move userdata sgpr ownership to compiler side."
(cherry picked from commit 4415a46be2)

Conflicts:
	src/amd/vulkan/radv_cmd_buffer.c
	src/amd/vulkan/radv_meta.c
2017-06-14 12:47:58 +01:00
Bas Nieuwenhuizen
c8226d3782 radv: Set both compute and graphics SGPRS on descriptor set flush.
We clear the descriptors_dirty array afterwards, so the SGPRs for
the other pipeline don't get updated on the flush for that other
draw/dispatch, so we have to make sure we do it immediately.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: ae61ddabe8 "radv: move userdata sgpr ownership to compiler side."
(cherry picked from commit 5fb8bb3065)
[Emil Velikov: drop radv_flush_indirect_descriptor_sets hunk - missing
in branch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/vulkan/radv_cmd_buffer.c
2017-06-14 12:47:57 +01:00
Tapani Pälli
6f062ba893 egl: fix _eglQuerySurface in EGL_BUFFER_AGE_EXT case
Specification states that in case of error, value should not be
written, patch changes buffer age queries to return -1 in case of
error so that we can skip changing the value.

In addition, small change to droid_query_buffer_age to return 0
in case buffer does not have a back buffer available.

Fixes:
   dEQP-EGL.functional.negative_partial_update.not_postable_surface

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8fac894f9b)
2017-06-14 12:47:57 +01:00
Tim Rowley
891dafc8e7 swr: relax c++ requirement from c++14 to c++11
Remove c++14 generic lambda to keep compiler requirement at c++11.

No regressions on piglit or vtk test suites.

Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

CC: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0b80b02502)
2017-06-14 12:47:57 +01:00
Marek Olšák
2a7279fa8f radeonsi: disable the patch ID workaround on SI when the patch ID isn't used (v2)
The workaround causes a massive performance decrease on 1-SE parts.
(Cape Verde, Hainan, Oland)

The performance regression is already part of 17.0 and 17.1.

v2: check tess_uses_prim_id

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 391673af7a)
[Emil Velikov: s/tcs_tes_uses_prim_id/tess_uses_prim_id/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-14 12:47:43 +01:00
Jason Ekstrand
ae960d7dee i965: Mark depth surfaces as needing a HiZ resolve after blitting
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 5097fcbfdc)
2017-06-14 10:40:22 +01:00
Jason Ekstrand
3a193c009b i965: Perform HiZ flush/stall prior to HiZ resolves
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit acbd02450b)
2017-06-14 10:40:22 +01:00
Jason Ekstrand
4889bb6af3 i965: Move the pre-depth-clear flush/stalls to intel_hiz_exec
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit acb9a2ef8f)
2017-06-14 10:40:22 +01:00
Jason Ekstrand
845c238ce2 i965/blorp: Take a layer range in intel_hiz_exec
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 252b004a51)
2017-06-14 10:40:22 +01:00
Thomas Hellstrom
67acca073a dri3/GLX: Fix drawable invalidation v2
A number of internal VMware apitrace traces image comparisons fail with
dri3 because the viewport transformation becomes incorrect after an X
drawable resize. The incorrect viewport transformation sometimes persist
until the second draw-call after a swapBuffer.

Comparing with the dri2 glx code there are a couple of places where dri2
invalidates the drawable in the absence of server-triggered invalidation,
where dri3 doesn't do that. When these invalidation points are added to
dri3, the image comparisons become correct.

v2:
Addressed review comment by Michel Dänzer.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-and-tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 1253d58983)
2017-06-14 10:40:22 +01:00
Marek Olšák
7b10ed6a12 radeonsi: fix a GPU hang with tessellation on 2-CU configs
Only harvested Stoney has 2 CUs. Tested on 2-CU Stoney and Fiji forced
to 2 CUs.

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
(cherry picked from commit 6c655cfeb4)
2017-06-14 10:40:22 +01:00
Marek Olšák
5a8d7ef65a st/mesa: don't load cached TGSI shaders on demand
This fixes a performance issue with the shader cache that delayed Gallium
shader create calls until draw calls.

I'd like this in stable, but it's not a showstopper.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 2ec50f98a9)
2017-06-14 10:40:22 +01:00
Lyude
98564569d0 nvc0: disable BGRA8 images on Fermi
BGRA8 image stores on Fermi don't work, which results in breaking
PBO downloads, such that they always return 0x0. Discovered this
through a glamor bug, and confirmed it does indeed break a good number
of piglit tests such as spec/arb_pixel_buffer_object/pbo-read-argb8888

Fixes: 8e7893eb53 ("nvc0: add support for BGRA8 images")
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 245912b684)
2017-06-14 10:40:22 +01:00
Brian Paul
6348a02e27 xlib: fix glXGetCurrentDisplay() failure
glXGetCurrentDisplay() has been broken for years and nobody noticed until
recently.  This change adds a new XMesaGetCurrentDisplay() that the GLX
emulation API can call, just as we did for glXGetCurrentContext().

Tested by hacking glxgears to call glXGetCurrentContext() before and
after glXMakeCurrent() to verify the return value is NULL beforehand and
the same as the opened display afterward.

Also tested by Tom Hudson with his tests programs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100988
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Tom Hudson <tom.hudson.phd@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit c6ba85a8c0)
2017-06-14 10:40:22 +01:00
Jose Fonseca
c24bdf046e automake: Link all libGL.so variants with -Bsymbolic.
We were linking src/glx with -Bsymbolic, but not the classic/gallium X11
libGL.so.

But it's always a good idea to build all libGL.so and all DRI drivers
with -Bsymbolic, otherwise they might resolve symbols from the 3rd party
application executable or shared libraries, which is _never_ what we
want.

In particular, this can happen when intercepting OpenGL calls with
apitrace, before
63194b2573

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ce5e83b8a0)
2017-06-14 10:40:22 +01:00
Chad Versace
15b5e5996a i965/dri: Fix bad GL error in intel_create_winsys_renderbuffer()
This function never occurs in the callchain of a GL function. It occurs
only in the callchain of eglCreate*Surface and the analogous paths for
GLX.  Therefore, even if a  thread does have a bound GL context,
emitting a GL error here is wrong. A misplaced GL error, when no GL
call is made, can confuse clients.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 9d996e94fb)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/mesa/drivers/dri/i965/intel_fbo.c
2017-06-14 10:39:50 +01:00
Lucas Stach
8cfaa8ad66 etnaviv: always do cpu_fini in transfer_unmap
The cpu_fini() call pushes the buffer back into the GPU domain, which needs
to be done for all buffers, not just the ones with CPU written content. The
etnaviv kernel driver currently doesn't validate this, but may start to do
so at a later point in time. If there is a temporary resource the fini needs
to happen before the RS uses this one as the source for the upload.

Also remove an invalid comment about flushing CPU caches, cpu_fini takes
care of everything involved in this.

Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
(cherry picked from commit cab5996c26)
2017-06-14 09:49:45 +01:00
Juan A. Suarez Romero
4908b1e909 docs: add sha256 checksums for 17.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-05 21:15:43 +00:00
Juan A. Suarez Romero
97f6404e50 docs: add release notes for 17.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-05 20:27:24 +00:00
Juan A. Suarez Romero
eada8963c1 Update version to 17.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-05 20:15:30 +00:00
Jason Ekstrand
ae55ab84b5 anv: Require vertex buffers to come from a 32-bit heap
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 39adea9330)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-03 20:37:13 +02:00
Juan A. Suarez Romero
57cdaa3dcc Revert "cherry-ignore: anv: Require vertex buffers to come from a 32-bit heap"
This reverts commit b3e48a07c0.
2017-06-03 20:37:13 +02:00
Jason Ekstrand
2c23389922 i965: Rework Sandy Bridge HiZ and stencil layouts
Sandy Bridge does not technically support mipmapped depth/stencil.  In
order to work around this, we allocate what are effectively completely
separate images for each miplevel, ensure that they are page-aligned,
and manually offset to them.  Prior to layered rendering, this was a
simple matter of setting a large enough halign/valign.

With the advent of layered rendering, however, things got more
complicated.  Now, things weren't as simple as just handing a surface
off to the hardware.  Any miplevel of a normally mipmapped surface can
be considered as just an array surface given the right qpitch.  However,
the hardware gives us no capability to specify qpitch so this won't
work.  Instead, the chosen solution was to use a new "all slices at each
LOD" layout which laid things out as a mipmap of arrays rather than an
array of mipmaps.  This way you can easily offset to any of the
miplevels and each is a valid array.

Unfortunately, the "all slices at each lod" concept missed one
fundamental thing about SNB HiZ and stencil hardware:  It doesn't just
always act as if you're always working with a non-mipmapped surface, it
acts as if you're always working on a non-mipmapped surface of the same
size as LOD0.  In other words, even though it may only write the
upper-left corner of each array slice, the qpitch for the array is for a
surface the size of LOD0 of the depth surface.  This mistake causes us
to under-allocate HiZ and stencil in some cases and also to accidentally
allow different miplevels to overlap.  Sadly, piglit test coverage
didn't quite catch this until I started making changes to the resolve
code that caused additional HiZ resolves in certain tests.

This commit switches Sandy Bridge HiZ and stencil over to a new scheme
that lays out the non-zero miplevels horizontally below LOD0.  This way
they can all have the same qpitch without interfering with each other.
Technically, the miplevels still overlap, but things are spaced out
enough that each page is only in the "written area" of one LOD.

Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 10903d2289)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-02 23:04:01 +02:00
Jason Ekstrand
f967ae7b3f anv: Advertise both 32-bit and 48-bit heaps when we have enough memory
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 50d0eb5096)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-02 23:04:01 +02:00
Jason Ekstrand
de8ebbcf1e anv: Refactor memory type setup
This makes us walk over the heaps one at a time and add the types for
LLC and !LLC to each heap.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 34581fdd4f)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-02 23:04:01 +02:00
Jason Ekstrand
2562b3252b anv: Make supports_48bit_addresses a heap property
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b83b1af6f6)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/intel/vulkan/anv_device.c
2017-06-02 23:04:01 +02:00
Jason Ekstrand
86a8854b11 anv: Stop setting BO flags in bo_init_new
The idea behind doing this was to make it easier to set various flags.
However, we have enough custom flag settings floating around the driver
that this is more of a nuisance than a help.  This commit has the
following functional changes:

 1) The workaround_bo created in anv_CreateDevice loses both flags.
    This shouldn't matter because it's very small and entirely internal
    to the driver.

 2) The bo created in anv_CreateDmaBufImageINTEL loses the
    EXEC_OBJECT_ASYNC flag.  In retrospect, it never should have gotten
    EXEC_OBJECT_ASYNC in the first place.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 00df1cd9d6)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/intel/vulkan/anv_allocator.c
	src/intel/vulkan/anv_device.c
	src/intel/vulkan/anv_queue.c
2017-06-02 23:04:01 +02:00
Jason Ekstrand
0f042901e3 anv: Add valid_bufer_usage to the memory type metadata
Instead of returning valid types as just a number, we now walk the list
and check the buffer's usage against the usage flags we store in the new
anv_memory_type structure.  Currently, valid_buffer_usage == ~0.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f7736ccf53)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/intel/vulkan/anv_device.c
	src/intel/vulkan/anv_private.h
2017-06-02 23:04:01 +02:00
Jason Ekstrand
15bc6d4d21 anv: Determine the type of mapping based on type metadata
Before, we were just comparing the type index to 0.  Now we actually
look the type up in the table and check its properties to determine what
kind of mapping we want to do.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 92325a7efc)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/intel/vulkan/anv_device.c
	src/intel/vulkan/anv_private.h
2017-06-02 23:04:01 +02:00
Jason Ekstrand
0aa1e6acde anv: Set EXEC_OBJECT_ASYNC when available
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 35e626bd0e)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Squashed with:

anv/tests: Create a dummy instance as well as device

This fixes crashes caused by 35e626bd0e
which made us start referencing the instance in the allocators.  With
this commit, the tests now happily pass again.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100877
Tested-by: Vinson Lee <vlee@freedesktop.org>
(cherry picked from commit 6ef1bd4fa5)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-02 23:04:01 +02:00
Juan A. Suarez Romero
b7b3c0fce7 Revert "cherry-ignore: anv: [...]"
Revert "cherry-ignore: anv: Refactor memory type setup"
This reverts commit eab4a503a0.

Revert "cherry-ignore: anv: Add valid_bufer_usage to the memory type metadata"
This reverts commit c31e814a85.

Revert "cherry-ignore: anv: Advertise both 32-bit and 48-bit heaps when we have enough memory"
This reverts commit e391144853.

Revert "cherry-ignore: anv: Make supports_48bit_addresses a heap property"
This reverts commit dbadd06632.

Revert "cherry-ignore: anv: Stop setting BO flags in bo_init_new"
This reverts commit 07867f72cf.

Revert "cherry-ignore: anv: Determine the type of mapping based on type metadata"
This reverts commit 9299466b83.
2017-06-02 23:03:47 +02:00
Juan A. Suarez Romero
eab4a503a0 cherry-ignore: anv: Refactor memory type setup 2017-06-01 10:09:44 +02:00
Juan A. Suarez Romero
c31e814a85 cherry-ignore: anv: Add valid_bufer_usage to the memory type metadata 2017-06-01 10:09:44 +02:00
Juan A. Suarez Romero
7a636d8ff1 cherry-ignore: radv: fix regression in descriptor set freeing 2017-06-01 10:09:44 +02:00
Juan A. Suarez Romero
b3e48a07c0 cherry-ignore: anv: Require vertex buffers to come from a 32-bit heap 2017-06-01 10:09:44 +02:00
Juan A. Suarez Romero
e391144853 cherry-ignore: anv: Advertise both 32-bit and 48-bit heaps when we have enough memory 2017-06-01 10:09:44 +02:00
Juan A. Suarez Romero
dbadd06632 cherry-ignore: anv: Make supports_48bit_addresses a heap property 2017-06-01 10:09:44 +02:00
Juan A. Suarez Romero
07867f72cf cherry-ignore: anv: Stop setting BO flags in bo_init_new 2017-06-01 10:09:44 +02:00
Juan A. Suarez Romero
9299466b83 cherry-ignore: anv: Determine the type of mapping based on type metadata 2017-06-01 10:09:43 +02:00
Juan A. Suarez Romero
103ec6e231 cherry-ignore: radeonsi: load patch_id for TES-as-ES when exporting for PS 2017-06-01 10:09:43 +02:00
Ian Romanick
f1d487f4f0 r100: Use _mesa_get_format_base_format in radeon_update_wrapper
The wrapper is for a renderbuffer around a texture.  Textures can have
formats (e.g., 3) that aren't valide for API generated renderbuffers.
_mesa_base_fbo_format will return 0, but _mesa_get_format_base_format
will return the base format of RGB.

Fixes a crashes in piglit tests fbo-alphatest-formats (all subtests
pass) and fbo-colormask-formats (some subtests pass, some fail).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 303b47f253)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 10:09:43 +02:00
Ian Romanick
57b38cc077 r100,r200: Don't assume glVisual is non-NULL during context creation
Thanks to EGL_MESA_configless_context, the visual pointer can be NULL.

Fixes a segfault (or assertion failure) in piglit's
egl-configless-context test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c24881d39c)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 10:09:43 +02:00
Ian Romanick
ca8481c41c r100: Don't assume that the base mipmap of a texture exists
Fixes crashes in piglit's gl-1.2-texture-base-level.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 2dcec62075)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 10:09:43 +02:00
Tapani Pälli
5b81c0524b egl/android: fix segfault within swap_buffers
Function droid_swap_buffers may get called without dri2_surf->buffer set,
in these cases we don't have a back buffer set either. Patch fixes segfault
seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI.

backtrace:
   #00 pc 00013f88  /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104)
   #01 pc 000117b2  /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50)
   #02 pc 000058b2  /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386)
   #03 pc 00011329  /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553)
   #04 pc 000118e7  /system/lib/libEGL.so (eglSwapBuffers+55)
   #05 pc 000754dc  /system/lib/libandroid_runtime.so

Note, this is v1 as v2 caused dEQP regressions.

Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f347bac30f)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 10:09:43 +02:00
Juan A. Suarez Romero
ed38bcedfb Revert "android: fix segfault within swap_buffers"
This reverts commit 4d4558411d.

See https://lists.freedesktop.org/archives/mesa-stable/2017-June/006408.html

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 10:09:43 +02:00
Nicolas Boichat
012c198bb7 configure.ac: Also match -androideabi tuple
On ARM Android platforms, the host_os tuple should be linux-androideabi,
so let's match both -android and -androideabi (or any other
-android* tuple) to determine if we should do an Android build.

Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit f6ac3d0db6)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:10 +02:00
Nicolai Hähnle
1534330eba st/mesa: remove redundant stfb->iface checks
stfb->iface is always non-NULL for an st_framebuffer. These checks
were incorrect, relying on out-of-bounds memory access in the
surface-less case of EGL_KHR_surfaceless_context.

v2: remove redundant stread check (Marek)

Reviewed-by: Marek Olšák <marek@olsak@amd.com> (v2)
(cherry picked from commit 9d346af322)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:10 +02:00
Bartosz Tomczyk
580a2e6c15 mesa: Avoid leaking surface in st_renderbuffer_delete
v2: add comment in code

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100741
Fixes: a5e733c6b5 mesa: drop current draw/read buffer when ctx is released
Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fd6c2a3f3e)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:10 +02:00
Bas Nieuwenhuizen
ce2b96dd8b radv: Reserve space for descriptor and push constant user SGPR setting.
flush_compute_state doesn't reserve a large chunk, so these need their own reservation.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
(cherry picked from commit 18efb404cf)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/amd/vulkan/radv_cmd_buffer.c
2017-06-01 04:25:10 +02:00
Emil Velikov
e4c4b86f70 egl/wayland: select the format based on the interface used
Rather than misleadingly depending on DRI2 for the WL_DRM vs WL_SHM
formats, use the wl_drm and wl_shm interface respectively.

Fixes: a1727aa75e ("egl/wayland: Don't use DRM format codes for SHM")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 6ef0fc400c)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:10 +02:00
Rob Clark
da7bde9d9a freedreno: fix fence creation fail if no rendering
Android tries to create a FENCE_FD fence without any rendering.  And
then falls over when that fails.  So just always create an initial
batch.

Fixes: e4ad8695 ("freedreno: fix crash when flush() but no rendering")
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 8fc9702a1b)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:10 +02:00
Timothy Arceri
7e9129f487 st/mesa: don't mark the program as in cache_fallback when there is cache miss
When we fallback currently the gl_program objects are re-allocated.

This is likely to change when the i965 cache lands, but for now
this fixes a crash when using MESA_GLSL=cache_fb. This env var
simulates the fallback path taken when a tgsi cache item doesn't
exist due to being evicted previously or some kind of error.

Unlike i965 we are always falling back at link time so it's safe to
just re-allocate everything. We will be unnecessarily freeing and
re-allocate a bunch of things here but it's probably not a huge deal,
and can be changed when the i965 code lands.

Fixes: 0e9991f957 ("glsl: don't reference shader prog data during cache fallback")

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 80e643345e)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:10 +02:00
Daniel Stone
d6d334c5fb egl/wayland: Ensure we get a back buffer
Commit 9ca6711faa changed the Wayland winsys to only block for the
frame callback inside SwapBuffers, rather than get_back_bo. get_back_bo
would perform a single non-blocking Wayland event dispatch, to try to
find any release events which we had pulled off the wire but not
actually processed. The blocking dispatch was moved to SwapBuffers.

This removed a guarantee that we would've processed all events inside
get_back_bo(), and introduced a failure whereby the server could've sent
a buffer release event, but we wouldn't have read it. In clients
unconstrained by SwapInterval (rendering ~as fast as possible), which
were being displayed directly without composition (buffer release delayed),
this could lead to get_back_bo() failing because there were no free
buffers available to it.

The drawing rightly failed, but this was papered over because of the
path in eglSwapBuffers() which attempts to guarantee a BO, in order to
support calling SwapBuffers twice in a row with no rendering actually
having been performed.

Since eglSwapBuffers will perform a blocking dispatch of Wayland
events, a buffer release would have arrived by that point, and we
could then choose a buffer to post to the server. The effect was that
frames were displayed out-of-order, since we grabbed a frame with random
past content to display to the compositor.

Ideally get_back_bo() failing should store a failure flag inside the
surface and cause the next SwapBuffers to fail, but for the meantime,
restore the correct behaviour such that get_back_bo() no longer fails.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98833
Fixes: 9ca6711faa ("Revert "wayland: Block for the frame callback in get_back_bo not dri2_swap_buffers"")
(cherry picked from commit 1f2d0093bf)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:10 +02:00
Emil Velikov
4af9c15052 radv: automake: list shared libraries after the static ones
Analogous to previous commit - the compiler can discard xcb + wayland
libs, since there is no user (the static libraries) before it on the
command line.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
(cherry picked from commit 2b6ad89d86)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:09 +02:00
Emil Velikov
007611b3c0 anv: automake: list shared libraries after the static ones
The compiler can discard the shared ones from the link chain, since
there is no user (the static libraries) before it on the command line.

Cc: mesa-stable@lists.freedesktop.org
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
(cherry picked from commit 3e8790bff0)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:09 +02:00
Jason Ekstrand
39c70f902c i965: Round copy size to the nearest block in intel_miptree_copy
The width and height of the copy don't have to be aligned to the block
size if they specify the right or bottom edges of the image.  (See also
the comment and asserts right above).  We need to round them up when we
do the division in order to get it 100% right.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0901d0bc4c)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:09 +02:00
Jason Ekstrand
af08c1672e i965/blorp: Do and end-of-pipe sync on both sides of fast-clear ops
We've discovered in the Vulkan driver that simply doing the end-of-pipe
sync afterwards is insufficient.  The specific requirement stated in the
PRM is that you have to do one every time you transition between the
tree modes of "clear", "render", and "resolve".  This is GL, so we could
track it but any attempt to do so would most likely get it wrong.  For
now, it's easier to just assume that every fast-clear op is an island
and do the sync both before and after.

This also removes the unneeded flush and stall after slow-clear
operations.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 441cd7a81d)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:08 +02:00
Jason Ekstrand
2f8ede04d9 anv: Set image memory types based on the type count
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 10fad58b31)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-01 04:25:07 +02:00
Jason Ekstrand
e064f7d826 anv: Set up memory types and heaps during physical device init
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c1f4343807)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Conflicts:
	src/intel/vulkan/anv_device.c
	src/intel/vulkan/anv_private.h
2017-06-01 04:16:53 +02:00
Jason Ekstrand
deecf2a49a anv: Predicate 48bit support on gen >= 8
This doesn't matter right now since it only affects whether or not we
set the kernel bit but, if we ever do anything else based on it, we'll
want it to be correct per-gen.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit eceaf7e234)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:04 +02:00
Jason Ekstrand
fafa17bd19 anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hack
Up until now, we've been memsetting the auxiliary surface to 0 at
BindImageMemory time to ensure that it is properly initialized.
However, this isn't correct because apps are allowed to freely alias
memory between different images and buffers so long as they properly
track whether or not a particular image is valid and, if it isn't,
transition from UNINITIALIZED to something else before using it.  We
now implement those transitions so we can drop the hack.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4eecd534f0)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:04 +02:00
Jason Ekstrand
667ec77634 anv: Handle transitioning depth from UNDEFINED to other layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cc45c4bb80)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:04 +02:00
Jason Ekstrand
207dd2e9e6 anv: Handle color layout transitions from the UNINITIALIZED layout
This causes dEQP-VK.api.copy_and_blit.resolve_image.partial.* to start
failing due to test bugs.  See CL 1031 for a test fix.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 75edecf502)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:04 +02:00
Marek Olšák
c5b1b05e1c radeonsi/gfx9: compile shaders with +xnack
so that LLVM doesn't allocate SGPRs where XNACK is.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 2beb31bd7c)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:04 +02:00
Emil Velikov
cfa6a8bb50 travis: remove workarounds for the Vulkan target
Previously we required --enable-egl for the platform selection to work.
Additionally due to the broken DRI3 dependency tracking we needed
--enable-glx.

Since both of these are now sorted now we no longer need the
workarounds.

While we're here, explicitly enable dri3.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 552cd5cce5)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:04 +02:00
Emil Velikov
22ba39352d configure: error out if building XVMC w/o supported platform
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit b496fc2932)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:04 +02:00
Emil Velikov
f7b6b34aae configure: error out if building VDPAU w/o supported platform
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 037e9d37b4)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
112e732867 configure: error out if building OMX w/o supported platform
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 1914c814a6)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
637a5d58b0 configure: error out if building VA w/o supported platform
A bit pedantic patch to fool proof should someone start thinkering
without knowing what they do.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 63e11ac2b5)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
1f88c3c0cb gallium/targets: link against XCB only as needed
OMX and VA can optionally use the X11 DRI2/DRI3, thus we should link
only as required.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fcbedce310)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
3bcbf4d1d5 st/omx: fix building against X11-less setups
The vl_*_screen_create API properly falls back to a NOP when we're
building without specific platforms. So the only thing we need is to
handle the lack of X11/Xlib.h and provide a dummy Display define.

Cc: <mesa-stable@lists.freedesktop.org>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 115cb729d8)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
114a4a99bd st/omx: remove unneeded X11 include
En route to a X11-less builds

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit d71ce62e84)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
9b44f3fa95 st/va: fix misplaced closing bracket
It's been like this since the code was introduced.

Fixes: 86eb4131a9 (st/va: add headless support, i.e. VA_DISPLAY_DRM)
Cc: <mesa-stable@lists.freedesktop.org>
Cc: Julien Isorce <julien.isorce@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit aaea53c2c0)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
1e8e12e09c auxiliary/vl: use vl_*_screen_create stubs when building w/o platform
Provide a dummy stub when the user has opted w/o said platform, thus
we can build the binaries without unnecessarily requiring X11/other
headers.

In order to avoid build and link-time issues, we remove the HAVE_DRI3
guards in the VA and VDPAU state-trackers.

With this change st/va will return VA_STATUS_ERROR_ALLOCATION_FAILED
instead of VA_STATUS_ERROR_UNIMPLEMENTED. That is fine since upstream
users of libva such as vlc and mpv do little error checking, let
alone distinguish between the two.

Cc: Leo Liu <leo.liu@amd.com>
Cc: Guttula, Suresh <Suresh.Guttula@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 369e5dd939)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
023b1da8c0 configure: error out when building X11 Vulkan without DRI3
Vulkan supports only DRI3 enabled X11 platforms. Make it obvious,
should one consider building without it.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 05043e0e8e)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
15fc135cda loader: build libloader_dri3_helper.la only with HAVE_PLATFORM_X11
Pretty much every other place does the same.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit d80d6d662e)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
60297c97cd configure: check once for DRI3 dependencies
Currently we are having the XCB_DRI3 dependencies duplicated,
partially.

Just do a once-off check and add all of the respective CFLAGS/LIBS
where needed.

As a nice side effect this helps us solve a couple of FIXMEs.

DRI3 is not a thing w/o X11 so disable it in such cases.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit acf3d2afab)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

squashed with:

configure.ac: add xcb-fixes to the XCB DRI3 list

The XCB module is used by the VL targets. Thus omitting it can lead to
link-time errors due to unresolved symbols.

Other DRI3 users such as the Vulkan WSI and the dri3 loader helper do
not use an update region in their xcb_present_pixmap() call. We will
look into that at a later stage.

Fixes: acf3d2afab ("configure: check once for DRI3 dependencies")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101110
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 9a90d6a9d4)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

squashed with:

configure.ac: s/xcb-fixes/xcb-xfixes/

Former is not a thing, even if I have a hacked xcb-fixes.pc on my system.
Thanks for spotting it Mark!

Fixes: 9a90d6a9d4 ("configure.ac: add xcb-fixes to the XCB DRI3 list")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 48cd1919ff)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
4c0010e93c configure: error out when building GLX w/o the X11 platform
Building EGL/Vulkan/other without X11, while GLX is enabled is confusing
and misleading. In practise anyone aiming at the former will also
disable GLX.

The inverse (some examples below) should still work:
 ./configure --disable-glx --with-platforms=x11 --with-vulkan-drivers=intel
 ./configure --disable-glx --with-platforms=x11 --enable-egl

Keep in mind that the X11 platform is enabled, by default.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 8212fc95b5)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:03 +02:00
Emil Velikov
5e2a985a54 configure: set HAVE_foo_PLATFORM as applicable
Rather than having multiple places that define the macros, do it just
once in configure. Makes existing code a bit shorter and easier to
manage as we fix the VL targets with follow-up commits.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit f353f844a0)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:02 +02:00
Emil Velikov
a0f311a588 configure: enable the surfaceless platform by default
A simple platform that you want to use in a many usecases. See the
spec file details.

It has no special requirements plus it takes less than a second to
build.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 2d35773221)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:02 +02:00
Emil Velikov
1d89787a19 configure: loosen --with-platforms heuristics
Remove the enable-egl pre-requirement. Platform selection does not
depend on EGL.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit edb5a65f93)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:02 +02:00
Emil Velikov
29dc0b2d8c configure: update remaining --with-egl-platforms references
Rename the remaining references to omit the egl part.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 73682f82bc)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:02 +02:00
Emil Velikov
76362188cc configure: rename remaining HAVE_EGL_PLATFORM_* guards
Analogous to others earlier, these will be used to control the platform
for more than the EGL driver.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 27737e7e84)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:02 +02:00
Emil Velikov
817f9982d5 configure: move platform handling further up
We'll need it for the Vulkan drivers and the VL targets.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 3208fd2e46)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:32:02 +02:00
Daniel Stone
2ebf5e5aa5 egl/wayland: Use per-surface event queues
During display initialisation, we need a separate event queue to handle
the registry events, which is correctly handled. But we also need
separate per-surface event queues to handle swapchain-related events,
such as surface frame events and buffer release events. This avoids two
surfaces from the same EGLDisplay, both current on separate threads,
dispatching each other's events.

Create separate per-surface event queues, create wl_surface and wl_drm
proxy wrapper objects per surface, so we eliminate the race around
sending events to the wrong queue. swrast buffers do not need a
dedicated proxy wrapper, as the wl_shm_pool used to create the
wl_buffers, being transient, can itself be assigned to a queue.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 36b9976e1f ("egl/wayland: Avoid race conditions when on non-main thread")
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 03dd9a88b0)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Squashed with:

egl/wayland: verify event queue was allocated

We're already verified that 'window' wasn't NULL, I'm guessing this
allocation error is about the newly created queue.

CID: 1409754
Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 30dc56bb5b)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 22:31:42 +02:00
Daniel Stone
013433b3d9 egl/wayland: Don't open-code roundtrip
wl_display_roundtrip_queue() exists and can replace roundtrip(). The
API was introduced with wayland 1.6, while we currently require 1.11.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8118bc269f)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 21:53:24 +02:00
Daniel Stone
0dbab1e0f7 vulkan/wsi/wayland: Use proxy wrappers for swapchain
Though most swapchain operations used a queue, they were racy in that
the object was created with the queue only set later, meaning that its
event could potentially be dispatched from the default queue in between
these two steps.

Use proxy wrappers to avoid this race, also assigning wl_buffers created
for the swapchain to the event queue.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 5034c61558)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Squashed with:

vulkan/wsi/wayland: Fix proxy wrappers for swapchain recreation

Before the swapchain event queue is destroyed, all proxy objects that reference
it must be dropped. Otherwise we risk a use-after-free if a frame callback event
or buffer release events are received afterwards.
This happens when an application destroys and recreates a swapchain in FIFO
mode between two frames without using the VkSwapchainCreateInfoKHR::oldSwapchain
mechanism to keep the old swapchain until after the next redraw.

Fixes: 5034c61558 ("vulkan/wsi/wayland: Use proxy wrappers for swapchain")
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1586768e74)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 21:53:24 +02:00
Daniel Stone
7cd7f0bfe4 vulkan/wsi/wayland: Use per-display event queue
Calling random callbacks on the display's event queue is hostile, as
we may call into client code when it least expects it. Create our own
event queue, one per wsi_wl_display, and use that for the registry.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c902a1957d)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 21:53:24 +02:00
Daniel Stone
260dac8447 vulkan/wsi/wayland: Remove roundtrip when creating image
There's no need to call wl_display_roundtrip() after trying to create a
buffer through wl_drm; if it succeeds then everything is fine, and if it
fails, then we get a fatal protocol error so can't recover anyway.

Additionally, doing a roundtrip on the default / main application queue,
is destructive anyway, so would need to be its own queue.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit afe8c8a299)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 21:53:24 +02:00
Daniel Stone
36dd525569 vulkan: Fix Wayland uninitialised registry
Untangle the exit cleanup paths so we don't try to use the registry
variable before it's been initialised.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d9a8bba7f4)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-31 21:53:23 +02:00
Emil Velikov
092c485b8e docs: add sha256 checksums for 17.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-25 08:18:59 +01:00
Emil Velikov
ca0a148a4d docs: add release notes for 17.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-25 08:11:42 +01:00
Emil Velikov
0eaf422957 Update version to 17.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-25 08:06:23 +01:00
Ilia Mirkin
4da22e2b68 nvc0/ir: SHLADD's middle source must be an immediate
The instruction encodings only allow for immediates. Don't try to
replace a zero (which is dumb to have in that op in any case) with RZ.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 82e77d4e44)
2017-05-22 10:19:30 +01:00
Emil Velikov
88309a985a automake: add SWR LLVM gen_builder.hpp workaround
As gen_builder.hpp file is generated, it contains information that is
specific to the LLVM version it originates from.

As suggested by Tim, the file seems to be forwards compatible. So in
order to produce ship a file which will work everywhere we should be
using earlies supported LLVM - 3.9.

With this we're back on track and can build all of mesa without
python/mako/flex and friends.

In the long term we might want to see if the python generators can be
updated to produce LLVM version agnostic files. At least within the
range supported by SWR.

Cc: <mesa-stable@lists.freedesktop.org>
Cc: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
(cherry picked from commit 5233eaf9ee)
2017-05-22 10:19:04 +01:00
Johnson Lin
3922a43bf2 nir/lower_tex: Fix minor error in YUV color conversion matrix
The matrix used for YCbCr to RGB is listed in:

    https://en.wikipedia.org/wiki/YCbCr

There was an error in converting the offsets from integers to unorm
values: 0.0625=16/256 should be 16.0/255,and 0.5=128.0/256 should be
128.0/255.  With this fix, the CSC result is bit aligned with wikipedia's
conversion result and FFMPeg's result.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100854
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit a6fb943f3e)
2017-05-19 23:23:21 +01:00
Rob Herring
832f6b4543 virgl: fix virgl_bo_transfer_{put, get} box struct copy
Commit 3dfe61ed6e ("gallium: decrease the size of pipe_box - 24 -> 16
bytes") changed the size of pipe_box, but the virgl code was relying on
pipe_box and drm_virtgpu_3d_box structs having the same size/layout doing
a struct copy. Copy the fields one by one instead.

Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Fixes: 3dfe61ed6e ("gallium: decrease the size of pipe_box - 24 -> 16 bytes")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 5771ecc90e)
2017-05-19 23:17:29 +01:00
Emil Velikov
b6f92084bd egl: add g_egldispatchstubs.h to the release tarball
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e19ea928b9)
2017-05-19 23:17:19 +01:00
Nanley Chery
50cd4e37d9 i965/formats: Update the three-channel DXT1 mappings
The procedure for decompressing an opaque DXT1 OpenGL format is
dependant on the comparison of two colors stored in the first 32 bits of
the compressed block. Here's the specified OpenGL behavior for
reference:

   The RGB color for a texel at location (x,y) in the block is given by:

      RGB0,              if color0 > color1 and code(x,y) == 0
      RGB1,              if color0 > color1 and code(x,y) == 1
      (2*RGB0+RGB1)/3,   if color0 > color1 and code(x,y) == 2
      (RGB0+2*RGB1)/3,   if color0 > color1 and code(x,y) == 3

      RGB0,              if color0 <= color1 and code(x,y) == 0
      RGB1,              if color0 <= color1 and code(x,y) == 1
      (RGB0+RGB1)/2,     if color0 <= color1 and code(x,y) == 2
      BLACK,             if color0 <= color1 and code(x,y) == 3

The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1.
This means that the behavior is incompatible with OpenGL. This is stated
in the SKL PRM, Vol 5: Memory Views:

   Opaque Textures (DXT1_RGB)
      Texture format DXT1_RGB is identical to DXT1, with the exception that the
      One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
      the resulting texel color is derived strictly from the Opaque Color Encoding.
      The alpha channel defaults to 1.0.

      Programming Note
      Context: Opaque Textures (DXT1_RGB)
      The behavior of this format is not compliant with the OGL spec.

The opaque and non-opaque DXT1 OpenGL formats are specified to be
decoded in exactly the same way except the BLACK value must have a
transparent alpha channel in the latter. Use the four-channel BC1 Intel
formats with the alpha set to 1 to provide the behavior required by the
spec. Note that the alpha is already set to 1 for RGB formats in
brw_get_texture_swizzle().

v2: Provide a more detailed commit message (Kenneth Graunke).
v3: Ensure the alpha channel is set to 1 for DXT1 formats.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 688ddb85c8)
2017-05-19 23:14:39 +01:00
Nanley Chery
6e977a358f anv/formats: Update the three-channel BC1 mappings
The procedure for decompressing an opaque BC1 Vulkan format is dependant on the
comparison of two colors stored in the first 32 bits of the compressed block.
Here's the specified OpenGL (and Vulkan) behavior for reference:

   The RGB color for a texel at location (x,y) in the block is given by:

      RGB0,              if color0 > color1 and code(x,y) == 0
      RGB1,              if color0 > color1 and code(x,y) == 1
      (2*RGB0+RGB1)/3,   if color0 > color1 and code(x,y) == 2
      (RGB0+2*RGB1)/3,   if color0 > color1 and code(x,y) == 3

      RGB0,              if color0 <= color1 and code(x,y) == 0
      RGB1,              if color0 <= color1 and code(x,y) == 1
      (RGB0+RGB1)/2,     if color0 <= color1 and code(x,y) == 2
      BLACK,             if color0 <= color1 and code(x,y) == 3

The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1. This
means that the behavior is incompatible with OpenGL and Vulkan. This is stated
in the SKL PRM, Vol 5: Memory Views:

   Opaque Textures (DXT1_RGB)
      Texture format DXT1_RGB is identical to DXT1, with the exception that the
      One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
      the resulting texel color is derived strictly from the Opaque Color Encoding.
      The alpha channel defaults to 1.0.

      Programming Note
      Context: Opaque Textures (DXT1_RGB)
      The behavior of this format is not compliant with the OGL spec.

The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in
exactly the same way except the BLACK value must have a transparent alpha
channel in the latter. Use the four-channel BC1 Intel formats with the alpha
set to 1 to provide the behavior required by the spec.

v2 (Kenneth Graunke):
- Provide a more detailed commit message.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 56458cb168)
2017-05-19 23:14:33 +01:00
Tom Stellard
e0df523795 gallivm: Make sure module has the correct data layout when pass manager runs
The datalayout for modules was purposely not being set in order to work around
the fact that the ExecutionEngine requires that the module's datalayout
matches the datalayout of the TargetMachine that the ExecutionEngine is
using.

When the pass manager runs on a module with no datalayout, it uses
the default datalayout which is little-endian.  This causes problems
on big-endian targets, because some optimizations that are legal on
little-endian or illegal on big-endian.

To resolve this, we set the datalayout prior to running the pass
manager, and then clear it before creating the ExectionEngine.

This patch fixes a lot of piglit tests on big-endian ppc64.

Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 14e525a4d7)
2017-05-19 23:14:24 +01:00
Chad Versace
6199b3d485 egl: Partially revert 23c86c74, fix eglMakeCurrent
Fixes regressions in Android CtsVerifier.apk on Intel Chrome OS devices
due to incorrect error handling in eglMakeCurrent. See below on how to
confirm the regression is fixed.

This partially reverts

    commit 23c86c74cc
    Author:  Chad Versace <chadversary@chromium.org>
    Subject: egl: Emit error when EGLSurface is lost

The problem with commit 23c86c74 is that, once an EGLSurface became
lost, the app could never unbind the bad surface. Each attempt to unbind
the bad surface with eglMakeCurrent failed with EGL_BAD_CURRENT_SURFACE.

Specificaly, the bad commit added the error handling below. #2 and #3
were right, but #1 was wrong.

    1. eglMakeCurrent emits EGL_BAD_CURRENT_SURFACE if the calling
       thread has unflushed commands and either previous surface is no
       longer valid.

    2. eglMakeCurrent emits EGL_BAD_NATIVE_WINDOW if either new surface
       is no longer valid.

    3. eglSwapBuffers emits EGL_BAD_NATIVE_WINDOW if the swapped surface
       is no longer valid.

Whe I wrote the bad commit, I misunderstood the EGL spec language
for #1. The correct behavior is, if I understand correctly now, is
below. This patch doesn't implement the correct behavior, though, it
just reverts the broken behavior.

    - Assume a bound EGLSurface is no longer valid.
    - Assume the bound EGLContext has unflushed commands.
    - The app calls eglMakeCurrent. The spec requires eglMakeCurrent to
      implicitly flush. After flushing, eglMakeCurrent emits
      EGL_BAD_CURRENT_SURFACE and does *not* alter the thread's
      current bindings.
    - If the app calls eglMakeCurrent again, and the app inserts no
      commands into the GL command stream between the two eglMakeCurrent
      calls, then this second eglMakeCurrent succeeds without emitting an
      error.

How to confirm this fixes the regression:

    Download android-cts-verifier-7.1_r5-linux_x86-x86.zip from
    source.android.com, unpack, and `adb install CtsVerifier.apk`.
    Run test "Projection Cube". Click the Pass button (a
    green checkmark). Then run test "Projection Widget". Confirm that
    widgets are visible and that logcat does not complain about
    eglMakeCurrent failure.

    Then confirm there are no regressions in the cts-traded module that
    commit 263243b1 fixed:

        cts-tf > run cts --skip-preconditions --skip-device-info \
                 -m CtsCameraTestCases \
                 -t android.hardware.camera2.cts.RobustnessTest

    Tested with Chrome OS board "reef".

Fixes: 23c86c74 (egl: Emit error when EGLSurface is lost)
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 8f62d21bd7)
2017-05-19 23:14:12 +01:00
Rob Clark
6c5bcc6473 freedreno: fix crash when flush() but no rendering
If we haven't created a batch, just bail in pipe->flush(), since there
is nothing to do.

Fixes crash in warsow, which creates a whole bunch of contexts used for
nothing but texture uploads.

Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit e4ad86952a)
2017-05-18 18:45:20 +01:00
Daniel Stone
15338b0d19 gbm/dri: Fix sign-extension in modifier query
When we were assembling the unsigned 64-bit query return from its
two signed 32-bit component parts, the lower half was getting
sign-extended into the top half. Be more explicit about what we want to
do.

Fixes gbm_bo_get_modifier() returning ((1 << 64) - 1) rather than
((1 << 56) - 1), i.e. DRM_FORMAT_MOD_INVALID.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
(cherry picked from commit 80ac89a952)
Fixes: 8378c576ab ("gbm: Export a get modifiers")
2017-05-18 18:24:06 +01:00
Andres Gomez
700dcb9ab4 bin/get-fixes-pick-list.sh: bring back the warning
We warn again if there are more than one line with the "fixes:" tag.

The warning is silenced when the commit has already landed or each
fixes tag reference a commit that is in branch.

v2:
 - Warn if any of the fixes tags has not landed (Emil)

v3:
 - Remove unnecessary head command
 - Clarify commit message (Emil)
 - Skip already picked commits sooner (Emil)

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b7af0ddfef)
2017-05-18 18:03:30 +01:00
Andres Gomez
81bdf59610 bin/get-fixes-pick-list.sh: don't warn if more than one, go over them
If an identified commit was having more than one fix, we would warn
about that and only treat the first.

Now, we don't warn but treat all of them.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 77306e2afc)
2017-05-18 18:03:14 +01:00
Juan A. Suarez Romero
68e64d92bc bin/get-{extra,fixes}-pick-list.sh: improve output
Show the commit hash and the title in a way that it is easier to copy
and paste in the bin/.cherry-ignore-extra file if we want to ignore
those commits for the future.

v2:
- Use printf instead echo (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 3af7f8275b)
2017-05-18 18:03:06 +01:00
Juan A. Suarez Romero
364048cf42 bin/get-{extra,fixes}-pick-list.sh: add support for ignore list
Both scripts does not use a file with the commits to ignore. So if we
have handled one of the suggested commits and decided we won't pick it,
the scripts will continue suggesting them.

v2:
- Mark the candidates in bin/get-extra-pick-list.sh (Juan A. Suarez)
- Use bin/.cherry-ignore to store rejected patches (Emil)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 99b41631bb)
2017-05-18 18:02:56 +01:00
Nicolai Hähnle
72a8fd8d50 st/mesa: remove an incorrect assertion
There is really no reason why the current DrawBuffer needs to be complete
at this point. In particular, the assertion gets hit on the X server side
in libglx when running .../piglit/bin/glx-get-current-display-ext -auto
(which uses indirect GLX rendering).

Fixes: 19b61799e3 ("st/mesa: don't cast the incomplete framebufer to st_framebuffer")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 377877ff5f)
2017-05-18 18:01:53 +01:00
Chih-Wei Huang
5cace16ac6 Android: correct libz dependency
Commit 6facb0c0 ("android: fix libz dynamic library dependencies")
unconditionally adds libz as a dependency to all shared libraries.
That is unnecessary.

Commit 85a9b1b5 introduced libz as a dependency to libmesa_util.
So only the shared libraries that use libmesa_util need libz.

Fix Android Lollipop build by adding the include path of zlib to
libmesa_util explicitly instead of getting the path implicitly
from zlib since it doesn't export the include path in Lollipop.

Fixes: 6facb0c0 "android: fix libz dynamic library dependencies"

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Herring <robh@kernel.org>
(cherry picked from commit bfc0c23843)
2017-05-18 18:01:44 +01:00
Emil Velikov
d6439cb297 configure: remove unneeded bits around libunwind handling
If libunwind is not found we'll fail at PKG_CHECK_MODULES, so the
follow-up check will be false. Additionally the AM_CONDITIONAL is not
used, so we can drop it.

Fixes: 3bcef6aa24 ("configure.ac: honour --disable-libunwind if the .pc file is present")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 709468a808)
2017-05-18 18:01:10 +01:00
Emil Velikov
0e4c34b347 radeon: automake: remove unneeded elf Cflags/Libs
No longer required as of commit d90bf4ef3e ("radeon: remove unused
radeon_elf_util.{c,h}")

v2: Add the required libelf link in src/amd/Makefile.common.am

Fixes: d90bf4ef3e ("radeon: remove unused  radeon_elf_util.{c,h}")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
(cherry picked from commit 88b8aaea3b)
2017-05-18 18:00:59 +01:00
Grazvydas Ignotas
f56fff79e7 anv: don't leak DRM devices
After successful drmGetDevices2() call, drmFreeDevices() needs to be
called.

Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # radv version
(cherry picked from commit 0ef302638f)
2017-05-18 18:00:41 +01:00
Grazvydas Ignotas
70cbcb2d39 anv: fix possible stack corruption
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.

Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit e0aee8b667)
2017-05-18 18:00:33 +01:00
Samuel Iglesias Gonsálvez
5755f40874 i965/vec4: load dvec3/4 uniforms first in the push constant buffer
Reorder the uniforms to load first the dvec4-aligned variables in the
push constant buffer and then push the vec4-aligned ones. It takes
into account that the relocated uniforms should be aligned to their
channel size.

This fixes a bug were the dvec3/4 might be loaded one part on a GRF and
the rest in next GRF, so the region parameters to read that could break
the HW rules.

v2:
- Fix broken logic.
- Add a comment to explain what should be needed to optimise the usage
  of the push constant buffer slots, as this patch does not pack the
  uniforms.

v3:
- Implemented the push constant buffer usage optimization.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit e69e5c7006)
2017-05-18 17:57:52 +01:00
Samuel Iglesias Gonsálvez
085efa0261 i965/vec4: fix swizzle and writemask when loading an uniform with constant offset
It was setting XYWZ swizzle and writemask to all uniforms, no matter if they
were a vector or scalar, so this can lead to problems when loading them
to the push constant buffer.

Moreover, 'shift' calculation was designed to calculate the offset in
DWORDS, but it doesn't take into account DFs, so the calculated swizzle
for the later ones was wrong.

The indirect case is not changed because MOV INDIRECT will write
to all components. Added an assert to verify that these uniforms
are aligned.

v2:
- Fix 'shift' calculation (Curro)
- Set both swizzle and writemask.
- Add assert(shift == 0) for the indirect case.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 8aa6ada838)
2017-05-18 17:57:52 +01:00
Samuel Iglesias Gonsálvez
2401051b16 i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution
We are going to add a packing feature to reduce the usage of the push
constant buffer. One of the consequences is that 'nr_params' would be
modified by vec4_visitor's run call, so we need to restore it if one of
them failed before executing the fallback ones. Same thing happens to the
uniforms values that would be reordered afterwards.

Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when
the dvec4 alignment and packing patch is applied.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 354f7f2cb9)
2017-05-18 17:57:52 +01:00
Eric Anholt
4b14ad64d0 vc4: Don't allocate new BOs to avoid synchronization when they're shared.
If X11 did a software fallback to the entire screen, we would throw out
the BO the screen is scanning out from and allocate a new one.

Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit e8ea42d245)
2017-05-18 17:57:52 +01:00
Hans de Goede
bcd09ef32e glxglvnddispatch: Add missing dispatch for GetDriverConfig
Together with some fixes to xdriinfo this fixes xdriinfo not working
with glvnd.

Since apps (xdriinfo) expect GetDriverConfig to work without going to
need through the dance to setup a glxcontext (which is a reasonable
expectation IMHO), the dispatch for this ends up significantly different
then any other dispatch function.

This patch gets the job done, but I'm not really happy with how this
patch turned out, suggestions for a better fix are welcome.

Cc: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 84f764a759)
2017-05-18 17:57:52 +01:00
Pohjolainen, Topi
6123a076d0 intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4
The reasoning Chad gave in the comment for choosing a valign of 4 is
entirely bunk.  The fact that you have to multiply pitch by 2 is
completely unrelated to the halign/valign parameters used for texture
layout.  (Not completely unrelated.  W-tiling is just Y-tiling with a
bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled
chunks so it makes everything easier if miplevels are always aligned to
8x8.)  The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet
doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you
can't do stencil texturing anyway.

v2 (Jason Ekstrand):
 - Delete most of Chad's comment and add a more descriptive commit
   message.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 236f17a9f7)
2017-05-18 17:57:52 +01:00
Rob Clark
691d42700b mesa/st: fix yuv EGLImage's
Don't reject YUV formats that the driver doesn't handle natively, since
mesa/st already knows how to lower this in shader.

Reported-by: Nicolas Dechesne <ndec@linaro.org>
Fixes: 83e9de2 ("st/mesa: EGLImageTarget* error handling")
Cc: 17.1 <mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@gmail.com>
Tested-by: Nicolas Dechesne <ndec@linaro.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 8b4588b090)
2017-05-18 17:57:52 +01:00
Lucas Stach
f36cd7fc57 etnaviv: allow R/B swapped surfaces to be cleared
Fixes: 7f62ffb68a ("etnaviv: add support for rb swap")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit 20ce6f1361)
2017-05-18 17:57:52 +01:00
Lucas Stach
c006ced2c9 etnaviv: stop oversizing buffer resources
PIPE_BUFFER is a target enum, not a binding. This caused the driver to
up-align the height of buffer resources, leading to largely oversizing
those resources. This is especially bad, as the buffer resources used
by the upload manager are already 1MB in size. Height alignment meant
that those would result in 4 to 8MB big BOs.

Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit 8173d7d9e8)
2017-05-18 17:57:51 +01:00
Nicolai Hähnle
5c9b59de8b radeonsi: fix gl_PrimitiveIDIn in geometry shader when using tessellation
This builds on commit 0549ea15ec ("radeonsi: fix primitive ID in
fragment shader when using tessellation").

Fixes piglit
arb_tessellation_shader/execution/gs-primitiveid-instanced.shader_test

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f4dbe2efb7)
2017-05-18 17:57:51 +01:00
Marek Olšák
b7d7458b9a radeonsi/gfx9: add support for Raven
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 7622181cad)
2017-05-18 17:57:51 +01:00
Marek Olšák
e39c07dbdf amd/addrlib: import Raven support
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit efdb378c36)
2017-05-18 17:57:51 +01:00
Eric Anholt
3b9b7a1342 renderonly: Initialize fields of struct winsys_handle.
vc4 was rejecting renderonly's import, because the offset field was
nonzero.

Fixes: 848b49b288 ("gallium: add renderonly library")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit c98f03c6eb)
2017-05-18 17:57:51 +01:00
Alex Deucher
054a27c508 radeonsi: add new vega10 pci ids
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2f0450c627)
2017-05-18 17:57:51 +01:00
Bruce Cherniak
1b3704c22d swr: move msaa resolve to generalized StoreTile
v3: list piglit tests fixed by this patch. Fixed typo Tim pointed out.
v2: Reword commit message to more closely adhere to community
guidelines.

This patch moves msaa resolve down into core/StoreTiles where the
surface format conversion routines are available.  The previous
"experimental" resolve was limited to 8-bit unsigned render targets.

This fixes a number of piglit msaa tests by adding resolve support for
all the render target formats we support.

Specifically:
layered-rendering/gl-layer-render: fail->pass
layered-rendering/gl-layer-render-storage: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg: crash->pass
multisample-formats *[2,4,8,16] gl_ext_texture_snorm: crash->pass
multisample-formats *[2,4,8,16] gl_arb_texture_float: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg-float: fail->pass

MSAA is still disabled by default, but can be enabled with
"export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
The default is 0, which is disabled.

This patch improves the number of multisample-formats supported by swr,
and fixes several crashes currently in the 17.1 branch.  Therefore, it
should be considered for inclusion in the 17.1 stable release.  Being
disabled by default, it poses no risk to most users of swr.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit f52e63069a)
2017-05-18 17:57:51 +01:00
Nicolai Hähnle
ad3b5b7f5f radeonsi: fix gl_PrimitiveID in tessellation with instanced draws on SI
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f16b755863)
2017-05-18 17:57:51 +01:00
Nicolai Hähnle
35ec26bb00 radeonsi: fix primitive ID in fragment shader when using tessellation
In a VS->TCS->TES->PS pipeline, the primitive ID is read from TES exports,
so it is as if TES were using the primitive ID.

Specifically, this fixes a bug where the primitive ID is not reset at
the start of a new instance.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 0549ea15ec)
2017-05-18 17:41:09 +01:00
Nicolai Hähnle
424bf46f27 radeonsi: mark fast-cleared textures as compressed when dirtying
There are a bunch of piglit fast clear tests that regressed on SI, for
example ./bin/ext_framebuffer_multisample-fast-clear single-sample.

The problem is that a texture is bound as a framebuffer, cleared, and
then rendered from in a loop that loops through different clear colors.
The texture is never rebound during all this, so the change to
tex->dirty_level_mask during fast clear was not taken into account
when checking for compressed textures.

I have considered simply reverting the problematic commit. However,
I think this solution is better. It does require looping through all
bound textures after a fast clear, but the alternative would require
visiting more textures needless on every draw. Draws are much more
common than clears.

Note that the rendering feedback loop rules do not apply here, because
the framebuffer binding is changed between the glClear and the draw
that samples from the texture that was cleared.

Fixes: bdd6449769 ("radeonsi: don't mark non-dirty textures with CMASK as compressed")
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 854ed47f3e)
2017-05-18 17:41:09 +01:00
Emil Velikov
806f802e7b docs: add sha256 checksums for 17.1.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-10 15:20:37 +01:00
Emil Velikov
15a38605fc docs: Update 17.1.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-10 12:11:03 +01:00
Emil Velikov
0831cc7c2f Update version to 17.1.0(final)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-10 12:02:52 +01:00
Dave Airlie
43678114c7 radv: don't advertise transfer props unless we can do anything else
There is no reason to advertise transfer ability for formats we can't
use for anything else. This stops some CTS tests hitting internal
error for 64-bit types when they see the transfer flags.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit efa19f5a54)
2017-05-10 11:29:28 +01:00
Dave Airlie
072b1f5270 radv/ac: canonicalize the output for 32-bit float min/max.
This fixes:
dEQP-VK.glsl.builtin.precision.min.*
dEQP-VK.glsl.builtin.precision.max.*
dEQP-VK.glsl.builtin.precision.clamp.*

The problem is the hw doesn't compare denorms properly,
so we have to flush them, even though the spec says
flushing is optional, if you don't flush the results
should be correct.

The -pro driver changes the shader float mode,
it would be nice if llvm could grow that perhaps.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3bf3f9866c)
2017-05-10 11:28:45 +01:00
Dave Airlie
bd79ce4356 radv: flush f32->f16 conversion denormals to zero. (v2)
SPIR-V defines the f32->f16 operation as flushing denormals to 0,
this compares the class using amd class opcode.

Thanks to Matt Arsenault for figuring it out.

This fix is VI+ only, add a TODO for SI/CIK.

This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 83e58b036e)
2017-05-10 11:28:07 +01:00
Dave Airlie
0640bae86c radv: flush more stages when semaphore are waiting.
This still doesn't give us complete pWaitDstStageMask support,
but it should provide enough to be correct if not as efficent as
possible.

If we have wait semaphores we must flush between submits and
flush the shaders as well.

This fixes the remaining fails in:
dEQP-VK.synchronization.op.single_queue.semaphore.*ssbo*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a524704025)
2017-05-10 11:26:38 +01:00
Dave Airlie
9b808c5748 radv: fix stencil only clears.
If we are clearing stencil only, we still need to provide a
a valid Z output from the vertex shader, we can't rely
on the depth clear value having any meaning, as we use this
for the position output, and it could get clipped, so we
don't end up clearing anything.

Fixes:
dEQP-VK.renderpass.simple.stencil
since I added S8 support.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3c73063974)
2017-05-10 11:26:17 +01:00
Dave Airlie
9105e36765 radv/wsi: report presentation error per image request
This ports
0fcb92c17d
anv: wsi: report presentation error per image request

This fixes:
dEQP-VK.wsi.xlib.incremental_present.scale_none.*

Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 09034aab64)
2017-05-10 11:25:33 +01:00
Daniel Stone
e1678159b1 i915: Fix build break with empty unreachable()
Actually put something in unreachable(), so as not to break the build on
a Friday evening.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit d4342b1398)
Fixes: 467332a0ab ("i965: Use helper function for modifier -> tiling")
Fixes: 8b8af19065 ("i965: Set modifier for imported and duplicated images")
2017-05-10 11:17:05 +01:00
Emil Velikov
da13cc7e4b Update version to 17.1.0-rc4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-08 11:40:34 +01:00
Dave Airlie
3db01cd4e7 radv: apply the tess+GS hang workaround to Polaris12 as well
As I pointed out for radeonsi, and AMD confirmed, so fix this
in radv as well.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2add79a732)
2017-05-08 11:37:10 +01:00
Fredrik Höglund
396f9ae52f radv/meta: fix restoring a push descriptor set
radv_bind_descriptor_set cannot be used to bind a push descriptor set
since a push descriptor set does not have a buffer list. However,
there is no need to add the buffers again when restoring a set, so
this fix is also an optimization.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 5ff4858111)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/vulkan/radv_meta.c
2017-05-08 11:36:26 +01:00
Kenneth Graunke
0eaab97f21 i965: Don't try to unmap NULL program cache BO.
When running shader-db with intel_stub and recent Mesa, context creation
fails when making a logical hardware context.  In this case, we call
intelDestroyContext(), which gets here and tries to unmap the cache BO.

But there isn't one - we haven't made it yet.  So we try to unmap a
NULL pointer, which used to be safe (it did nothing), but crashes
after commit 7c3b8ed878.

The result is that we crash rather than failing context creation with
a nice message.  Either way nothing works, but this is more polite.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit bc074a4518)
2017-05-08 11:33:00 +01:00
Kenneth Graunke
b0394dfe2f Revert "mesa: Require mipmap completeness for glCopyImageSubData(), sometimes."
This reverts commit c5bf7cb529.

This broke rendering in "Total War: WARHAMMER", which uses a single
level RGBA_UINT32 texture and the default filter modes of GL_LINEAR
and GL_NEAREST_MIPMAP_LINEAR.  However, the texture max level is 0,
so it is actually mipmap complete - it's the integer + linear rule
that causes the error.

I'm working with Khronos to find a real solution.  However it turns
out, this patch is not correct and breaks real programs, so let's
revert it for now.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100690
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16224
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1456da91c8)
2017-05-08 11:32:57 +01:00
Rob Clark
639481e340 freedreno/a3xx: fix hang w/ large render targets and small gmem
Possibly other gen's have a similar limit.  Fixes glmark2 -b shadow
with larger resolutions on devices with small gmem (for example,
fullscreen 1080p on 8x16/db410c).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 6050d5bf3d)
2017-05-08 11:32:49 +01:00
Daniel Stone
ee0254a12f i965: Set modifier for imported and duplicated images
When a buffer is being created from FD or GEM flink import, the current
API makes no provision for passing modifier information along with this.
Set the modifier for such images to DRM_FORMAT_MOD_INVALID.

Also preserve the modifier when duplicating an image, as will be done by
GBM when importing from a wl_buffer.

This doubly tripped up Wayland, as the images would first have been
created (as wl_buffers) with a 0 modifier, and then lost what modifier
they would've had when being duplicated into gbm_bos.

Fixes: d78a36ea62 ("i965/dri: Handle the linear fb modifier")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 8b8af19065)
2017-05-08 11:24:05 +01:00
Daniel Stone
5b7cc779d2 i965: Use helper function for modifier -> tiling
Use a helper function and struct to convert between a modifier and
tiling mode, so we can use it later for a tiling -> modifier lookup.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 467332a0ab)
2017-05-08 11:24:05 +01:00
Grazvydas Ignotas
ae9a2c1fc4 radv: don't leak DRM devices
After successful drmGetDevices2() call, drmFreeDevices() needs to be called.

Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 8aab792e92)
2017-05-08 11:24:05 +01:00
Grazvydas Ignotas
64b98a1e72 radv: fix possible stack corruption
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.

Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 898cbb491b)
2017-05-08 11:24:05 +01:00
Bas Nieuwenhuizen
7bbece985e radv: Don't set dynamic state for pipelines with rasterizer dicard.
All of the dynamic states apply to rasterization & fragment processing,
so we don't need to set them if we don't rasterize.

We don't clear the dirty flags for them though, so we don't miss any
updates for the next pipeline with rasterization.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
(cherry picked from commit 9e847eedd5)
2017-05-08 11:24:05 +01:00
Marek Olšák
929ae9581c radeonsi: apply the tess+GS hang workaround to Polaris12 as well
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit ee5908396e)
2017-05-08 11:24:05 +01:00
Dave Airlie
d4c08bc8c1 radv: enable POLARIS12 support.
This just adds the chip in the right places.

We don't set the partial_vs_wave workaround, as radeonsi
doesn't, but have to confirm it's not required.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a096d8d3f7)
2017-05-08 11:24:04 +01:00
Nicolas Boichat
cd284ce928 egl/android: Set EGLSurface.Lost to EGL_TRUE/EGL_FALSE
Lost is an EGLBoolean, so we should assign it to EGL_TRUE/EGL_FALSE,
not true/false.

Fixes: e5eace5868 ("egl/android: Mark surface as lost when dequeueBuffer fails")
Fixes: 0212db3504 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 63b12b0c77)
2017-05-08 11:24:04 +01:00
Chad Versace
e972294b8e egl/android: Mark surface as lost when dequeueBuffer fails
This ensures that future calls to eglSwapBuffers and eglMakeCurrent emit
an error.

This patch is part of a series for fixing
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface
on Chrome OS x86 devices.

Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e5eace5868)
2017-05-08 11:24:04 +01:00
Chad Versace
5ef17d6854 egl/android: Cancel any outstanding ANativeBuffer in surface destructor
That is, call ANativeWindow::cancelBuffer in droid_destroy_surface().

This should prevent application deadlock when the app destroys the
EGLSurface after EGL has acquired a buffer from SurfaceFlinger
(ANativeWindow::dequeueBuffer) but before EGL has released it
(ANativeWindow::enqueueBuffer).

This patch is part of a series for fixing
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface
on Chrome OS x86 devices.

Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 0212db3504)
2017-05-08 11:24:04 +01:00
Chad Versace
1a23aff6b7 egl: Emit error when EGLSurface is lost
Add a new bool, _EGLSurface::Lost, and check it in eglMakeCurrent and
eglSwapBuffers. The EGL 1.5 spec says that those functions emit errors
when the native surface is no longer valid.

This patch just updates core EGL. No driver sets _EGLSurface::Lost yet.

I discovered that Mesa failed to detect lost surfaces while debugging an
Android CTS camera test,
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface.
This patch doesn't fix the test though, though, because the test expects
EGL_BAD_SURFACE when the surface becomes lost, and this patch actually
complies with the EGL spec. If I interpreted the EGL spec correctly,
EGL_BAD_NATIVE_WINDOW or EGL_BAD_CURRENT_SURFACE is the correct error.

Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 23c86c74cc)
2017-05-08 11:24:04 +01:00
Marek Olšák
9a226fa669 winsys/amdgpu: fix Polaris12 (RX 550) breakage
reported by Greg White.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100892
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 69e6eab653)
2017-05-08 11:24:04 +01:00
Marek Olšák
41dfe1f275 radeonsi/gfx9: make some PA & DB registers match the closed Vulkan driver
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 283a1d1e27)
2017-05-08 11:24:04 +01:00
Emil Velikov
ba6cb5d97a glx: glX_proto_send.py: use correct compile guard GLX_INDIRECT_RENDERING
The code itself has nothing to do with shared glapi, thus having it
behind GLX_SHARED_GLAPI is misleading. Use GLX_INDIRECT_RENDERING
instead.

The latter macro is set at global scope by the Autotools and Scons build
systems.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 6177d60a37)
2017-05-08 11:24:04 +01:00
Emil Velikov
7e4b3aec9f mesa/dri: always link against shared glapi
Analogous to previous commit. Check with the extensive commit
description and bug report referenced.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 51accecce7)
2017-05-08 11:24:04 +01:00
Emil Velikov
124e7b3bc8 gallium/dri: always link against shared glapi
In the early days of Xorg and Mesa we had multiple providers of the
GLAPI. All of those were the ones responsible for dlopening the DRI
module. Hence it was perfectly fine, and actually expected, for the DRI
modules to have unresolved symbols.

Since then we've moved the API to a separate shared library and no other
libraries provide the symbols.

Here comes the picky part:
It's possible that one uses old Xorg (where libglx.so provides the
GLAPI) and new Mesa (with DRI modules linking against libglapi.so).

That should still work, since the the libglx.so symbols will take
precedence over the libglapi.so ones.

I've verified this while running 1.14 series Xorg alongside this (and
next) patch.

It may seem a bit fragile, but that's of reasonably OK since all of the
affected Xorg versions have been EOL for years.

The final one being the 1.14 series, which saw its final bug fix release
1.14.7 in June 2014.

To ensure that the binaries do not have unresolved symbols add
-no-undefined and $(LD_NO_UNDEFINED), just like we do everywhere else
throughout mesa.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98428
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 79a26b663a)
2017-05-08 11:24:04 +01:00
Ben Boeckel
5c43c3fc73 scons: update for LLVM 4.0
LLVMDemangle, LLVMGlobalISel, and LLVMDebugInfoMSF are new.

Also update the comment to add irreader to the list of components.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 58f51f0754)
2017-05-08 11:24:03 +01:00
Adam Jackson
0bd957be11 egl/platform/drm: Don't take display ownership until gbm is initialized
If the gbm_create_device() call here actually did fail, any subsequent
eglTerminate on the display would segfault.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit f258815c7d)
2017-05-08 11:24:03 +01:00
Samuel Iglesias Gonsálvez
4ad2c57c26 anv: anv_gem_mmap() returns MAP_FAILED as mapping error
Take it into account when checking if the mapping failed.

v2:
- Remove map == NULL and its related comment (Emil)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

Fixes: 6f3e3c715a ("vk/allocator: Add a BO pool")
Fixes: 9919a2d34d ("anv/image: Memset hiz surfaces to 0 when binding memory")
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b546c9d318)

Squashed with:

anv: vkBindImageMemory() should return VK_ERROR_OUT_OF_{HOST,DEVICE}_MEMORY on failure

According to the spec we get VK_ERROR_OUT_OF_HOST_MEMORY or
VK_ERROR_OUT_OF_DEVICE_MEMORY on vkBindImageMemory failure.

Fixes returned value changed by b546c9d.

Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 939b015736)

Squashed with:

anv: fix anv_gem_mmap comment to not mention NULL

The function cannot return NULL, update the comment accordingly.

Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
(cherry picked from commit 9d2aa6e506)
2017-05-08 11:23:40 +01:00
Marek Olšák
14bbc51e6d radeonsi/gfx9: fix gl_ViewportIndex
v2: remove unnecessary LLVMBuildAnd calls

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit f466683cb0)
2017-05-05 19:35:08 +01:00
Christian Gmeiner
caa6baa688 etnaviv: add L8A8_UNORM texture format
No piglit regressions.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
(cherry picked from commit a8007ed687)
2017-05-05 19:35:08 +01:00
Samuel Iglesias Gonsálvez
56922e8f33 i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions
The regioning parameters are now properly set by convert_to_hw_regs()
and we don't need to fix them in the generator. That latter fix
previously done in the generator was strictly speaking wrong for any
non-identity regions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit f57e234fdd)
2017-05-05 19:35:08 +01:00
Samuel Iglesias Gonsálvez
6c96e750f1 i965/vec4: fix register width for DF VGRF and UNIFORM
On gen7, the swizzles used in DF align16 instructions works for element
size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that
in the rest of the code and prepare the instructions for this (scalarize_df()),
we need to set it to two again.

However, for DF align1 instructions, a width of 2 is wrong as we are not
reading the data we want. For example, an uniform would have a region of
<0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access
to the first 4.

This patch sets the default one to 4 and then modifies the width of
align16 instruction's DF sources when we translate the logical swizzle
to the physical one.

v2:
- Remove conditional (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit aaeb1c99be)
2017-05-05 19:35:08 +01:00
Samuel Iglesias Gonsálvez
ca357be5aa i965/vec4: fix vertical stride to avoid breaking region parameter rule
From IVB PRM, vol4, part3, "General Restrictions on Regioning
Parameters":

  "If ExecSize = Width and HorzStride ≠ 0, VertStride must
   be set to Width * HorzStride."

In next patch, we are going to modify the region parameter for
uniforms and vgrf. For uniforms that are the source of
DF align1 instructions, they will have <0, 4, 1> regioning and
the execsize for those instructions will be 4, so they will break
the regioning rule. This will be the same for VGRF sources where
we use the vstride == 0 exploit.

As we know we are not going to cross the GRF boundary with that
execsize and parameters (not even with the exploit), we just fix
the vstride here.

v2:
- Move is_align1_df() (Curro)
- Refactor exec_size == width calculation (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 7f728bce81)
2017-05-05 19:35:08 +01:00
Philipp Zabel
e702379663 renderonly: use drmIoctl
To restart interrupted system calls, use drmIoctl.

Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit b539335e50)
2017-05-05 19:35:08 +01:00
Philipp Zabel
ff27a47807 renderonly: drop resources on destroy
The renderonly_scanout holds a reference on its prime pipe resource,
which should be released when it is destroyed. If it was created by
renderonly_create_kms_dumb_buffer_for_resource, the dumb BO also has
to be destroyed.

Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit cd8ee259c8)
2017-05-05 19:35:08 +01:00
Philipp Zabel
63d75fbfe3 renderonly: close transfer prime_fd
prime_fd is only used to transfer the scanout buffer to the GPU inside
renderonly_create_kms_dumb_buffer_for_resource. It should be closed
immediately to avoid leaking the DMA-BUF file handle.

Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit ab51cd2f26)
2017-05-05 19:35:08 +01:00
Eric Anholt
cabca7185b nir: Pick just the channels we want for bitmap and drawpixels lowering.
NIR now validates that SSA references use the same number of channels as
are in the SSA value.

v2: Reword commit message, since the commit didn't land before the
    validation change did.

Fixes: 370d68babc ("nir/validate: Validate that bit sizes and components always match")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit fba6559a1e)
2017-05-05 19:35:08 +01:00
Randy Xu
8020ce02fc i965: Solve Android native fence fd double close
The Android native fence in i965 has two fds: _EGLSync::SyncFd and
brw_fence::sync_fd.

The semantics of __DRI2fenceExtensionRec::create_fence_fd are unclear on
whether the DRI driver takes ownership of the incoming fd (which is the
same incoming fd from eglCreateSync).  i965 did take ownership, but all
other Mesa drivers do not; instead, they dup the incoming fd. As
a result, _EGLSync::SyncFd and brw_fence::sync_fd were the same fd, and
both egl_dri2 and i965 believed they owned it. On eglDestroySync, that
led to a double-close.

Fix the double-close by making brw_dri_create_fence_fd dup the incoming
fd, just like the other drivers do.

Signed-off-by: Randy Xu <randy.xu@intel.com>
Test: Run Vulkan and GLES stress test and no crash.
Fixes: 6403e37651 ("i965/sync: Implement fences based on Linux sync_file")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
[chadv: Polish the commit message]
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 6f21b5601c)
2017-05-05 19:35:08 +01:00
Eric Anholt
7b4b055a24 vc4: Only build the NEON code on arm32.
NEON is sufficiently different on arm64 that we can't just reuse this
code.  Disable it on arm64 for now.

v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build
    for a v8 CPU.

Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d884d1a654)
2017-05-05 19:35:08 +01:00
Emil Velikov
72e52fa7c8 Update version to 17.1.0-rc3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-30 09:46:32 +01:00
Emil Velikov
6ca6d53e1c travis: bump MAKEFLAGS to -j4
The instance should have 2 cores, yet bumping the jobs to 4 should give
us a minor speed improvement.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit b1d45c3366)
2017-04-30 09:46:32 +01:00
Emil Velikov
5f88ebaf5c travis: enable wayland support
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 27a0b383b9)
2017-04-30 09:46:32 +01:00
Emil Velikov
f068a360cd travis: add Gallium state-tracker targets
Split into OpenCL and others, since the former is quite time consuming.

v2:
 - explicitly enable/disable components
 - build libvdpau 1.1 requirement
 - enable st/vdpau
 - build libva 1.6.2 (API 0.38) requirement

v3: Drop ubuntu-toolchain-r-test from sources (Andres)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 0e6a36cd3f)
2017-04-30 09:46:32 +01:00
Emil Velikov
e7cafd09ba travis: model scons check target like the make one
Should make things a bit more consistent across the board.

Cc: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit b3f2076549)
2017-04-30 09:46:32 +01:00
Emil Velikov
f76068b879 travis: split the make target to three separate ones
Split the target to allow faster builds for each run.

The overall build time will be more, yet Travis runs multiple builds in
parallel so we're limited by the slowest one.

Things are split roughly as:
 - DRI loaders, classic DRI drivers, classic OSMesa, make check
 - All Gallium drivers (minus the SWR) alongside st/dri (mesa)
 - The Vulkan drivers - ANV and RADV, make check (anv)

v2:
 - rework RUN_CHECK to MAKE_CHECK_COMMAND
 - explicitly disable DRI loaders
 - generate linux/memfd.h locally and enable ANV
 - add libedit-dev

v3: Use printf to create the header (Andres).
v4: Really add the libedit + printf hunks.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 7e2af37474)
2017-04-30 09:46:32 +01:00
Emil Velikov
ef6da453f0 travis: add "make swr" to the build matrix
v2: Quote OVERRIDE variables.
v3: Add missplaced libedit-dev hunk (Andres).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 8479fd8a10)
2017-04-30 09:46:32 +01:00
Emil Velikov
5b4bff2ddb travis: add "scons swr" to the build matrix
Requires GCC 5.0 (due to the C++14 requirement) and LLVM 3.9.

v2: Enable the target, add libedit-dev, rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS, quote OVERRIDE
variables.
v4: Keep check target as-is (Andres)

Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit f55d98ac85)
2017-04-30 09:46:32 +01:00
Emil Velikov
667cb4bc9e travis: add separate "scons" and "scons llvm" targets
The former does not require any LLVM, while the latter uses LLVM 3.3.

This way we'll quickly catch any LLVM 3.3+ functionality that gets
introduced where it shouldn't.

Add the full list of addons for each build permutation.

v2: Keep libedit-dev, rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4:
 - Remove llvm-toolchain-trusty-3.3 source (Andres)
 - Keep check target as-is (Andres)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 85ee2c6cfc)
2017-04-30 09:46:32 +01:00
Emil Velikov
3f0b544745 travis: split out matrix from env
With next commits we'll add a couple of more options.

v2: Rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4: Keep check target as-is, will rework with later patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 56ba252e23)
2017-04-30 09:46:31 +01:00
Emil Velikov
27d4beb2e1 travis: rework "if test" blocks in the script section
Split the "if test" blocks so that we get more sensible output in case
of a failure.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit abcfea23ad)
2017-04-30 09:46:31 +01:00
Emil Velikov
5e8e015db3 travis: remove unused -dev packages
We effectively override libdrm-dev and libxcb-dri2-0-dev since we build
and install the package locally.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit ae713a7b79)
2017-04-30 09:46:31 +01:00
Emil Velikov
135615caa0 travis: automatically manage ccache caching
According to the manual

"If you are using ccache, use:

  language: c # or other C/C++ variants

  cache: ccache

to cache $HOME/.ccache and automatically add /usr/lib/ccache to your
$PATH."

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 6431b98c54)
2017-04-30 09:46:31 +01:00
Emil Velikov
2acd78cfab travis: enable apt cache
Provides a small, but consistent improvement.
Example numbers of the jobs added later in the series.

"make loaders/classic DRI" - 1s
"scons SWR" - 6s

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 486f28ba88)
2017-04-30 09:46:31 +01:00
Andres Gomez
642228ceaf travis: add the possibility of using the txc-dxtn library
The txc-dxtn library implements the patented S3 Texture Compression
algorithm.

By default it won't be used but we add the possibility of setting the
USE_TXC_DXTN variable to yes in the travis web UI so it will be
installed and used for the scons tests.

Cc: Eric Anholt <eric@anholt.net>
Cc: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov: keep the LIB prefix, drop the LD_LIBRARY_PATH, fold URL]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 29322daef2)
2017-04-30 09:46:31 +01:00
Andres Gomez
12e7ec2c05 travis: replace Trusty-based LLVM toolchain apt-get with apt addon
Trusty's LLVM toochain repository was whitelisted some time ago. See:
479067c5e7

Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov]
 - set sudo to false
 - reference the Trusty change (Rhys)
 - keep libedit-dev
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7819d265c7)
2017-04-30 09:46:31 +01:00
Emil Velikov
f2d91f2065 travis: explicitly LD_LIBRARY_PATH the local libraries
Some of the libraries may be dlopened, which may not always work due to
the non-standard prefix that we're using.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit cb820daa3f)
2017-04-30 09:46:31 +01:00
Jason Ekstrand
9a97d9081d anv/cmd_buffer: Use the device allocator for QueueSubmit
The command is really operating on a Queue not a command buffer and the
nearest object to that with an allocator is VkDevice.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>
(cherry picked from commit bd3a9813b9)
2017-04-30 09:46:31 +01:00
Ilia Mirkin
a720963140 gallium/targets: fix bool setting on BE architectures
val_bool and val_int are in a union. val_bool gets the first byte, which
happens to work on LE when setting via the int, but breaks on BE. By
setting the value properly, we are able to use DRI3 on BE architectures.
Tested by running glxgears with a NV34 in a G5 PPC.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: squash the vmwgfx hunk]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 6af14778a3)
2017-04-30 09:46:31 +01:00
Nicolai Hähnle
3f0740e87c st/mesa: don't cast the incomplete framebufer to st_framebuffer
The incomplete framebuffer is set for a surfaceless context. This leads to
the following error in piglit spec@egl_khr_surfaceless_context@viewport:

==26703==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7f6886e43240 at pc 0x7f68854db0fd bp 0x7ffca404b3b0 sp 0x7ffca404b3a0
READ of size 8 at 0x7f6886e43240 thread T0
    #0 0x7f68854db0fc in st_viewport ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57
    #1 0x556840176cdb in main tests/egl/spec/egl_khr_surfaceless_context/viewport.c:101
    #2 0x7f688edcf3f0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x203f0)
    #3 0x556840176e19 in _start (/home/nha/amd/piglit/bin/egl-surfaceless-context-viewport+0xe19)

0x7f6886e43240 is located 32 bytes to the left of global variable 'DummyRenderbuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:69:31' (0x7f6886e43260) of size 112
0x7f6886e43240 is located 8 bytes to the right of global variable 'IncompleteFramebuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:73:30' (0x7f6886e42de0) of size 1112
SUMMARY: AddressSanitizer: global-buffer-overflow ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57 in st_viewport

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek@olsak@amd.com>
(cherry picked from commit 19b61799e3)
2017-04-30 09:46:31 +01:00
Timothy Arceri
2609ac2b5a util/disk_cache: remove percentage based max cache limit
The more I think about it the more this seems like a bad idea.
When we were deleting old cache dirs this wasn't so bad as it
was unlikely we would ever hit the actual limit before things
were cleaned up. Now that we only start cleaning up old cache
items once the limit is reached the a percentage based max
cache limit is more risky.

For the inital release of shader cache I think its better to
stick to a more conservative cache limit, at least until we
have some way of cleaning up the cache more aggressively.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 22fa3d90a9)
2017-04-30 09:46:30 +01:00
Timothy Arceri
3597829605 disk_cache: use block size rather than file size
The majority of cache files are less than 1kb this resulted in us
greatly miscalculating the amount of disk space used by the cache.

Using the number of blocks allocated to the file is more
conservative and less likely to cause issues.

This change will result in cache sizes being miscalculated further
until old items added with the previous calculation have all been
removed. However I don't see anyway around that, the previous
patch should help limit that problem.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 4e1f3afea9)
2017-04-30 09:46:30 +01:00
Timothy Arceri
505e7cd232 disk_cache: reduce default cache size to 5% of filesystem
Modern disks are extremely large and are only going to get bigger.
Usage has shown frequent Mesa upgrades can result in the cache
growing very fast i.e. wasting a lot of disk space unnecessarily.

5% seems like a more reasonable default.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit ce41237151)
2017-04-30 09:46:30 +01:00
Jason Ekstrand
791f0fb429 anv: Don't place scratch buffers above the 32-bit boundary
This fixes rendering corruptions in DOOM.  Hopefully, it will also make
Jenkins a bit more stable as we've been seeing some random failures and
GPU hangs ever since turning on 48bit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620
Fixes: 651ec926fc "anv: Add support for 48-bit addresses"
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c43b4bc85e)
2017-04-30 09:46:30 +01:00
Marek Olšák
f1fe2b30b1 radeonsi: adjust ESGS ring buffer size computation on VI
Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 3f2a0649ab)
2017-04-30 09:46:30 +01:00
Marek Olšák
ee36cbe219 radeonsi/gfx9: don't set deprecated field PARTIAL_ES_WAVE_ON
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 80814819c2)
2017-04-30 09:46:30 +01:00
Marek Olšák
2cd07c39cc radeonsi/gfx9: set MAX_PRIMGRP_IN_WAVE in the correct register
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 60a20e6879)
2017-04-30 09:46:30 +01:00
Marek Olšák
76f046add3 radeonsi/gfx9: add a workaround for viewing a slice of 3D as a 2D image
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 8e8570a9e8)
2017-04-30 09:46:30 +01:00
Marek Olšák
328afc7e86 radeonsi/gfx9: fix 1D array shader images
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 482e6b07cc)
2017-04-30 09:46:30 +01:00
Marek Olšák
77345993ec radeonsi/gfx9: fix most things wrong with shader images
There are 2 major hw changes:
- The address must always point to the address of level 0. GFX9 tiling
  modes don't allow binding to a non-0 level.
- 3D must always be bound as 3D, because 2D and 3D use entirely different
  tiling modes, and the texture target determines which set of modes is
  used.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 5c94779585)
2017-04-30 09:46:30 +01:00
Marek Olšák
f2673a0f40 radeonsi/gfx9: fix texture buffer objects and image buffers with IDXEN==0
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 65e0c3fba7)
2017-04-30 09:46:30 +01:00
Francisco Jerez
b38423210e intel/fs: Take into account amount of data read in spilling cost heuristic.
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.

Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%.  In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 58324389be)
2017-04-30 09:46:29 +01:00
Francisco Jerez
ba6fd491a1 intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit ecc19e12dc)
2017-04-30 09:46:29 +01:00
Kenneth Graunke
36f6fc59cb i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().
opt_register_coalesce() was optimizing sequences such as:

   mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
   mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
   mov(8) m4.zw:F, vgrf5.xxxy:F

into:

   mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
   mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D

This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well.  Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy.  But the MACH instruction expects
.z to contain attr18.x * attr19.x.  Bogus results ensue.

No change in shader-db on Haswell.  Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 2faf227ec2)

Squashed with commit:

i965/vec4: Use reads_accumulator_implicitly(), not MACH checks.

Curro pointed out that I should not just check for MACH, but use
the reads_accumulator_implicitly() helper, which would also prevent
the same bug with MAC and SADA2 (if we ever decide to use them).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 6b10c37b9c)
2017-04-30 09:46:15 +01:00
Emil Velikov
2bf79cb2f1 Update version to 17.1.0-rc2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-24 15:23:07 +01:00
Emil Velikov
fb6379697b st/clover: add space between < and ::
As pointed out by compiler

./llvm/codegen.hpp:52:22: error: ‘<::’ cannot begin a template-argument list [-fpermissive]
./llvm/codegen.hpp:52:22: note: ‘<:’ is an alternate spelling for ‘[’. Insert whitespace between ‘<’ and ‘::’

Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
(cherry picked from commit dd6ec78b4f)
2017-04-24 13:27:37 +01:00
Vinson Lee
0948e113d2 configure.ac: Fix typos.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b81d85f175)
2017-04-24 13:14:07 +01:00
Timothy Arceri
f61c453cfc mesa: validate sampler type across the whole program
Currently we were only making sure types were the same within a
single stage. This looks to have regressed with 953a0af8e3.

Fixes: 953a0af8e3 ("mesa: validate sampler uniforms during gluniform calls")

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=97524
(cherry picked from commit d682f8aa8e)
2017-04-24 13:13:38 +01:00
Emil Velikov
612fc14aab mesa/glthread: correctly compare thread handles
As mentioned in the manual - comparing pthread_t handles via the C
comparison operator is incorrect and pthread_equal() should be used
instead.

Cc: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: d8d81fbc31 ("mesa: Add infrastructure for a worker thread to process GL commands.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 52df318d61)
2017-04-24 13:13:22 +01:00
Emil Velikov
8aa9aa6a5f st/mesa: automake: honour the vdpau header install location
If VDPAU is installed in the non-default location, we'll fail to find
the headers and error at build time.

../../src/gallium/include/state_tracker/vdpau_dmabuf.h:37:25: fatal error: vdpau/vdpau.h: No such file or directory
 #include <vdpau/vdpau.h>
                         ^

Fixes: faba96bc60 ("st/vdpau: add new interop interface")
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 51c0c213b7)
2017-04-24 13:13:07 +01:00
Emil Velikov
10ff4b49dc configure.ac: check require_basic_egl only if egl enabled
Fixes: 1ac40173c2 ("configure.ac: simplify EGL requirements for drivers dependent on EGL")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 4516bfbd30)
2017-04-24 13:12:59 +01:00
Emil Velikov
3d40db7892 configure.ac: manually expand PKG_CHECK_VAR
The macro is introduced with pkgconfig v0.28 which isn't universally
available. Thus it will error at configure stage.

Reported-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 179e21a720)
2017-04-24 13:12:51 +01:00
Rob Clark
99da9dfd95 util/queue: don't hang at exit
So atexit() is horrible and 4aea8fe7 is probably not a good idea.  But
add an extra layer of duct-tape to the problem.  Otherwise we hit a
situation where app using an atexit() handler that runs later than ours
doesn't hang when trying to tear down a context.

 (gdb) bt
 #0  util_queue_killall_and_wait (queue=queue@entry=0x52bc80) at ../../../src/util/u_queue.c:264
 #1  0x0000007fb6c380c0 in atexit_handler () at ../../../src/util/u_queue.c:51
 #2  0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
 #3  0x0000007fb7730e5c in exit () from /lib64/libc.so.6
 #4  0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
 #5  0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
 #6  0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
 #7  0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
 #8  0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
 #9  0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
 (gdb) c
 Continuing.
 [Thread 0x7fb67580c0 (LWP 3471) exited]
 ^C
 Thread 1 "drawbuffer-mode" received signal SIGINT, Interrupt.
 0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
 (gdb) bt
 #0  0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
 #1  0x0000007fb6c38304 in cnd_wait (mtx=0x5bdc90, cond=0x5bdcc0) at ../../../include/c11/threads_posix.h:159
 #2  util_queue_fence_wait (fence=0x5bdc90) at ../../../src/util/u_queue.c:106
 #3  0x0000007fb6daac70 in fd_batch_sync (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:233
 #4  batch_reset (batch=batch@entry=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:183
 #5  0x0000007fb6daa5e0 in batch_flush (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:290
 #6  fd_batch_flush (batch=0x5bdc70, sync=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:308
 #7  0x0000007fb6daba2c in fd_bc_flush (cache=0x461220, ctx=0x52b920) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch_cache.c:141
 #8  0x0000007fb6dac954 in fd_context_flush (pctx=0x52b920, fence=0x0, flags=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_context.c:54
 #9  0x0000007fb6b43294 in st_glFlush (ctx=<optimized out>) at ../../../src/mesa/state_tracker/st_cb_flush.c:121
 #10 0x0000007fb69a84e8 in _mesa_make_current (newCtx=newCtx@entry=0x0, drawBuffer=drawBuffer@entry=0x0, readBuffer=readBuffer@entry=0x0) at ../../../src/mesa/main/context.c:1654
 #11 0x0000007fb6b7ca58 in st_api_make_current (stapi=<optimized out>, stctxi=0x0, stdrawi=0x0, streadi=0x0) at ../../../src/mesa/state_tracker/st_manager.c:827
 #12 0x0000007fb6cc87e8 in dri_unbind_context (cPriv=<optimized out>) at ../../../../../src/gallium/state_trackers/dri/dri_context.c:217
 #13 0x0000007fb6cc80b0 in driUnbindContext (pcp=0x5271e0) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:591
 #14 0x0000007fb7d1da08 in MakeContextCurrent (dpy=0x433380, draw=0, read=0, gc_user=0x0) at ../../../src/glx/glxcurrent.c:214
 #15 0x0000007fb7a8d5e0 in glx_platform_make_current () from /lib64/libwaffle-1.so.0
 #16 0x0000007fb7a894e4 in waffle_make_current () from /lib64/libwaffle-1.so.0
 #17 0x0000007fb7ef8c60 in piglit_wfl_framework_teardown (wfl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_wfl_framework.c:628
 #18 0x0000007fb7ef939c in piglit_winsys_framework_teardown (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:238
 #19 0x0000007fb7ef9c30 in destroy (gl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:212
 #20 0x0000007fb7edb7c4 in destroy () at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:184
 #21 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
 #22 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
 #23 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
 #24 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
 #25 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
 #26 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
 #27 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
 #28 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
 (gdb) r

Fixes: 4aea8fe7 ("gallium/u_queue: fix random crashes when the app calls exit()")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 6fb7935ded)
2017-04-24 13:12:41 +01:00
Emil Velikov
fcbb263f8c configure.ac: print deprecation warning as needed
The warning should be printed only when one explicitly uses the
deprecated configure toggle.

Fixes: 7748c3f5eb ("configure.ac: deprecate --with-egl-platforms over
--with-platforms")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 9915753e63)
2017-04-24 13:12:26 +01:00
Marek Olšák
29fa5b6e1c st/mesa: invalidate the readpix cache in st_indirect_draw_vbo
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 7cd6e2df65)
2017-04-24 13:11:14 +01:00
Emil Velikov
4e7e903bb3 winsys/sw/dri: don't use GNU void pointer arithmetic
Resolves build issues like the following:

src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:31: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
        data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
                               ^
src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:62: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
        data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
                                                              ^

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 309f4067a7)
2017-04-24 13:11:14 +01:00
Nicolai Hähnle
26949e872b vbo: fix gl_DrawID handling in glMultiDrawArrays
Fixes a bug in
KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 51deba0eb3)
2017-04-24 13:11:14 +01:00
Nicolai Hähnle
2cc119c35a mesa: move glMultiDrawArrays to vbo and fix error handling
When any count[i] is negative, we must skip all draws.

Moving to vbo makes the subsequent change easier.

v2:
- provide the function in all contexts, including GLES
- adjust validation accordingly to include the xfb check
v3:
- fix mix-up of pre- and post-xfb prim count (Nils Wallménius)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 42d5465b9b)
2017-04-24 13:11:13 +01:00
Nicolai Hähnle
6abdbd8b10 mesa: extract need_xfb_remaining_prims_check
The same logic needs to be applied to glMultiDrawArrays.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 756e9ebbdd)
2017-04-24 13:11:13 +01:00
Nicolai Hähnle
24c05c57e4 mesa: fix remaining xfb prims check for GLES with multiple instances
Found by inspection.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit ea9a8940ca)
2017-04-24 13:11:13 +01:00
Nanley Chery
7ae90b4f65 anv/cmd_buffer: Disable CCS on BDW input attachments
The description under RENDER_SURFACE_STATE::RedClearColor says,

   For Sampling Engine Multisampled Surfaces and Render Targets:
    Specifies the clear value for the red channel.
   For Other Surfaces:
    This field is ignored.

This means that the sampler on BDW doesn't support CCS.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit d9d793696b)
2017-04-24 13:11:13 +01:00
Lionel Landwerlin
0f2ac6ded8 anv: blorp: flush memory after copy
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d71efbe5f2)
2017-04-24 13:11:13 +01:00
Rob Clark
bea2c4b88f freedreno: fix crash if ctx torn down with no rendering
In this case, ctx->flush_queue would not have been initialized.

Fixes: 0b613c20 ("freedreno: enable draw/batch reordering by default")
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit d4601b0efc)
2017-04-24 13:11:13 +01:00
Emil Velikov
ed846b4c78 Update version to 17.1.0-rc1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 14:51:04 +01:00
Emil Velikov
8c69adf9a9 Revert "docs: add 17.2.0-devel release notes template, bump version"
This reverts commit 47dd2544e1.

Should have landed in the master branch
2017-04-17 14:30:44 +01:00
Emil Velikov
47dd2544e1 docs: add 17.2.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 14:28:27 +01:00
1413 changed files with 46409 additions and 177682 deletions

View File

@@ -45,8 +45,6 @@ matrix:
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libxdamage-dev
- libxfixes-dev
- env:
# NOTE: Building SWR is 2x (yes two) times slower than all the other
# gallium drivers combined.
@@ -91,7 +89,7 @@ matrix:
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS="i915,nouveau,pl111,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,etnaviv,imx"
- GALLIUM_DRIVERS="i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,etnaviv,imx"
- VULKAN_DRIVERS=""
addons:
apt:

View File

@@ -37,18 +37,11 @@ LOCAL_CFLAGS += \
-Wno-missing-field-initializers \
-Wno-initializer-overrides \
-Wno-mismatched-tags \
-DVERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\"
# XXX: The following __STDC_*_MACROS defines should not be needed.
# It's likely due to a bug elsewhere, but let's temporarily add them
# here to fix the radeonsi build.
LOCAL_CFLAGS += \
-DANDROID_API_LEVEL=$(PLATFORM_SDK_VERSION) \
-DENABLE_SHADER_CACHE \
-D__STDC_CONSTANT_MACROS \
-D__STDC_LIMIT_MACROS \
-DHAVE___BUILTIN_EXPECT \
-DHAVE___BUILTIN_FFS \
-DHAVE___BUILTIN_FFSLL \
@@ -66,7 +59,6 @@ LOCAL_CFLAGS += \
-DHAVE_PTHREAD=1 \
-DHAVE_DLOPEN \
-DHAVE_DL_ITERATE_PHDR \
-DMAJOR_IN_SYSMACROS \
-fvisibility=hidden \
-Wno-sign-compare
@@ -89,10 +81,28 @@ LOCAL_CFLAGS += \
endif
endif
ifeq ($(MESA_ENABLE_LLVM),true)
ifeq ($(MESA_ANDROID_MAJOR_VERSION),5)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0305 -DMESA_LLVM_VERSION_PATCH=2
ELF_INCLUDES := external/elfutils/0.153/libelf
endif
ifeq ($(MESA_ANDROID_MAJOR_VERSION),6)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0
ELF_INCLUDES := external/elfutils/src/libelf
endif
ifeq ($(MESA_ANDROID_MAJOR_VERSION),7)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0
ELF_INCLUDES := external/elfutils/libelf
endif
endif
ifneq ($(LOCAL_IS_HOST_MODULE),true)
# add libdrm if there are hardware drivers
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SHARED_LIBRARIES += libdrm
endif
endif
LOCAL_CFLAGS_32 += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\"
LOCAL_CFLAGS_64 += -DDEFAULT_DRIVER_DIR=\"/system/lib64/$(MESA_DRI_MODULE_REL_PATH)\"

View File

@@ -24,7 +24,7 @@
# BOARD_GPU_DRIVERS should be defined. The valid values are
#
# classic drivers: i915 i965
# gallium drivers: swrast freedreno i915g nouveau pl111 r300g r600g radeonsi vc4 virgl vmwgfx
# gallium drivers: swrast freedreno i915g nouveau r300g r600g radeonsi vc4 virgl vmwgfx
#
# The main target is libGLES_mesa. For each classic driver enabled, a DRI
# module will also be built. DRI modules will be loaded by libGLES_mesa.
@@ -32,9 +32,6 @@
MESA_TOP := $(call my-dir)
MESA_ANDROID_MAJOR_VERSION := $(word 1, $(subst ., , $(PLATFORM_VERSION)))
ifneq ($(filter 2 4, $(MESA_ANDROID_MAJOR_VERSION)),)
$(error "Android 4.4 and earlier not supported")
endif
MESA_DRI_MODULE_REL_PATH := dri
MESA_DRI_MODULE_PATH := $(TARGET_OUT_SHARED_LIBRARIES)/$(MESA_DRI_MODULE_REL_PATH)
@@ -43,37 +40,19 @@ MESA_DRI_MODULE_UNSTRIPPED_PATH := $(TARGET_OUT_SHARED_LIBRARIES_UNSTRIPPED)/$(M
MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk
MESA_PYTHON2 := python
# Lists to convert driver names to boolean variables
# in form of <driver name>.<boolean make variable>
classic_drivers := i915.HAVE_I915_DRI i965.HAVE_I965_DRI
gallium_drivers := \
swrast.HAVE_GALLIUM_SOFTPIPE \
freedreno.HAVE_GALLIUM_FREEDRENO \
i915g.HAVE_GALLIUM_I915 \
nouveau.HAVE_GALLIUM_NOUVEAU \
pl111.HAVE_GALLIUM_PL111 \
r300g.HAVE_GALLIUM_R300 \
r600g.HAVE_GALLIUM_R600 \
radeonsi.HAVE_GALLIUM_RADEONSI \
vmwgfx.HAVE_GALLIUM_VMWGFX \
vc4.HAVE_GALLIUM_VC4 \
virgl.HAVE_GALLIUM_VIRGL
classic_drivers := i915 i965
gallium_drivers := swrast freedreno i915g nouveau r300g r600g radeonsi vmwgfx vc4 virgl
ifeq ($(BOARD_GPU_DRIVERS),all)
MESA_BUILD_CLASSIC := $(filter HAVE_%, $(subst ., , $(classic_drivers)))
MESA_BUILD_GALLIUM := $(filter HAVE_%, $(subst ., , $(gallium_drivers)))
else
# Warn if we have any invalid driver names
$(foreach d, $(BOARD_GPU_DRIVERS), \
$(if $(findstring $(d).,$(classic_drivers) $(gallium_drivers)), \
, \
$(warning invalid GPU driver: $(d)) \
) \
)
MESA_BUILD_CLASSIC := $(strip $(foreach d, $(BOARD_GPU_DRIVERS), $(patsubst $(d).%,%, $(filter $(d).%, $(classic_drivers)))))
MESA_BUILD_GALLIUM := $(strip $(foreach d, $(BOARD_GPU_DRIVERS), $(patsubst $(d).%,%, $(filter $(d).%, $(gallium_drivers)))))
MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS))
# warn about invalid drivers
invalid_drivers := $(filter-out \
$(classic_drivers) $(gallium_drivers), $(MESA_GPU_DRIVERS))
ifneq ($(invalid_drivers),)
$(warning invalid GPU drivers: $(invalid_drivers))
# tidy up
MESA_GPU_DRIVERS := $(filter-out $(invalid_drivers), $(MESA_GPU_DRIVERS))
endif
$(foreach d, $(MESA_BUILD_CLASSIC) $(MESA_BUILD_GALLIUM), $(eval $(d) := true))
# host and target must be the same arch to generate matypes.h
ifeq ($(TARGET_ARCH),$(HOST_ARCH))
@@ -82,27 +61,23 @@ else
MESA_ENABLE_ASM := false
endif
ifneq ($(filter true, $(HAVE_GALLIUM_RADEONSI)),)
MESA_ENABLE_LLVM := true
ifneq ($(filter $(classic_drivers), $(MESA_GPU_DRIVERS)),)
MESA_BUILD_CLASSIC := true
else
MESA_BUILD_CLASSIC := false
endif
define mesa-build-with-llvm
$(if $(filter $(MESA_ANDROID_MAJOR_VERSION), 4 5), \
$(warning Unsupported LLVM version in Android $(MESA_ANDROID_MAJOR_VERSION)),) \
$(if $(filter 6,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_STATIC_LIBRARIES += libLLVMCore) \
$(eval LOCAL_C_INCLUDES += external/llvm/include external/llvm/device/include),) \
$(if $(filter 7,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_STATIC_LIBRARIES += libLLVMCore) \
$(eval LOCAL_C_INCLUDES += external/llvm/include external/llvm/device/include),) \
$(if $(filter O,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_HEADER_LIBRARIES += llvm-headers),)
endef
ifneq ($(filter $(gallium_drivers), $(MESA_GPU_DRIVERS)),)
MESA_BUILD_GALLIUM := true
else
MESA_BUILD_GALLIUM := false
endif
MESA_ENABLE_LLVM := $(if $(filter radeonsi,$(MESA_GPU_DRIVERS)),true,false)
# add subdirectories
ifneq ($(strip $(MESA_GPU_DRIVERS)),)
SUBDIRS := \
src/gbm \
src/loader \
@@ -117,5 +92,11 @@ SUBDIRS := \
src/vulkan
INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))
ifeq ($(strip $(MESA_BUILD_GALLIUM)),true)
INC_DIRS += $(call all-named-subdir-makefiles,src/gallium)
endif
include $(INC_DIRS)
endif

View File

@@ -43,7 +43,7 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-llvm-shared-libs \
--with-platforms=x11,wayland,drm,surfaceless \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,nouveau,r300,pl111,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr,etnaviv,imx \
--with-gallium-drivers=i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr,etnaviv,imx \
--with-vulkan-drivers=intel,radeon
ACLOCAL_AMFLAGS = -I m4

View File

@@ -1 +1 @@
17.2.0-devel
17.1.3

4
bin/.cherry-ignore Normal file
View File

@@ -0,0 +1,4 @@
# This commit depends on 9fd9a7d0ba3 and 678d568c7b2, neither of which is in branch.
b84b631c6381d9b36bca5d0e7cc67dd23af188c1 radeonsi: load patch_id for TES-as-ES when exporting for PS
# This commit addressed an earlier commit 126d5ad which did not land in branch.
9da104593386f6e8ddec8f0d9d288aceb8908fe1 radv: fix regression in descriptor set freeing.

View File

@@ -1,2 +1,3 @@
[*.sh]
indent_style = tab
indent_style = space
indent_size = 2

View File

@@ -36,17 +36,14 @@ do
continue
fi
# Place every "fixes:" tag on its own line and join with the next word
# on its line or a later one.
fixes=`git show -s $sha | tr -d "\n" | sed -e 's/fixes:[[:space:]]*/\nfixes:/Ig' | grep "fixes:" | sed -e 's/\(fixes:[a-zA-Z0-9]*\).*$/\1/'`
# For each one try to extract the tag
fixes_count=`echo "$fixes" | wc -l`
fixes_count=`git show $sha | grep -i "fixes:" | wc -l`
warn=`(test $fixes_count -gt 1 && echo $fixes_count) || echo 0`
while [ $fixes_count -gt 0 ] ; do
# Treat only the current line
id=`echo "$fixes" | tail -n $fixes_count | head -n 1 | cut -d : -f 2`
fixes=`git show $sha | grep -i "fixes:" | tail -n $fixes_count`
fixes_count=$(($fixes_count-1))
# The following sed/cut combination is borrowed from GregKH
id=`echo ${fixes} | sed -e 's/^[ \t]*//' | cut -f 2 -d ':' | sed -e 's/^[ \t]*//' | cut -f 1 -d ' '`
# Bail out if we cannot find suitable id.
# Any specific validation the $id is valid and not some junk, is

View File

@@ -133,7 +133,7 @@ class PerfParser(LineParser):
def __init__(self, infile, symbol):
LineParser.__init__(self, infile)
self.symbol = symbol
self.symbol = symbol
def readline(self):
# Override LineParser.readline to ignore comment lines
@@ -155,7 +155,7 @@ class PerfParser(LineParser):
addresses.sort()
total_samples = 0
sys.stdout.write('%s:\n' % self.symbol)
sys.stdout.write('%s:\n' % self.symbol)
for address, instr in asm:
try:
sample = samples.pop(address)

View File

@@ -74,7 +74,7 @@ AC_SUBST([OPENCL_VERSION])
# in the first entry.
LIBDRM_REQUIRED=2.4.75
LIBDRM_RADEON_REQUIRED=2.4.71
LIBDRM_AMDGPU_REQUIRED=2.4.81
LIBDRM_AMDGPU_REQUIRED=2.4.79
LIBDRM_INTEL_REQUIRED=2.4.75
LIBDRM_NVVIEUX_REQUIRED=2.4.66
LIBDRM_NOUVEAU_REQUIRED=2.4.66
@@ -102,8 +102,8 @@ ZLIB_REQUIRED=1.2.3
dnl LLVM versions
LLVM_REQUIRED_GALLIUM=3.3.0
LLVM_REQUIRED_OPENCL=3.6.0
LLVM_REQUIRED_R600=3.9.0
LLVM_REQUIRED_RADEONSI=3.9.0
LLVM_REQUIRED_R600=3.8.0
LLVM_REQUIRED_RADEONSI=3.8.0
LLVM_REQUIRED_RADV=3.9.0
LLVM_REQUIRED_SWR=3.9.0
@@ -455,7 +455,7 @@ int main () {
CFLAGS=$save_CFLAGS
AC_ARG_ENABLE(pwr8,
[AS_HELP_STRING([--disable-pwr8],
[AS_HELP_STRING([--disable-pwr8-inst],
[disable POWER8-specific instructions])],
[enable_pwr8=$enableval], [enable_pwr8=auto])
@@ -766,13 +766,6 @@ if test "x$enable_asm" = xyes; then
;;
esac
;;
powerpc64le)
case "$host_os" in
linux*)
asm_arch=ppc64le
;;
esac
;;
esac
case "$asm_arch" in
@@ -788,10 +781,6 @@ if test "x$enable_asm" = xyes; then
DEFINES="$DEFINES -DUSE_SPARC_ASM"
AC_MSG_RESULT([yes, sparc])
;;
ppc64le)
DEFINES="$DEFINES -DUSE_PPC64LE_ASM"
AC_MSG_RESULT([yes, ppc64le])
;;
*)
AC_MSG_RESULT([no, platform not supported])
;;
@@ -848,11 +837,6 @@ dnl is not valid for that platform.
if test "x$android" = xno; then
test -z "$PTHREAD_LIBS" && PTHREAD_LIBS="-lpthread"
fi
dnl According to the manual when using pthreads, one should add -pthread to
dnl both compile and link-time arguments.
dnl In practise that should be sufficient for all platforms, since any
dnl platforms build with GCC and Clang support the flag.
PTHREAD_LIBS="$PTHREAD_LIBS -pthread"
dnl pthread-stubs is mandatory on BSD platforms, due to the nature of the
dnl project. Even then there's a notable issue as described in the project README
@@ -867,6 +851,8 @@ esac
if test "x$pthread_stubs_possible" = xyes; then
PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs >= 0.4)
AC_SUBST(PTHREADSTUBS_CFLAGS)
AC_SUBST(PTHREADSTUBS_LIBS)
fi
dnl SELinux awareness.
@@ -1080,12 +1066,16 @@ AC_SUBST([LLVM_INCLUDEDIR])
dnl
dnl libunwind
dnl
PKG_CHECK_EXISTS(libunwind, [HAVE_LIBUNWIND=yes], [HAVE_LIBUNWIND=no])
AC_ARG_ENABLE([libunwind],
[AS_HELP_STRING([--enable-libunwind],
[Use libunwind for backtracing (default: auto)])],
[LIBUNWIND="$enableval"],
[LIBUNWIND="$HAVE_LIBUNWIND"])
[LIBUNWIND="auto"])
PKG_CHECK_EXISTS(libunwind, [HAVE_LIBUNWIND=yes], [HAVE_LIBUNWIND=no])
if test "x$LIBUNWIND" = "xauto"; then
LIBUNWIND="$HAVE_LIBUNWIND"
fi
if test "x$LIBUNWIND" = "xyes"; then
PKG_CHECK_MODULES(LIBUNWIND, libunwind)
@@ -1250,7 +1240,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
AC_ARG_WITH([gallium-drivers],
[AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
[comma delimited Gallium drivers list, e.g.
"i915,nouveau,r300,r600,radeonsi,freedreno,pl111,svga,swrast,swr,vc4,virgl,etnaviv,imx"
"i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,swr,vc4,virgl,etnaviv,imx"
@<:@default=r300,r600,svga,swrast@:>@])],
[with_gallium_drivers="$withval"],
[with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
@@ -1726,7 +1716,7 @@ done
if test "x$enable_glx" != xno; then
if ! echo "$platforms" | grep -q 'x11'; then
AC_MSG_ERROR([Building GLX without the x11 platform is not supported])
AC_MSG_ERROR([Building without the x11 platform as GLX is enabled, is not supported])
fi
fi
@@ -1841,11 +1831,12 @@ if test -n "$with_dri_drivers"; then
xi915)
require_libdrm "i915"
HAVE_I915_DRI=yes
PKG_CHECK_MODULES([I915], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
;;
xi965)
require_libdrm "i965"
HAVE_I965_DRI=yes
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
;;
xnouveau)
require_libdrm "nouveau"
@@ -1957,6 +1948,7 @@ if test -n "$with_vulkan_drivers"; then
case "x$driver" in
xintel)
require_libdrm "ANV"
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
require_x11_dri3 "ANV"
HAVE_INTEL_VULKAN=yes
;;
@@ -2099,7 +2091,7 @@ dnl Gallium G3DVL configuration
dnl
if test -n "$with_gallium_drivers" -a "x$with_gallium_drivers" != xswrast; then
if test "x$enable_xvmc" = xauto -a "x$have_xvmc_platform" = xyes; then
PKG_CHECK_EXISTS([xvmc >= $XVMC_REQUIRED], [enable_xvmc=yes], [enable_xvmc=no])
PKG_CHECK_EXISTS([xvmc >= $XVMC_REQUIRED], [enable_xvmc=yes], [enable_xvmc=no])
fi
if test "x$enable_vdpau" = xauto -a "x$have_vdpau_platform" = xyes; then
@@ -2107,7 +2099,7 @@ if test -n "$with_gallium_drivers" -a "x$with_gallium_drivers" != xswrast; then
fi
if test "x$enable_omx" = xauto -a "x$have_omx_platform" = xyes; then
PKG_CHECK_EXISTS([libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED], [enable_omx=yes], [enable_omx=no])
PKG_CHECK_EXISTS([libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED], [enable_omx=yes], [enable_omx=no])
fi
if test "x$enable_va" = xauto -a "x$have_va_platform" = xyes; then
@@ -2427,7 +2419,7 @@ if test -n "$with_gallium_drivers"; then
;;
xi915)
HAVE_GALLIUM_I915=yes
PKG_CHECK_MODULES([I915], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
require_libdrm "Gallium i915"
;;
xr300)
@@ -2511,9 +2503,6 @@ if test -n "$with_gallium_drivers"; then
DEFINES="$DEFINES -DUSE_VC4_SIMULATOR"],
[USE_VC4_SIMULATOR=no])
;;
xpl111)
HAVE_GALLIUM_PL111=yes
;;
xvirgl)
HAVE_GALLIUM_VIRGL=yes
require_libdrm "virgl"
@@ -2543,10 +2532,6 @@ if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_IMX" = xyes ; then
AC_MSG_ERROR([Building with imx requires etnaviv])
fi
if test "x$HAVE_GALLIUM_VC4" != xyes -a "x$HAVE_GALLIUM_PL111" = xyes ; then
AC_MSG_ERROR([Building with pl111 requires vc4])
fi
dnl
dnl Set defines and buildtime variables only when using LLVM.
dnl
@@ -2611,7 +2596,6 @@ fi
AM_CONDITIONAL(HAVE_GALLIUM_SVGA, test "x$HAVE_GALLIUM_SVGA" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_I915, test "x$HAVE_GALLIUM_I915" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_PL111, test "x$HAVE_GALLIUM_PL111" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_R300, test "x$HAVE_GALLIUM_R300" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_R600, test "x$HAVE_GALLIUM_R600" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_RADEONSI, test "x$HAVE_GALLIUM_RADEONSI" = xyes)
@@ -2651,7 +2635,8 @@ AM_CONDITIONAL(HAVE_SWRAST_DRI, test x$HAVE_SWRAST_DRI = xyes)
AM_CONDITIONAL(HAVE_RADEON_VULKAN, test "x$HAVE_RADEON_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_INTEL_VULKAN, test "x$HAVE_INTEL_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_AMD_DRIVERS, test "x$HAVE_GALLIUM_RADEONSI" = xyes -o \
AM_CONDITIONAL(HAVE_AMD_DRIVERS, test "x$HAVE_GALLIUM_R600" = xyes -o \
"x$HAVE_GALLIUM_RADEONSI" = xyes -o \
"x$HAVE_RADEON_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_INTEL_DRIVERS, test "x$HAVE_INTEL_VULKAN" = xyes -o \
@@ -2674,7 +2659,6 @@ AM_CONDITIONAL(HAVE_COMMON_OSMESA, test "x$enable_osmesa" = xyes -o \
AM_CONDITIONAL(HAVE_X86_ASM, test "x$asm_arch" = xx86 -o "x$asm_arch" = xx86_64)
AM_CONDITIONAL(HAVE_X86_64_ASM, test "x$asm_arch" = xx86_64)
AM_CONDITIONAL(HAVE_SPARC_ASM, test "x$asm_arch" = xsparc)
AM_CONDITIONAL(HAVE_PPC64LE_ASM, test "x$asm_arch" = xppc64le)
AC_SUBST([NINE_MAJOR], 1)
AC_SUBST([NINE_MINOR], 0)
@@ -2761,7 +2745,6 @@ AC_CONFIG_FILES([Makefile
src/gallium/drivers/llvmpipe/Makefile
src/gallium/drivers/noop/Makefile
src/gallium/drivers/nouveau/Makefile
src/gallium/drivers/pl111/Makefile
src/gallium/drivers/r300/Makefile
src/gallium/drivers/r600/Makefile
src/gallium/drivers/radeon/Makefile
@@ -2807,7 +2790,6 @@ AC_CONFIG_FILES([Makefile
src/gallium/winsys/freedreno/drm/Makefile
src/gallium/winsys/i915/drm/Makefile
src/gallium/winsys/nouveau/drm/Makefile
src/gallium/winsys/pl111/drm/Makefile
src/gallium/winsys/radeon/drm/Makefile
src/gallium/winsys/amdgpu/drm/Makefile
src/gallium/winsys/svga/drm/Makefile
@@ -2982,17 +2964,15 @@ echo " Static libs: $enable_static"
echo " Shared-glapi: $enable_shared_glapi"
dnl Compiler options
# cleanup the CFLAGS/CXXFLAGS/LDFLAGS/DEFINES vars
# cleanup the CFLAGS/CXXFLAGS/DEFINES vars
cflags=`echo $CFLAGS | \
$SED 's/^ *//;s/ */ /;s/ *$//'`
cxxflags=`echo $CXXFLAGS | \
$SED 's/^ *//;s/ */ /;s/ *$//'`
ldflags=`echo $LDFLAGS | $SED 's/^ *//;s/ */ /;s/ *$//'`
defines=`echo $DEFINES | $SED 's/^ *//;s/ */ /;s/ *$//'`
echo ""
echo " CFLAGS: $cflags"
echo " CXXFLAGS: $cxxflags"
echo " LDFLAGS: $ldflags"
echo " Macros: $defines"
echo ""
if test "x$enable_llvm" = xyes; then

View File

@@ -84,7 +84,6 @@
<li><a href="codingstyle.html" target="_parent">Coding Style</a>
<li><a href="submittingpatches.html" target="_parent">Submitting patches</a>
<li><a href="releasing.html" target="_parent">Releasing process</a>
<li><a href="release-calendar.html" target="_parent">Release calendar</a>
<li><a href="sourcedocs.html" target="_parent">Source Documentation</a>
<li><a href="dispatch.html" target="_parent">GL Dispatch</a>
</ul>

View File

@@ -46,9 +46,6 @@ sometimes be useful for debugging end-user issues.
<li>MESA_NO_MMX - if set, disables Intel MMX optimizations
<li>MESA_NO_3DNOW - if set, disables AMD 3DNow! optimizations
<li>MESA_NO_SSE - if set, disables Intel SSE optimizations
<li>MESA_NO_ERROR - if set error checking is disabled as per KHR_no_error.
This will result in undefined behaviour for invalid use of the api, but
can reduce CPU use for apps that are known to be error free.</li>
<li>MESA_DEBUG - if set, error messages are printed to stderr. For example,
if the application generates a GL_INVALID_ENUM error, a corresponding error
message indicating where the error occurred, and possibly why, will be
@@ -163,47 +160,48 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
This is useful for debugging hangs, etc.</li>
<li>INTEL_DEBUG - a comma-separated list of named flags, which do various things:
<ul>
<li>ann - annotate IR in assembly dumps</li>
<li>aub - dump batches into an AUB trace for use with simulation tools</li>
<li>bat - emit batch information</li>
<li>blit - emit messages about blit operations</li>
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>buf - emit messages about buffer objects</li>
<li>clip - emit messages about the clip unit (for old gens, includes the CLIP program)</li>
<li>color - use color in output</li>
<li>cs - dump shader assembly for compute shaders</li>
<li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li>
<li>dri - emit messages about the DRI interface</li>
<li>tex - emit messages about textures.</li>
<li>state - emit messages about state flag tracking</li>
<li>blit - emit messages about blit operations</li>
<li>miptree - emit messages about miptrees</li>
<li>perf - emit messages about performance issues</li>
<li>perfmon - emit messages about AMD_performance_monitor</li>
<li>bat - emit batch information</li>
<li>pix - emit messages about pixel operations</li>
<li>buf - emit messages about buffer objects</li>
<li>fbo - emit messages about framebuffers</li>
<li>fs - dump shader assembly for fragment shaders</li>
<li>gs - dump shader assembly for geometry shaders</li>
<li>hex - print instruction hex dump with the disassembly</li>
<li>l3 - emit messages about the new L3 state during transitions</li>
<li>miptree - emit messages about miptrees</li>
<li>no8 - don't generate SIMD8 fragment shader</li>
<li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>
<li>nocompact - disable instruction compaction</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>norbc - disable single sampled render buffer compression</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
<li>perf - emit messages about performance issues</li>
<li>perfmon - emit messages about AMD_performance_monitor</li>
<li>pix - emit messages about pixel operations</li>
<li>sync - after sending each batch, emit a message and wait for that batch to finish rendering</li>
<li>prim - emit messages about drawing primitives</li>
<li>vert - emit messages about vertex assembly</li>
<li>dri - emit messages about the DRI interface</li>
<li>sf - emit messages about the strips &amp; fans unit (for old gens, includes the SF program)</li>
<li>stats - enable statistics counters. you probably actually want perfmon or intel_gpu_top instead.</li>
<li>urb - emit messages about URB setup</li>
<li>vs - dump shader assembly for vertex shaders</li>
<li>clip - emit messages about the clip unit (for old gens, includes the CLIP program)</li>
<li>aub - dump batches into an AUB trace for use with simulation tools</li>
<li>shader_time - record how much GPU time is spent in each shader</li>
<li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
<li>ann - annotate IR in assembly dumps</li>
<li>no8 - don't generate SIMD8 fragment shader</li>
<li>vec4 - force vec4 mode in vertex shader</li>
<li>spill_fs - force spilling of all registers in the scalar backend (useful to debug spilling code)</li>
<li>spill_vec4 - force spilling of all registers in the vec4 backend (useful to debug spilling code)</li>
<li>state - emit messages about state flag tracking</li>
<li>sync - after sending each batch, emit a message and wait for that batch to finish rendering</li>
<li>cs - dump shader assembly for compute shaders</li>
<li>hex - print instruction hex dump with the disassembly</li>
<li>nocompact - disable instruction compaction</li>
<li>tcs - dump shader assembly for tessellation control shaders</li>
<li>tes - dump shader assembly for tessellation evaluation shaders</li>
<li>tex - emit messages about textures.</li>
<li>urb - emit messages about URB setup</li>
<li>vert - emit messages about vertex assembly</li>
<li>vs - dump shader assembly for vertex shaders</li>
<li>l3 - emit messages about the new L3 state during transitions</li>
<li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li>
<li>norbc - disable single sampled render buffer compression</li>
</ul>
<li>INTEL_SCALAR_VS (or TCS, TES, GS) - force scalar/vec4 mode for a shader stage (Gen8-9 only)</li>
<li>INTEL_PRECISE_TRIG - if set to 1, true or yes, then the driver prefers
accuracy over performance in trig functions.</li>
</ul>
@@ -304,8 +302,6 @@ See src/mesa/state_tracker/st_debug.c for other options.
(will often result in incorrect rendering).
<li>SVGA_DEBUG - for dumping shaders, constant buffers, etc. See the code
for details.
<li>SVGA_EXTRA_LOGGING - if set, enables extra logging to the vmware.log file,
such as the OpenGL program's name and command line arguments.
<li>See the driver code for other, lesser-used variables.
</ul>

View File

@@ -277,7 +277,7 @@ GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+
Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES version:
GL_ARB_bindless_texture DONE (radeonsi)
GL_ARB_bindless_texture started (airlied)
GL_ARB_cl_event not started
GL_ARB_compute_variable_group_size DONE (nvc0, radeonsi)
GL_ARB_ES3_2_compatibility DONE (i965/gen8+)
@@ -297,7 +297,7 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (nvc0, radeonsi)
GL_ARB_shader_stencil_export DONE (i965/gen9+, radeonsi, softpipe, llvmpipe, swr)
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+, nvc0, radeonsi)
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+, radeonsi)
GL_ARB_sparse_buffer DONE (radeonsi/CIK+)
GL_ARB_sparse_texture not started
GL_ARB_sparse_texture2 not started
@@ -305,9 +305,9 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_texture_filter_minmax not started
GL_ARB_transform_feedback_overflow_query DONE (i965/gen6+)
GL_KHR_blend_equation_advanced_coherent DONE (i965/gen9+)
GL_KHR_no_error started (Timothy Arceri)
GL_KHR_texture_compression_astc_hdr DONE (i965/bxt)
GL_KHR_texture_compression_astc_sliced_3d DONE (i965/gen9+)
GL_KHR_no_error not started
GL_KHR_texture_compression_astc_hdr DONE (core only)
GL_KHR_texture_compression_astc_sliced_3d not started
GL_OES_depth_texture_cube_map DONE (all drivers that support GLSL 1.30+)
GL_OES_EGL_image DONE (all drivers)
GL_OES_EGL_image_external_essl3 not started

View File

@@ -16,59 +16,6 @@
<h1>News</h1>
<h2>June 19, 2017</h2>
<p>
<a href="relnotes/17.1.3.html">Mesa 17.1.3</a> is released.
This is a bug-fix release.
</p>
<h2>June 5, 2017</h2>
<p>
<a href="relnotes/17.1.2.html">Mesa 17.1.2</a> is released.
This is a bug-fix release.
</p>
<h2>June 1, 2017</h2>
<p>
<a href="relnotes/17.0.7.html">Mesa 17.0.7</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 17.0.7 will be the final release in the 17.0
series. Users of 17.0 are encouraged to migrate to the 17.1 series in order
to obtain future fixes.
</p>
<h2>May 25, 2017</h2>
<p>
<a href="relnotes/17.1.1.html">Mesa 17.1.1</a> is released.
This is a bug-fix release.
</p>
<h2>May 12, 2017</h2>
<p>
<a href="relnotes/17.0.6.html">Mesa 17.0.6</a> is released.
This is a bug-fix release.
</p>
<h2>May 10, 2017</h2>
<p>
<a href="relnotes/17.1.0.html">Mesa 17.1.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>April 28, 2017</h2>
<p>
<a href="relnotes/17.0.5.html">Mesa 17.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>April 17, 2017</h2>
<p>
<a href="relnotes/17.0.4.html">Mesa 17.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>April 1, 2017</h2>
<p>
<a href="relnotes/17.0.3.html">Mesa 17.0.3</a> is released.

View File

@@ -1,106 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Release calendar</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Overview</h1>
<p>
Mesa provides feature/development and stable releases.
</p>
<p>
The table below lists the date and release manager that is expected to do the
specific release.
<br>
Take a look <a href="submittingpatches.html#criteria" target="_parent">here</a>
if you'd like to nominate a patch in the next stable release.
</p>
<h1 id="calendar">Calendar</h1>
<table border="1">
<tr>
<th>Branch</th>
<th>Expected date</th>
<th>Release</th>
<th>Release manager</th>
<th>Notes</th>
</tr>
<tr>
<td rowspan="5">17.1</td>
<td>2017-06-30</td>
<td>17.1.4</td>
<td>Andres Gomez</td>
<td></td>
</tr>
<tr>
<td>2017-07-14</td>
<td>17.1.5</td>
<td>Andres Gomez</td>
<td></td>
</tr>
<tr>
<td>2017-07-28</td>
<td>17.1.6</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-08-11</td>
<td>17.1.7</td>
<td>Juan A. Suarez Romero</td>
<td></td>
</tr>
<tr>
<td>2017-08-25</td>
<td>17.1.8</td>
<td>Andres Gomez</td>
<td>Final planned release for the 17.1 series</td>
</tr>
<tr>
<td rowspan="5">17.2</td>
<td>2017-07-21</td>
<td>17.2.0-rc1</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-07-28</td>
<td>17.2.0-rc2</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-08-04</td>
<td>17.2.0-rc3</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-08-11</td>
<td>17.2.0-rc4</td>
<td>Emil Velikov</td>
<td>May be promoted to 17.2.0 final</td>
</tr>
<tr>
<td>2017-08-25</td>
<td>17.2.1</td>
<td>Emil Velikov</td>
<td></td>
</table>
</div>
</body>
</html>

View File

@@ -14,7 +14,6 @@
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Releasing process</h1>
<ul>
@@ -24,13 +23,11 @@
<li><a href="#branch">Making a branchpoint</a>
<li><a href="#prerelease">Pre-release announcement</a>
<li><a href="#release">Making a new release</a>
<li><a href="#calendar">Update the calendar</a>
<li><a href="#announce">Announce the release</a>
<li><a href="#website">Update the mesa3d.org website</a>
<li><a href="#bugzilla">Update Bugzilla</a>
</ul>
<h1 id="overview">Overview</h1>
<p>
@@ -51,15 +48,11 @@ For example:
Mesa 12.0.2 - 12.0 branch, bugfix
</pre>
<h1 id="schedule">Release schedule</h1>
<p>
Releases should happen on Fridays. Delays can occur although those should be keep
to a minimum.
<br>
See our <a href="release-calendar.html" target="_parent">calendar</a> for the
date and other details for individual releases.
</p>
<h2>Feature releases</h2>
@@ -86,24 +79,15 @@ The final release from the 12.0 series Mesa 12.0.5 will be out around the same
time (or shortly after) 13.0.1 is out.
</p>
<h1 id="pickntest">Cherry-picking and testing</h1>
<p>
Commits nominated for the active branch are picked as based on the
<a href="submittingpatches.html#criteria" target="_parent">criteria</a> as
described in the same section.
</p>
<p>
Nomination happens in the mesa-stable@ mailing list. However,
maintainer is resposible of checking for forgotten candidates in the
master branch. This is achieved by a combination of ad-hoc scripts and
a casual search for terms such as regression, fix, broken and similar.
</p>
<p>
Maintainer is also responsible for testing in various possible permutations of
Maintainer is responsible for testing in various possible permutations of
the autoconf and scons build.
</p>
@@ -117,57 +101,33 @@ release. This is made <strong>only</strong> with explicit permission/request,
and the patch <strong>must</strong> be very well contained. Thus it cannot
affect more than one driver/subsystem.
</p>
<p>
Currently Ilia Mirkin and AMD devs have requested "permanent" exception.
</p>
<ul>
<li>make distcheck, scons and scons check must pass
<li>Testing with different version of system components - LLVM and others is also
performed where possible.
<li>As a general rule, testing with various combinations of configure
switches, depending on the specific patchset.
</ul>
<p>
Achieved by combination of local ad-hoc scripts, mingw-w64 cross
compilation and AppVeyor plus Travis-CI, the latter as part of their
Github integration.
Achieved by combination of local ad-hoc scripts and AppVeyor plus Travis-CI,
the latter as part of their Github integration.
</p>
<p>
For Windows related changes, the main contact point is Brian
Paul. Jose Fonseca can also help as a fallback contact.
</p>
<p>
For Android related changes, the main contact is Tapani
P&auml;lli. Mauro Rossi is collaborating with android-x86 and may
provide feedback about the build status in that project.
</p>
<p>
For MacOSX related changes, Jeremy Huddleston Sequoia is currently a
good contact point.
</p>
<p>
<strong>Note:</strong> If a patch in the current queue needs any additional
fix(es), then they should be squashed together.
<br>
The commit messages and the <code>cherry picked from</code> tags must be preserved.
</p>
<p>
This should be noted in the <a href="#prerelease">pre-announce</a> email.
</p>
<pre>
git show b10859ec41d09c57663a258f43fe57c12332698e
commit b10859ec41d09c57663a258f43fe57c12332698e
Author: Jonas Pfeil &lt;pfeiljonas@gmx.de&gt;
Author: Jonas Pfeil &ltpfeiljonas@gmx.de&gt
Date: Wed Mar 1 18:11:10 2017 +0100
ralloc: Make sure ralloc() allocations match malloc()'s alignment.
@@ -186,6 +146,7 @@ This should be noted in the <a href="#prerelease">pre-announce</a> email.
(cherry picked from commit ff494fe999510ea40e3ed5827e7818550b6de126)
</pre>
</p>
<h2>Regression/functionality testing</h2>
@@ -193,23 +154,15 @@ This should be noted in the <a href="#prerelease">pre-announce</a> email.
Less often (once or twice), shortly before the pre-release announcement.
Ensure that testing is redone if Intel devs have requested an exception, as per above.
</p>
<ul>
<li><em>no regressions should be observed for Piglit/dEQP/CTS/Vulkan on Intel platforms</em>
<li><em>no regressions should be observed for Piglit using the swrast, softpipe
and llvmpipe drivers</em>
</ul>
<p>
Currently testing is performed courtesy of the Intel OTC team and their Jenkins CI setup. Check with the Intel team over IRC how to get things setup.
</p>
<p>
Installing the built driver from the pre-announced RC branch in the
system and making some every day's use until the release may be a good
idea too.
</p>
<h1 id="branch">Making a branchpoint</h1>
@@ -249,18 +202,15 @@ To setup the branchpoint:
Now go to
<a href="https://bugs.freedesktop.org/editversions.cgi?action=add&amp;product=Mesa" target="_parent">Bugzilla</a> and add the new Mesa version X.Y.
</p>
<p>
Check that there are no distribution breaking changes and revert them if needed.
For example: files being overwritten on install, etc. Happens extremely rarely -
we had only one case so far (see commit 2ced8eb136528914e1bf4e000dea06a9d53c7e04).
</p>
<p>
Proceed to <a href="#release">release</a> -rc1.
</p>
<h1 id="prerelease">Pre-release announcement</h1>
<p>
@@ -274,22 +224,18 @@ release is made.
</p>
<h2>Terminology used</h2>
<ul><li>Nominated</ul>
<p>
Patch that is nominated but yet to to merged in the patch queue/branch.
</p>
<ul><li>Queued</ul>
<p>
Patch is in the queue/branch and will feature in the next release.
Barring reported regressions or objections from developers.
</p>
<ul><li>Rejected</ul>
<p>
Patch does not fit the
<a href="submittingpatches.html#criteria" target="_parent">criteria</a> and
@@ -395,7 +341,6 @@ AUTHOR (NUMBER):
Reason: ...
</pre>
<h1 id="release">Making a new release</h1>
<p>
@@ -403,21 +348,18 @@ These are the instructions for making a new Mesa release.
</p>
<h3>Get latest source files</h3>
<p>
Ensure the latest code is available - both in your local master and the
relevant branch.
</p>
<h3>Perform basic testing</h3>
<p>
Most of the testing should already be done during the
<a href="#pickntest">cherry-pick</a> and
<a href="#prerelease">pre-announce</a> stages.
So we do a quick 'touch test'
</p>
So we do a quick 'touch test'
<ul>
<li>make distcheck (you can omit this if you're not using --dist below)
<li>scons (from release tarball)
@@ -510,7 +452,6 @@ be empty (TBD) at this point.
<p>
Two scripts are available to help generate portions of the release notes:
</p>
<pre>
./bin/bugzilla_mesa.sh
@@ -527,7 +468,6 @@ to be included in the release notes.
<p>
Commit these changes and push the branch.
</p>
<pre>
git push origin HEAD
</pre>
@@ -538,7 +478,6 @@ Commit these changes and push the branch.
<p>
Start the release process.
</p>
<pre>
../relative/path/to/release.sh . # append --dist if you've already done distcheck above
</pre>
@@ -576,15 +515,7 @@ docs/index.html to add a news entry. Then commit and push:
</pre>
<h1 id="calendar">Update the calendar</h1>
<p>
Remove the version from the <a href="release-calendar.html" target="_parent">calendar</a>.
</p>
<h1 id="announce">Announce the release</h1>
<p>
Use the generated template during the releasing process.
</p>
@@ -597,7 +528,6 @@ As the hosting was moved to freedesktop, git hooks are deployed to update the
website. Manually check that it is updated 5-10 minutes after the final <code>git push</code>
</p>
<h1 id="bugzilla">Update Bugzilla</h1>
<p>

View File

@@ -21,14 +21,6 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/17.1.3.html">17.1.3 release notes</a>
<li><a href="relnotes/17.1.2.html">17.1.2 release notes</a>
<li><a href="relnotes/17.0.7.html">17.0.7 release notes</a>
<li><a href="relnotes/17.1.1.html">17.1.1 release notes</a>
<li><a href="relnotes/17.0.6.html">17.0.6 release notes</a>
<li><a href="relnotes/17.1.0.html">17.1.0 release notes</a>
<li><a href="relnotes/17.0.5.html">17.0.5 release notes</a>
<li><a href="relnotes/17.0.4.html">17.0.4 release notes</a>
<li><a href="relnotes/17.0.3.html">17.0.3 release notes</a>
<li><a href="relnotes/17.0.2.html">17.0.2 release notes</a>
<li><a href="relnotes/13.0.6.html">13.0.6 release notes</a>

View File

@@ -1,156 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.4 Release Notes / April 17, 2017</h1>
<p>
Mesa 17.0.4 is a bug fix release which fixes bugs found since the 17.0.3 release.
</p>
<p>
Mesa 17.0.4 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
c4c34ba05d48f76b45bc05bc4b6e9242077f403d63c4f0c355c7b07786de233e mesa-17.0.4.tar.gz
1269dc8545a193932a0779b2db5bce9be4a5f6813b98c38b93b372be8362a346 mesa-17.0.4.tar.xz
</pre>
<h2>Next release</h2>
<p>
Mesa 17.0.5 is expected in approximatelly two weeks. See the release
<a href="../release-calendar.html#calendar" target="_parent">calendar</a>
for details.
</p>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99515">Bug 99515</a> - SIGSEGV MAPERR on Android nougat-x86 with mesa 17.0.0rc</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100391">Bug 100391</a> - SachaWillems deferredmultisampling asserts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100452">Bug 100452</a> - push_constants host memory leak when resetting command buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100582">Bug 100582</a> - [GEN8+] piglit.spec.arb_stencil_texturing.glblitframebuffer corrupts state.gl_texture* assertions</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: add new polaris10 pci id</li>
</ul>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Invalidate L2 for TRANSFER_WRITE barriers</li>
</ul>
<p>Andres Gomez (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.3</li>
</ul>
<p>Craig Stout (1):</p>
<ul>
<li>anv/cmd_buffer: fix host memory leak</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>Revert "cherry-ignore: add the Flush after unmap in gbm/dri fix"</li>
<li>Revert "freedreno: fix memory leak"</li>
<li>Update version to 17.0.4</li>
</ul>
<p>Fabio Estevam (1):</p>
<ul>
<li>loader: Move non-error message to debug level</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>nvc0/ir: fix LSB/BFE/BFI implementations</li>
<li>nvc0/ir: fix overwriting of offset register with interpolateAtOffset</li>
<li>nvc0: increase texture buffer object alignment to 256 for pre-GM107</li>
<li>nouveau: when mapping a persistent buffer, synchronize on former xfers</li>
</ul>
<p>Jason Ekstrand (5):</p>
<ul>
<li>i965/fs: Always provide a default LOD of 0 for TXS and TXL</li>
<li>anv/pipeline: Properly handle unset gl_Layer and gl_ViewportIndex</li>
<li>anv/blorp: Align vertex buffers to 64B</li>
<li>i965/blorp: Align vertex buffers to 64B</li>
<li>i965/blorp: Bump the batch space estimate</li>
</ul>
<p>Jerome Duval (2):</p>
<ul>
<li>haiku: build fixes around debug defines</li>
<li>haiku/winsys: fix dt prototype args</li>
</ul>
<p>Julien Isorce (4):</p>
<ul>
<li>winsys/radeon: check null in radeon_cs_create_fence</li>
<li>winsys/radeon: check null return from radeon_cs_create_fence in cs_flush</li>
<li>radeon: initialize hole variable before calling container_of</li>
<li>radeon_drm_bo: explicitly check return value of drmCommandWriteRead</li>
</ul>
<p>Kenneth Graunke (4):</p>
<ul>
<li>i965: Document the sad story of the kernel command parser.</li>
<li>i965: Set screen-&gt;cmd_parser_version to 0 if we can't write registers.</li>
<li>i965: Skip register write detection when possible.</li>
<li>i965: Set kernel features before computing max GL version.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>targets: export radeon winsys_create functions to silence LLVM warning</li>
</ul>
<p>Michal Srb (1):</p>
<ul>
<li>st: Add cubeMapFace parameter to st_finalize_texture.</li>
</ul>
<p>Thomas Hellstrom (1):</p>
<ul>
<li>gbm/dri: Flush after unmap</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,144 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.5 Release Notes / April 28, 2017</h1>
<p>
Mesa 17.0.5 is a bug fix release which fixes bugs found since the 17.0.4 release.
</p>
<p>
Mesa 17.0.5 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
7510eee0d0077860b250d30d73305048c2df4ba09ea8fc04e4f3eec7beece301 mesa-17.0.5.tar.gz
668efa445d2f57a26e5c096b1965a685733a3b57d9c736f9d6460263847f9bfe mesa-17.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (16):</p>
<ul>
<li>cherry-ignore: Add the pci_id into the shader cache UUID</li>
<li>cherry-ignore: fix crash if ctx torn down with no rendering</li>
<li>cherry-ignore: Fix typos.</li>
<li>cherry-ignore: Revert "etnaviv: Cannot render to rb-swapped formats"</li>
<li>cherry-ignore: Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."</li>
<li>cherry-ignore: fix typo in a2b10g10r10 fast clear calculation</li>
<li>cherry-ignore: remove unused anv_dispatch_table dtable</li>
<li>cherry-ignore: remove unused radv_dispatch_table dtable</li>
<li>cherry-ignore: make radv_resolve_entrypoint static</li>
<li>cherry-ignore: vulkan: add support for libmesa_vulkan_util</li>
<li>cherry-ignore: r600: fix libmesa_amd_common dependency</li>
<li>cherry-ignore: remove dead brw_new_shader() declaration</li>
<li>cherry-ignore: remove i965_symbols_test reference from .gitignore</li>
<li>cherry-ignore: automake: ensure that the destination directory is created</li>
<li>cherry-ignore: provide required gem stubs for the tests</li>
<li>Update version to 17.0.5</li>
</ul>
<p>Boyan Ding (2):</p>
<ul>
<li>nvc0/ir: Properly handle a "split form" of predicate destination</li>
<li>nir: Destination component count of shader_clock intrinsic is 2</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.4</li>
<li>winsys/sw/dri: don't use GNU void pointer arithmetic</li>
<li>st/clover: add space between &lt; and ::</li>
<li>configure.ac: check require_basic_egl only if egl enabled</li>
<li>st/mesa: automake: honour the vdpau header install location</li>
</ul>
<p>Francisco Jerez (2):</p>
<ul>
<li>intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.</li>
<li>intel/fs: Take into account amount of data read in spilling cost heuristic.</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>radv: report timestampPeriod correctly</li>
</ul>
<p>Jason Ekstrand (5):</p>
<ul>
<li>anv/blorp: Flush the texture cache in UpdateBuffer</li>
<li>anv/cmd_buffer: Flush the VF cache at the top of all primaries</li>
<li>anv/cmd_buffer: Always set up a null surface state</li>
<li>anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED</li>
<li>anv/blorp: Properly handle VK_ATTACHMENT_UNUSED</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: invalidate the readpix cache in st_indirect_draw_vbo</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>anv/cmd_buffer: Disable CCS on BDW input attachments</li>
</ul>
<p>Nicolai Hähnle (4):</p>
<ul>
<li>mesa: fix remaining xfb prims check for GLES with multiple instances</li>
<li>mesa: extract need_xfb_remaining_prims_check</li>
<li>mesa: move glMultiDrawArrays to vbo and fix error handling</li>
<li>vbo: fix gl_DrawID handling in glMultiDrawArrays</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>util/queue: don't hang at exit</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>mesa: validate sampler type across the whole program</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,186 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.6 Release Notes / May 12, 2017</h1>
<p>
Mesa 17.0.6 is a bug fix release which fixes bugs found since the 17.0.5 release.
</p>
<p>
Mesa 17.0.6 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
f1b2497d553e9a584f0caa3a2d9d310e27ead15fb0af170da69f6e70fb5031cd mesa-17.0.6.tar.gz
89ecf3bcd0f18dcca5aaa42bf36bb52a2df33be89889f94aaaad91f7a504a69d mesa-17.0.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98428">Bug 98428</a> - Undefined non-weak-symbol in dri-drivers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100854">Bug 100854</a> - YUV to RGB Color Space Conversion result is not precise</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>egl/platform/drm: Don't take display ownership until gbm is initialized</li>
</ul>
<p>Andres Gomez (7):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.5</li>
<li>travis: replace Trusty-based LLVM toolchain apt-get with apt addon</li>
<li>travis: add the possibility of using the txc-dxtn library</li>
<li>cherry-ignore: 17.1 nominations only</li>
<li>cherry-ignore: fix regression in descriptor set freeing.</li>
<li>cherry-ignore: rejected commits</li>
<li>Update version to 17.0.6</li>
</ul>
<p>Ben Boeckel (1):</p>
<ul>
<li>scons: update for LLVM 4.0</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: move duplicated st_ws_framebuffer() function into header file</li>
</ul>
<p>Chad Versace (3):</p>
<ul>
<li>egl: Emit error when EGLSurface is lost</li>
<li>egl/android: Cancel any outstanding ANativeBuffer in surface destructor</li>
<li>egl/android: Mark surface as lost when dequeueBuffer fails</li>
</ul>
<p>Christian Gmeiner (1):</p>
<ul>
<li>etnaviv: add L8A8_UNORM texture format</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>radv/wsi: report presentation error per image request</li>
<li>radv: enable POLARIS12 support.</li>
</ul>
<p>Emil Velikov (21):</p>
<ul>
<li>travis: correct libdrm required regex to also track libdrm itself</li>
<li>travis: add nearly all gallium drivers to the list</li>
<li>travis: use both cores for make/make check</li>
<li>travis: bring the scons build on par with AppVeyor</li>
<li>travis: explicitly LD_LIBRARY_PATH the local libraries</li>
<li>travis: enable apt cache</li>
<li>travis: automatically manage ccache caching</li>
<li>travis: remove unused -dev packages</li>
<li>travis: rework "if test" blocks in the script section</li>
<li>travis: split out matrix from env</li>
<li>travis: add separate "scons" and "scons llvm" targets</li>
<li>travis: add "scons swr" to the build matrix</li>
<li>travis: add "make swr" to the build matrix</li>
<li>travis: split the make target to three separate ones</li>
<li>travis: model scons check target like the make one</li>
<li>travis: add Gallium state-tracker targets</li>
<li>travis: enable wayland support</li>
<li>travis: bump MAKEFLAGS to -j4</li>
<li>gallium/dri: always link against shared glapi</li>
<li>mesa/dri: always link against shared glapi</li>
<li>glx: glX_proto_send.py: use correct compile guard GLX_INDIRECT_RENDERING</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>nir: Pick just the channels we want for bitmap and drawpixels lowering.</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>gallium/targets: fix bool setting on BE architectures</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>anv/cmd_buffer: Use the device allocator for QueueSubmit</li>
</ul>
<p>Johnson Lin (1):</p>
<ul>
<li>nir/lower_tex: Fix minor error in YUV color conversion matrix</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>radeonsi: adjust ESGS ring buffer size computation on VI</li>
<li>radeonsi: apply the tess+GS hang workaround to Polaris12 as well</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>radeonsi: fix gl_PrimitiveID in tessellation with instanced draws on SI</li>
</ul>
<p>Philipp Zabel (3):</p>
<ul>
<li>renderonly: close transfer prime_fd</li>
<li>renderonly: drop resources on destroy</li>
<li>renderonly: use drmIoctl</li>
</ul>
<p>Rhys Kidd (3):</p>
<ul>
<li>travis: Support LLVM 3.8+ on Trusty-based Travis-CI via apt-get not apt addon</li>
<li>travis: Add radv vulkan driver to continuous integration</li>
<li>travis: Add radeonsi to continuous integration</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno/a3xx: fix hang w/ large render targets and small gmem</li>
</ul>
<p>Samuel Iglesias Gonsálvez (5):</p>
<ul>
<li>i965/vec4: fix vertical stride to avoid breaking region parameter rule</li>
<li>i965/vec4: fix register width for DF VGRF and UNIFORM</li>
<li>i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions</li>
<li>anv: anv_gem_mmap() returns MAP_FAILED as mapping error</li>
<li>anv: vkBindImageMemory() should return VK_ERROR_OUT_OF_{HOST,DEVICE}_MEMORY on failure</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,145 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.7 Release Notes / June 1, 2017</h1>
<p>
Mesa 17.0.7 is a bug fix release which fixes bugs found since the 17.0.6 release.
</p>
<p>
Mesa 17.0.7 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
bc68d13c6b1a053b855ac453ebf7e62bd89511adf44bad6c613e09f7fa13390a mesa-17.0.7.tar.gz
f6d75304a229c8d10443e219d6b6c0c342567dbab5a879ebe7cfa3c9139c4492 mesa-17.0.7.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98833">Bug 98833</a> - [REGRESSION, bisected] Wayland revert commit breaks non-Vsync fullscreen frame updates</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100741">Bug 100741</a> - Chromium - Memory leak</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.6</li>
</ul>
<p>Bartosz Tomczyk (1):</p>
<ul>
<li>mesa: Avoid leaking surface in st_renderbuffer_delete</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>egl: Partially revert 23c86c74, fix eglMakeCurrent</li>
</ul>
<p>Daniel Stone (7):</p>
<ul>
<li>vulkan: Fix Wayland uninitialised registry</li>
<li>vulkan/wsi/wayland: Remove roundtrip when creating image</li>
<li>vulkan/wsi/wayland: Use per-display event queue</li>
<li>vulkan/wsi/wayland: Use proxy wrappers for swapchain</li>
<li>egl/wayland: Don't open-code roundtrip</li>
<li>egl/wayland: Use per-surface event queues</li>
<li>egl/wayland: Ensure we get a back buffer</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>st/va: fix misplaced closing bracket</li>
<li>anv: automake: list shared libraries after the static ones</li>
<li>radv: automake: list shared libraries after the static ones</li>
<li>egl/wayland: select the format based on the interface used</li>
<li>Update version to 17.0.7</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>renderonly: Initialize fields of struct winsys_handle.</li>
<li>vc4: Don't allocate new BOs to avoid synchronization when they're shared.</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>glxglvnddispatch: Add missing dispatch for GetDriverConfig</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nvc0/ir: SHLADD's middle source must be an immediate</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>i965/blorp: Do and end-of-pipe sync on both sides of fast-clear ops</li>
<li>i965: Round copy size to the nearest block in intel_miptree_copy</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>etnaviv: stop oversizing buffer resources</li>
</ul>
<p>Nanley Chery (2):</p>
<ul>
<li>anv/formats: Update the three-channel BC1 mappings</li>
<li>i965/formats: Update the three-channel DXT1 mappings</li>
</ul>
<p>Pohjolainen, Topi (1):</p>
<ul>
<li>intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4</li>
</ul>
<p>Samuel Iglesias Gonsálvez (3):</p>
<ul>
<li>i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution</li>
<li>i965/vec4: fix swizzle and writemask when loading an uniform with constant offset</li>
<li>i965/vec4: load dvec3/4 uniforms first in the push constant buffer</li>
</ul>
<p>Tom Stellard (1):</p>
<ul>
<li>gallivm: Make sure module has the correct data layout when pass manager runs</li>
</ul>
</div>
</body>
</html>

View File

@@ -19,8 +19,7 @@
<p>
Mesa 17.1.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for
<a href="../release-calendar.html#calendar" target="_parent">Mesa 17.1.1</a>.
with a previous release or wait for Mesa 17.1.1.
</p>
<p>
Mesa 17.1.0 implements the OpenGL 4.5 API, but the version reported by

View File

@@ -31,8 +31,7 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
81ae9127286ff8d631e466d258608d6dea9854fe7bee2e8521da44c7544f01e5 mesa-17.1.3.tar.gz
5f1ee9a8aea2880f887884df2dea0c16dd1b13eb42fd2e52265db0dc1b380e8c mesa-17.1.3.tar.xz
TBD
</pre>

View File

@@ -1,68 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.0 Release Notes / TBD</h1>
<p>
Mesa 17.2.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 17.2.1.
</p>
<p>
Mesa 17.2.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_bindless_texture on radeonsi</li>
<li>GL_ARB_post_depth_coverage on nvc0 (GM200+)</li>
<li>GL_ARB_shader_viewport_layer_array on nvc0 (GM200+)</li>
<li>GL_AMD_vertex_shader_layer on nvc0 (GM200+)</li>
<li>GL_AMD_vertex_shader_viewport_index on nvc0 (GM200+)</li>
</ul>
<h2>Bug fixes</h2>
<ul>
TBD
</ul>
<h2>Changes</h2>
<ul>
<li>GL_APPLE_vertex_array_object support removed.</li>
</ul>
</div>
</body>
</html>

View File

@@ -50,8 +50,6 @@ execution. These are generally used for debugging.
The filenames will be "shader_X.vert" or "shader_X.frag" where X
the shader ID.
<li><b>cache_info</b> - print debug information about shader cache
<li><b>cache_fb</b> - force cached shaders to be ignored and do a full
recompile via the fallback path</li>
<li><b>uniform</b> - print message to stdout when glUniform is called
<li><b>nopvert</b> - force vertex shaders to be a simple shader that just transforms
the vertex position with ftransform() and passes through the color and

View File

@@ -1,12 +0,0 @@
#!/bin/sh
# run git from the sources directory
cd "$(dirname "$0")"
# don't print anything if git fails
if ! git_sha1=$(git --git-dir=.git rev-parse --short=10 HEAD 2>/dev/null)
then
exit
fi
printf '#define MESA_GIT_SHA1 "git-%s"\n' "$git_sha1"

View File

@@ -702,7 +702,6 @@ struct __DRIuseInvalidateExtensionRec {
#define __DRI_ATTRIB_BIND_TO_TEXTURE_TARGETS 46
#define __DRI_ATTRIB_YINVERTED 47
#define __DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE 48
#define __DRI_ATTRIB_MAX (__DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE + 1)
/* __DRI_ATTRIB_RENDER_TYPE */
#define __DRI_ATTRIB_RGBA_BIT 0x01
@@ -1137,7 +1136,7 @@ struct __DRIdri2ExtensionRec {
* extensions.
*/
#define __DRI_IMAGE "DRI_IMAGE"
#define __DRI_IMAGE_VERSION 15
#define __DRI_IMAGE_VERSION 14
/**
* These formats correspond to the similarly named MESA_FORMAT_*
@@ -1494,67 +1493,6 @@ struct __DRIimageExtensionRec {
const uint64_t *modifiers,
const unsigned int modifier_count,
void *loaderPrivate);
/*
* Like createImageFromDmaBufs, but takes also format modifiers.
*
* For EGL_EXT_image_dma_buf_import_modifiers.
*
* \since 15
*/
__DRIimage *(*createImageFromDmaBufs2)(__DRIscreen *screen,
int width, int height, int fourcc,
uint64_t modifier,
int *fds, int num_fds,
int *strides, int *offsets,
enum __DRIYUVColorSpace color_space,
enum __DRISampleRange sample_range,
enum __DRIChromaSiting horiz_siting,
enum __DRIChromaSiting vert_siting,
unsigned *error,
void *loaderPrivate);
/*
* dmabuf format query to support EGL_EXT_image_dma_buf_import_modifiers.
*
* \param max Maximum number of formats that can be accomodated into
* \param formats. If zero, no formats are returned -
* instead, the driver returns the total number of
* supported dmabuf formats in \param count.
* \param formats Buffer to fill formats into.
* \param count Count of formats returned, or, total number of
* supported formats in case \param max is zero.
*
* Returns true on success.
*
* \since 15
*/
GLboolean (*queryDmaBufFormats)(__DRIscreen *screen, int max,
int *formats, int *count);
/*
* dmabuf format modifier query for a given format to support
* EGL_EXT_image_dma_buf_import_modifiers.
*
* \param fourcc The format to query modifiers for. If this format
* is not supported by the driver, return false.
* \param max Maximum number of modifiers that can be accomodated in
* \param modifiers. If zero, no modifiers are returned -
* instead, the driver returns the total number of
* modifiers for \param format in \param count.
* \param modifiers Buffer to fill modifiers into.
* \param count Count of the modifiers returned, or, total number of
* supported modifiers for \param fourcc in case
* \param max is zero.
*
* Returns true upon success.
*
* \since 15
*/
GLboolean (*queryDmaBufModifiers)(__DRIscreen *screen, int fourcc,
int max, uint64_t *modifiers,
unsigned int *external_only,
int *count);
};
@@ -1782,19 +1720,6 @@ struct __DRIbackgroundCallableExtensionRec {
* operations (e.g. it should just set a thread-local variable).
*/
void (*setBackgroundContext)(void *loaderPrivate);
/**
* Indicate that it is multithread safe to use glthread. For GLX/EGL
* platforms using Xlib, that involves calling XInitThreads, before
* opening an X display.
*
* Note: only supported if extension version is at least 2.
*
* \param loaderPrivate is the value that was passed to to the driver when
* the context was created. This can be used by the loader to identify
* which context any callbacks are associated with.
*/
GLboolean (*isThreadSafe)(void *loaderPrivate);
};
#endif

View File

@@ -502,13 +502,9 @@ thrd_current(void)
HANDLE hCurrentThread;
BOOL bRet;
/* GetCurrentThread() returns a pseudo-handle, which we need
* to pass to DuplicateHandle(). Only the resulting handle can be used
* from other threads.
*
* Note that neither handle can be compared to the one by thread_create.
* Only the thread IDs - as returned by GetThreadId() and GetCurrentThreadId()
* can be compared directly.
/* GetCurrentThread() returns a pseudo-handle, which is useless. We need
* to call DuplicateHandle to get a real handle. However the handle value
* will not match the one returned by thread_create.
*
* Other potential solutions would be:
* - define thrd_t as a thread Ids, but this would mean we'd need to OpenThread for many operations

View File

@@ -165,26 +165,3 @@ CHIPSET(0x5927, kbl_gt3, "Intel(R) Iris Plus Graphics 650 (Kaby Lake GT3)")
CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x3184, glk, "Intel(R) HD Graphics (Geminilake)")
CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)")
CHIPSET(0x3E90, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
CHIPSET(0x3E93, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
CHIPSET(0x3E91, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E92, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E96, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E9B, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E94, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3EA6, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x3EA7, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x3EA8, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x3EA5, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x5A49, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)")
CHIPSET(0x5A4A, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)")
CHIPSET(0x5A41, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)")
CHIPSET(0x5A42, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)")
CHIPSET(0x5A44, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)")
CHIPSET(0x5A59, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)")
CHIPSET(0x5A5A, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)")
CHIPSET(0x5A5C, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)")
CHIPSET(0x5A50, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
CHIPSET(0x5A51, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
CHIPSET(0x5A52, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
CHIPSET(0x5A54, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")

View File

@@ -213,7 +213,6 @@ CHIPSET(0x6985, POLARIS12_, POLARIS12)
CHIPSET(0x6986, POLARIS12_, POLARIS12)
CHIPSET(0x6987, POLARIS12_, POLARIS12)
CHIPSET(0x6995, POLARIS12_, POLARIS12)
CHIPSET(0x6997, POLARIS12_, POLARIS12)
CHIPSET(0x699F, POLARIS12_, POLARIS12)
CHIPSET(0x6860, VEGA10_, VEGA10)

View File

@@ -43,7 +43,7 @@ extern "C" {
#define VK_VERSION_MINOR(version) (((uint32_t)(version) >> 12) & 0x3ff)
#define VK_VERSION_PATCH(version) ((uint32_t)(version) & 0xfff)
// Version of this file
#define VK_HEADER_VERSION 49
#define VK_HEADER_VERSION 46
#define VK_NULL_HANDLE 0
@@ -261,6 +261,9 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_EXTERNAL_BUFFER_INFO_KHX = 1000071002,
VK_STRUCTURE_TYPE_EXTERNAL_BUFFER_PROPERTIES_KHX = 1000071003,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHX = 1000071004,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2_KHX = 1000071005,
VK_STRUCTURE_TYPE_IMAGE_FORMAT_PROPERTIES_2_KHX = 1000071006,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_IMAGE_FORMAT_INFO_2_KHX = 1000071007,
VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_BUFFER_CREATE_INFO_KHX = 1000072000,
VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO_KHX = 1000072001,
VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO_KHX = 1000072002,
@@ -298,10 +301,6 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DISCARD_RECTANGLE_PROPERTIES_EXT = 1000099000,
VK_STRUCTURE_TYPE_PIPELINE_DISCARD_RECTANGLE_STATE_CREATE_INFO_EXT = 1000099001,
VK_STRUCTURE_TYPE_HDR_METADATA_EXT = 1000105000,
VK_STRUCTURE_TYPE_SHARED_PRESENT_SURFACE_CAPABILITIES_KHR = 1000111000,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SURFACE_INFO_2_KHR = 1000119000,
VK_STRUCTURE_TYPE_SURFACE_CAPABILITIES_2_KHR = 1000119001,
VK_STRUCTURE_TYPE_SURFACE_FORMAT_2_KHR = 1000119002,
VK_STRUCTURE_TYPE_IOS_SURFACE_CREATE_INFO_MVK = 1000122000,
VK_STRUCTURE_TYPE_MACOS_SURFACE_CREATE_INFO_MVK = 1000123000,
VK_STRUCTURE_TYPE_BEGIN_RANGE = VK_STRUCTURE_TYPE_APPLICATION_INFO,
@@ -591,7 +590,6 @@ typedef enum VkImageLayout {
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL = 7,
VK_IMAGE_LAYOUT_PREINITIALIZED = 8,
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR = 1000001002,
VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR = 1000111000,
VK_IMAGE_LAYOUT_BEGIN_RANGE = VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_END_RANGE = VK_IMAGE_LAYOUT_PREINITIALIZED,
VK_IMAGE_LAYOUT_RANGE_SIZE = (VK_IMAGE_LAYOUT_PREINITIALIZED - VK_IMAGE_LAYOUT_UNDEFINED + 1),
@@ -898,47 +896,6 @@ typedef enum VkSubpassContents {
VK_SUBPASS_CONTENTS_MAX_ENUM = 0x7FFFFFFF
} VkSubpassContents;
typedef enum VkObjectType {
VK_OBJECT_TYPE_UNKNOWN = 0,
VK_OBJECT_TYPE_INSTANCE = 1,
VK_OBJECT_TYPE_PHYSICAL_DEVICE = 2,
VK_OBJECT_TYPE_DEVICE = 3,
VK_OBJECT_TYPE_QUEUE = 4,
VK_OBJECT_TYPE_SEMAPHORE = 5,
VK_OBJECT_TYPE_COMMAND_BUFFER = 6,
VK_OBJECT_TYPE_FENCE = 7,
VK_OBJECT_TYPE_DEVICE_MEMORY = 8,
VK_OBJECT_TYPE_BUFFER = 9,
VK_OBJECT_TYPE_IMAGE = 10,
VK_OBJECT_TYPE_EVENT = 11,
VK_OBJECT_TYPE_QUERY_POOL = 12,
VK_OBJECT_TYPE_BUFFER_VIEW = 13,
VK_OBJECT_TYPE_IMAGE_VIEW = 14,
VK_OBJECT_TYPE_SHADER_MODULE = 15,
VK_OBJECT_TYPE_PIPELINE_CACHE = 16,
VK_OBJECT_TYPE_PIPELINE_LAYOUT = 17,
VK_OBJECT_TYPE_RENDER_PASS = 18,
VK_OBJECT_TYPE_PIPELINE = 19,
VK_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT = 20,
VK_OBJECT_TYPE_SAMPLER = 21,
VK_OBJECT_TYPE_DESCRIPTOR_POOL = 22,
VK_OBJECT_TYPE_DESCRIPTOR_SET = 23,
VK_OBJECT_TYPE_FRAMEBUFFER = 24,
VK_OBJECT_TYPE_COMMAND_POOL = 25,
VK_OBJECT_TYPE_SURFACE_KHR = 1000000000,
VK_OBJECT_TYPE_SWAPCHAIN_KHR = 1000001000,
VK_OBJECT_TYPE_DISPLAY_KHR = 1000002000,
VK_OBJECT_TYPE_DISPLAY_MODE_KHR = 1000002001,
VK_OBJECT_TYPE_DEBUG_REPORT_CALLBACK_EXT = 1000011000,
VK_OBJECT_TYPE_DESCRIPTOR_UPDATE_TEMPLATE_KHR = 1000085000,
VK_OBJECT_TYPE_OBJECT_TABLE_NVX = 1000086000,
VK_OBJECT_TYPE_INDIRECT_COMMANDS_LAYOUT_NVX = 1000086001,
VK_OBJECT_TYPE_BEGIN_RANGE = VK_OBJECT_TYPE_UNKNOWN,
VK_OBJECT_TYPE_END_RANGE = VK_OBJECT_TYPE_COMMAND_POOL,
VK_OBJECT_TYPE_RANGE_SIZE = (VK_OBJECT_TYPE_COMMAND_POOL - VK_OBJECT_TYPE_UNKNOWN + 1),
VK_OBJECT_TYPE_MAX_ENUM = 0x7FFFFFFF
} VkObjectType;
typedef VkFlags VkInstanceCreateFlags;
typedef enum VkFormatFeatureFlagBits {
@@ -3366,8 +3323,6 @@ typedef enum VkPresentModeKHR {
VK_PRESENT_MODE_MAILBOX_KHR = 1,
VK_PRESENT_MODE_FIFO_KHR = 2,
VK_PRESENT_MODE_FIFO_RELAXED_KHR = 3,
VK_PRESENT_MODE_SHARED_DEMAND_REFRESH_KHR = 1000111000,
VK_PRESENT_MODE_SHARED_CONTINUOUS_REFRESH_KHR = 1000111001,
VK_PRESENT_MODE_BEGIN_RANGE_KHR = VK_PRESENT_MODE_IMMEDIATE_KHR,
VK_PRESENT_MODE_END_RANGE_KHR = VK_PRESENT_MODE_FIFO_RELAXED_KHR,
VK_PRESENT_MODE_RANGE_SIZE_KHR = (VK_PRESENT_MODE_FIFO_RELAXED_KHR - VK_PRESENT_MODE_IMMEDIATE_KHR + 1),
@@ -4146,64 +4101,6 @@ VKAPI_ATTR void VKAPI_CALL vkCmdPushDescriptorSetWithTemplateKHR(
const void* pData);
#endif
#define VK_KHR_shared_presentable_image 1
#define VK_KHR_SHARED_PRESENTABLE_IMAGE_SPEC_VERSION 1
#define VK_KHR_SHARED_PRESENTABLE_IMAGE_EXTENSION_NAME "VK_KHR_shared_presentable_image"
typedef struct VkSharedPresentSurfaceCapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkImageUsageFlags sharedPresentSupportedUsageFlags;
} VkSharedPresentSurfaceCapabilitiesKHR;
typedef VkResult (VKAPI_PTR *PFN_vkGetSwapchainStatusKHR)(VkDevice device, VkSwapchainKHR swapchain);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR VkResult VKAPI_CALL vkGetSwapchainStatusKHR(
VkDevice device,
VkSwapchainKHR swapchain);
#endif
#define VK_KHR_get_surface_capabilities2 1
#define VK_KHR_GET_SURFACE_CAPABILITIES_2_SPEC_VERSION 1
#define VK_KHR_GET_SURFACE_CAPABILITIES_2_EXTENSION_NAME "VK_KHR_get_surface_capabilities2"
typedef struct VkPhysicalDeviceSurfaceInfo2KHR {
VkStructureType sType;
const void* pNext;
VkSurfaceKHR surface;
} VkPhysicalDeviceSurfaceInfo2KHR;
typedef struct VkSurfaceCapabilities2KHR {
VkStructureType sType;
void* pNext;
VkSurfaceCapabilitiesKHR surfaceCapabilities;
} VkSurfaceCapabilities2KHR;
typedef struct VkSurfaceFormat2KHR {
VkStructureType sType;
void* pNext;
VkSurfaceFormatKHR surfaceFormat;
} VkSurfaceFormat2KHR;
typedef VkResult (VKAPI_PTR *PFN_vkGetPhysicalDeviceSurfaceCapabilities2KHR)(VkPhysicalDevice physicalDevice, const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo, VkSurfaceCapabilities2KHR* pSurfaceCapabilities);
typedef VkResult (VKAPI_PTR *PFN_vkGetPhysicalDeviceSurfaceFormats2KHR)(VkPhysicalDevice physicalDevice, const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo, uint32_t* pSurfaceFormatCount, VkSurfaceFormat2KHR* pSurfaceFormats);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceCapabilities2KHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo,
VkSurfaceCapabilities2KHR* pSurfaceCapabilities);
VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceFormats2KHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo,
uint32_t* pSurfaceFormatCount,
VkSurfaceFormat2KHR* pSurfaceFormats);
#endif
#define VK_EXT_debug_report 1
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkDebugReportCallbackEXT)

View File

@@ -21,7 +21,18 @@
.PHONY: git_sha1.h.tmp
git_sha1.h.tmp:
@sh $(top_srcdir)/git_sha1_gen.sh > $@
@# Don't assume that $(top_srcdir)/.git is a directory. It may be
@# a gitlink file if $(top_srcdir) is a submodule checkout or a linked
@# worktree.
@# If we are building from a release tarball copy the bundled header.
@touch git_sha1.h.tmp
@if test -e $(top_srcdir)/.git; then \
if which git > /dev/null; then \
git --git-dir=$(top_srcdir)/.git log -n 1 --oneline | \
sed 's/^\([^ ]*\) .*/#define MESA_GIT_SHA1 "git-\1"/' \
> git_sha1.h.tmp ; \
fi \
fi
git_sha1.h: git_sha1.h.tmp
@echo "updating git_sha1.h"

View File

@@ -22,15 +22,27 @@ def write_git_sha1_h_file(filename):
to retrieve the git hashid and write the header file. An empty file
will be created if anything goes wrong."""
tempfile = "git_sha1.h.tmp"
with open(tempfile, "w") as f:
args = [ 'sh', Dir('#').abspath + '/git_sha1_gen.sh' ]
try:
subprocess.Popen(args, stdout=f).wait()
except:
print "Warning: exception in write_git_sha1_h_file()"
return
args = [ 'git', 'rev-parse', '--short=10', 'HEAD' ]
try:
(commit, foo) = subprocess.Popen(args, stdout=subprocess.PIPE).communicate()
except:
print "Warning: exception in write_git_sha1_h_file()"
# git log command didn't work
if not os.path.exists(filename):
dirname = os.path.dirname(filename)
if dirname and not os.path.exists(dirname):
os.makedirs(dirname)
# create an empty file if none already exists
f = open(filename, "w")
f.close()
return
# note that commit[:-1] removes the trailing newline character
commit = '#define MESA_GIT_SHA1 "git-%s"\n' % commit[:-1]
tempfile = "git_sha1.h.tmp"
f = open(tempfile, "w")
f.write(commit)
f.close()
if not os.path.exists(filename) or not filecmp.cmp(tempfile, filename):
# The filename does not exist or it's different from the new file,
# so replace old file with new.

View File

@@ -42,11 +42,5 @@ LOCAL_C_INCLUDES := \
$(MESA_TOP)/src/amd/addrlib/gfx9/chip \
$(MESA_TOP)/src/amd/addrlib/r800/chip
LOCAL_EXPORT_C_INCLUDE_DIRS := \
$(LOCAL_PATH) \
$(LOCAL_PATH)/addrlib/core \
$(LOCAL_PATH)/addrlib/inc/chip/r800 \
$(LOCAL_PATH)/addrlib/r800/chip
include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)

View File

@@ -29,7 +29,6 @@ include $(CLEAR_VARS)
LOCAL_MODULE := libmesa_amd_common
LOCAL_SRC_FILES := \
$(AMD_COMMON_FILES) \
$(AMD_COMPILER_FILES) \
$(AMD_DEBUG_FILES)
@@ -50,27 +49,15 @@ LOCAL_C_INCLUDES := \
$(MESA_TOP)/include \
$(MESA_TOP)/src \
$(MESA_TOP)/src/amd/common \
$(MESA_TOP)/src/compiler \
$(call generated-sources-dir-for,STATIC_LIBRARIES,libmesa_nir,,)/nir \
$(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary \
$(intermediates)/common \
external/llvm/include \
external/llvm/device/include
external/llvm/device/include \
external/libcxx/include \
$(ELF_INCLUDES)
LOCAL_EXPORT_C_INCLUDE_DIRS := \
$(LOCAL_PATH)/common
LOCAL_SHARED_LIBRARIES := \
libdrm_amdgpu
LOCAL_STATIC_LIBRARIES := \
libmesa_nir
LOCAL_WHOLE_STATIC_LIBRARIES := \
libelf
$(call mesa-build-with-llvm)
LOCAL_STATIC_LIBRARIES := libLLVMCore
include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)

View File

@@ -25,7 +25,6 @@ COMMON_LIBS = common/libamd_common.la
# TODO cleanup these
common_libamd_common_la_CPPFLAGS = \
$(AMDGPU_CFLAGS) \
$(VALGRIND_CFLAGS) \
$(DEFINES) \
-I$(top_srcdir)/include \
@@ -55,7 +54,6 @@ common_libamd_common_la_CXXFLAGS = \
noinst_LTLIBRARIES += $(COMMON_LIBS)
common_libamd_common_la_SOURCES = \
$(AMD_COMMON_FILES) \
$(AMD_COMPILER_FILES) \
$(AMD_DEBUG_FILES) \
$(AMD_GENERATED_FILES)

View File

@@ -42,25 +42,16 @@ ADDRLIB_FILES = \
AMD_COMPILER_FILES = \
common/ac_binary.c \
common/ac_binary.h \
common/ac_exp_param.h \
common/ac_llvm_build.c \
common/ac_llvm_build.h \
common/ac_llvm_helper.cpp \
common/ac_llvm_util.c \
common/ac_llvm_util.h \
common/ac_shader_info.c \
common/ac_shader_info.h
common/ac_llvm_util.h
AMD_NIR_FILES = \
common/ac_nir_to_llvm.c \
common/ac_nir_to_llvm.h
AMD_COMMON_FILES = \
common/ac_gpu_info.c \
common/ac_gpu_info.h \
common/ac_surface.c \
common/ac_surface.h
AMD_DEBUG_FILES = \
common/ac_debug.c \
common/ac_debug.h

View File

@@ -132,15 +132,9 @@ void ac_dump_reg(FILE *file, unsigned offset, uint32_t value,
static void ac_parse_set_reg_packet(FILE *f, uint32_t *ib, unsigned count,
unsigned reg_offset)
{
unsigned reg = ((ib[1] & 0xFFFF) << 2) + reg_offset;
unsigned index = ib[1] >> 28;
unsigned reg = (ib[1] << 2) + reg_offset;
int i;
if (index != 0) {
print_spaces(f, INDENT_PKT);
fprintf(f, "INDEX = %u\n", index);
}
for (i = 0; i < count; i++)
ac_dump_reg(f, reg + i*4, ib[2+i], ~0);
}
@@ -220,52 +214,6 @@ static uint32_t *ac_parse_packet3(FILE *f, uint32_t *ib, int *num_dw,
print_named_value(f, "ADDRESS_HI", ib[3], 16);
}
break;
case PKT3_EVENT_WRITE_EOP:
ac_dump_reg(f, R_028A90_VGT_EVENT_INITIATOR, ib[1],
S_028A90_EVENT_TYPE(~0));
print_named_value(f, "EVENT_INDEX", (ib[1] >> 8) & 0xf, 4);
print_named_value(f, "TCL1_VOL_ACTION_ENA", (ib[1] >> 12) & 0x1, 1);
print_named_value(f, "TC_VOL_ACTION_ENA", (ib[1] >> 13) & 0x1, 1);
print_named_value(f, "TC_WB_ACTION_ENA", (ib[1] >> 15) & 0x1, 1);
print_named_value(f, "TCL1_ACTION_ENA", (ib[1] >> 16) & 0x1, 1);
print_named_value(f, "TC_ACTION_ENA", (ib[1] >> 17) & 0x1, 1);
print_named_value(f, "ADDRESS_LO", ib[2], 32);
print_named_value(f, "ADDRESS_HI", ib[3], 16);
print_named_value(f, "DST_SEL", (ib[3] >> 16) & 0x3, 2);
print_named_value(f, "INT_SEL", (ib[3] >> 24) & 0x7, 3);
print_named_value(f, "DATA_SEL", ib[3] >> 29, 3);
print_named_value(f, "DATA_LO", ib[4], 32);
print_named_value(f, "DATA_HI", ib[5], 32);
break;
case PKT3_RELEASE_MEM:
ac_dump_reg(f, R_028A90_VGT_EVENT_INITIATOR, ib[1],
S_028A90_EVENT_TYPE(~0));
print_named_value(f, "EVENT_INDEX", (ib[1] >> 8) & 0xf, 4);
print_named_value(f, "TCL1_VOL_ACTION_ENA", (ib[1] >> 12) & 0x1, 1);
print_named_value(f, "TC_VOL_ACTION_ENA", (ib[1] >> 13) & 0x1, 1);
print_named_value(f, "TC_WB_ACTION_ENA", (ib[1] >> 15) & 0x1, 1);
print_named_value(f, "TCL1_ACTION_ENA", (ib[1] >> 16) & 0x1, 1);
print_named_value(f, "TC_ACTION_ENA", (ib[1] >> 17) & 0x1, 1);
print_named_value(f, "TC_NC_ACTION_ENA", (ib[1] >> 19) & 0x1, 1);
print_named_value(f, "TC_WC_ACTION_ENA", (ib[1] >> 20) & 0x1, 1);
print_named_value(f, "TC_MD_ACTION_ENA", (ib[1] >> 21) & 0x1, 1);
print_named_value(f, "DST_SEL", (ib[2] >> 16) & 0x3, 2);
print_named_value(f, "INT_SEL", (ib[2] >> 24) & 0x7, 3);
print_named_value(f, "DATA_SEL", ib[2] >> 29, 3);
print_named_value(f, "ADDRESS_LO", ib[3], 32);
print_named_value(f, "ADDRESS_HI", ib[4], 32);
print_named_value(f, "DATA_LO", ib[5], 32);
print_named_value(f, "DATA_HI", ib[6], 32);
print_named_value(f, "CTXID", ib[7], 32);
break;
case PKT3_WAIT_REG_MEM:
print_named_value(f, "OP", ib[1], 32);
print_named_value(f, "ADDRESS_LO", ib[2], 32);
print_named_value(f, "ADDRESS_HI", ib[3], 32);
print_named_value(f, "REF", ib[4], 32);
print_named_value(f, "MASK", ib[5], 32);
print_named_value(f, "POLL_INTERVAL", ib[6], 16);
break;
case PKT3_DRAW_INDEX_AUTO:
ac_dump_reg(f, R_030930_VGT_NUM_INDICES, ib[1], ~0);
ac_dump_reg(f, R_0287F0_VGT_DRAW_INITIATOR, ib[2], ~0);

View File

@@ -1,40 +0,0 @@
/*
* Copyright 2014 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
*/
#ifndef AC_EXP_PARAM_H
#define AC_EXP_PARAM_H
enum {
/* SPI_PS_INPUT_CNTL_i.OFFSET[0:4] */
AC_EXP_PARAM_OFFSET_0 = 0,
AC_EXP_PARAM_OFFSET_31 = 31,
/* SPI_PS_INPUT_CNTL_i.DEFAULT_VAL[0:1] */
AC_EXP_PARAM_DEFAULT_VAL_0000 = 64,
AC_EXP_PARAM_DEFAULT_VAL_0001,
AC_EXP_PARAM_DEFAULT_VAL_1110,
AC_EXP_PARAM_DEFAULT_VAL_1111,
AC_EXP_PARAM_UNDEFINED = 255,
};
#endif

View File

@@ -1,303 +0,0 @@
/*
* Copyright © 2017 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS
* AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*/
#include "ac_gpu_info.h"
#include "sid.h"
#include "gfx9d.h"
#include "util/u_math.h"
#include <stdio.h>
#include <xf86drm.h>
#include <amdgpu_drm.h>
#include <amdgpu.h>
#define CIK_TILE_MODE_COLOR_2D 14
#define CIK__GB_TILE_MODE__PIPE_CONFIG(x) (((x) >> 6) & 0x1f)
#define CIK__PIPE_CONFIG__ADDR_SURF_P2 0
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_8x16 4
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_16x16 5
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_16x32 6
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_32x32 7
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x16_8x16 8
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_8x16 9
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_8x16 10
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_16x16 11
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x16 12
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x32 13
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x64_32x32 14
#define CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_8X16 16
#define CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_16X16 17
static unsigned cik_get_num_tile_pipes(struct amdgpu_gpu_info *info)
{
unsigned mode2d = info->gb_tile_mode[CIK_TILE_MODE_COLOR_2D];
switch (CIK__GB_TILE_MODE__PIPE_CONFIG(mode2d)) {
case CIK__PIPE_CONFIG__ADDR_SURF_P2:
return 2;
case CIK__PIPE_CONFIG__ADDR_SURF_P4_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_16x32:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_32x32:
return 4;
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x16_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x32:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x64_32x32:
return 8;
case CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_8X16:
case CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_16X16:
return 16;
default:
fprintf(stderr, "Invalid CIK pipe configuration, assuming P2\n");
assert(!"this should never occur");
return 2;
}
}
bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
struct radeon_info *info,
struct amdgpu_gpu_info *amdinfo)
{
struct amdgpu_buffer_size_alignments alignment_info = {};
struct amdgpu_heap_info vram, vram_vis, gtt;
struct drm_amdgpu_info_hw_ip dma = {}, compute = {}, uvd = {}, vce = {}, vcn_dec = {};
uint32_t vce_version = 0, vce_feature = 0, uvd_version = 0, uvd_feature = 0;
uint32_t unused_feature;
int r, i, j;
drmDevicePtr devinfo;
/* Get PCI info. */
r = drmGetDevice2(fd, 0, &devinfo);
if (r) {
fprintf(stderr, "amdgpu: drmGetDevice2 failed.\n");
return false;
}
info->pci_domain = devinfo->businfo.pci->domain;
info->pci_bus = devinfo->businfo.pci->bus;
info->pci_dev = devinfo->businfo.pci->dev;
info->pci_func = devinfo->businfo.pci->func;
drmFreeDevice(&devinfo);
/* Query hardware and driver information. */
r = amdgpu_query_gpu_info(dev, amdinfo);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_gpu_info failed.\n");
return false;
}
r = amdgpu_query_buffer_size_alignment(dev, &alignment_info);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_buffer_size_alignment failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM, 0, &vram);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM,
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED,
&vram_vis);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram_vis) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_GTT, 0, &gtt);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(gtt) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_DMA, 0, &dma);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(dma) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_COMPUTE, 0, &compute);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(compute) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_UVD, 0, &uvd);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(uvd) failed.\n");
return false;
}
if (info->drm_major == 3 && info->drm_minor >= 17) {
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_VCN_DEC, 0, &vcn_dec);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(vcn_dec) failed.\n");
return false;
}
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_ME, 0, 0,
&info->me_fw_version, &unused_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(me) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_PFP, 0, 0,
&info->pfp_fw_version, &unused_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(pfp) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_CE, 0, 0,
&info->ce_fw_version, &unused_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(ce) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_UVD, 0, 0,
&uvd_version, &uvd_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(uvd) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_VCE, 0, &vce);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(vce) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_VCE, 0, 0,
&vce_version, &vce_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(vce) failed.\n");
return false;
}
/* Set chip identification. */
info->pci_id = amdinfo->asic_id; /* TODO: is this correct? */
info->vce_harvest_config = amdinfo->vce_harvest_config;
switch (info->pci_id) {
#define CHIPSET(pci_id, name, cfamily) case pci_id: info->family = CHIP_##cfamily; break;
#include "pci_ids/radeonsi_pci_ids.h"
#undef CHIPSET
default:
fprintf(stderr, "amdgpu: Invalid PCI ID.\n");
return false;
}
if (info->family >= CHIP_VEGA10)
info->chip_class = GFX9;
else if (info->family >= CHIP_TONGA)
info->chip_class = VI;
else if (info->family >= CHIP_BONAIRE)
info->chip_class = CIK;
else if (info->family >= CHIP_TAHITI)
info->chip_class = SI;
else {
fprintf(stderr, "amdgpu: Unknown family.\n");
return false;
}
/* Set which chips have dedicated VRAM. */
info->has_dedicated_vram =
!(amdinfo->ids_flags & AMDGPU_IDS_FLAGS_FUSION);
/* Set hardware information. */
info->gart_size = gtt.heap_size;
info->vram_size = vram.heap_size;
info->vram_vis_size = vram_vis.heap_size;
/* The kernel can split large buffers in VRAM but not in GTT, so large
* allocations can fail or cause buffer movement failures in the kernel.
*/
info->max_alloc_size = MIN2(info->vram_size * 0.9, info->gart_size * 0.7);
/* convert the shader clock from KHz to MHz */
info->max_shader_clock = amdinfo->max_engine_clk / 1000;
info->max_se = amdinfo->num_shader_engines;
info->max_sh_per_se = amdinfo->num_shader_arrays_per_engine;
info->has_hw_decode =
(uvd.available_rings != 0) || (vcn_dec.available_rings != 0);
info->uvd_fw_version =
uvd.available_rings ? uvd_version : 0;
info->vce_fw_version =
vce.available_rings ? vce_version : 0;
info->has_userptr = true;
info->num_render_backends = amdinfo->rb_pipes;
info->clock_crystal_freq = amdinfo->gpu_counter_freq;
info->tcc_cache_line_size = 64; /* TC L2 line size on GCN */
if (info->chip_class == GFX9) {
info->num_tile_pipes = 1 << G_0098F8_NUM_PIPES(amdinfo->gb_addr_cfg);
info->pipe_interleave_bytes =
256 << G_0098F8_PIPE_INTERLEAVE_SIZE_GFX9(amdinfo->gb_addr_cfg);
} else {
info->num_tile_pipes = cik_get_num_tile_pipes(amdinfo);
info->pipe_interleave_bytes =
256 << G_0098F8_PIPE_INTERLEAVE_SIZE_GFX6(amdinfo->gb_addr_cfg);
}
info->has_virtual_memory = true;
assert(util_is_power_of_two(dma.available_rings + 1));
assert(util_is_power_of_two(compute.available_rings + 1));
info->num_sdma_rings = util_bitcount(dma.available_rings);
info->num_compute_rings = util_bitcount(compute.available_rings);
/* Get the number of good compute units. */
info->num_good_compute_units = 0;
for (i = 0; i < info->max_se; i++)
for (j = 0; j < info->max_sh_per_se; j++)
info->num_good_compute_units +=
util_bitcount(amdinfo->cu_bitmap[i][j]);
memcpy(info->si_tile_mode_array, amdinfo->gb_tile_mode,
sizeof(amdinfo->gb_tile_mode));
info->enabled_rb_mask = amdinfo->enabled_rb_pipes_mask;
memcpy(info->cik_macrotile_mode_array, amdinfo->gb_macro_tile_mode,
sizeof(amdinfo->gb_macro_tile_mode));
info->pte_fragment_size = alignment_info.size_local;
info->gart_page_size = alignment_info.size_remote;
if (info->chip_class == SI)
info->gfx_ib_pad_with_type2 = TRUE;
return true;
}

View File

@@ -1,111 +0,0 @@
/*
* Copyright © 2017 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS
* AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*/
#ifndef AC_GPU_INFO_H
#define AC_GPU_INFO_H
#include <stdint.h>
#include <stdbool.h>
#include "amd_family.h"
#ifdef __cplusplus
extern "C" {
#endif
/* Prior to C11 the following may trigger a typedef redeclaration warning */
typedef struct amdgpu_device *amdgpu_device_handle;
struct amdgpu_gpu_info;
struct radeon_info {
/* PCI info: domain:bus:dev:func */
uint32_t pci_domain;
uint32_t pci_bus;
uint32_t pci_dev;
uint32_t pci_func;
/* Device info. */
uint32_t pci_id;
enum radeon_family family;
enum chip_class chip_class;
uint32_t pte_fragment_size;
uint32_t gart_page_size;
uint64_t gart_size;
uint64_t vram_size;
uint64_t vram_vis_size;
uint64_t max_alloc_size;
uint32_t min_alloc_size;
bool has_dedicated_vram;
bool has_virtual_memory;
bool gfx_ib_pad_with_type2;
bool has_hw_decode;
uint32_t num_sdma_rings;
uint32_t num_compute_rings;
uint32_t uvd_fw_version;
uint32_t vce_fw_version;
uint32_t me_fw_version;
uint32_t pfp_fw_version;
uint32_t ce_fw_version;
uint32_t vce_harvest_config;
uint32_t clock_crystal_freq;
uint32_t tcc_cache_line_size;
/* Kernel info. */
uint32_t drm_major; /* version */
uint32_t drm_minor;
uint32_t drm_patchlevel;
bool has_userptr;
/* Shader cores. */
uint32_t r600_max_quad_pipes; /* wave size / 16 */
uint32_t max_shader_clock;
uint32_t num_good_compute_units;
uint32_t max_se; /* shader engines */
uint32_t max_sh_per_se; /* shader arrays per shader engine */
/* Render backends (color + depth blocks). */
uint32_t r300_num_gb_pipes;
uint32_t r300_num_z_pipes;
uint32_t r600_gb_backend_map; /* R600 harvest config */
bool r600_gb_backend_map_valid;
uint32_t r600_num_banks;
uint32_t num_render_backends;
uint32_t num_tile_pipes; /* pipe count from PIPE_CONFIG */
uint32_t pipe_interleave_bytes;
uint32_t enabled_rb_mask; /* GCN harvest config */
/* Tile modes. */
uint32_t si_tile_mode_array[32];
uint32_t cik_macrotile_mode_array[16];
};
bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
struct radeon_info *info,
struct amdgpu_gpu_info *amdinfo);
#ifdef __cplusplus
}
#endif
#endif /* AC_GPU_INFO_H */

View File

@@ -33,13 +33,11 @@
#include <stdio.h>
#include "ac_llvm_util.h"
#include "ac_exp_param.h"
#include "util/bitscan.h"
#include "util/macros.h"
#include "sid.h"
#include "shader_enums.h"
/* Initialize module-independent parts of the context.
*
* The caller is responsible for initializing ctx::module and ctx::builder.
@@ -56,20 +54,11 @@ ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context)
ctx->voidt = LLVMVoidTypeInContext(ctx->context);
ctx->i1 = LLVMInt1TypeInContext(ctx->context);
ctx->i8 = LLVMInt8TypeInContext(ctx->context);
ctx->i16 = LLVMIntTypeInContext(ctx->context, 16);
ctx->i32 = LLVMIntTypeInContext(ctx->context, 32);
ctx->i64 = LLVMIntTypeInContext(ctx->context, 64);
ctx->f16 = LLVMHalfTypeInContext(ctx->context);
ctx->f32 = LLVMFloatTypeInContext(ctx->context);
ctx->f64 = LLVMDoubleTypeInContext(ctx->context);
ctx->v4i32 = LLVMVectorType(ctx->i32, 4);
ctx->v4f32 = LLVMVectorType(ctx->f32, 4);
ctx->v8i32 = LLVMVectorType(ctx->i32, 8);
ctx->i32_0 = LLVMConstInt(ctx->i32, 0, false);
ctx->i32_1 = LLVMConstInt(ctx->i32, 1, false);
ctx->f32_0 = LLVMConstReal(ctx->f32, 0.0);
ctx->f32_1 = LLVMConstReal(ctx->f32, 1.0);
ctx->v16i8 = LLVMVectorType(ctx->i8, 16);
ctx->range_md_kind = LLVMGetMDKindIDInContext(ctx->context,
"range", 5);
@@ -242,16 +231,42 @@ build_cube_intrinsic(struct ac_llvm_context *ctx,
LLVMValueRef in[3],
struct cube_selection_coords *out)
{
LLVMTypeRef f32 = ctx->f32;
LLVMBuilderRef builder = ctx->builder;
out->stc[1] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubetc",
f32, in, 3, AC_FUNC_ATTR_READNONE);
out->stc[0] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubesc",
f32, in, 3, AC_FUNC_ATTR_READNONE);
out->ma = ac_build_intrinsic(ctx, "llvm.amdgcn.cubema",
f32, in, 3, AC_FUNC_ATTR_READNONE);
out->id = ac_build_intrinsic(ctx, "llvm.amdgcn.cubeid",
f32, in, 3, AC_FUNC_ATTR_READNONE);
if (HAVE_LLVM >= 0x0309) {
LLVMTypeRef f32 = ctx->f32;
out->stc[1] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubetc",
f32, in, 3, AC_FUNC_ATTR_READNONE);
out->stc[0] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubesc",
f32, in, 3, AC_FUNC_ATTR_READNONE);
out->ma = ac_build_intrinsic(ctx, "llvm.amdgcn.cubema",
f32, in, 3, AC_FUNC_ATTR_READNONE);
out->id = ac_build_intrinsic(ctx, "llvm.amdgcn.cubeid",
f32, in, 3, AC_FUNC_ATTR_READNONE);
} else {
LLVMValueRef c[4] = {
in[0],
in[1],
in[2],
LLVMGetUndef(LLVMTypeOf(in[0]))
};
LLVMValueRef vec = ac_build_gather_values(ctx, c, 4);
LLVMValueRef tmp =
ac_build_intrinsic(ctx, "llvm.AMDGPU.cube",
LLVMTypeOf(vec), &vec, 1,
AC_FUNC_ATTR_READNONE);
out->stc[1] = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 0, 0), "");
out->stc[0] = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 1, 0), "");
out->ma = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 2, 0), "");
out->id = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 3, 0), "");
}
}
/**
@@ -541,7 +556,7 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
bool has_add_tid)
{
/* TODO: Fix stores with ADD_TID and remove the "has_add_tid" flag. */
if (!has_add_tid) {
if (HAVE_LLVM >= 0x0309 && !has_add_tid) {
/* Split 3 channel stores, becase LLVM doesn't support 3-channel
* intrinsics. */
if (num_channels == 3) {
@@ -642,89 +657,114 @@ ac_build_buffer_load(struct ac_llvm_context *ctx,
unsigned inst_offset,
unsigned glc,
unsigned slc,
bool can_speculate,
bool allow_smem)
bool readonly_memory)
{
LLVMValueRef offset = LLVMConstInt(ctx->i32, inst_offset, 0);
if (voffset)
offset = LLVMBuildAdd(ctx->builder, offset, voffset, "");
if (soffset)
offset = LLVMBuildAdd(ctx->builder, offset, soffset, "");
/* TODO: VI and later generations can use SMEM with GLC=1.*/
if (allow_smem && !glc && !slc) {
assert(vindex == NULL);
LLVMValueRef result[4];
for (int i = 0; i < num_channels; i++) {
if (i) {
offset = LLVMBuildAdd(ctx->builder, offset,
LLVMConstInt(ctx->i32, 4, 0), "");
}
LLVMValueRef args[2] = {rsrc, offset};
result[i] = ac_build_intrinsic(ctx, "llvm.SI.load.const.v4i32",
ctx->f32, args, 2,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_LEGACY);
}
if (num_channels == 1)
return result[0];
if (num_channels == 3)
result[num_channels++] = LLVMGetUndef(ctx->f32);
return ac_build_gather_values(ctx, result, num_channels);
}
unsigned func = CLAMP(num_channels, 1, 3) - 1;
LLVMValueRef args[] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
offset,
LLVMConstInt(ctx->i1, glc, 0),
LLVMConstInt(ctx->i1, slc, 0)
};
if (HAVE_LLVM >= 0x309) {
LLVMValueRef args[] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
LLVMConstInt(ctx->i32, inst_offset, 0),
LLVMConstInt(ctx->i1, glc, 0),
LLVMConstInt(ctx->i1, slc, 0)
};
LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2),
ctx->v4f32};
const char *type_names[] = {"f32", "v2f32", "v4f32"};
char name[256];
LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2),
ctx->v4f32};
const char *type_names[] = {"f32", "v2f32", "v4f32"};
char name[256];
snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
type_names[func]);
if (voffset) {
args[2] = LLVMBuildAdd(ctx->builder, args[2], voffset,
"");
}
return ac_build_intrinsic(ctx, name, types[func], args,
ARRAY_SIZE(args),
/* READNONE means writes can't affect it, while
* READONLY means that writes can affect it. */
can_speculate && HAVE_LLVM >= 0x0400 ?
AC_FUNC_ATTR_READNONE :
AC_FUNC_ATTR_READONLY);
if (soffset) {
args[2] = LLVMBuildAdd(ctx->builder, args[2], soffset,
"");
}
snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
type_names[func]);
return ac_build_intrinsic(ctx, name, types[func], args,
ARRAY_SIZE(args),
/* READNONE means writes can't
* affect it, while READONLY means
* that writes can affect it. */
readonly_memory && HAVE_LLVM >= 0x0400 ?
AC_FUNC_ATTR_READNONE :
AC_FUNC_ATTR_READONLY);
} else {
LLVMValueRef args[] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v16i8, ""),
voffset ? voffset : vindex,
soffset,
LLVMConstInt(ctx->i32, inst_offset, 0),
LLVMConstInt(ctx->i32, voffset ? 1 : 0, 0), // offen
LLVMConstInt(ctx->i32, vindex ? 1 : 0, 0), //idxen
LLVMConstInt(ctx->i32, glc, 0),
LLVMConstInt(ctx->i32, slc, 0),
LLVMConstInt(ctx->i32, 0, 0), // TFE
};
LLVMTypeRef types[] = {ctx->i32, LLVMVectorType(ctx->i32, 2),
ctx->v4i32};
const char *type_names[] = {"i32", "v2i32", "v4i32"};
const char *arg_type = "i32";
char name[256];
if (voffset && vindex) {
LLVMValueRef vaddr[] = {vindex, voffset};
arg_type = "v2i32";
args[1] = ac_build_gather_values(ctx, vaddr, 2);
}
snprintf(name, sizeof(name), "llvm.SI.buffer.load.dword.%s.%s",
type_names[func], arg_type);
return ac_build_intrinsic(ctx, name, types[func], args,
ARRAY_SIZE(args), AC_FUNC_ATTR_READONLY);
}
}
LLVMValueRef ac_build_buffer_load_format(struct ac_llvm_context *ctx,
LLVMValueRef rsrc,
LLVMValueRef vindex,
LLVMValueRef voffset,
bool can_speculate)
bool readonly_memory)
{
LLVMValueRef args [] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
vindex,
voffset,
LLVMConstInt(ctx->i1, 0, 0), /* glc */
LLVMConstInt(ctx->i1, 0, 0), /* slc */
};
if (HAVE_LLVM >= 0x0309) {
LLVMValueRef args [] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
vindex,
voffset,
LLVMConstInt(ctx->i1, 0, 0), /* glc */
LLVMConstInt(ctx->i1, 0, 0), /* slc */
};
return ac_build_intrinsic(ctx,
"llvm.amdgcn.buffer.load.format.v4f32",
ctx->v4f32, args, ARRAY_SIZE(args),
/* READNONE means writes can't affect it, while
* READONLY means that writes can affect it. */
can_speculate && HAVE_LLVM >= 0x0400 ?
AC_FUNC_ATTR_READNONE :
AC_FUNC_ATTR_READONLY);
return ac_build_intrinsic(ctx,
"llvm.amdgcn.buffer.load.format.v4f32",
ctx->v4f32, args, ARRAY_SIZE(args),
/* READNONE means writes can't
* affect it, while READONLY means
* that writes can affect it. */
readonly_memory && HAVE_LLVM >= 0x0400 ?
AC_FUNC_ATTR_READNONE :
AC_FUNC_ATTR_READONLY);
}
LLVMValueRef args[] = {
rsrc,
voffset,
vindex,
};
return ac_build_intrinsic(ctx, "llvm.SI.vs.load.input",
ctx->v4f32, args, 3,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_LEGACY);
}
/**
@@ -1204,265 +1244,3 @@ void ac_get_image_intr_name(const char *base_name,
data_type_name, coords_type_name, rsrc_type_name);
}
}
#define AC_EXP_TARGET (HAVE_LLVM >= 0x0500 ? 0 : 3)
#define AC_EXP_OUT0 (HAVE_LLVM >= 0x0500 ? 2 : 5)
enum ac_ir_type {
AC_IR_UNDEF,
AC_IR_CONST,
AC_IR_VALUE,
};
struct ac_vs_exp_chan
{
LLVMValueRef value;
float const_float;
enum ac_ir_type type;
};
struct ac_vs_exp_inst {
unsigned offset;
LLVMValueRef inst;
struct ac_vs_exp_chan chan[4];
};
struct ac_vs_exports {
unsigned num;
struct ac_vs_exp_inst exp[VARYING_SLOT_MAX];
};
/* Return true if the PARAM export has been eliminated. */
static bool ac_eliminate_const_output(uint8_t *vs_output_param_offset,
uint32_t num_outputs,
struct ac_vs_exp_inst *exp)
{
unsigned i, default_val; /* SPI_PS_INPUT_CNTL_i.DEFAULT_VAL */
bool is_zero[4] = {}, is_one[4] = {};
for (i = 0; i < 4; i++) {
/* It's a constant expression. Undef outputs are eliminated too. */
if (exp->chan[i].type == AC_IR_UNDEF) {
is_zero[i] = true;
is_one[i] = true;
} else if (exp->chan[i].type == AC_IR_CONST) {
if (exp->chan[i].const_float == 0)
is_zero[i] = true;
else if (exp->chan[i].const_float == 1)
is_one[i] = true;
else
return false; /* other constant */
} else
return false;
}
/* Only certain combinations of 0 and 1 can be eliminated. */
if (is_zero[0] && is_zero[1] && is_zero[2])
default_val = is_zero[3] ? 0 : 1;
else if (is_one[0] && is_one[1] && is_one[2])
default_val = is_zero[3] ? 2 : 3;
else
return false;
/* The PARAM export can be represented as DEFAULT_VAL. Kill it. */
LLVMInstructionEraseFromParent(exp->inst);
/* Change OFFSET to DEFAULT_VAL. */
for (i = 0; i < num_outputs; i++) {
if (vs_output_param_offset[i] == exp->offset) {
vs_output_param_offset[i] =
AC_EXP_PARAM_DEFAULT_VAL_0000 + default_val;
break;
}
}
return true;
}
static bool ac_eliminate_duplicated_output(uint8_t *vs_output_param_offset,
uint32_t num_outputs,
struct ac_vs_exports *processed,
struct ac_vs_exp_inst *exp)
{
unsigned p, copy_back_channels = 0;
/* See if the output is already in the list of processed outputs.
* The LLVMValueRef comparison relies on SSA.
*/
for (p = 0; p < processed->num; p++) {
bool different = false;
for (unsigned j = 0; j < 4; j++) {
struct ac_vs_exp_chan *c1 = &processed->exp[p].chan[j];
struct ac_vs_exp_chan *c2 = &exp->chan[j];
/* Treat undef as a match. */
if (c2->type == AC_IR_UNDEF)
continue;
/* If c1 is undef but c2 isn't, we can copy c2 to c1
* and consider the instruction duplicated.
*/
if (c1->type == AC_IR_UNDEF) {
copy_back_channels |= 1 << j;
continue;
}
/* Test whether the channels are not equal. */
if (c1->type != c2->type ||
(c1->type == AC_IR_CONST &&
c1->const_float != c2->const_float) ||
(c1->type == AC_IR_VALUE &&
c1->value != c2->value)) {
different = true;
break;
}
}
if (!different)
break;
copy_back_channels = 0;
}
if (p == processed->num)
return false;
/* If a match was found, but the matching export has undef where the new
* one has a normal value, copy the normal value to the undef channel.
*/
struct ac_vs_exp_inst *match = &processed->exp[p];
while (copy_back_channels) {
unsigned chan = u_bit_scan(&copy_back_channels);
assert(match->chan[chan].type == AC_IR_UNDEF);
LLVMSetOperand(match->inst, AC_EXP_OUT0 + chan,
exp->chan[chan].value);
match->chan[chan] = exp->chan[chan];
}
/* The PARAM export is duplicated. Kill it. */
LLVMInstructionEraseFromParent(exp->inst);
/* Change OFFSET to the matching export. */
for (unsigned i = 0; i < num_outputs; i++) {
if (vs_output_param_offset[i] == exp->offset) {
vs_output_param_offset[i] = match->offset;
break;
}
}
return true;
}
void ac_optimize_vs_outputs(struct ac_llvm_context *ctx,
LLVMValueRef main_fn,
uint8_t *vs_output_param_offset,
uint32_t num_outputs,
uint8_t *num_param_exports)
{
LLVMBasicBlockRef bb;
bool removed_any = false;
struct ac_vs_exports exports;
exports.num = 0;
/* Process all LLVM instructions. */
bb = LLVMGetFirstBasicBlock(main_fn);
while (bb) {
LLVMValueRef inst = LLVMGetFirstInstruction(bb);
while (inst) {
LLVMValueRef cur = inst;
inst = LLVMGetNextInstruction(inst);
struct ac_vs_exp_inst exp;
if (LLVMGetInstructionOpcode(cur) != LLVMCall)
continue;
LLVMValueRef callee = ac_llvm_get_called_value(cur);
if (!ac_llvm_is_function(callee))
continue;
const char *name = LLVMGetValueName(callee);
unsigned num_args = LLVMCountParams(callee);
/* Check if this is an export instruction. */
if ((num_args != 9 && num_args != 8) ||
(strcmp(name, "llvm.SI.export") &&
strcmp(name, "llvm.amdgcn.exp.f32")))
continue;
LLVMValueRef arg = LLVMGetOperand(cur, AC_EXP_TARGET);
unsigned target = LLVMConstIntGetZExtValue(arg);
if (target < V_008DFC_SQ_EXP_PARAM)
continue;
target -= V_008DFC_SQ_EXP_PARAM;
/* Parse the instruction. */
memset(&exp, 0, sizeof(exp));
exp.offset = target;
exp.inst = cur;
for (unsigned i = 0; i < 4; i++) {
LLVMValueRef v = LLVMGetOperand(cur, AC_EXP_OUT0 + i);
exp.chan[i].value = v;
if (LLVMIsUndef(v)) {
exp.chan[i].type = AC_IR_UNDEF;
} else if (LLVMIsAConstantFP(v)) {
LLVMBool loses_info;
exp.chan[i].type = AC_IR_CONST;
exp.chan[i].const_float =
LLVMConstRealGetDouble(v, &loses_info);
} else {
exp.chan[i].type = AC_IR_VALUE;
}
}
/* Eliminate constant and duplicated PARAM exports. */
if (ac_eliminate_const_output(vs_output_param_offset,
num_outputs, &exp) ||
ac_eliminate_duplicated_output(vs_output_param_offset,
num_outputs, &exports,
&exp)) {
removed_any = true;
} else {
exports.exp[exports.num++] = exp;
}
}
bb = LLVMGetNextBasicBlock(bb);
}
/* Remove holes in export memory due to removed PARAM exports.
* This is done by renumbering all PARAM exports.
*/
if (removed_any) {
uint8_t old_offset[VARYING_SLOT_MAX];
unsigned out, i;
/* Make a copy of the offsets. We need the old version while
* we are modifying some of them. */
memcpy(old_offset, vs_output_param_offset,
sizeof(old_offset));
for (i = 0; i < exports.num; i++) {
unsigned offset = exports.exp[i].offset;
/* Update vs_output_param_offset. Multiple outputs can
* have the same offset.
*/
for (out = 0; out < num_outputs; out++) {
if (old_offset[out] == offset)
vs_output_param_offset[out] = i;
}
/* Change the PARAM offset in the instruction. */
LLVMSetOperand(exports.exp[i].inst, AC_EXP_TARGET,
LLVMConstInt(ctx->i32,
V_008DFC_SQ_EXP_PARAM + i, 0));
}
*num_param_exports = exports.num;
}
}

View File

@@ -40,20 +40,11 @@ struct ac_llvm_context {
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
LLVMTypeRef i16;
LLVMTypeRef i32;
LLVMTypeRef i64;
LLVMTypeRef f16;
LLVMTypeRef f32;
LLVMTypeRef f64;
LLVMTypeRef v4i32;
LLVMTypeRef v4f32;
LLVMTypeRef v8i32;
LLVMValueRef i32_0;
LLVMValueRef i32_1;
LLVMValueRef f32_0;
LLVMValueRef f32_1;
LLVMTypeRef v16i8;
unsigned range_md_kind;
unsigned invariant_load_md_kind;
@@ -152,14 +143,13 @@ ac_build_buffer_load(struct ac_llvm_context *ctx,
unsigned inst_offset,
unsigned glc,
unsigned slc,
bool can_speculate,
bool allow_smem);
bool readonly_memory);
LLVMValueRef ac_build_buffer_load_format(struct ac_llvm_context *ctx,
LLVMValueRef rsrc,
LLVMValueRef vindex,
LLVMValueRef voffset,
bool can_speculate);
bool readonly_memory);
LLVMValueRef
ac_get_thread_id(struct ac_llvm_context *ctx);
@@ -249,12 +239,6 @@ void ac_get_image_intr_name(const char *base_name,
LLVMTypeRef coords_type,
LLVMTypeRef rsrc_type,
char *out_name, unsigned out_len);
void ac_optimize_vs_outputs(struct ac_llvm_context *ac,
LLVMValueRef main_fn,
uint8_t *vs_output_param_offset,
uint32_t num_outputs,
uint8_t *num_param_exports);
#ifdef __cplusplus
}
#endif

View File

@@ -34,7 +34,6 @@
#include <llvm/Target/TargetOptions.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/IR/Attributes.h>
#include <llvm/IR/CallSite.h>
#if HAVE_LLVM < 0x0500
namespace llvm {
@@ -45,13 +44,9 @@ typedef AttributeSet AttributeList;
void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes)
{
llvm::Argument *A = llvm::unwrap<llvm::Argument>(val);
#if HAVE_LLVM < 0x0500
llvm::AttrBuilder B;
B.addDereferenceableAttr(bytes);
A->addAttr(llvm::AttributeList::get(A->getContext(), A->getArgNo() + 1, B));
#else
A->addAttr(llvm::Attribute::getWithDereferenceableBytes(A->getContext(), bytes));
#endif
}
bool ac_is_sgpr_param(LLVMValueRef arg)
@@ -62,21 +57,3 @@ bool ac_is_sgpr_param(LLVMValueRef arg)
return AS.hasAttribute(ArgNo + 1, llvm::Attribute::ByVal) ||
AS.hasAttribute(ArgNo + 1, llvm::Attribute::InReg);
}
LLVMValueRef ac_llvm_get_called_value(LLVMValueRef call)
{
#if HAVE_LLVM >= 0x0309
return LLVMGetCalledValue(call);
#else
return llvm::wrap(llvm::CallSite(llvm::unwrap<llvm::Instruction>(call)).getCalledValue());
#endif
}
bool ac_llvm_is_function(LLVMValueRef v)
{
#if HAVE_LLVM >= 0x0309
return LLVMGetValueKind(v) == LLVMFunctionValueKind;
#else
return llvm::isa<llvm::Function>(llvm::unwrap(v));
#endif
}

View File

@@ -105,14 +105,18 @@ static const char *ac_get_llvm_processor_name(enum radeon_family family)
return "fiji";
case CHIP_STONEY:
return "stoney";
#if HAVE_LLVM == 0x0308
case CHIP_POLARIS10:
return "tonga";
case CHIP_POLARIS11:
return "tonga";
#else
case CHIP_POLARIS10:
return "polaris10";
case CHIP_POLARIS11:
case CHIP_POLARIS12:
return "polaris11";
case CHIP_VEGA10:
case CHIP_RAVEN:
return "gfx900";
#endif
default:
return "";
}
@@ -128,7 +132,7 @@ LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family, bool su
target,
triple,
ac_get_llvm_processor_name(family),
"+DumpCode,+vgpr-spilling,-fp32-denormals,-xnack",
"+DumpCode,+vgpr-spilling",
LLVMCodeGenLevelDefault,
LLVMRelocDefault,
LLVMCodeModelDefault);
@@ -220,13 +224,3 @@ ac_dump_module(LLVMModuleRef module)
fprintf(stderr, "%s", str);
LLVMDisposeMessage(str);
}
void
ac_llvm_add_target_dep_function_attr(LLVMValueRef F,
const char *name, int value)
{
char str[16];
snprintf(str, sizeof(str), "%i", value);
LLVMAddTargetDependentFunctionAttr(F, name, str);
}

View File

@@ -64,13 +64,6 @@ void ac_add_func_attributes(LLVMContextRef ctx, LLVMValueRef function,
unsigned attrib_mask);
void ac_dump_module(LLVMModuleRef module);
LLVMValueRef ac_llvm_get_called_value(LLVMValueRef call);
bool ac_llvm_is_function(LLVMValueRef v);
void
ac_llvm_add_target_dep_function_attr(LLVMValueRef F,
const char *name, int value);
#ifdef __cplusplus
}
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -29,7 +29,7 @@
#include "llvm-c/TargetMachine.h"
#include "amd_family.h"
#include "../vulkan/radv_descriptor_set.h"
#include "ac_shader_info.h"
#include "shader_enums.h"
struct ac_shader_binary;
struct ac_shader_config;
@@ -41,12 +41,10 @@ struct ac_vs_variant_key {
uint32_t instance_rate_inputs;
uint32_t as_es:1;
uint32_t as_ls:1;
uint32_t export_prim_id:1;
};
struct ac_tes_variant_key {
uint32_t as_es:1;
uint32_t export_prim_id:1;
};
struct ac_tcs_variant_key {
@@ -85,8 +83,7 @@ struct ac_userdata_info {
enum ac_ud_index {
AC_UD_SCRATCH_RING_OFFSETS = 0,
AC_UD_PUSH_CONSTANTS = 1,
AC_UD_INDIRECT_DESCRIPTOR_SETS = 2,
AC_UD_SHADER_START = 3,
AC_UD_SHADER_START = 2,
AC_UD_VS_VERTEX_BUFFERS = AC_UD_SHADER_START,
AC_UD_VS_BASE_VERTEX_START_INSTANCE,
AC_UD_VS_LS_TCS_IN_LAYOUT,
@@ -123,15 +120,15 @@ struct ac_userdata_locations {
};
struct ac_vs_output_info {
uint8_t vs_output_param_offset[VARYING_SLOT_MAX];
uint8_t clip_dist_mask;
uint8_t cull_dist_mask;
uint8_t param_exports;
bool writes_pointsize;
bool writes_layer;
bool writes_viewport_index;
bool export_prim_id;
uint32_t prim_id_output;
uint32_t layer_output;
uint32_t export_mask;
unsigned param_exports;
unsigned pos_exports;
};
@@ -141,11 +138,10 @@ struct ac_es_output_info {
struct ac_shader_variant_info {
struct ac_userdata_locations user_sgprs_locs;
struct ac_shader_info info;
unsigned num_user_sgprs;
unsigned num_input_sgprs;
unsigned num_input_vgprs;
bool need_indirect_descriptor_sets;
union {
struct {
struct ac_vs_output_info outinfo;
@@ -170,6 +166,7 @@ struct ac_shader_variant_info {
bool force_persample;
bool prim_id_input;
bool layer_input;
bool uses_sample_positions;
} fs;
struct {
unsigned block_size[3];
@@ -181,7 +178,6 @@ struct ac_shader_variant_info {
unsigned invocations;
unsigned gsvs_vertex_size;
unsigned max_gsvs_emit_size;
bool uses_prim_id;
} gs;
struct {
bool uses_prim_id;

View File

@@ -1,127 +0,0 @@
/*
* Copyright © 2017 Red Hat
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "nir/nir.h"
#include "ac_shader_info.h"
#include "ac_nir_to_llvm.h"
static void mark_sampler_desc(nir_variable *var, struct ac_shader_info *info)
{
info->desc_set_used_mask = (1 << var->data.descriptor_set);
}
static void
gather_intrinsic_info(nir_intrinsic_instr *instr, struct ac_shader_info *info)
{
switch (instr->intrinsic) {
case nir_intrinsic_interp_var_at_sample:
info->ps.needs_sample_positions = true;
break;
case nir_intrinsic_load_draw_id:
info->vs.needs_draw_id = true;
break;
case nir_intrinsic_load_num_work_groups:
info->cs.grid_components_used = instr->num_components;
break;
case nir_intrinsic_vulkan_resource_index:
info->desc_set_used_mask |= (1 << nir_intrinsic_desc_set(instr));
break;
case nir_intrinsic_image_load:
case nir_intrinsic_image_store:
case nir_intrinsic_image_atomic_add:
case nir_intrinsic_image_atomic_min:
case nir_intrinsic_image_atomic_max:
case nir_intrinsic_image_atomic_and:
case nir_intrinsic_image_atomic_or:
case nir_intrinsic_image_atomic_xor:
case nir_intrinsic_image_atomic_exchange:
case nir_intrinsic_image_atomic_comp_swap:
case nir_intrinsic_image_size:
mark_sampler_desc(instr->variables[0]->var, info);
break;
default:
break;
}
}
static void
gather_tex_info(nir_tex_instr *instr, struct ac_shader_info *info)
{
if (instr->sampler)
mark_sampler_desc(instr->sampler->var, info);
if (instr->texture)
mark_sampler_desc(instr->texture->var, info);
}
static void
gather_info_block(nir_block *block, struct ac_shader_info *info)
{
nir_foreach_instr(instr, block) {
switch (instr->type) {
case nir_instr_type_intrinsic:
gather_intrinsic_info(nir_instr_as_intrinsic(instr), info);
break;
case nir_instr_type_tex:
gather_tex_info(nir_instr_as_tex(instr), info);
break;
default:
break;
}
}
}
static void
gather_info_input_decl(nir_shader *nir,
const struct ac_nir_compiler_options *options,
nir_variable *var,
struct ac_shader_info *info)
{
switch (nir->stage) {
case MESA_SHADER_VERTEX:
info->vs.has_vertex_buffers = true;
break;
default:
break;
}
}
void
ac_nir_shader_info_pass(struct nir_shader *nir,
const struct ac_nir_compiler_options *options,
struct ac_shader_info *info)
{
struct nir_function *func = (struct nir_function *)exec_list_get_head(&nir->functions);
info->needs_push_constants = true;
if (!options->layout)
info->needs_push_constants = false;
else if (!options->layout->push_constant_size &&
!options->layout->dynamic_offset_count)
info->needs_push_constants = false;
nir_foreach_variable(variable, &nir->inputs)
gather_info_input_decl(nir, options, variable, info);
nir_foreach_block(block, func->impl) {
gather_info_block(block, info);
}
}

View File

@@ -1,53 +0,0 @@
/*
* Copyright © 2017 Red Hat
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#ifndef AC_SHADER_INFO_H
#define AC_SHADER_INFO_H
struct nir_shader;
struct ac_nir_compiler_options;
struct ac_shader_info {
bool needs_push_constants;
uint32_t desc_set_used_mask;
struct {
bool has_vertex_buffers; /* needs vertex buffers and base/start */
bool needs_draw_id;
} vs;
struct {
bool needs_sample_positions;
} ps;
struct {
uint8_t grid_components_used;
} cs;
};
/* A NIR pass to gather all the info needed to optimise the allocation patterns
* for the RADV user sgprs
*/
void
ac_nir_shader_info_pass(struct nir_shader *nir,
const struct ac_nir_compiler_options *options,
struct ac_shader_info *info);
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -1,220 +0,0 @@
/*
* Copyright © 2017 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS
* AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*/
#ifndef AC_SURFACE_H
#define AC_SURFACE_H
#include <stdint.h>
#include "amd_family.h"
#ifdef __cplusplus
extern "C" {
#endif
/* Forward declarations. */
typedef void* ADDR_HANDLE;
struct amdgpu_gpu_info;
struct radeon_info;
#define RADEON_SURF_MAX_LEVELS 15
enum radeon_surf_mode {
RADEON_SURF_MODE_LINEAR_ALIGNED = 1,
RADEON_SURF_MODE_1D = 2,
RADEON_SURF_MODE_2D = 3,
};
/* These are defined exactly like GB_TILE_MODEn.MICRO_TILE_MODE_NEW. */
enum radeon_micro_mode {
RADEON_MICRO_MODE_DISPLAY = 0,
RADEON_MICRO_MODE_THIN = 1,
RADEON_MICRO_MODE_DEPTH = 2,
RADEON_MICRO_MODE_ROTATED = 3,
};
/* the first 16 bits are reserved for libdrm_radeon, don't use them */
#define RADEON_SURF_SCANOUT (1 << 16)
#define RADEON_SURF_ZBUFFER (1 << 17)
#define RADEON_SURF_SBUFFER (1 << 18)
#define RADEON_SURF_Z_OR_SBUFFER (RADEON_SURF_ZBUFFER | RADEON_SURF_SBUFFER)
/* bits 19 and 20 are reserved for libdrm_radeon, don't use them */
#define RADEON_SURF_HAS_TILE_MODE_INDEX (1 << 20)
#define RADEON_SURF_FMASK (1 << 21)
#define RADEON_SURF_DISABLE_DCC (1 << 22)
#define RADEON_SURF_TC_COMPATIBLE_HTILE (1 << 23)
#define RADEON_SURF_IMPORTED (1 << 24)
#define RADEON_SURF_OPTIMIZE_FOR_SPACE (1 << 25)
struct legacy_surf_level {
uint64_t offset;
uint64_t slice_size;
uint64_t dcc_offset;
uint64_t dcc_fast_clear_size;
uint16_t nblk_x;
uint16_t nblk_y;
enum radeon_surf_mode mode;
};
struct legacy_surf_layout {
unsigned bankw:4; /* max 8 */
unsigned bankh:4; /* max 8 */
unsigned mtilea:4; /* max 8 */
unsigned tile_split:13; /* max 4K */
unsigned stencil_tile_split:13; /* max 4K */
unsigned pipe_config:5; /* max 17 */
unsigned num_banks:5; /* max 16 */
unsigned macro_tile_index:4; /* max 15 */
/* Whether the depth miptree or stencil miptree as used by the DB are
* adjusted from their TC compatible form to ensure depth/stencil
* compatibility. If either is true, the corresponding plane cannot be
* sampled from.
*/
unsigned depth_adjusted:1;
unsigned stencil_adjusted:1;
struct legacy_surf_level level[RADEON_SURF_MAX_LEVELS];
struct legacy_surf_level stencil_level[RADEON_SURF_MAX_LEVELS];
uint8_t tiling_index[RADEON_SURF_MAX_LEVELS];
uint8_t stencil_tiling_index[RADEON_SURF_MAX_LEVELS];
};
/* Same as addrlib - AddrResourceType. */
enum gfx9_resource_type {
RADEON_RESOURCE_1D = 0,
RADEON_RESOURCE_2D,
RADEON_RESOURCE_3D,
};
struct gfx9_surf_flags {
uint16_t swizzle_mode; /* tile mode */
uint16_t epitch; /* (pitch - 1) or (height - 1) */
};
struct gfx9_surf_meta_flags {
unsigned rb_aligned:1; /* optimal for RBs */
unsigned pipe_aligned:1; /* optimal for TC */
};
struct gfx9_surf_layout {
struct gfx9_surf_flags surf; /* color or depth surface */
struct gfx9_surf_flags fmask; /* not added to surf_size */
struct gfx9_surf_flags stencil; /* added to surf_size, use stencil_offset */
struct gfx9_surf_meta_flags dcc; /* metadata of color */
struct gfx9_surf_meta_flags htile; /* metadata of depth and stencil */
struct gfx9_surf_meta_flags cmask; /* metadata of fmask */
enum gfx9_resource_type resource_type; /* 1D, 2D or 3D */
uint64_t surf_offset; /* 0 unless imported with an offset */
/* The size of the 2D plane containing all mipmap levels. */
uint64_t surf_slice_size;
uint16_t surf_pitch; /* in blocks */
uint16_t surf_height;
/* Mipmap level offset within the slice in bytes. Only valid for LINEAR. */
uint32_t offset[RADEON_SURF_MAX_LEVELS];
uint16_t dcc_pitch_max; /* (mip chain pitch - 1) */
uint64_t stencil_offset; /* separate stencil */
uint64_t fmask_size;
uint64_t cmask_size;
uint32_t fmask_alignment;
uint32_t cmask_alignment;
};
struct radeon_surf {
/* Format properties. */
unsigned blk_w:4;
unsigned blk_h:4;
unsigned bpe:5;
/* Number of mipmap levels where DCC is enabled starting from level 0.
* Non-zero levels may be disabled due to alignment constraints, but not
* the first level.
*/
unsigned num_dcc_levels:4;
unsigned is_linear:1;
/* Displayable, thin, depth, rotated. AKA D,S,Z,R swizzle modes. */
unsigned micro_tile_mode:3;
uint32_t flags;
/* These are return values. Some of them can be set by the caller, but
* they will be treated as hints (e.g. bankw, bankh) and might be
* changed by the calculator.
*/
uint64_t surf_size;
uint64_t dcc_size;
uint64_t htile_size;
uint32_t htile_slice_size;
uint32_t surf_alignment;
uint32_t dcc_alignment;
uint32_t htile_alignment;
union {
/* R600-VI return values.
*
* Some of them can be set by the caller if certain parameters are
* desirable. The allocator will try to obey them.
*/
struct legacy_surf_layout legacy;
/* GFX9+ return values. */
struct gfx9_surf_layout gfx9;
} u;
};
struct ac_surf_info {
uint32_t width;
uint32_t height;
uint32_t depth;
uint8_t samples;
uint8_t levels;
uint16_t array_size;
};
struct ac_surf_config {
struct ac_surf_info info;
unsigned is_3d : 1;
unsigned is_cube : 1;
};
ADDR_HANDLE amdgpu_addr_create(const struct radeon_info *info,
const struct amdgpu_gpu_info *amdinfo);
int ac_compute_surface(ADDR_HANDLE addrlib, const struct radeon_info *info,
const struct ac_surf_config * config,
enum radeon_surf_mode mode,
struct radeon_surf *surf);
#ifdef __cplusplus
}
#endif
#endif /* AC_SURFACE_H */

View File

@@ -36,7 +36,7 @@
// Gets bits for specified mask from specified src packed instance.
#define AMD_HSA_BITS_GET(src, mask) \
((src & mask) >> mask ## _SHIFT)
((src & mask) >> mask ## _SHIFT) \
/* Every amd_*_code_t has the following properties, which are composed of
* a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*),

View File

@@ -1345,8 +1345,8 @@
#define V_008F14_IMG_DATA_FORMAT_RESERVED_56 0x38
#define V_008F14_IMG_DATA_FORMAT_4_4 0x39
#define V_008F14_IMG_DATA_FORMAT_6_5_5 0x3A
#define V_008F14_IMG_DATA_FORMAT_S8_16 0x3B
#define V_008F14_IMG_DATA_FORMAT_S8_32 0x3C
#define V_008F14_IMG_DATA_S8_16 0x3B
#define V_008F14_IMG_DATA_S8_32 0x3C
#define V_008F14_IMG_DATA_FORMAT_8_AS_32 0x3D
#define V_008F14_IMG_DATA_FORMAT_8_AS_32_32 0x3E
#define V_008F14_IMG_DATA_FORMAT_32_AS_32_32_32_32 0x3F

View File

@@ -54,17 +54,6 @@
#define PKT3_WAIT_REG_MEM 0x3C
#define WAIT_REG_MEM_EQUAL 3
#define WAIT_REG_MEM_MEM_SPACE(x) (((unsigned)(x) & 0x3) << 4)
#define PKT3_COPY_DATA 0x40
#define COPY_DATA_SRC_SEL(x) ((x) & 0xf)
#define COPY_DATA_REG 0
#define COPY_DATA_MEM 1
#define COPY_DATA_PERF 4
#define COPY_DATA_IMM 5
#define COPY_DATA_TIMESTAMP 9
#define COPY_DATA_DST_SEL(x) (((unsigned)(x) & 0xf) << 8)
#define COPY_DATA_MEM_ASYNC 5
#define COPY_DATA_COUNT_SEL (1 << 16)
#define COPY_DATA_WR_CONFIRM (1 << 20)
#define PKT3_EVENT_WRITE 0x46
#define PKT3_EVENT_WRITE_EOP 0x47
#define EOP_DATA_SEL(x) ((x) << 29)

View File

@@ -154,7 +154,6 @@
#define COPY_DATA_MEM 1
#define COPY_DATA_PERF 4
#define COPY_DATA_IMM 5
#define COPY_DATA_TIMESTAMP 9
#define COPY_DATA_DST_SEL(x) (((unsigned)(x) & 0xf) << 8)
#define COPY_DATA_COUNT_SEL (1 << 16)
#define COPY_DATA_WR_CONFIRM (1 << 20)
@@ -170,7 +169,7 @@
*/
/* fix CP DMA before uncommenting: */
/*#define PKT3_EVENT_WRITE_EOS 0x48*/ /* not on GFX9 */
#define PKT3_RELEASE_MEM 0x49 /* GFX9+ [any ring] or GFX8 [compute ring only] */
#define PKT3_RELEASE_MEM 0x49 /* GFX9+ (any ring) or GFX8 (compute ring only) */
#define PKT3_ONE_REG_WRITE 0x57 /* not on CIK */
#define PKT3_ACQUIRE_MEM 0x58 /* new for CIK */
#define PKT3_SET_CONFIG_REG 0x68
@@ -280,7 +279,6 @@
#define S_500_DSL_SEL(x) (((unsigned)(x) & 0x3) << 20)
#define V_500_DST_ADDR 0
#define V_500_GDS 1 /* program DAS to 1 as well */
#define V_500_NOWHERE 2 /* new for GFX9 */
#define V_500_DST_ADDR_TC_L2 3 /* new for CIK */
#define S_500_ENGINE(x) ((x) & 0x1)
#define V_500_ME 0

View File

@@ -110,7 +110,7 @@ class IntTable:
[static] const typename name[] = { ... };
to filp.
"""
idxs = sorted(self.idxs) + [len(self.table)]
idxs = sorted(self.idxs) + [-1]
fragments = [
('\t/* %s */ %s' % (

View File

@@ -51,7 +51,6 @@ VULKAN_FILES := \
radv_meta_fast_clear.c \
radv_meta_resolve.c \
radv_meta_resolve_cs.c \
radv_meta_resolve_fs.c \
radv_pass.c \
radv_pipeline.c \
radv_pipeline_cache.c \

File diff suppressed because it is too large Load Diff

View File

@@ -37,7 +37,4 @@ enum {
RADV_DEBUG_NO_IBS = 0x200,
};
enum {
RADV_PERFTEST_BATCHCHAIN = 0x1,
};
#endif

View File

@@ -77,7 +77,6 @@ VkResult radv_CreateDescriptorSetLayout(
const VkDescriptorSetLayoutBinding *binding = &pCreateInfo->pBindings[j];
uint32_t b = binding->binding;
uint32_t alignment;
unsigned binding_buffer_count = 0;
switch (binding->descriptorType) {
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
@@ -86,7 +85,7 @@ VkResult radv_CreateDescriptorSetLayout(
set_layout->binding[b].dynamic_offset_count = 1;
set_layout->dynamic_shader_stages |= binding->stageFlags;
set_layout->binding[b].size = 0;
binding_buffer_count = 1;
set_layout->binding[b].buffer_count = 1;
alignment = 1;
break;
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
@@ -94,7 +93,7 @@ VkResult radv_CreateDescriptorSetLayout(
case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:
case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:
set_layout->binding[b].size = 16;
binding_buffer_count = 1;
set_layout->binding[b].buffer_count = 1;
alignment = 16;
break;
case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
@@ -102,13 +101,13 @@ VkResult radv_CreateDescriptorSetLayout(
case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:
/* main descriptor + fmask descriptor */
set_layout->binding[b].size = 64;
binding_buffer_count = 1;
set_layout->binding[b].buffer_count = 1;
alignment = 32;
break;
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
/* main descriptor + fmask descriptor + sampler */
set_layout->binding[b].size = 96;
binding_buffer_count = 1;
set_layout->binding[b].buffer_count = 1;
alignment = 32;
break;
case VK_DESCRIPTOR_TYPE_SAMPLER:
@@ -151,7 +150,7 @@ VkResult radv_CreateDescriptorSetLayout(
}
set_layout->size += binding->descriptorCount * set_layout->binding[b].size;
buffer_count += binding->descriptorCount * binding_buffer_count;
buffer_count += binding->descriptorCount * set_layout->binding[b].buffer_count;
dynamic_offset_count += binding->descriptorCount *
set_layout->binding[b].dynamic_offset_count;
set_layout->shader_stages |= binding->stageFlags;
@@ -262,29 +261,26 @@ radv_descriptor_set_create(struct radv_device *device,
struct radv_descriptor_set **out_set)
{
struct radv_descriptor_set *set;
unsigned range_offset = sizeof(struct radv_descriptor_set) +
unsigned mem_size = sizeof(struct radv_descriptor_set) +
sizeof(struct radeon_winsys_bo *) * layout->buffer_count;
unsigned mem_size = range_offset +
sizeof(struct radv_descriptor_range) * layout->dynamic_offset_count;
set = vk_alloc2(&device->alloc, NULL, mem_size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (pool->host_memory_base) {
if (pool->host_memory_end - pool->host_memory_ptr < mem_size)
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
set = (struct radv_descriptor_set*)pool->host_memory_ptr;
pool->host_memory_ptr += mem_size;
} else {
set = vk_alloc2(&device->alloc, NULL, mem_size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!set)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
}
if (!set)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
memset(set, 0, mem_size);
if (layout->dynamic_offset_count) {
set->dynamic_descriptors = (struct radv_descriptor_range*)((uint8_t*)set + range_offset);
unsigned size = sizeof(struct radv_descriptor_range) *
layout->dynamic_offset_count;
set->dynamic_descriptors = vk_alloc2(&device->alloc, NULL, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!set->dynamic_descriptors) {
vk_free2(&device->alloc, NULL, set);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
}
}
set->layout = layout;
@@ -301,12 +297,10 @@ radv_descriptor_set_create(struct radv_device *device,
set->va = device->ws->buffer_get_va(set->bo) + pool->current_offset;
pool->current_offset += layout_size;
list_addtail(&set->vram_list, &pool->vram_list);
} else if (!pool->host_memory_base) {
} else {
uint64_t offset = 0;
struct list_head *prev = &pool->vram_list;
struct radv_descriptor_set *cur;
assert(!pool->host_memory_base);
LIST_FOR_EACH_ENTRY(cur, &pool->vram_list, vram_list) {
uint64_t start = (uint8_t*)cur->mapped_ptr - pool->mapped_ptr;
if (start - offset >= layout_size)
@@ -325,8 +319,7 @@ radv_descriptor_set_create(struct radv_device *device,
set->mapped_ptr = (uint32_t*)(pool->mapped_ptr + offset);
set->va = device->ws->buffer_get_va(set->bo) + offset;
list_add(&set->vram_list, prev);
} else
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
}
}
for (unsigned i = 0; i < layout->binding_count; ++i) {
@@ -355,10 +348,10 @@ radv_descriptor_set_destroy(struct radv_device *device,
struct radv_descriptor_set *set,
bool free_bo)
{
assert(!pool->host_memory_base);
if (free_bo && set->size)
list_del(&set->vram_list);
if (set->dynamic_descriptors)
vk_free2(&device->alloc, NULL, set->dynamic_descriptors);
vk_free2(&device->alloc, NULL, set);
}
@@ -371,17 +364,18 @@ VkResult radv_CreateDescriptorPool(
RADV_FROM_HANDLE(radv_device, device, _device);
struct radv_descriptor_pool *pool;
int size = sizeof(struct radv_descriptor_pool);
uint64_t bo_size = 0, bo_count = 0, range_count = 0;
uint64_t bo_size = 0;
pool = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!pool)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
memset(pool, 0, sizeof(*pool));
for (unsigned i = 0; i < pCreateInfo->poolSizeCount; ++i) {
if (pCreateInfo->pPoolSizes[i].type != VK_DESCRIPTOR_TYPE_SAMPLER)
bo_count += pCreateInfo->pPoolSizes[i].descriptorCount;
switch(pCreateInfo->pPoolSizes[i].type) {
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
range_count += pCreateInfo->pPoolSizes[i].descriptorCount;
break;
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
@@ -405,26 +399,6 @@ VkResult radv_CreateDescriptorPool(
}
}
if (!(pCreateInfo->flags & VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT)) {
uint64_t host_size = pCreateInfo->maxSets * sizeof(struct radv_descriptor_set);
host_size += sizeof(struct radeon_winsys_bo*) * bo_count;
host_size += sizeof(struct radv_descriptor_range) * range_count;
size += host_size;
}
pool = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!pool)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
memset(pool, 0, sizeof(*pool));
if (!(pCreateInfo->flags & VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT)) {
pool->host_memory_base = (uint8_t*)pool + sizeof(struct radv_descriptor_pool);
pool->host_memory_ptr = pool->host_memory_base;
pool->host_memory_end = (uint8_t*)pool + size;
}
if (bo_size) {
pool->bo = device->ws->buffer_create(device->ws, bo_size,
32, RADEON_DOMAIN_VRAM, 0);
@@ -448,11 +422,9 @@ void radv_DestroyDescriptorPool(
if (!pool)
return;
if (!pool->host_memory_base) {
list_for_each_entry_safe(struct radv_descriptor_set, set,
&pool->vram_list, vram_list) {
radv_descriptor_set_destroy(device, pool, set, false);
}
list_for_each_entry_safe(struct radv_descriptor_set, set,
&pool->vram_list, vram_list) {
radv_descriptor_set_destroy(device, pool, set, false);
}
if (pool->bo)
@@ -468,17 +440,14 @@ VkResult radv_ResetDescriptorPool(
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_descriptor_pool, pool, descriptorPool);
if (!pool->host_memory_base) {
list_for_each_entry_safe(struct radv_descriptor_set, set,
&pool->vram_list, vram_list) {
radv_descriptor_set_destroy(device, pool, set, false);
}
list_for_each_entry_safe(struct radv_descriptor_set, set,
&pool->vram_list, vram_list) {
radv_descriptor_set_destroy(device, pool, set, false);
}
list_inithead(&pool->vram_list);
pool->current_offset = 0;
pool->host_memory_ptr = pool->host_memory_base;
return VK_SUCCESS;
}
@@ -527,7 +496,7 @@ VkResult radv_FreeDescriptorSets(
for (uint32_t i = 0; i < count; i++) {
RADV_FROM_HANDLE(radv_descriptor_set, set, pDescriptorSets[i]);
if (set && !pool->host_memory_base)
if (set)
radv_descriptor_set_destroy(device, pool, set, true);
}
return VK_SUCCESS;
@@ -670,7 +639,7 @@ void radv_update_descriptor_sets(
ptr += binding_layout->offset / 4;
ptr += binding_layout->size * writeset->dstArrayElement / 4;
buffer_list += binding_layout->buffer_offset;
buffer_list += writeset->dstArrayElement;
buffer_list += binding_layout->buffer_count * writeset->dstArrayElement;
for (j = 0; j < writeset->descriptorCount; ++j) {
switch(writeset->descriptorType) {
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
@@ -721,7 +690,7 @@ void radv_update_descriptor_sets(
break;
}
ptr += binding_layout->size / 4;
++buffer_list;
buffer_list += binding_layout->buffer_count;
}
}
@@ -765,7 +734,8 @@ VkResult radv_CreateDescriptorUpdateTemplateKHR(VkDevice _device,
const VkDescriptorUpdateTemplateEntryKHR *entry = &pCreateInfo->pDescriptorUpdateEntries[i];
const struct radv_descriptor_set_binding_layout *binding_layout =
set_layout->binding + entry->dstBinding;
const uint32_t buffer_offset = binding_layout->buffer_offset + entry->dstArrayElement;
const uint32_t buffer_offset = binding_layout->buffer_offset +
binding_layout->buffer_count * entry->dstArrayElement;
const uint32_t *immutable_samplers = NULL;
uint32_t dst_offset;
uint32_t dst_stride;
@@ -805,6 +775,7 @@ VkResult radv_CreateDescriptorUpdateTemplateKHR(VkDevice _device,
.dst_offset = dst_offset,
.dst_stride = dst_stride,
.buffer_offset = buffer_offset,
.buffer_count = binding_layout->buffer_count,
.has_sampler = !binding_layout->immutable_samplers_offset,
.immutable_samplers = immutable_samplers
};
@@ -888,7 +859,7 @@ void radv_update_descriptor_set_with_template(struct radv_device *device,
}
pSrc += templ->entry[i].src_stride;
pDst += templ->entry[i].dst_stride;
++buffer_list;
buffer_list += templ->entry[i].buffer_count;
}
}
}

View File

@@ -26,7 +26,7 @@
#include <vulkan/vulkan.h>
#define MAX_SETS 32
#define MAX_SETS 8
struct radv_descriptor_set_binding_layout {
VkDescriptorType type;
@@ -38,9 +38,10 @@ struct radv_descriptor_set_binding_layout {
uint32_t buffer_offset;
uint16_t dynamic_offset_offset;
uint16_t dynamic_offset_count;
/* redundant with the type, each for a single array element */
uint32_t size;
uint32_t buffer_count;
uint16_t dynamic_offset_count;
/* Offset in the radv_descriptor_set_layout of the immutable samplers, or 0
* if there are no immutable samplers. */

View File

@@ -33,7 +33,7 @@
#include "radv_cs.h"
#include "util/disk_cache.h"
#include "util/strtod.h"
#include "vk_util.h"
#include "util/vk_util.h"
#include <xf86drm.h>
#include <amdgpu.h>
#include <amdgpu_drm.h>
@@ -42,7 +42,6 @@
#include "ac_llvm_util.h"
#include "vk_format.h"
#include "sid.h"
#include "gfx9d.h"
#include "util/debug.h"
static int
@@ -62,15 +61,6 @@ radv_device_get_cache_uuid(enum radeon_family family, void *uuid)
return 0;
}
static void
radv_get_device_uuid(drmDevicePtr device, void *uuid) {
memset(uuid, 0, VK_UUID_SIZE);
memcpy((char*)uuid + 0, &device->businfo.pci->domain, 2);
memcpy((char*)uuid + 2, &device->businfo.pci->bus, 1);
memcpy((char*)uuid + 3, &device->businfo.pci->dev, 1);
memcpy((char*)uuid + 4, &device->businfo.pci->func, 1);
}
static const VkExtensionProperties instance_extensions[] = {
{
.extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
@@ -98,10 +88,6 @@ static const VkExtensionProperties instance_extensions[] = {
.extensionName = VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME,
.specVersion = 1,
},
};
static const VkExtensionProperties common_device_extensions[] = {
@@ -141,14 +127,6 @@ static const VkExtensionProperties common_device_extensions[] = {
.extensionName = VK_NV_DEDICATED_ALLOCATION_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_FD_EXTENSION_NAME,
.specVersion = 1,
},
};
static VkResult
@@ -209,40 +187,11 @@ is_extension_enabled(const VkExtensionProperties *extensions,
return false;
}
static const char *
get_chip_name(enum radeon_family family)
{
switch (family) {
case CHIP_TAHITI: return "AMD RADV TAHITI";
case CHIP_PITCAIRN: return "AMD RADV PITCAIRN";
case CHIP_VERDE: return "AMD RADV CAPE VERDE";
case CHIP_OLAND: return "AMD RADV OLAND";
case CHIP_HAINAN: return "AMD RADV HAINAN";
case CHIP_BONAIRE: return "AMD RADV BONAIRE";
case CHIP_KAVERI: return "AMD RADV KAVERI";
case CHIP_KABINI: return "AMD RADV KABINI";
case CHIP_HAWAII: return "AMD RADV HAWAII";
case CHIP_MULLINS: return "AMD RADV MULLINS";
case CHIP_TONGA: return "AMD RADV TONGA";
case CHIP_ICELAND: return "AMD RADV ICELAND";
case CHIP_CARRIZO: return "AMD RADV CARRIZO";
case CHIP_FIJI: return "AMD RADV FIJI";
case CHIP_POLARIS10: return "AMD RADV POLARIS10";
case CHIP_POLARIS11: return "AMD RADV POLARIS11";
case CHIP_POLARIS12: return "AMD RADV POLARIS12";
case CHIP_STONEY: return "AMD RADV STONEY";
case CHIP_VEGA10: return "AMD RADV VEGA";
case CHIP_RAVEN: return "AMD RADV RAVEN";
default: return "AMD RADV unknown";
}
}
static VkResult
radv_physical_device_init(struct radv_physical_device *device,
struct radv_instance *instance,
drmDevicePtr drm_device)
const char *path)
{
const char *path = drm_device->nodes[DRM_NODE_RENDER];
VkResult result;
drmVersionPtr version;
int fd;
@@ -270,8 +219,7 @@ radv_physical_device_init(struct radv_physical_device *device,
assert(strlen(path) < ARRAY_SIZE(device->path));
strncpy(device->path, path, ARRAY_SIZE(device->path));
device->ws = radv_amdgpu_winsys_create(fd, instance->debug_flags,
instance->perftest_flags);
device->ws = radv_amdgpu_winsys_create(fd, instance->debug_flags);
if (!device->ws) {
result = VK_ERROR_INCOMPATIBLE_DRIVER;
goto fail;
@@ -301,15 +249,7 @@ radv_physical_device_init(struct radv_physical_device *device,
goto fail;
fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n");
device->name = get_chip_name(device->rad_info.family);
radv_get_device_uuid(drm_device, device->device_uuid);
if (device->rad_info.family == CHIP_STONEY ||
device->rad_info.chip_class >= GFX9) {
device->has_rbplus = true;
device->rbplus_allowed = device->rad_info.family == CHIP_STONEY;
}
device->name = device->rad_info.name;
return VK_SUCCESS;
@@ -327,6 +267,7 @@ radv_physical_device_finish(struct radv_physical_device *device)
close(device->local_fd);
}
static void *
default_alloc_func(void *pUserData, size_t size, size_t align,
VkSystemAllocationScope allocationScope)
@@ -368,11 +309,6 @@ static const struct debug_control radv_debug_options[] = {
{NULL, 0}
};
static const struct debug_control radv_perftest_options[] = {
{"batchchain", RADV_PERFTEST_BATCHCHAIN},
{NULL, 0}
};
VkResult radv_CreateInstance(
const VkInstanceCreateInfo* pCreateInfo,
const VkAllocationCallbacks* pAllocator,
@@ -430,9 +366,6 @@ VkResult radv_CreateInstance(
instance->debug_flags = parse_debug_string(getenv("RADV_DEBUG"),
radv_debug_options);
instance->perftest_flags = parse_debug_string(getenv("RADV_PERFTEST"),
radv_perftest_options);
*pInstance = radv_instance_to_handle(instance);
return VK_SUCCESS;
@@ -480,7 +413,7 @@ radv_enumerate_devices(struct radv_instance *instance)
result = radv_physical_device_init(instance->physicalDevices +
instance->physicalDeviceCount,
instance,
devices[i]);
devices[i]->nodes[DRM_NODE_RENDER]);
if (result == VK_SUCCESS)
++instance->physicalDeviceCount;
else if (result != VK_ERROR_INCOMPATIBLE_DRIVER)
@@ -523,8 +456,8 @@ void radv_GetPhysicalDeviceFeatures(
VkPhysicalDevice physicalDevice,
VkPhysicalDeviceFeatures* pFeatures)
{
RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice);
bool is_gfx9 = pdevice->rad_info.chip_class >= GFX9;
// RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice);
memset(pFeatures, 0, sizeof(*pFeatures));
*pFeatures = (VkPhysicalDeviceFeatures) {
@@ -532,8 +465,8 @@ void radv_GetPhysicalDeviceFeatures(
.fullDrawIndexUint32 = true,
.imageCubeArray = true,
.independentBlend = true,
.geometryShader = !is_gfx9,
.tessellationShader = !is_gfx9,
.geometryShader = true,
.tessellationShader = true,
.sampleRateShading = false,
.dualSrcBlend = true,
.logicOp = true,
@@ -583,6 +516,28 @@ void radv_GetPhysicalDeviceFeatures2KHR(
return radv_GetPhysicalDeviceFeatures(physicalDevice, &pFeatures->features);
}
static uint32_t radv_get_driver_version()
{
const char *minor_string = strchr(VERSION, '.');
const char *patch_string = minor_string ? strchr(minor_string + 1, ','): NULL;
int major = atoi(VERSION);
int minor = minor_string ? atoi(minor_string + 1) : 0;
int patch = patch_string ? atoi(patch_string + 1) : 0;
if (strstr(VERSION, "devel")) {
if (patch == 0) {
patch = 99;
if (minor == 0) {
minor = 99;
--major;
} else
--minor;
} else
--patch;
}
uint32_t version = VK_MAKE_VERSION(major, minor, patch);
return version;
}
void radv_GetPhysicalDeviceProperties(
VkPhysicalDevice physicalDevice,
VkPhysicalDeviceProperties* pProperties)
@@ -699,7 +654,7 @@ void radv_GetPhysicalDeviceProperties(
.sampledImageStencilSampleCounts = sample_counts,
.storageImageSampleCounts = VK_SAMPLE_COUNT_1_BIT,
.maxSampleMaskWords = 1,
.timestampComputeAndGraphics = true,
.timestampComputeAndGraphics = false,
.timestampPeriod = 1000000.0 / pdevice->rad_info.clock_crystal_freq,
.maxClipDistances = 8,
.maxCullDistances = 8,
@@ -718,7 +673,7 @@ void radv_GetPhysicalDeviceProperties(
*pProperties = (VkPhysicalDeviceProperties) {
.apiVersion = VK_MAKE_VERSION(1, 0, 42),
.driverVersion = vk_get_driver_version(),
.driverVersion = radv_get_driver_version(),
.vendorID = 0x1002,
.deviceID = pdevice->rad_info.pci_id,
.deviceType = pdevice->rad_info.has_dedicated_vram ? VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU : VK_PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU,
@@ -734,7 +689,6 @@ void radv_GetPhysicalDeviceProperties2KHR(
VkPhysicalDevice physicalDevice,
VkPhysicalDeviceProperties2KHR *pProperties)
{
RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice);
radv_GetPhysicalDeviceProperties(physicalDevice, &pProperties->properties);
vk_foreach_struct(ext, pProperties->pNext) {
@@ -745,13 +699,6 @@ void radv_GetPhysicalDeviceProperties2KHR(
properties->maxPushDescriptors = MAX_PUSH_DESCRIPTORS;
break;
}
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHX: {
VkPhysicalDeviceIDPropertiesKHX *properties = (VkPhysicalDeviceIDPropertiesKHX*)ext;
radv_device_get_cache_uuid(0, properties->driverUUID);
memcpy(properties->deviceUUID, pdevice->device_uuid, VK_UUID_SIZE);
properties->deviceLUIDValid = false;
break;
}
default:
break;
}
@@ -765,7 +712,7 @@ static void radv_get_physical_device_queue_family_properties(
{
int num_queue_families = 1;
int idx;
if (pdevice->rad_info.num_compute_rings > 0 &&
if (pdevice->rad_info.compute_rings > 0 &&
pdevice->rad_info.chip_class >= CIK &&
!(pdevice->instance->debug_flags & RADV_DEBUG_NO_COMPUTE_QUEUE))
num_queue_families++;
@@ -792,7 +739,7 @@ static void radv_get_physical_device_queue_family_properties(
idx++;
}
if (pdevice->rad_info.num_compute_rings > 0 &&
if (pdevice->rad_info.compute_rings > 0 &&
pdevice->rad_info.chip_class >= CIK &&
!(pdevice->instance->debug_flags & RADV_DEBUG_NO_COMPUTE_QUEUE)) {
if (*pCount > idx) {
@@ -800,7 +747,7 @@ static void radv_get_physical_device_queue_family_properties(
.queueFlags = VK_QUEUE_COMPUTE_BIT |
VK_QUEUE_TRANSFER_BIT |
VK_QUEUE_SPARSE_BINDING_BIT,
.queueCount = pdevice->rad_info.num_compute_rings,
.queueCount = pdevice->rad_info.compute_rings,
.timestampValidBits = 64,
.minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 },
};
@@ -884,11 +831,11 @@ void radv_GetPhysicalDeviceMemoryProperties(
pMemoryProperties->memoryHeapCount = RADV_MEM_HEAP_COUNT;
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM] = (VkMemoryHeap) {
.size = physical_device->rad_info.vram_size -
physical_device->rad_info.vram_vis_size,
physical_device->rad_info.visible_vram_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM_CPU_ACCESS] = (VkMemoryHeap) {
.size = physical_device->rad_info.vram_vis_size,
.size = physical_device->rad_info.visible_vram_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_GTT] = (VkMemoryHeap) {
@@ -971,8 +918,6 @@ radv_device_init_gs_info(struct radv_device *device)
case CHIP_POLARIS10:
case CHIP_POLARIS11:
case CHIP_POLARIS12:
case CHIP_VEGA10:
case CHIP_RAVEN:
device->gs_table_depth = 32;
return;
default:
@@ -1096,7 +1041,6 @@ VkResult radv_CreateDevice(
case RADV_QUEUE_COMPUTE:
si_cs_emit_cache_flush(device->flush_cs[family],
device->physical_device->rad_info.chip_class,
NULL, 0,
family == RADV_QUEUE_COMPUTE && device->physical_device->rad_info.chip_class >= CIK,
RADV_CMD_FLAG_INV_ICACHE |
RADV_CMD_FLAG_INV_SMEM_L1 |
@@ -1112,7 +1056,6 @@ VkResult radv_CreateDevice(
case RADV_QUEUE_COMPUTE:
si_cs_emit_cache_flush(device->flush_shader_cs[family],
device->physical_device->rad_info.chip_class,
NULL, 0,
family == RADV_QUEUE_COMPUTE && device->physical_device->rad_info.chip_class >= CIK,
family == RADV_QUEUE_COMPUTE ? RADV_CMD_FLAG_CS_PARTIAL_FLUSH : (RADV_CMD_FLAG_CS_PARTIAL_FLUSH | RADV_CMD_FLAG_PS_PARTIAL_FLUSH) |
RADV_CMD_FLAG_INV_ICACHE |
@@ -1475,11 +1418,12 @@ radv_get_hs_offchip_param(struct radv_device *device, uint32_t *max_offchip_buff
max_offchip_buffers = MIN2(max_offchip_buffers, 126);
break;
case CIK:
case VI:
case GFX9:
default:
max_offchip_buffers = MIN2(max_offchip_buffers, 508);
break;
case VI:
default:
max_offchip_buffers = MIN2(max_offchip_buffers, 512);
break;
}
*max_offchip_buffers_p = max_offchip_buffers;
@@ -1715,10 +1659,6 @@ radv_get_preamble_cs(struct radv_queue *queue,
S_030938_SIZE(tess_factor_ring_size / 4));
radeon_set_uconfig_reg(cs, R_030940_VGT_TF_MEMORY_BASE,
tf_va >> 8);
if (queue->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_set_uconfig_reg(cs, R_030944_VGT_TF_MEMORY_BASE_HI,
tf_va >> 40);
}
radeon_set_uconfig_reg(cs, R_03093C_VGT_HS_OFFCHIP_PARAM, hs_offchip_param);
} else {
radeon_set_config_reg(cs, R_008988_VGT_TF_RING_SIZE,
@@ -1762,7 +1702,6 @@ radv_get_preamble_cs(struct radv_queue *queue,
if (!i) {
si_cs_emit_cache_flush(cs,
queue->device->physical_device->rad_info.chip_class,
NULL, 0,
queue->queue_family_index == RING_COMPUTE &&
queue->device->physical_device->rad_info.chip_class >= CIK,
RADV_CMD_FLAG_INV_ICACHE |
@@ -2076,7 +2015,7 @@ VkResult radv_AllocateMemory(
VkResult result;
enum radeon_bo_domain domain;
uint32_t flags = 0;
const VkDedicatedAllocationMemoryAllocateInfoNV *dedicate_info = NULL;
assert(pAllocateInfo->sType == VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO);
if (pAllocateInfo->allocationSize == 0) {
@@ -2085,10 +2024,15 @@ VkResult radv_AllocateMemory(
return VK_SUCCESS;
}
const VkImportMemoryFdInfoKHX *import_info =
vk_find_struct_const(pAllocateInfo->pNext, IMPORT_MEMORY_FD_INFO_KHX);
const VkDedicatedAllocationMemoryAllocateInfoNV *dedicate_info =
vk_find_struct_const(pAllocateInfo->pNext, DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV);
vk_foreach_struct(ext, pAllocateInfo->pNext) {
switch (ext->sType) {
case VK_STRUCTURE_TYPE_DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV:
dedicate_info = (const VkDedicatedAllocationMemoryAllocateInfoNV *)ext;
break;
default:
break;
}
}
mem = vk_alloc2(&device->alloc, pAllocator, sizeof(*mem), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
@@ -2103,18 +2047,6 @@ VkResult radv_AllocateMemory(
mem->buffer = NULL;
}
if (import_info) {
assert(import_info->handleType ==
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX);
mem->bo = device->ws->buffer_from_fd(device->ws, import_info->fd,
NULL, NULL);
if (!mem->bo) {
result = VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX;
goto fail;
} else
goto out_success;
}
uint64_t alloc_size = align_u64(pAllocateInfo->allocationSize, 4096);
if (pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_WRITE_COMBINE ||
pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_CACHED)
@@ -2138,7 +2070,7 @@ VkResult radv_AllocateMemory(
goto fail;
}
mem->type_index = pAllocateInfo->memoryTypeIndex;
out_success:
*pMem = radv_device_memory_to_handle(mem);
return VK_SUCCESS;
@@ -2674,9 +2606,9 @@ static inline unsigned
si_tile_mode_index(const struct radv_image *image, unsigned level, bool stencil)
{
if (stencil)
return image->surface.u.legacy.stencil_tiling_index[level];
return image->surface.stencil_tiling_index[level];
else
return image->surface.u.legacy.tiling_index[level];
return image->surface.tiling_index[level];
}
static uint32_t radv_surface_layer_count(struct radv_image_view *iview)
@@ -2692,68 +2624,24 @@ radv_initialise_color_surface(struct radv_device *device,
const struct vk_format_description *desc;
unsigned ntype, format, swap, endian;
unsigned blend_clamp = 0, blend_bypass = 0;
unsigned pitch_tile_max, slice_tile_max, tile_mode_index;
uint64_t va;
const struct radeon_surf *surf = &iview->image->surface;
const struct radeon_surf_level *level_info = &surf->level[iview->base_mip];
desc = vk_format_description(iview->vk_format);
memset(cb, 0, sizeof(*cb));
/* Intensity is implemented as Red, so treat it that way. */
cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == VK_SWIZZLE_1);
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
if (device->physical_device->rad_info.chip_class >= GFX9) {
struct gfx9_surf_meta_flags meta;
if (iview->image->dcc_offset)
meta = iview->image->surface.u.gfx9.dcc;
else
meta = iview->image->surface.u.gfx9.cmask;
cb->cb_color_attrib |= S_028C74_COLOR_SW_MODE(iview->image->surface.u.gfx9.surf.swizzle_mode) |
S_028C74_FMASK_SW_MODE(iview->image->surface.u.gfx9.fmask.swizzle_mode) |
S_028C74_RB_ALIGNED(meta.rb_aligned) |
S_028C74_PIPE_ALIGNED(meta.pipe_aligned);
va += iview->image->surface.u.gfx9.surf_offset >> 8;
} else {
const struct legacy_surf_level *level_info = &surf->u.legacy.level[iview->base_mip];
unsigned pitch_tile_max, slice_tile_max, tile_mode_index;
va += level_info->offset;
pitch_tile_max = level_info->nblk_x / 8 - 1;
slice_tile_max = (level_info->nblk_x * level_info->nblk_y) / 64 - 1;
tile_mode_index = si_tile_mode_index(iview->image, iview->base_mip, false);
cb->cb_color_pitch = S_028C64_TILE_MAX(pitch_tile_max);
cb->cb_color_slice = S_028C68_TILE_MAX(slice_tile_max);
cb->cb_color_cmask_slice = iview->image->cmask.slice_tile_max;
cb->cb_color_attrib |= S_028C74_TILE_MODE_INDEX(tile_mode_index);
cb->micro_tile_mode = iview->image->surface.micro_tile_mode;
if (iview->image->fmask.size) {
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(iview->image->fmask.pitch_in_pixels / 8 - 1);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(iview->image->fmask.tile_mode_index);
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(iview->image->fmask.slice_tile_max);
} else {
/* This must be set for fast clear to work without FMASK. */
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(pitch_tile_max);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(tile_mode_index);
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(slice_tile_max);
}
}
va += level_info->offset;
cb->cb_color_base = va >> 8;
/* CMASK variables */
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->cmask.offset;
cb->cb_color_cmask = va >> 8;
cb->cb_color_cmask_slice = iview->image->cmask.slice_tile_max;
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->dcc_offset;
@@ -2763,8 +2651,20 @@ radv_initialise_color_surface(struct radv_device *device,
cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) |
S_028C6C_SLICE_MAX(iview->base_layer + max_slice - 1);
if (iview->image->info.samples > 1) {
unsigned log_samples = util_logbase2(iview->image->info.samples);
cb->micro_tile_mode = iview->image->surface.micro_tile_mode;
pitch_tile_max = level_info->nblk_x / 8 - 1;
slice_tile_max = (level_info->nblk_x * level_info->nblk_y) / 64 - 1;
tile_mode_index = si_tile_mode_index(iview->image, iview->base_mip, false);
cb->cb_color_pitch = S_028C64_TILE_MAX(pitch_tile_max);
cb->cb_color_slice = S_028C68_TILE_MAX(slice_tile_max);
/* Intensity is implemented as Red, so treat it that way. */
cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == VK_SWIZZLE_1) |
S_028C74_TILE_MODE_INDEX(tile_mode_index);
if (iview->image->samples > 1) {
unsigned log_samples = util_logbase2(iview->image->samples);
cb->cb_color_attrib |= S_028C74_NUM_SAMPLES(log_samples) |
S_028C74_NUM_FRAGMENTS(log_samples);
@@ -2772,9 +2672,18 @@ radv_initialise_color_surface(struct radv_device *device,
if (iview->image->fmask.size) {
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset;
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(iview->image->fmask.pitch_in_pixels / 8 - 1);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(iview->image->fmask.tile_mode_index);
cb->cb_color_fmask = va >> 8;
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(iview->image->fmask.slice_tile_max);
} else {
/* This must be set for fast clear to work without FMASK. */
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(pitch_tile_max);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(tile_mode_index);
cb->cb_color_fmask = cb->cb_color_base;
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(slice_tile_max);
}
ntype = radv_translate_color_numformat(iview->vk_format,
@@ -2819,7 +2728,7 @@ radv_initialise_color_surface(struct radv_device *device,
format != V_028C70_COLOR_24_8) |
S_028C70_NUMBER_TYPE(ntype) |
S_028C70_ENDIAN(endian);
if (iview->image->info.samples > 1)
if (iview->image->samples > 1)
if (iview->image->fmask.size)
cb->cb_color_info |= S_028C70_COMPRESSION(1);
@@ -2827,12 +2736,12 @@ radv_initialise_color_surface(struct radv_device *device,
!(device->debug_flags & RADV_DEBUG_NO_FAST_CLEARS))
cb->cb_color_info |= S_028C70_FAST_CLEAR(1);
if (iview->image->surface.dcc_size && iview->base_mip < surf->num_dcc_levels)
if (iview->image->surface.dcc_size && level_info->dcc_enabled)
cb->cb_color_info |= S_028C70_DCC_ENABLE(1);
if (device->physical_device->rad_info.chip_class >= VI) {
unsigned max_uncompressed_block_size = 2;
if (iview->image->info.samples > 1) {
if (iview->image->samples > 1) {
if (iview->image->surface.bpe == 1)
max_uncompressed_block_size = 0;
else if (iview->image->surface.bpe == 2)
@@ -2846,24 +2755,9 @@ radv_initialise_color_surface(struct radv_device *device,
/* This must be set for fast clear to work without FMASK. */
if (!iview->image->fmask.size &&
device->physical_device->rad_info.chip_class == SI) {
unsigned bankh = util_logbase2(iview->image->surface.u.legacy.bankh);
unsigned bankh = util_logbase2(iview->image->surface.bankh);
cb->cb_color_attrib |= S_028C74_FMASK_BANK_HEIGHT(bankh);
}
if (device->physical_device->rad_info.chip_class >= GFX9) {
uint32_t max_slice = radv_surface_layer_count(iview);
unsigned mip0_depth = iview->base_layer + max_slice - 1;
cb->cb_color_view |= S_028C6C_MIP_LEVEL(iview->base_mip);
cb->cb_color_attrib |= S_028C74_MIP0_DEPTH(mip0_depth) |
S_028C74_RESOURCE_TYPE(iview->image->surface.u.gfx9.resource_type);
cb->cb_color_attrib2 = S_028C68_MIP0_WIDTH(iview->image->info.width - 1) |
S_028C68_MIP0_HEIGHT(iview->image->info.height - 1) |
S_028C68_MAX_MIP(iview->image->info.levels);
cb->gfx9_epitch = S_0287A0_EPITCH(iview->image->surface.u.gfx9.surf.epitch);
}
}
static void
@@ -2872,8 +2766,9 @@ radv_initialise_ds_surface(struct radv_device *device,
struct radv_image_view *iview)
{
unsigned level = iview->base_mip;
unsigned format, stencil_format;
unsigned format;
uint64_t va, s_offs, z_offs;
const struct radeon_surf_level *level_info = &iview->image->surface.level[level];
bool stencil_only = false;
memset(ds, 0, sizeof(*ds));
switch (iview->vk_format) {
@@ -2895,121 +2790,98 @@ radv_initialise_ds_surface(struct radv_device *device,
break;
case VK_FORMAT_S8_UINT:
stencil_only = true;
level_info = &iview->image->surface.stencil_level[level];
break;
default:
break;
}
format = radv_translate_dbformat(iview->vk_format);
stencil_format = iview->image->surface.flags & RADEON_SURF_SBUFFER ?
V_028044_STENCIL_8 : V_028044_STENCIL_INVALID;
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
s_offs = z_offs = va;
z_offs += iview->image->surface.level[level].offset;
s_offs += iview->image->surface.stencil_level[level].offset;
uint32_t max_slice = radv_surface_layer_count(iview);
ds->db_depth_view = S_028008_SLICE_START(iview->base_layer) |
S_028008_SLICE_MAX(iview->base_layer + max_slice - 1);
ds->db_depth_info = S_02803C_ADDR5_SWIZZLE_MASK(1);
ds->db_z_info = S_028040_FORMAT(format) | S_028040_ZRANGE_PRECISION(1);
ds->db_htile_data_base = 0;
ds->db_htile_surface = 0;
if (iview->image->samples > 1)
ds->db_z_info |= S_028040_NUM_SAMPLES(util_logbase2(iview->image->samples));
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
s_offs = z_offs = va;
if (iview->image->surface.flags & RADEON_SURF_SBUFFER)
ds->db_stencil_info = S_028044_FORMAT(V_028044_STENCIL_8);
else
ds->db_stencil_info = S_028044_FORMAT(V_028044_STENCIL_INVALID);
if (device->physical_device->rad_info.chip_class >= GFX9) {
assert(iview->image->surface.u.gfx9.surf_offset == 0);
s_offs += iview->image->surface.u.gfx9.stencil_offset;
ds->db_z_info = S_028038_FORMAT(format) |
S_028038_NUM_SAMPLES(util_logbase2(iview->image->info.samples)) |
S_028038_SW_MODE(iview->image->surface.u.gfx9.surf.swizzle_mode) |
S_028038_MAXMIP(iview->image->info.levels - 1);
ds->db_stencil_info = S_02803C_FORMAT(stencil_format) |
S_02803C_SW_MODE(iview->image->surface.u.gfx9.stencil.swizzle_mode);
ds->db_z_info2 = S_028068_EPITCH(iview->image->surface.u.gfx9.surf.epitch);
ds->db_stencil_info2 = S_02806C_EPITCH(iview->image->surface.u.gfx9.stencil.epitch);
ds->db_depth_view |= S_028008_MIPID(level);
ds->db_depth_size = S_02801C_X_MAX(iview->image->info.width - 1) |
S_02801C_Y_MAX(iview->image->info.height - 1);
/* Only use HTILE for the first level. */
if (iview->image->surface.htile_size && !level) {
ds->db_z_info |= S_028038_TILE_SURFACE_ENABLE(1);
if (!(iview->image->surface.flags & RADEON_SURF_SBUFFER))
/* Use all of the htile_buffer for depth if there's no stencil. */
ds->db_stencil_info |= S_02803C_TILE_STENCIL_DISABLE(1);
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset +
iview->image->htile_offset;
ds->db_htile_data_base = va >> 8;
ds->db_htile_surface = S_028ABC_FULL_CACHE(1) |
S_028ABC_PIPE_ALIGNED(iview->image->surface.u.gfx9.htile.pipe_aligned) |
S_028ABC_RB_ALIGNED(iview->image->surface.u.gfx9.htile.rb_aligned);
}
} else {
const struct legacy_surf_level *level_info = &iview->image->surface.u.legacy.level[level];
if (device->physical_device->rad_info.chip_class >= CIK) {
struct radeon_info *info = &device->physical_device->rad_info;
unsigned tiling_index = iview->image->surface.tiling_index[level];
unsigned stencil_index = iview->image->surface.stencil_tiling_index[level];
unsigned macro_index = iview->image->surface.macro_tile_index;
unsigned tile_mode = info->si_tile_mode_array[tiling_index];
unsigned stencil_tile_mode = info->si_tile_mode_array[stencil_index];
unsigned macro_mode = info->cik_macrotile_mode_array[macro_index];
if (stencil_only)
level_info = &iview->image->surface.u.legacy.stencil_level[level];
tile_mode = stencil_tile_mode;
z_offs += iview->image->surface.u.legacy.level[level].offset;
s_offs += iview->image->surface.u.legacy.stencil_level[level].offset;
ds->db_depth_info |=
S_02803C_ARRAY_MODE(G_009910_ARRAY_MODE(tile_mode)) |
S_02803C_PIPE_CONFIG(G_009910_PIPE_CONFIG(tile_mode)) |
S_02803C_BANK_WIDTH(G_009990_BANK_WIDTH(macro_mode)) |
S_02803C_BANK_HEIGHT(G_009990_BANK_HEIGHT(macro_mode)) |
S_02803C_MACRO_TILE_ASPECT(G_009990_MACRO_TILE_ASPECT(macro_mode)) |
S_02803C_NUM_BANKS(G_009990_NUM_BANKS(macro_mode));
ds->db_z_info |= S_028040_TILE_SPLIT(G_009910_TILE_SPLIT(tile_mode));
ds->db_stencil_info |= S_028044_TILE_SPLIT(G_009910_TILE_SPLIT(stencil_tile_mode));
} else {
unsigned tile_mode_index = si_tile_mode_index(iview->image, level, false);
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
tile_mode_index = si_tile_mode_index(iview->image, level, true);
ds->db_stencil_info |= S_028044_TILE_MODE_INDEX(tile_mode_index);
}
ds->db_depth_info = S_02803C_ADDR5_SWIZZLE_MASK(1);
ds->db_z_info = S_028040_FORMAT(format) | S_028040_ZRANGE_PRECISION(1);
ds->db_stencil_info = S_028044_FORMAT(stencil_format);
if (iview->image->surface.htile_size && !level) {
ds->db_z_info |= S_028040_TILE_SURFACE_ENABLE(1) |
S_028040_ALLOW_EXPCLEAR(1);
if (iview->image->info.samples > 1)
ds->db_z_info |= S_028040_NUM_SAMPLES(util_logbase2(iview->image->info.samples));
if (iview->image->surface.flags & RADEON_SURF_SBUFFER) {
/* Workaround: For a not yet understood reason, the
* combination of MSAA, fast stencil clear and stencil
* decompress messes with subsequent stencil buffer
* uses. Problem was reproduced on Verde, Bonaire,
* Tonga, and Carrizo.
*
* Disabling EXPCLEAR works around the problem.
*
* Check piglit's arb_texture_multisample-stencil-clear
* test if you want to try changing this.
*/
if (iview->image->samples <= 1)
ds->db_stencil_info |= S_028044_ALLOW_EXPCLEAR(1);
} else
/* Use all of the htile_buffer for depth if there's no stencil. */
ds->db_stencil_info |= S_028044_TILE_STENCIL_DISABLE(1);
if (device->physical_device->rad_info.chip_class >= CIK) {
struct radeon_info *info = &device->physical_device->rad_info;
unsigned tiling_index = iview->image->surface.u.legacy.tiling_index[level];
unsigned stencil_index = iview->image->surface.u.legacy.stencil_tiling_index[level];
unsigned macro_index = iview->image->surface.u.legacy.macro_tile_index;
unsigned tile_mode = info->si_tile_mode_array[tiling_index];
unsigned stencil_tile_mode = info->si_tile_mode_array[stencil_index];
unsigned macro_mode = info->cik_macrotile_mode_array[macro_index];
if (stencil_only)
tile_mode = stencil_tile_mode;
ds->db_depth_info |=
S_02803C_ARRAY_MODE(G_009910_ARRAY_MODE(tile_mode)) |
S_02803C_PIPE_CONFIG(G_009910_PIPE_CONFIG(tile_mode)) |
S_02803C_BANK_WIDTH(G_009990_BANK_WIDTH(macro_mode)) |
S_02803C_BANK_HEIGHT(G_009990_BANK_HEIGHT(macro_mode)) |
S_02803C_MACRO_TILE_ASPECT(G_009990_MACRO_TILE_ASPECT(macro_mode)) |
S_02803C_NUM_BANKS(G_009990_NUM_BANKS(macro_mode));
ds->db_z_info |= S_028040_TILE_SPLIT(G_009910_TILE_SPLIT(tile_mode));
ds->db_stencil_info |= S_028044_TILE_SPLIT(G_009910_TILE_SPLIT(stencil_tile_mode));
} else {
unsigned tile_mode_index = si_tile_mode_index(iview->image, level, false);
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
tile_mode_index = si_tile_mode_index(iview->image, level, true);
ds->db_stencil_info |= S_028044_TILE_MODE_INDEX(tile_mode_index);
}
ds->db_depth_size = S_028058_PITCH_TILE_MAX((level_info->nblk_x / 8) - 1) |
S_028058_HEIGHT_TILE_MAX((level_info->nblk_y / 8) - 1);
ds->db_depth_slice = S_02805C_SLICE_TILE_MAX((level_info->nblk_x * level_info->nblk_y) / 64 - 1);
if (iview->image->surface.htile_size && !level) {
ds->db_z_info |= S_028040_TILE_SURFACE_ENABLE(1);
if (!(iview->image->surface.flags & RADEON_SURF_SBUFFER))
/* Use all of the htile_buffer for depth if there's no stencil. */
ds->db_stencil_info |= S_028044_TILE_STENCIL_DISABLE(1);
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset +
iview->image->htile_offset;
ds->db_htile_data_base = va >> 8;
ds->db_htile_surface = S_028ABC_FULL_CACHE(1);
}
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset +
iview->image->htile_offset;
ds->db_htile_data_base = va >> 8;
ds->db_htile_surface = S_028ABC_FULL_CACHE(1);
} else {
ds->db_htile_data_base = 0;
ds->db_htile_surface = 0;
}
ds->db_z_read_base = ds->db_z_write_base = z_offs >> 8;
ds->db_stencil_read_base = ds->db_stencil_write_base = s_offs >> 8;
ds->db_depth_size = S_028058_PITCH_TILE_MAX((level_info->nblk_x / 8) - 1) |
S_028058_HEIGHT_TILE_MAX((level_info->nblk_y / 8) - 1);
ds->db_depth_slice = S_02805C_SLICE_TILE_MAX((level_info->nblk_x * level_info->nblk_y) / 64 - 1);
}
VkResult radv_CreateFramebuffer(
@@ -3243,6 +3115,7 @@ void radv_DestroySampler(
vk_free2(&device->alloc, pAllocator, sampler);
}
/* vk_icd.h does not declare this function, so we declare it here to
* suppress Wmissing-prototypes.
*/
@@ -3286,34 +3159,3 @@ vk_icdNegotiateLoaderICDInterfaceVersion(uint32_t *pSupportedVersion)
*pSupportedVersion = MIN2(*pSupportedVersion, 3u);
return VK_SUCCESS;
}
VkResult radv_GetMemoryFdKHX(VkDevice _device,
VkDeviceMemory _memory,
VkExternalMemoryHandleTypeFlagsKHX handleType,
int *pFD)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_device_memory, memory, _memory);
/* We support only one handle type. */
assert(handleType == VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX);
bool ret = radv_get_memory_fd(device, memory, pFD);
if (ret == false)
return VK_ERROR_OUT_OF_DEVICE_MEMORY;
return VK_SUCCESS;
}
VkResult radv_GetMemoryFdPropertiesKHX(VkDevice _device,
VkExternalMemoryHandleTypeFlagBitsKHX handleType,
int fd,
VkMemoryFdPropertiesKHX *pMemoryFdProperties)
{
/* The valid usage section for this function says:
*
* "handleType must not be one of the handle types defined as opaque."
*
* Since we only handle opaque handles for now, there are no FD properties.
*/
return VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX;
}

View File

@@ -42,9 +42,6 @@ supported_extensions = [
'VK_KHR_wayland_surface',
'VK_KHR_xcb_surface',
'VK_KHR_xlib_surface',
'VK_KHX_external_memory_capabilities',
'VK_KHX_external_memory',
'VK_KHX_external_memory_fd',
]
# We generate a static hash table for entry point lookup

View File

@@ -28,8 +28,6 @@
#include "sid.h"
#include "r600d_common.h"
#include "vk_util.h"
#include "util/u_half.h"
#include "util/format_srgb.h"
#include "util/format_r11g11b10f.h"
@@ -1008,11 +1006,16 @@ void radv_GetPhysicalDeviceFormatProperties2KHR(
&pFormatProperties->formatProperties);
}
static VkResult radv_get_image_format_properties(struct radv_physical_device *physical_device,
const VkPhysicalDeviceImageFormatInfo2KHR *info,
VkImageFormatProperties *pImageFormatProperties)
VkResult radv_GetPhysicalDeviceImageFormatProperties(
VkPhysicalDevice physicalDevice,
VkFormat format,
VkImageType type,
VkImageTiling tiling,
VkImageUsageFlags usage,
VkImageCreateFlags createFlags,
VkImageFormatProperties* pImageFormatProperties)
{
RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
VkFormatProperties format_props;
VkFormatFeatureFlags format_feature_flags;
VkExtent3D maxExtent;
@@ -1020,11 +1023,11 @@ static VkResult radv_get_image_format_properties(struct radv_physical_device *ph
uint32_t maxArraySize;
VkSampleCountFlags sampleCounts = VK_SAMPLE_COUNT_1_BIT;
radv_physical_device_get_format_properties(physical_device, info->format,
radv_physical_device_get_format_properties(physical_device, format,
&format_props);
if (info->tiling == VK_IMAGE_TILING_LINEAR) {
if (tiling == VK_IMAGE_TILING_LINEAR) {
format_feature_flags = format_props.linearTilingFeatures;
} else if (info->tiling == VK_IMAGE_TILING_OPTIMAL) {
} else if (tiling == VK_IMAGE_TILING_OPTIMAL) {
format_feature_flags = format_props.optimalTilingFeatures;
} else {
unreachable("bad VkImageTiling");
@@ -1033,7 +1036,7 @@ static VkResult radv_get_image_format_properties(struct radv_physical_device *ph
if (format_feature_flags == 0)
goto unsupported;
switch (info->type) {
switch (type) {
default:
unreachable("bad vkimage type\n");
case VK_IMAGE_TYPE_1D:
@@ -1059,34 +1062,34 @@ static VkResult radv_get_image_format_properties(struct radv_physical_device *ph
break;
}
if (info->tiling == VK_IMAGE_TILING_OPTIMAL &&
info->type == VK_IMAGE_TYPE_2D &&
if (tiling == VK_IMAGE_TILING_OPTIMAL &&
type == VK_IMAGE_TYPE_2D &&
(format_feature_flags & (VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT |
VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT)) &&
!(info->flags & VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT) &&
!(info->usage & VK_IMAGE_USAGE_STORAGE_BIT)) {
!(createFlags & VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT) &&
!(usage & VK_IMAGE_USAGE_STORAGE_BIT)) {
sampleCounts |= VK_SAMPLE_COUNT_2_BIT | VK_SAMPLE_COUNT_4_BIT | VK_SAMPLE_COUNT_8_BIT;
}
if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
if (usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT)) {
goto unsupported;
}
}
if (info->usage & VK_IMAGE_USAGE_STORAGE_BIT) {
if (usage & VK_IMAGE_USAGE_STORAGE_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_STORAGE_IMAGE_BIT)) {
goto unsupported;
}
}
if (info->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) {
if (usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT)) {
goto unsupported;
}
}
if (info->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
if (usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT)) {
goto unsupported;
}
@@ -1117,132 +1120,18 @@ unsupported:
return VK_ERROR_FORMAT_NOT_SUPPORTED;
}
VkResult radv_GetPhysicalDeviceImageFormatProperties(
VkPhysicalDevice physicalDevice,
VkFormat format,
VkImageType type,
VkImageTiling tiling,
VkImageUsageFlags usage,
VkImageCreateFlags createFlags,
VkImageFormatProperties* pImageFormatProperties)
{
RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
const VkPhysicalDeviceImageFormatInfo2KHR info = {
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_IMAGE_FORMAT_INFO_2_KHR,
.pNext = NULL,
.format = format,
.type = type,
.tiling = tiling,
.usage = usage,
.flags = createFlags,
};
return radv_get_image_format_properties(physical_device, &info,
pImageFormatProperties);
}
static void
get_external_image_format_properties(const VkPhysicalDeviceImageFormatInfo2KHR *pImageFormatInfo,
VkExternalMemoryPropertiesKHX *external_properties)
{
VkExternalMemoryFeatureFlagBitsKHX flags = 0;
VkExternalMemoryHandleTypeFlagsKHX export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHX compat_flags = 0;
switch (pImageFormatInfo->type) {
case VK_IMAGE_TYPE_2D:
flags = VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT_KHX|VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHX|VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHX;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX;
break;
default:
break;
}
*external_properties = (VkExternalMemoryPropertiesKHX) {
.externalMemoryFeatures = flags,
.exportFromImportedHandleTypes = export_flags,
.compatibleHandleTypes = compat_flags,
};
}
VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceImageFormatInfo2KHR *base_info,
VkImageFormatProperties2KHR *base_props)
const VkPhysicalDeviceImageFormatInfo2KHR* pImageFormatInfo,
VkImageFormatProperties2KHR *pImageFormatProperties)
{
RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
const VkPhysicalDeviceExternalImageFormatInfoKHX *external_info = NULL;
VkExternalImageFormatPropertiesKHX *external_props = NULL;
VkResult result;
result = radv_get_image_format_properties(physical_device, base_info,
&base_props->imageFormatProperties);
if (result != VK_SUCCESS)
return result;
/* Extract input structs */
vk_foreach_struct_const(s, base_info->pNext) {
switch (s->sType) {
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_EXTERNAL_IMAGE_FORMAT_INFO_KHX:
external_info = (const void *) s;
break;
default:
break;
}
}
/* Extract output structs */
vk_foreach_struct(s, base_props->pNext) {
switch (s->sType) {
case VK_STRUCTURE_TYPE_EXTERNAL_IMAGE_FORMAT_PROPERTIES_KHX:
external_props = (void *) s;
break;
default:
break;
}
}
/* From the Vulkan 1.0.42 spec:
*
* If handleType is 0, vkGetPhysicalDeviceImageFormatProperties2KHR will
* behave as if VkPhysicalDeviceExternalImageFormatInfoKHX was not
* present and VkExternalImageFormatPropertiesKHX will be ignored.
*/
if (external_info && external_info->handleType != 0) {
switch (external_info->handleType) {
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX:
get_external_image_format_properties(base_info, &external_props->externalMemoryProperties);
break;
default:
/* From the Vulkan 1.0.42 spec:
*
* If handleType is not compatible with the [parameters] specified
* in VkPhysicalDeviceImageFormatInfo2KHR, then
* vkGetPhysicalDeviceImageFormatProperties2KHR returns
* VK_ERROR_FORMAT_NOT_SUPPORTED.
*/
result = vk_errorf(VK_ERROR_FORMAT_NOT_SUPPORTED,
"unsupported VkExternalMemoryTypeFlagBitsKHX 0x%x",
external_info->handleType);
goto fail;
}
}
return VK_SUCCESS;
fail:
if (result == VK_ERROR_FORMAT_NOT_SUPPORTED) {
/* From the Vulkan 1.0.42 spec:
*
* If the combination of parameters to
* vkGetPhysicalDeviceImageFormatProperties2KHR is not supported by
* the implementation for use in vkCreateImage, then all members of
* imageFormatProperties will be filled with zero.
*/
base_props->imageFormatProperties = (VkImageFormatProperties) {0};
}
return result;
return radv_GetPhysicalDeviceImageFormatProperties(physicalDevice,
pImageFormatInfo->format,
pImageFormatInfo->type,
pImageFormatInfo->tiling,
pImageFormatInfo->usage,
pImageFormatInfo->flags,
&pImageFormatProperties->imageFormatProperties);
}
void radv_GetPhysicalDeviceSparseImageFormatProperties(
@@ -1268,28 +1157,3 @@ void radv_GetPhysicalDeviceSparseImageFormatProperties2KHR(
/* Sparse images are not yet supported. */
*pPropertyCount = 0;
}
void radv_GetPhysicalDeviceExternalBufferPropertiesKHX(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceExternalBufferInfoKHX *pExternalBufferInfo,
VkExternalBufferPropertiesKHX *pExternalBufferProperties)
{
VkExternalMemoryFeatureFlagBitsKHX flags = 0;
VkExternalMemoryHandleTypeFlagsKHX export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHX compat_flags = 0;
switch(pExternalBufferInfo->handleType) {
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX:
flags = VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT_KHX |
VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHX |
VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHX;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX;
break;
default:
break;
}
pExternalBufferProperties->externalMemoryProperties = (VkExternalMemoryPropertiesKHX) {
.externalMemoryFeatures = flags,
.exportFromImportedHandleTypes = export_flags,
.compatibleHandleTypes = compat_flags,
};
}

View File

@@ -29,7 +29,6 @@
#include "vk_format.h"
#include "radv_radeon_winsys.h"
#include "sid.h"
#include "gfx9d.h"
#include "util/debug.h"
static unsigned
radv_choose_tiling(struct radv_device *Device,
@@ -68,15 +67,22 @@ radv_init_surface(struct radv_device *device,
is_depth = vk_format_has_depth(desc);
is_stencil = vk_format_has_stencil(desc);
surface->npix_x = pCreateInfo->extent.width;
surface->npix_y = pCreateInfo->extent.height;
surface->npix_z = pCreateInfo->extent.depth;
surface->blk_w = vk_format_get_blockwidth(pCreateInfo->format);
surface->blk_h = vk_format_get_blockheight(pCreateInfo->format);
surface->blk_d = 1;
surface->array_size = pCreateInfo->arrayLayers;
surface->last_level = pCreateInfo->mipLevels - 1;
surface->bpe = vk_format_get_blocksize(pCreateInfo->format);
/* align byte per element on dword */
if (surface->bpe == 3) {
surface->bpe = 4;
}
surface->nsamples = pCreateInfo->samples ? pCreateInfo->samples : 1;
surface->flags = RADEON_SURF_SET(array_mode, MODE);
switch (pCreateInfo->imageType){
@@ -104,7 +110,8 @@ radv_init_surface(struct radv_device *device,
}
if (is_stencil)
surface->flags |= RADEON_SURF_SBUFFER;
surface->flags |= RADEON_SURF_SBUFFER |
RADEON_SURF_HAS_SBUFFER_MIPTREE;
surface->flags |= RADEON_SURF_HAS_TILE_MODE_INDEX;
@@ -130,9 +137,9 @@ static inline unsigned
si_tile_mode_index(const struct radv_image *image, unsigned level, bool stencil)
{
if (stencil)
return image->surface.u.legacy.stencil_tiling_index[level];
return image->surface.stencil_tiling_index[level];
else
return image->surface.u.legacy.tiling_index[level];
return image->surface.tiling_index[level];
}
static unsigned radv_map_swizzle(unsigned swizzle)
@@ -190,80 +197,33 @@ radv_make_buffer_descriptor(struct radv_device *device,
static void
si_set_mutable_tex_desc_fields(struct radv_device *device,
struct radv_image *image,
const struct legacy_surf_level *base_level_info,
const struct radeon_surf_level *base_level_info,
unsigned base_level, unsigned first_level,
unsigned block_width, bool is_stencil,
uint32_t *state)
{
uint64_t gpu_address = device->ws->buffer_get_va(image->bo) + image->offset;
uint64_t va = gpu_address;
uint64_t va = gpu_address + base_level_info->offset;
unsigned pitch = base_level_info->nblk_x * block_width;
enum chip_class chip_class = device->physical_device->rad_info.chip_class;
uint64_t meta_va = 0;
if (chip_class >= GFX9) {
if (is_stencil)
va += image->surface.u.gfx9.stencil_offset;
else
va += image->surface.u.gfx9.surf_offset;
} else
va += base_level_info->offset;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[3] &= C_008F1C_TILING_INDEX;
state[4] &= C_008F20_PITCH_GFX6;
state[6] &= C_008F28_COMPRESSION_EN;
assert(!(va & 255));
state[0] = va >> 8;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
state[3] |= S_008F1C_TILING_INDEX(si_tile_mode_index(image, base_level,
is_stencil));
state[4] |= S_008F20_PITCH_GFX6(pitch - 1);
if (chip_class >= VI) {
state[6] &= C_008F28_COMPRESSION_EN;
state[7] = 0;
if (image->surface.dcc_size && first_level < image->surface.num_dcc_levels) {
uint64_t meta_va = gpu_address + image->dcc_offset;
if (chip_class <= VI)
meta_va += base_level_info->dcc_offset;
state[6] |= S_008F28_COMPRESSION_EN(1);
state[7] = meta_va >> 8;
}
}
if (chip_class >= GFX9) {
state[3] &= C_008F1C_SW_MODE;
state[4] &= C_008F20_PITCH_GFX9;
if (is_stencil) {
state[3] |= S_008F1C_SW_MODE(image->surface.u.gfx9.stencil.swizzle_mode);
state[4] |= S_008F20_PITCH_GFX9(image->surface.u.gfx9.stencil.epitch);
} else {
state[3] |= S_008F1C_SW_MODE(image->surface.u.gfx9.surf.swizzle_mode);
state[4] |= S_008F20_PITCH_GFX9(image->surface.u.gfx9.surf.epitch);
}
state[5] &= C_008F24_META_DATA_ADDRESS &
C_008F24_META_PIPE_ALIGNED &
C_008F24_META_RB_ALIGNED;
if (meta_va) {
struct gfx9_surf_meta_flags meta;
if (image->dcc_offset)
meta = image->surface.u.gfx9.dcc;
else
meta = image->surface.u.gfx9.htile;
state[5] |= S_008F24_META_DATA_ADDRESS(meta_va >> 40) |
S_008F24_META_PIPE_ALIGNED(meta.pipe_aligned) |
S_008F24_META_RB_ALIGNED(meta.rb_aligned);
}
} else {
/* SI-CI-VI */
unsigned pitch = base_level_info->nblk_x * block_width;
unsigned index = si_tile_mode_index(image, base_level, is_stencil);
state[3] &= C_008F1C_TILING_INDEX;
state[3] |= S_008F1C_TILING_INDEX(index);
state[4] &= C_008F20_PITCH_GFX6;
state[4] |= S_008F20_PITCH_GFX6(pitch - 1);
if (image->surface.dcc_size && image->surface.level[first_level].dcc_enabled) {
state[6] |= S_008F28_COMPRESSION_EN(1);
state[7] = (gpu_address +
image->dcc_offset +
base_level_info->dcc_offset) >> 8;
}
}
@@ -289,36 +249,6 @@ static unsigned radv_tex_dim(VkImageType image_type, VkImageViewType view_type,
unreachable("illegale image type");
}
}
static unsigned gfx9_border_color_swizzle(const unsigned char swizzle[4])
{
unsigned bc_swizzle = V_008F20_BC_SWIZZLE_XYZW;
if (swizzle[3] == VK_SWIZZLE_X) {
/* For the pre-defined border color values (white, opaque
* black, transparent black), the only thing that matters is
* that the alpha channel winds up in the correct place
* (because the RGB channels are all the same) so either of
* these enumerations will work.
*/
if (swizzle[2] == VK_SWIZZLE_Y)
bc_swizzle = V_008F20_BC_SWIZZLE_WZYX;
else
bc_swizzle = V_008F20_BC_SWIZZLE_WXYZ;
} else if (swizzle[0] == VK_SWIZZLE_X) {
if (swizzle[1] == VK_SWIZZLE_Y)
bc_swizzle = V_008F20_BC_SWIZZLE_XYZW;
else
bc_swizzle = V_008F20_BC_SWIZZLE_XWYZ;
} else if (swizzle[1] == VK_SWIZZLE_X) {
bc_swizzle = V_008F20_BC_SWIZZLE_YXWZ;
} else if (swizzle[2] == VK_SWIZZLE_X) {
bc_swizzle = V_008F20_BC_SWIZZLE_ZYXW;
}
return bc_swizzle;
}
/**
* Build the sampler view descriptor for a texture.
*/
@@ -361,59 +291,40 @@ si_make_texture_descriptor(struct radv_device *device,
data_format = 0;
}
type = radv_tex_dim(image->type, view_type, image->info.array_size, image->info.samples,
type = radv_tex_dim(image->type, view_type, image->array_size, image->samples,
(image->usage & VK_IMAGE_USAGE_STORAGE_BIT));
if (type == V_008F1C_SQ_RSRC_IMG_1D_ARRAY) {
height = 1;
depth = image->info.array_size;
depth = image->array_size;
} else if (type == V_008F1C_SQ_RSRC_IMG_2D_ARRAY ||
type == V_008F1C_SQ_RSRC_IMG_2D_MSAA_ARRAY) {
if (view_type != VK_IMAGE_VIEW_TYPE_3D)
depth = image->info.array_size;
depth = image->array_size;
} else if (type == V_008F1C_SQ_RSRC_IMG_CUBE)
depth = image->info.array_size / 6;
depth = image->array_size / 6;
state[0] = 0;
state[1] = (S_008F14_DATA_FORMAT_GFX6(data_format) |
S_008F14_NUM_FORMAT_GFX6(num_format));
state[2] = (S_008F18_WIDTH(width - 1) |
S_008F18_HEIGHT(height - 1) |
S_008F18_PERF_MOD(4));
S_008F18_HEIGHT(height - 1));
state[3] = (S_008F1C_DST_SEL_X(radv_map_swizzle(swizzle[0])) |
S_008F1C_DST_SEL_Y(radv_map_swizzle(swizzle[1])) |
S_008F1C_DST_SEL_Z(radv_map_swizzle(swizzle[2])) |
S_008F1C_DST_SEL_W(radv_map_swizzle(swizzle[3])) |
S_008F1C_BASE_LEVEL(image->info.samples > 1 ?
S_008F1C_BASE_LEVEL(image->samples > 1 ?
0 : first_level) |
S_008F1C_LAST_LEVEL(image->info.samples > 1 ?
util_logbase2(image->info.samples) :
S_008F1C_LAST_LEVEL(image->samples > 1 ?
util_logbase2(image->samples) :
last_level) |
S_008F1C_POW2_PAD(image->levels > 1) |
S_008F1C_TYPE(type));
state[4] = 0;
state[5] = S_008F24_BASE_ARRAY(first_layer);
state[4] = S_008F20_DEPTH(depth - 1);
state[5] = (S_008F24_BASE_ARRAY(first_layer) |
S_008F24_LAST_ARRAY(last_layer));
state[6] = 0;
state[7] = 0;
if (device->physical_device->rad_info.chip_class >= GFX9) {
unsigned bc_swizzle = gfx9_border_color_swizzle(desc->swizzle);
/* Depth is the the last accessible layer on Gfx9.
* The hw doesn't need to know the total number of layers.
*/
if (type == V_008F1C_SQ_RSRC_IMG_3D)
state[4] |= S_008F20_DEPTH(depth - 1);
else
state[4] |= S_008F20_DEPTH(last_layer);
state[4] |= S_008F20_BC_SWIZZLE(bc_swizzle);
state[5] |= S_008F24_MAX_MIP(image->info.samples > 1 ?
util_logbase2(image->info.samples) :
last_level);
} else {
state[3] |= S_008F1C_POW2_PAD(image->info.levels > 1);
state[4] |= S_008F20_DEPTH(depth - 1);
state[5] |= S_008F24_LAST_ARRAY(last_layer);
}
if (image->dcc_offset) {
unsigned swap = radv_translate_colorswap(vk_format, FALSE);
@@ -422,7 +333,7 @@ si_make_texture_descriptor(struct radv_device *device,
/* The last dword is unused by hw. The shader uses it to clear
* bits in the first dword of sampler state.
*/
if (device->physical_device->rad_info.chip_class <= CIK && image->info.samples <= 1) {
if (device->physical_device->rad_info.chip_class <= CIK && image->samples <= 1) {
if (first_level == last_level)
state[7] = C_008F30_MAX_ANISO_RATIO;
else
@@ -432,73 +343,45 @@ si_make_texture_descriptor(struct radv_device *device,
/* Initialize the sampler view for FMASK. */
if (image->fmask.size) {
uint32_t fmask_format, num_format;
uint32_t fmask_format;
uint64_t gpu_address = device->ws->buffer_get_va(image->bo);
uint64_t va;
va = gpu_address + image->offset + image->fmask.offset;
if (device->physical_device->rad_info.chip_class >= GFX9) {
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK;
switch (image->info.samples) {
case 2:
num_format = V_008F14_IMG_FMASK_8_2_2;
break;
case 4:
num_format = V_008F14_IMG_FMASK_8_4_4;
break;
case 8:
num_format = V_008F14_IMG_FMASK_32_8_8;
break;
default:
unreachable("invalid nr_samples");
}
} else {
switch (image->info.samples) {
case 2:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S2_F2;
break;
case 4:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S4_F4;
break;
case 8:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK32_S8_F8;
break;
default:
assert(0);
fmask_format = V_008F14_IMG_DATA_FORMAT_INVALID;
}
num_format = V_008F14_IMG_NUM_FORMAT_UINT;
switch (image->samples) {
case 2:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S2_F2;
break;
case 4:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S4_F4;
break;
case 8:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK32_S8_F8;
break;
default:
assert(0);
fmask_format = V_008F14_IMG_DATA_FORMAT_INVALID;
}
fmask_state[0] = va >> 8;
fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >> 40) |
S_008F14_DATA_FORMAT_GFX6(fmask_format) |
S_008F14_NUM_FORMAT_GFX6(num_format);
S_008F14_NUM_FORMAT_GFX6(V_008F14_IMG_NUM_FORMAT_UINT);
fmask_state[2] = S_008F18_WIDTH(width - 1) |
S_008F18_HEIGHT(height - 1);
fmask_state[3] = S_008F1C_DST_SEL_X(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_Z(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_W(V_008F1C_SQ_SEL_X) |
S_008F1C_TILING_INDEX(image->fmask.tile_mode_index) |
S_008F1C_TYPE(radv_tex_dim(image->type, view_type, 1, 0, false));
fmask_state[4] = 0;
fmask_state[5] = S_008F24_BASE_ARRAY(first_layer);
fmask_state[4] = S_008F20_DEPTH(depth - 1) |
S_008F20_PITCH_GFX6(image->fmask.pitch_in_pixels - 1);
fmask_state[5] = S_008F24_BASE_ARRAY(first_layer) |
S_008F24_LAST_ARRAY(last_layer);
fmask_state[6] = 0;
fmask_state[7] = 0;
if (device->physical_device->rad_info.chip_class >= GFX9) {
fmask_state[3] |= S_008F1C_SW_MODE(image->surface.u.gfx9.fmask.swizzle_mode);
fmask_state[4] |= S_008F20_DEPTH(last_layer) |
S_008F20_PITCH_GFX9(image->surface.u.gfx9.fmask.epitch);
fmask_state[5] |= S_008F24_META_PIPE_ALIGNED(image->surface.u.gfx9.cmask.pipe_aligned) |
S_008F24_META_RB_ALIGNED(image->surface.u.gfx9.cmask.rb_aligned);
} else {
fmask_state[3] |= S_008F1C_TILING_INDEX(image->fmask.tile_mode_index);
fmask_state[4] |= S_008F20_DEPTH(depth - 1) |
S_008F20_PITCH_GFX6(image->fmask.pitch_in_pixels - 1);
fmask_state[5] |= S_008F24_LAST_ARRAY(last_layer);
}
} else if (fmask_state)
memset(fmask_state, 0, 8 * 4);
}
@@ -528,13 +411,13 @@ radv_query_opaque_metadata(struct radv_device *device,
si_make_texture_descriptor(device, image, true,
(VkImageViewType)image->type, image->vk_format,
&fixedmapping, 0, image->info.levels - 1, 0,
image->info.array_size,
image->info.width, image->info.height,
image->info.depth,
&fixedmapping, 0, image->levels - 1, 0,
image->array_size,
image->extent.width, image->extent.height,
image->extent.depth,
desc, NULL);
si_set_mutable_tex_desc_fields(device, image, &image->surface.u.legacy.level[0], 0, 0,
si_set_mutable_tex_desc_fields(device, image, &image->surface.level[0], 0, 0,
image->surface.blk_w, false, desc);
/* Clear the base address and set the relative DCC offset. */
@@ -546,10 +429,10 @@ radv_query_opaque_metadata(struct radv_device *device,
memcpy(&md->metadata[2], desc, sizeof(desc));
/* Dwords [10:..] contain the mipmap level offsets. */
for (i = 0; i <= image->info.levels - 1; i++)
md->metadata[10+i] = image->surface.u.legacy.level[i].offset >> 8;
for (i = 0; i <= image->levels - 1; i++)
md->metadata[10+i] = image->surface.level[i].offset >> 8;
md->size_metadata = (11 + image->info.levels - 1) * 4;
md->size_metadata = (11 + image->levels - 1) * 4;
}
void
@@ -560,23 +443,19 @@ radv_init_metadata(struct radv_device *device,
struct radeon_surf *surface = &image->surface;
memset(metadata, 0, sizeof(*metadata));
metadata->microtile = surface->level[0].mode >= RADEON_SURF_MODE_1D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->macrotile = surface->level[0].mode >= RADEON_SURF_MODE_2D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->pipe_config = surface->pipe_config;
metadata->bankw = surface->bankw;
metadata->bankh = surface->bankh;
metadata->tile_split = surface->tile_split;
metadata->mtilea = surface->mtilea;
metadata->num_banks = surface->num_banks;
metadata->stride = surface->level[0].pitch_bytes;
metadata->scanout = (surface->flags & RADEON_SURF_SCANOUT) != 0;
if (device->physical_device->rad_info.chip_class >= GFX9) {
metadata->u.gfx9.swizzle_mode = surface->u.gfx9.surf.swizzle_mode;
} else {
metadata->u.legacy.microtile = surface->u.legacy.level[0].mode >= RADEON_SURF_MODE_1D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->u.legacy.macrotile = surface->u.legacy.level[0].mode >= RADEON_SURF_MODE_2D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->u.legacy.pipe_config = surface->u.legacy.pipe_config;
metadata->u.legacy.bankw = surface->u.legacy.bankw;
metadata->u.legacy.bankh = surface->u.legacy.bankh;
metadata->u.legacy.tile_split = surface->u.legacy.tile_split;
metadata->u.legacy.mtilea = surface->u.legacy.mtilea;
metadata->u.legacy.num_banks = surface->u.legacy.num_banks;
metadata->u.legacy.stride = surface->u.legacy.level[0].nblk_x * surface->bpe;
metadata->u.legacy.scanout = (surface->flags & RADEON_SURF_SCANOUT) != 0;
}
radv_query_opaque_metadata(device, image, metadata);
}
@@ -588,20 +467,14 @@ radv_image_get_fmask_info(struct radv_device *device,
struct radv_fmask_info *out)
{
/* FMASK is allocated like an ordinary texture. */
struct radeon_surf fmask = {};
struct ac_surf_info info = image->info;
struct radeon_surf fmask = image->surface;
memset(out, 0, sizeof(*out));
if (device->physical_device->rad_info.chip_class >= GFX9) {
out->alignment = image->surface.u.gfx9.fmask_alignment;
out->size = image->surface.u.gfx9.fmask_size;
return;
}
fmask.blk_w = image->surface.blk_w;
fmask.blk_h = image->surface.blk_h;
info.samples = 1;
fmask.flags = image->surface.flags | RADEON_SURF_FMASK;
fmask.bo_alignment = 0;
fmask.bo_size = 0;
fmask.nsamples = 1;
fmask.flags |= RADEON_SURF_FMASK;
/* Force 2D tiling if it wasn't set. This may occur when creating
* FMASK for MSAA resolve on R6xx. On R6xx, the single-sample
@@ -609,6 +482,8 @@ radv_image_get_fmask_info(struct radv_device *device,
fmask.flags = RADEON_SURF_CLR(fmask.flags, MODE);
fmask.flags |= RADEON_SURF_SET(RADEON_SURF_MODE_2D, MODE);
fmask.flags |= RADEON_SURF_HAS_TILE_MODE_INDEX;
switch (nr_samples) {
case 2:
case 4:
@@ -621,25 +496,25 @@ radv_image_get_fmask_info(struct radv_device *device,
return;
}
device->ws->surface_init(device->ws, &info, &fmask);
assert(fmask.u.legacy.level[0].mode == RADEON_SURF_MODE_2D);
device->ws->surface_init(device->ws, &fmask);
assert(fmask.level[0].mode == RADEON_SURF_MODE_2D);
out->slice_tile_max = (fmask.u.legacy.level[0].nblk_x * fmask.u.legacy.level[0].nblk_y) / 64;
out->slice_tile_max = (fmask.level[0].nblk_x * fmask.level[0].nblk_y) / 64;
if (out->slice_tile_max)
out->slice_tile_max -= 1;
out->tile_mode_index = fmask.u.legacy.tiling_index[0];
out->pitch_in_pixels = fmask.u.legacy.level[0].nblk_x;
out->bank_height = fmask.u.legacy.bankh;
out->alignment = MAX2(256, fmask.surf_alignment);
out->size = fmask.surf_size;
out->tile_mode_index = fmask.tiling_index[0];
out->pitch_in_pixels = fmask.level[0].nblk_x;
out->bank_height = fmask.bankh;
out->alignment = MAX2(256, fmask.bo_alignment);
out->size = fmask.bo_size;
}
static void
radv_image_alloc_fmask(struct radv_device *device,
struct radv_image *image)
{
radv_image_get_fmask_info(device, image, image->info.samples, &image->fmask);
radv_image_get_fmask_info(device, image, image->samples, &image->fmask);
image->fmask.offset = align64(image->size, image->fmask.alignment);
image->size = image->fmask.offset + image->fmask.size;
@@ -655,12 +530,6 @@ radv_image_get_cmask_info(struct radv_device *device,
unsigned num_pipes = device->physical_device->rad_info.num_tile_pipes;
unsigned cl_width, cl_height;
if (device->physical_device->rad_info.chip_class >= GFX9) {
out->alignment = image->surface.u.gfx9.cmask_alignment;
out->size = image->surface.u.gfx9.cmask_size;
return;
}
switch (num_pipes) {
case 2:
cl_width = 32;
@@ -685,8 +554,8 @@ radv_image_get_cmask_info(struct radv_device *device,
unsigned base_align = num_pipes * pipe_interleave_bytes;
unsigned width = align(image->info.width, cl_width*8);
unsigned height = align(image->info.height, cl_height*8);
unsigned width = align(image->surface.npix_x, cl_width*8);
unsigned height = align(image->surface.npix_y, cl_height*8);
unsigned slice_elements = (width * height) / (8*8);
/* Each element of CMASK is a nibble. */
@@ -697,7 +566,7 @@ radv_image_get_cmask_info(struct radv_device *device,
out->slice_tile_max -= 1;
out->alignment = MAX2(256, base_align);
out->size = (image->type == VK_IMAGE_TYPE_3D ? image->info.depth : image->info.array_size) *
out->size = (image->type == VK_IMAGE_TYPE_3D ? image->extent.depth : image->array_size) *
align(slice_bytes, base_align);
}
@@ -729,7 +598,7 @@ static void
radv_image_alloc_htile(struct radv_device *device,
struct radv_image *image)
{
if ((device->debug_flags & RADV_DEBUG_NO_HIZ) || image->info.levels > 1) {
if ((device->debug_flags & RADV_DEBUG_NO_HIZ) || image->levels > 1) {
image->surface.htile_size = 0;
return;
}
@@ -768,14 +637,11 @@ radv_image_create(VkDevice _device,
memset(image, 0, sizeof(*image));
image->type = pCreateInfo->imageType;
image->info.width = pCreateInfo->extent.width;
image->info.height = pCreateInfo->extent.height;
image->info.depth = pCreateInfo->extent.depth;
image->info.samples = pCreateInfo->samples;
image->info.array_size = pCreateInfo->arrayLayers;
image->info.levels = pCreateInfo->mipLevels;
image->extent = pCreateInfo->extent;
image->vk_format = pCreateInfo->format;
image->levels = pCreateInfo->mipLevels;
image->array_size = pCreateInfo->arrayLayers;
image->samples = pCreateInfo->samples;
image->tiling = pCreateInfo->tiling;
image->usage = pCreateInfo->usage;
image->flags = pCreateInfo->flags;
@@ -783,18 +649,15 @@ radv_image_create(VkDevice _device,
image->exclusive = pCreateInfo->sharingMode == VK_SHARING_MODE_EXCLUSIVE;
if (pCreateInfo->sharingMode == VK_SHARING_MODE_CONCURRENT) {
for (uint32_t i = 0; i < pCreateInfo->queueFamilyIndexCount; ++i)
if (pCreateInfo->pQueueFamilyIndices[i] == VK_QUEUE_FAMILY_EXTERNAL_KHX)
image->queue_family_mask |= (1u << RADV_MAX_QUEUE_FAMILIES) - 1u;
else
image->queue_family_mask |= 1u << pCreateInfo->pQueueFamilyIndices[i];
image->queue_family_mask |= 1u << pCreateInfo->pQueueFamilyIndices[i];
}
radv_init_surface(device, &image->surface, create_info);
device->ws->surface_init(device->ws, &image->info, &image->surface);
device->ws->surface_init(device->ws, &image->surface);
image->size = image->surface.surf_size;
image->alignment = image->surface.surf_alignment;
image->size = image->surface.bo_size;
image->alignment = image->surface.bo_alignment;
if (image->exclusive || image->queue_family_mask == 1)
can_cmask_dcc = true;
@@ -807,15 +670,22 @@ radv_image_create(VkDevice _device,
if ((pCreateInfo->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) &&
pCreateInfo->mipLevels == 1 &&
!image->surface.dcc_size && image->info.depth == 1 && can_cmask_dcc)
!image->surface.dcc_size && image->extent.depth == 1 && can_cmask_dcc)
radv_image_alloc_cmask(device, image);
if (image->info.samples > 1 && vk_format_is_color(pCreateInfo->format)) {
if (image->samples > 1 && vk_format_is_color(pCreateInfo->format)) {
radv_image_alloc_fmask(device, image);
} else if (vk_format_is_depth(pCreateInfo->format)) {
radv_image_alloc_htile(device, image);
}
if (create_info->stride && create_info->stride != image->surface.level[0].pitch_bytes) {
image->surface.level[0].nblk_x = create_info->stride / image->surface.bpe;
image->surface.level[0].pitch_bytes = create_info->stride;
image->surface.level[0].slice_size = create_info->stride * image->surface.level[0].nblk_y;
}
if (pCreateInfo->flags & VK_IMAGE_CREATE_SPARSE_BINDING_BIT) {
image->alignment = MAX2(image->alignment, 4096);
image->size = align64(image->size, image->alignment);
@@ -837,7 +707,9 @@ radv_image_create(VkDevice _device,
void
radv_image_view_init(struct radv_image_view *iview,
struct radv_device *device,
const VkImageViewCreateInfo* pCreateInfo)
const VkImageViewCreateInfo* pCreateInfo,
struct radv_cmd_buffer *cmd_buffer,
VkImageUsageFlags usage_mask)
{
RADV_FROM_HANDLE(radv_image, image, pCreateInfo->image);
const VkImageSubresourceRange *range = &pCreateInfo->subresourceRange;
@@ -846,11 +718,11 @@ radv_image_view_init(struct radv_image_view *iview,
switch (image->type) {
case VK_IMAGE_TYPE_1D:
case VK_IMAGE_TYPE_2D:
assert(range->baseArrayLayer + radv_get_layerCount(image, range) - 1 <= image->info.array_size);
assert(range->baseArrayLayer + radv_get_layerCount(image, range) - 1 <= image->array_size);
break;
case VK_IMAGE_TYPE_3D:
assert(range->baseArrayLayer + radv_get_layerCount(image, range) - 1
<= radv_minify(image->info.depth, range->baseMipLevel));
<= radv_minify(image->extent.depth, range->baseMipLevel));
break;
default:
unreachable("bad VkImageType");
@@ -869,9 +741,9 @@ radv_image_view_init(struct radv_image_view *iview,
}
iview->extent = (VkExtent3D) {
.width = radv_minify(image->info.width , range->baseMipLevel),
.height = radv_minify(image->info.height, range->baseMipLevel),
.depth = radv_minify(image->info.depth , range->baseMipLevel),
.width = radv_minify(image->extent.width , range->baseMipLevel),
.height = radv_minify(image->extent.height, range->baseMipLevel),
.depth = radv_minify(image->extent.depth , range->baseMipLevel),
};
iview->extent.width = round_up_u32(iview->extent.width * vk_format_get_blockwidth(iview->vk_format),
@@ -898,31 +770,91 @@ radv_image_view_init(struct radv_image_view *iview,
iview->descriptor,
iview->fmask_descriptor);
si_set_mutable_tex_desc_fields(device, image,
is_stencil ? &image->surface.u.legacy.stencil_level[range->baseMipLevel]
: &image->surface.u.legacy.level[range->baseMipLevel],
range->baseMipLevel,
is_stencil ? &image->surface.stencil_level[range->baseMipLevel] : &image->surface.level[range->baseMipLevel], range->baseMipLevel,
range->baseMipLevel,
blk_w, is_stencil, iview->descriptor);
}
bool radv_layout_has_htile(const struct radv_image *image,
VkImageLayout layout,
unsigned queue_mask)
void radv_image_set_optimal_micro_tile_mode(struct radv_device *device,
struct radv_image *image, uint32_t micro_tile_mode)
{
return image->surface.htile_size &&
(layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) &&
queue_mask == (1u << RADV_QUEUE_GENERAL);
/* These magic numbers were copied from addrlib. It doesn't use any
* definitions for them either. They are all 2D_TILED_THIN1 modes with
* different bpp and micro tile mode.
*/
if (device->physical_device->rad_info.chip_class >= CIK) {
switch (micro_tile_mode) {
case 0: /* displayable */
image->surface.tiling_index[0] = 10;
break;
case 1: /* thin */
image->surface.tiling_index[0] = 14;
break;
case 3: /* rotated */
image->surface.tiling_index[0] = 28;
break;
default: /* depth, thick */
assert(!"unexpected micro mode");
return;
}
} else { /* SI */
switch (micro_tile_mode) {
case 0: /* displayable */
switch (image->surface.bpe) {
case 1:
image->surface.tiling_index[0] = 10;
break;
case 2:
image->surface.tiling_index[0] = 11;
break;
default: /* 4, 8 */
image->surface.tiling_index[0] = 12;
break;
}
break;
case 1: /* thin */
switch (image->surface.bpe) {
case 1:
image->surface.tiling_index[0] = 14;
break;
case 2:
image->surface.tiling_index[0] = 15;
break;
case 4:
image->surface.tiling_index[0] = 16;
break;
default: /* 8, 16 */
image->surface.tiling_index[0] = 17;
break;
}
break;
default: /* depth, thick */
assert(!"unexpected micro mode");
return;
}
}
image->surface.micro_tile_mode = micro_tile_mode;
}
bool radv_layout_has_htile(const struct radv_image *image,
VkImageLayout layout)
{
return (layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
}
bool radv_layout_is_htile_compressed(const struct radv_image *image,
VkImageLayout layout,
unsigned queue_mask)
VkImageLayout layout)
{
return image->surface.htile_size &&
(layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) &&
queue_mask == (1u << RADV_QUEUE_GENERAL);
return layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
}
bool radv_layout_can_expclear(const struct radv_image *image,
VkImageLayout layout)
{
return (layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
}
bool radv_layout_can_fast_clear(const struct radv_image *image,
@@ -938,8 +870,6 @@ unsigned radv_image_queue_family_mask(const struct radv_image *image, uint32_t f
{
if (!image->exclusive)
return image->queue_family_mask;
if (family == VK_QUEUE_FAMILY_EXTERNAL_KHX)
return (1u << RADV_MAX_QUEUE_FAMILIES) - 1u;
if (family == VK_QUEUE_FAMILY_IGNORED)
return 1u << queue_family;
return 1u << family;
@@ -985,15 +915,14 @@ void radv_GetImageSubresourceLayout(
RADV_FROM_HANDLE(radv_image, image, _image);
int level = pSubresource->mipLevel;
int layer = pSubresource->arrayLayer;
struct radeon_surf *surface = &image->surface;
pLayout->offset = surface->u.legacy.level[level].offset + surface->u.legacy.level[level].slice_size * layer;
pLayout->rowPitch = surface->u.legacy.level[level].nblk_x * surface->bpe;
pLayout->arrayPitch = surface->u.legacy.level[level].slice_size;
pLayout->depthPitch = surface->u.legacy.level[level].slice_size;
pLayout->size = surface->u.legacy.level[level].slice_size;
pLayout->offset = image->surface.level[level].offset + image->surface.level[level].slice_size * layer;
pLayout->rowPitch = image->surface.level[level].pitch_bytes;
pLayout->arrayPitch = image->surface.level[level].slice_size;
pLayout->depthPitch = image->surface.level[level].slice_size;
pLayout->size = image->surface.level[level].slice_size;
if (image->type == VK_IMAGE_TYPE_3D)
pLayout->size *= u_minify(image->info.depth, level);
pLayout->size *= image->surface.level[level].nblk_z;
}
@@ -1011,7 +940,7 @@ radv_CreateImageView(VkDevice _device,
if (view == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
radv_image_view_init(view, device, pCreateInfo);
radv_image_view_init(view, device, pCreateInfo, NULL, ~0);
*pView = radv_image_view_to_handle(view);

View File

@@ -30,20 +30,21 @@
#include <pwd.h>
#include <sys/stat.h>
static void
radv_meta_save_novertex(struct radv_meta_saved_state *state,
void
radv_meta_save(struct radv_meta_saved_state *state,
const struct radv_cmd_buffer *cmd_buffer,
uint32_t dynamic_mask)
{
state->old_pipeline = cmd_buffer->state.pipeline;
state->old_descriptor_set0 = cmd_buffer->state.descriptors[0];
memcpy(state->old_vertex_bindings, cmd_buffer->state.vertex_bindings,
sizeof(state->old_vertex_bindings));
state->dynamic_mask = dynamic_mask;
radv_dynamic_state_copy(&state->dynamic, &cmd_buffer->state.dynamic,
dynamic_mask);
memcpy(state->push_constants, cmd_buffer->push_constants, MAX_PUSH_CONSTANTS_SIZE);
state->vertex_saved = false;
}
void
@@ -52,13 +53,12 @@ radv_meta_restore(const struct radv_meta_saved_state *state,
{
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer), VK_PIPELINE_BIND_POINT_GRAPHICS,
radv_pipeline_to_handle(state->old_pipeline));
cmd_buffer->state.descriptors[0] = state->old_descriptor_set0;
if (state->vertex_saved) {
memcpy(cmd_buffer->state.vertex_bindings, state->old_vertex_bindings,
sizeof(state->old_vertex_bindings));
cmd_buffer->state.vb_dirty |= (1 << RADV_META_VERTEX_BINDING_COUNT) - 1;
}
cmd_buffer->state.descriptors[0] = state->old_descriptor_set0;
memcpy(cmd_buffer->state.vertex_bindings, state->old_vertex_bindings,
sizeof(state->old_vertex_bindings));
cmd_buffer->state.vb_dirty |= (1 << RADV_META_VERTEX_BINDING_COUNT) - 1;
cmd_buffer->state.dirty |= RADV_CMD_DIRTY_PIPELINE;
radv_dynamic_state_copy(&cmd_buffer->state.dynamic, &state->dynamic,
@@ -338,14 +338,8 @@ radv_device_init_meta(struct radv_device *device)
result = radv_device_init_meta_resolve_compute_state(device);
if (result != VK_SUCCESS)
goto fail_resolve_compute;
result = radv_device_init_meta_resolve_fragment_state(device);
if (result != VK_SUCCESS)
goto fail_resolve_fragment;
return VK_SUCCESS;
fail_resolve_fragment:
radv_device_finish_meta_resolve_compute_state(device);
fail_resolve_compute:
radv_device_finish_meta_fast_clear_flush_state(device);
fail_fast_clear:
@@ -382,7 +376,6 @@ radv_device_finish_meta(struct radv_device *device)
radv_device_finish_meta_buffer_state(device);
radv_device_finish_meta_fast_clear_flush_state(device);
radv_device_finish_meta_resolve_compute_state(device);
radv_device_finish_meta_resolve_fragment_state(device);
radv_store_meta_pipeline(device);
radv_pipeline_cache_finish(&device->meta_state.cache);
@@ -394,212 +387,12 @@ radv_device_finish_meta(struct radv_device *device)
* should have no effect.
*/
void
radv_meta_save_graphics_reset_vport_scissor_novertex(struct radv_meta_saved_state *saved_state,
struct radv_cmd_buffer *cmd_buffer)
radv_meta_save_graphics_reset_vport_scissor(struct radv_meta_saved_state *saved_state,
struct radv_cmd_buffer *cmd_buffer)
{
uint32_t dirty_state = (1 << VK_DYNAMIC_STATE_VIEWPORT) | (1 << VK_DYNAMIC_STATE_SCISSOR);
radv_meta_save_novertex(saved_state, cmd_buffer, dirty_state);
radv_meta_save(saved_state, cmd_buffer, dirty_state);
cmd_buffer->state.dynamic.viewport.count = 0;
cmd_buffer->state.dynamic.scissor.count = 0;
cmd_buffer->state.dirty |= dirty_state;
}
nir_ssa_def *radv_meta_gen_rect_vertices_comp2(nir_builder *vs_b, nir_ssa_def *comp2)
{
nir_intrinsic_instr *vertex_id = nir_intrinsic_instr_create(vs_b->shader, nir_intrinsic_load_vertex_id_zero_base);
nir_ssa_dest_init(&vertex_id->instr, &vertex_id->dest, 1, 32, "vertexid");
nir_builder_instr_insert(vs_b, &vertex_id->instr);
/* vertex 0 - -1.0, -1.0 */
/* vertex 1 - -1.0, 1.0 */
/* vertex 2 - 1.0, -1.0 */
/* so channel 0 is vertex_id != 2 ? -1.0 : 1.0
channel 1 is vertex id != 1 ? -1.0 : 1.0 */
nir_ssa_def *c0cmp = nir_ine(vs_b, &vertex_id->dest.ssa,
nir_imm_int(vs_b, 2));
nir_ssa_def *c1cmp = nir_ine(vs_b, &vertex_id->dest.ssa,
nir_imm_int(vs_b, 1));
nir_ssa_def *comp[4];
comp[0] = nir_bcsel(vs_b, c0cmp,
nir_imm_float(vs_b, -1.0),
nir_imm_float(vs_b, 1.0));
comp[1] = nir_bcsel(vs_b, c1cmp,
nir_imm_float(vs_b, -1.0),
nir_imm_float(vs_b, 1.0));
comp[2] = comp2;
comp[3] = nir_imm_float(vs_b, 1.0);
nir_ssa_def *outvec = nir_vec(vs_b, comp, 4);
return outvec;
}
nir_ssa_def *radv_meta_gen_rect_vertices(nir_builder *vs_b)
{
return radv_meta_gen_rect_vertices_comp2(vs_b, nir_imm_float(vs_b, 0.0));
}
/* vertex shader that generates vertices */
nir_shader *
radv_meta_build_nir_vs_generate_vertices(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_vs_gen_verts");
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_store_var(&b, v_position, outvec, 0xf);
return b.shader;
}
nir_shader *
radv_meta_build_nir_fs_noop(void)
{
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_asprintf(b.shader,
"meta_noop_fs");
return b.shader;
}
static nir_ssa_def *radv_meta_build_resolve_srgb_conversion(nir_builder *b,
nir_ssa_def *input)
{
nir_const_value v;
unsigned i;
v.u32[0] = 0x3b4d2e1c; // 0.00313080009
nir_ssa_def *cmp[3];
for (i = 0; i < 3; i++)
cmp[i] = nir_flt(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *ltvals[3];
v.f32[0] = 12.92;
for (i = 0; i < 3; i++)
ltvals[i] = nir_fmul(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *gtvals[3];
for (i = 0; i < 3; i++) {
v.f32[0] = 1.0/2.4;
gtvals[i] = nir_fpow(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
v.f32[0] = 1.055;
gtvals[i] = nir_fmul(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
v.f32[0] = 0.055;
gtvals[i] = nir_fsub(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
}
nir_ssa_def *comp[4];
for (i = 0; i < 3; i++)
comp[i] = nir_bcsel(b, cmp[i], ltvals[i], gtvals[i]);
comp[3] = nir_channels(b, input, 3);
return nir_vec(b, comp, 4);
}
void radv_meta_build_resolve_shader_core(nir_builder *b,
bool is_integer,
bool is_srgb,
int samples,
nir_variable *input_img,
nir_variable *color,
nir_ssa_def *img_coord)
{
/* do a txf_ms on each sample */
nir_ssa_def *tmp;
nir_if *outer_if = NULL;
nir_tex_instr *tex = nir_tex_instr_create(b->shader, 2);
tex->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex->op = nir_texop_txf_ms;
tex->src[0].src_type = nir_tex_src_coord;
tex->src[0].src = nir_src_for_ssa(img_coord);
tex->src[1].src_type = nir_tex_src_ms_index;
tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, 0));
tex->dest_type = nir_type_float;
tex->is_array = false;
tex->coord_components = 2;
tex->texture = nir_deref_var_create(tex, input_img);
tex->sampler = NULL;
nir_ssa_dest_init(&tex->instr, &tex->dest, 4, 32, "tex");
nir_builder_instr_insert(b, &tex->instr);
tmp = &tex->dest.ssa;
if (!is_integer && samples > 1) {
nir_tex_instr *tex_all_same = nir_tex_instr_create(b->shader, 1);
tex_all_same->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_all_same->op = nir_texop_samples_identical;
tex_all_same->src[0].src_type = nir_tex_src_coord;
tex_all_same->src[0].src = nir_src_for_ssa(img_coord);
tex_all_same->dest_type = nir_type_float;
tex_all_same->is_array = false;
tex_all_same->coord_components = 2;
tex_all_same->texture = nir_deref_var_create(tex_all_same, input_img);
tex_all_same->sampler = NULL;
nir_ssa_dest_init(&tex_all_same->instr, &tex_all_same->dest, 1, 32, "tex");
nir_builder_instr_insert(b, &tex_all_same->instr);
nir_ssa_def *all_same = nir_ine(b, &tex_all_same->dest.ssa, nir_imm_int(b, 0));
nir_if *if_stmt = nir_if_create(b->shader);
if_stmt->condition = nir_src_for_ssa(all_same);
nir_cf_node_insert(b->cursor, &if_stmt->cf_node);
b->cursor = nir_after_cf_list(&if_stmt->then_list);
for (int i = 1; i < samples; i++) {
nir_tex_instr *tex_add = nir_tex_instr_create(b->shader, 2);
tex_add->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_add->op = nir_texop_txf_ms;
tex_add->src[0].src_type = nir_tex_src_coord;
tex_add->src[0].src = nir_src_for_ssa(img_coord);
tex_add->src[1].src_type = nir_tex_src_ms_index;
tex_add->src[1].src = nir_src_for_ssa(nir_imm_int(b, i));
tex_add->dest_type = nir_type_float;
tex_add->is_array = false;
tex_add->coord_components = 2;
tex_add->texture = nir_deref_var_create(tex_add, input_img);
tex_add->sampler = NULL;
nir_ssa_dest_init(&tex_add->instr, &tex_add->dest, 4, 32, "tex");
nir_builder_instr_insert(b, &tex_add->instr);
tmp = nir_fadd(b, tmp, &tex_add->dest.ssa);
}
tmp = nir_fdiv(b, tmp, nir_imm_float(b, samples));
nir_store_var(b, color, tmp, 0xf);
b->cursor = nir_after_cf_list(&if_stmt->else_list);
outer_if = if_stmt;
}
nir_store_var(b, color, &tex->dest.ssa, 0xf);
if (outer_if)
b->cursor = nir_after_cf_node(&outer_if->cf_node);
if (is_srgb) {
nir_ssa_def *newv = nir_load_var(b, color);
newv = radv_meta_build_resolve_srgb_conversion(b, newv);
nir_store_var(b, color, newv, 0xf);
}
}

View File

@@ -35,7 +35,6 @@ extern "C" {
#define RADV_META_VERTEX_BINDING_COUNT 2
struct radv_meta_saved_state {
bool vertex_saved;
struct radv_vertex_binding old_vertex_bindings[RADV_META_VERTEX_BINDING_COUNT];
struct radv_descriptor_set *old_descriptor_set0;
struct radv_pipeline *old_pipeline;
@@ -91,9 +90,9 @@ void radv_device_finish_meta_query_state(struct radv_device *device);
VkResult radv_device_init_meta_resolve_compute_state(struct radv_device *device);
void radv_device_finish_meta_resolve_compute_state(struct radv_device *device);
VkResult radv_device_init_meta_resolve_fragment_state(struct radv_device *device);
void radv_device_finish_meta_resolve_fragment_state(struct radv_device *device);
void radv_meta_save(struct radv_meta_saved_state *state,
const struct radv_cmd_buffer *cmd_buffer,
uint32_t dynamic_mask);
void radv_meta_restore(const struct radv_meta_saved_state *state,
struct radv_cmd_buffer *cmd_buffer);
@@ -201,8 +200,8 @@ void radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
const VkImageSubresourceRange *subresourceRange);
void radv_meta_save_graphics_reset_vport_scissor_novertex(struct radv_meta_saved_state *saved_state,
struct radv_cmd_buffer *cmd_buffer);
void radv_meta_save_graphics_reset_vport_scissor(struct radv_meta_saved_state *saved_state,
struct radv_cmd_buffer *cmd_buffer);
void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image,
@@ -212,33 +211,9 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
uint32_t region_count,
const VkImageResolve *regions);
void radv_meta_resolve_fragment_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image,
VkImageLayout src_image_layout,
struct radv_image *dest_image,
VkImageLayout dest_image_layout,
uint32_t region_count,
const VkImageResolve *regions);
void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
struct radv_image *linear_image);
/* common nir builder helpers */
#include "nir/nir_builder.h"
nir_ssa_def *radv_meta_gen_rect_vertices(nir_builder *vs_b);
nir_ssa_def *radv_meta_gen_rect_vertices_comp2(nir_builder *vs_b, nir_ssa_def *comp2);
nir_shader *radv_meta_build_nir_vs_generate_vertices(void);
nir_shader *radv_meta_build_nir_fs_noop(void);
void radv_meta_build_resolve_shader_core(nir_builder *b,
bool is_integer,
bool is_srgb,
int samples,
nir_variable *input_img,
nir_variable *color,
nir_ssa_def *img_coord);
#ifdef __cplusplus
}
#endif

View File

@@ -38,64 +38,25 @@ build_nir_vertex_shader(void)
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_blit_vs");
b.shader->info->name = ralloc_strdup(b.shader, "meta_blit_vs");
nir_variable *pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "a_pos");
pos_in->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "gl_Position");
pos_out->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, pos_out, pos_in);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "a_tex_pos");
tex_pos_in->data.location = VERT_ATTRIB_GENERIC1;
nir_variable *tex_pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "v_tex_pos");
tex_pos_out->data.location = VARYING_SLOT_VAR0;
tex_pos_out->data.interpolation = INTERP_MODE_SMOOTH;
nir_copy_var(&b, tex_pos_out, tex_pos_in);
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
nir_store_var(&b, pos_out, outvec, 0xf);
nir_intrinsic_instr *src_box = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
src_box->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
nir_intrinsic_set_base(src_box, 0);
nir_intrinsic_set_range(src_box, 16);
src_box->num_components = 4;
nir_ssa_dest_init(&src_box->instr, &src_box->dest, 4, 32, "src_box");
nir_builder_instr_insert(&b, &src_box->instr);
nir_intrinsic_instr *src0_z = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
src0_z->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
nir_intrinsic_set_base(src0_z, 16);
nir_intrinsic_set_range(src0_z, 4);
src0_z->num_components = 1;
nir_ssa_dest_init(&src0_z->instr, &src0_z->dest, 1, 32, "src0_z");
nir_builder_instr_insert(&b, &src0_z->instr);
nir_intrinsic_instr *vertex_id = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_vertex_id_zero_base);
nir_ssa_dest_init(&vertex_id->instr, &vertex_id->dest, 1, 32, "vertexid");
nir_builder_instr_insert(&b, &vertex_id->instr);
/* vertex 0 - src0_x, src0_y, src0_z */
/* vertex 1 - src0_x, src1_y, src0_z*/
/* vertex 2 - src1_x, src0_y, src0_z */
/* so channel 0 is vertex_id != 2 ? src_x : src_x + w
channel 1 is vertex id != 1 ? src_y : src_y + w */
nir_ssa_def *c0cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 2));
nir_ssa_def *c1cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 1));
nir_ssa_def *comp[4];
comp[0] = nir_bcsel(&b, c0cmp,
nir_channel(&b, &src_box->dest.ssa, 0),
nir_channel(&b, &src_box->dest.ssa, 2));
comp[1] = nir_bcsel(&b, c1cmp,
nir_channel(&b, &src_box->dest.ssa, 1),
nir_channel(&b, &src_box->dest.ssa, 3));
comp[2] = &src0_z->dest.ssa;
comp[3] = nir_imm_float(&b, 1.0);
nir_ssa_def *out_tex_vec = nir_vec(&b, comp, 4);
nir_store_var(&b, tex_pos_out, out_tex_vec, 0xf);
return b.shader;
}
@@ -109,7 +70,7 @@ build_nir_copy_fragment_shader(enum glsl_sampler_dim tex_dim)
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
sprintf(shader_name, "meta_blit_fs.%d", tex_dim);
b.shader->info.name = ralloc_strdup(b.shader, shader_name);
b.shader->info->name = ralloc_strdup(b.shader, shader_name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "v_tex_pos");
@@ -163,7 +124,7 @@ build_nir_copy_fragment_shader_depth(enum glsl_sampler_dim tex_dim)
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
sprintf(shader_name, "meta_blit_depth_fs.%d", tex_dim);
b.shader->info.name = ralloc_strdup(b.shader, shader_name);
b.shader->info->name = ralloc_strdup(b.shader, shader_name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "v_tex_pos");
@@ -217,7 +178,7 @@ build_nir_copy_fragment_shader_stencil(enum glsl_sampler_dim tex_dim)
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
sprintf(shader_name, "meta_blit_stencil_fs.%d", tex_dim);
b.shader->info.name = ralloc_strdup(b.shader, shader_name);
b.shader->info->name = ralloc_strdup(b.shader, shader_name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "v_tex_pos");
@@ -275,21 +236,65 @@ meta_emit_blit(struct radv_cmd_buffer *cmd_buffer,
VkFilter blit_filter)
{
struct radv_device *device = cmd_buffer->device;
unsigned offset = 0;
struct blit_vb_data {
float pos[2];
float tex_coord[3];
} vb_data[3];
assert(src_image->info.samples == dest_image->info.samples);
float vertex_push_constants[5] = {
(float)src_offset_0.x / (float)src_iview->extent.width,
(float)src_offset_0.y / (float)src_iview->extent.height,
(float)src_offset_1.x / (float)src_iview->extent.width,
(float)src_offset_1.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
assert(src_image->samples == dest_image->samples);
unsigned vb_size = 3 * sizeof(*vb_data);
vb_data[0] = (struct blit_vb_data) {
.pos = {
-1.0,
-1.0,
},
.tex_coord = {
(float)src_offset_0.x / (float)src_iview->extent.width,
(float)src_offset_0.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
},
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit.pipeline_layout,
VK_SHADER_STAGE_VERTEX_BIT, 0, 20,
vertex_push_constants);
vb_data[1] = (struct blit_vb_data) {
.pos = {
-1.0,
1.0,
},
.tex_coord = {
(float)src_offset_0.x / (float)src_iview->extent.width,
(float)src_offset_1.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
},
};
vb_data[2] = (struct blit_vb_data) {
.pos = {
1.0,
-1.0,
},
.tex_coord = {
(float)src_offset_1.x / (float)src_iview->extent.width,
(float)src_offset_0.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
},
};
radv_cmd_buffer_upload_data(cmd_buffer, vb_size, 16, vb_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = vb_size,
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
radv_CmdBindVertexBuffers(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1,
(VkBuffer[]) {
radv_buffer_to_handle(&vertex_buffer)
},
(VkDeviceSize[]) {
0,
});
VkSampler sampler;
radv_CreateSampler(radv_device_to_handle(device),
@@ -504,10 +509,10 @@ void radv_CmdBlitImage(
* vkCmdBlitImage must not be used for multisampled source or
* destination images. Use vkCmdResolveImage for this purpose.
*/
assert(src_image->info.samples == 1);
assert(dest_image->info.samples == 1);
assert(src_image->samples == 1);
assert(dest_image->samples == 1);
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
for (unsigned r = 0; r < regionCount; r++) {
const VkImageSubresourceLayers *src_res = &pRegions[r].srcSubresource;
@@ -526,7 +531,8 @@ void radv_CmdBlitImage(
.baseArrayLayer = src_res->baseArrayLayer,
.layerCount = 1
},
});
},
cmd_buffer, VK_IMAGE_USAGE_SAMPLED_BIT);
unsigned dst_start, dst_end;
if (dest_image->type == VK_IMAGE_TYPE_3D) {
@@ -574,6 +580,12 @@ void radv_CmdBlitImage(
dest_box.extent.height = abs(dst_y1 - dst_y0);
struct radv_image_view dest_iview;
unsigned usage;
if (dst_res->aspectMask == VK_IMAGE_ASPECT_COLOR_BIT)
usage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
else
usage = VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
const unsigned num_layers = dst_end - dst_start;
for (unsigned i = 0; i < num_layers; i++) {
const VkOffset3D dest_offset_0 = {
@@ -613,7 +625,8 @@ void radv_CmdBlitImage(
.baseArrayLayer = dest_array_slice,
.layerCount = 1
},
});
},
cmd_buffer, usage);
meta_emit_blit(cmd_buffer,
src_image, &src_iview,
src_offset_0, src_offset_1,
@@ -752,8 +765,31 @@ radv_device_init_meta_blit_color(struct radv_device *device,
VkPipelineVertexInputStateCreateInfo vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = 5 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = 8
}
}
};
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
@@ -908,8 +944,31 @@ radv_device_init_meta_blit_depth(struct radv_device *device,
VkPipelineVertexInputStateCreateInfo vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = 5 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = 8
}
}
};
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
@@ -1066,8 +1125,31 @@ radv_device_init_meta_blit_stencil(struct radv_device *device,
VkPipelineVertexInputStateCreateInfo vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = 5 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = 8
}
}
};
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
@@ -1226,15 +1308,11 @@ radv_device_init_meta_blit_state(struct radv_device *device)
if (result != VK_SUCCESS)
goto fail;
const VkPushConstantRange push_constant_range = {VK_SHADER_STAGE_VERTEX_BIT, 0, 20};
result = radv_CreatePipelineLayout(radv_device_to_handle(device),
&(VkPipelineLayoutCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1,
.pSetLayouts = &device->meta_state.blit.ds_layout,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &push_constant_range,
},
&device->meta_state.alloc, &device->meta_state.blit.pipeline_layout);
if (result != VK_SUCCESS)
@@ -1251,10 +1329,12 @@ radv_device_init_meta_blit_state(struct radv_device *device)
goto fail;
result = radv_device_init_meta_blit_stencil(device, &vs);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
fail:
ralloc_free(vs.nir);
if (result != VK_SUCCESS)
radv_device_finish_meta_blit_state(device);
radv_device_finish_meta_blit_state(device);
return result;
}

View File

@@ -53,6 +53,7 @@ enum blit2d_src_type {
static void
create_iview(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_surf *surf,
VkImageUsageFlags usage,
struct radv_image_view *iview, VkFormat depth_format)
{
VkFormat format;
@@ -75,7 +76,7 @@ create_iview(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = surf->layer,
.layerCount = 1
},
});
}, cmd_buffer, usage);
}
static void
@@ -135,10 +136,11 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type],
VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4,
VK_SHADER_STAGE_FRAGMENT_BIT, 0, 4,
&src_buf->pitch);
} else {
create_iview(cmd_buffer, src_img, &tmp->iview, depth_format);
create_iview(cmd_buffer, src_img, VK_IMAGE_USAGE_SAMPLED_BIT, &tmp->iview,
depth_format);
radv_meta_push_descriptor_set(cmd_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
device->meta_state.blit2d.p_layouts[src_type],
@@ -177,7 +179,15 @@ blit2d_bind_dst(struct radv_cmd_buffer *cmd_buffer,
VkFormat depth_format,
struct blit2d_dst_temps *tmp)
{
create_iview(cmd_buffer, dst, &tmp->iview, depth_format);
VkImageUsageFlagBits bits;
if (dst->aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT)
bits = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
else
bits = VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
create_iview(cmd_buffer, dst, bits,
&tmp->iview, depth_format);
radv_CreateFramebuffer(radv_device_to_handle(cmd_buffer->device),
&(VkFramebufferCreateInfo) {
@@ -258,21 +268,69 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
struct blit2d_src_temps src_temps;
blit2d_bind_src(cmd_buffer, src_img, src_buf, &src_temps, src_type, depth_format);
uint32_t offset = 0;
struct blit2d_dst_temps dst_temps;
blit2d_bind_dst(cmd_buffer, dst, rects[r].dst_x + rects[r].width,
rects[r].dst_y + rects[r].height, depth_format, &dst_temps);
float vertex_push_constants[4] = {
rects[r].src_x,
rects[r].src_y,
rects[r].src_x + rects[r].width,
rects[r].src_y + rects[r].height,
struct blit_vb_data {
float pos[2];
float tex_coord[2];
} vb_data[3];
unsigned vb_size = 3 * sizeof(*vb_data);
vb_data[0] = (struct blit_vb_data) {
.pos = {
-1.0,
-1.0,
},
.tex_coord = {
rects[r].src_x,
rects[r].src_y,
},
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type],
VK_SHADER_STAGE_VERTEX_BIT, 0, 16,
vertex_push_constants);
vb_data[1] = (struct blit_vb_data) {
.pos = {
-1.0,
1.0,
},
.tex_coord = {
rects[r].src_x,
rects[r].src_y + rects[r].height,
},
};
vb_data[2] = (struct blit_vb_data) {
.pos = {
1.0,
-1.0,
},
.tex_coord = {
rects[r].src_x + rects[r].width,
rects[r].src_y,
},
};
radv_cmd_buffer_upload_data(cmd_buffer, vb_size, 16, vb_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = vb_size,
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
radv_CmdBindVertexBuffers(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1,
(VkBuffer[]) {
radv_buffer_to_handle(&vertex_buffer),
},
(VkDeviceSize[]) {
0,
});
if (dst->aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT) {
unsigned fs_key = radv_format_meta_fs_key(dst_temps.iview.vk_format);
@@ -375,53 +433,25 @@ build_nir_vertex_shader(void)
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_blit2d_vs");
b.shader->info->name = ralloc_strdup(b.shader, "meta_blit_vs");
nir_variable *pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "a_pos");
pos_in->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "gl_Position");
pos_out->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, pos_out, pos_in);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "a_tex_pos");
tex_pos_in->data.location = VERT_ATTRIB_GENERIC1;
nir_variable *tex_pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec2, "v_tex_pos");
tex_pos_out->data.location = VARYING_SLOT_VAR0;
tex_pos_out->data.interpolation = INTERP_MODE_SMOOTH;
nir_copy_var(&b, tex_pos_out, tex_pos_in);
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
nir_store_var(&b, pos_out, outvec, 0xf);
nir_intrinsic_instr *src_box = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
src_box->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
nir_intrinsic_set_base(src_box, 0);
nir_intrinsic_set_range(src_box, 16);
src_box->num_components = 4;
nir_ssa_dest_init(&src_box->instr, &src_box->dest, 4, 32, "src_box");
nir_builder_instr_insert(&b, &src_box->instr);
nir_intrinsic_instr *vertex_id = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_vertex_id_zero_base);
nir_ssa_dest_init(&vertex_id->instr, &vertex_id->dest, 1, 32, "vertexid");
nir_builder_instr_insert(&b, &vertex_id->instr);
/* vertex 0 - src_x, src_y */
/* vertex 1 - src_x, src_y+h */
/* vertex 2 - src_x+w, src_y */
/* so channel 0 is vertex_id != 2 ? src_x : src_x + w
channel 1 is vertex id != 1 ? src_y : src_y + w */
nir_ssa_def *c0cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 2));
nir_ssa_def *c1cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 1));
nir_ssa_def *comp[2];
comp[0] = nir_bcsel(&b, c0cmp,
nir_channel(&b, &src_box->dest.ssa, 0),
nir_channel(&b, &src_box->dest.ssa, 2));
comp[1] = nir_bcsel(&b, c1cmp,
nir_channel(&b, &src_box->dest.ssa, 1),
nir_channel(&b, &src_box->dest.ssa, 3));
nir_ssa_def *out_tex_vec = nir_vec(&b, comp, 2);
nir_store_var(&b, tex_pos_out, out_tex_vec, 0x3);
return b.shader;
}
@@ -472,8 +502,6 @@ build_nir_buffer_fetch(struct nir_builder *b, struct radv_device *device,
sampler->data.binding = 0;
nir_intrinsic_instr *width = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(width, 16);
nir_intrinsic_set_range(width, 4);
width->src[0] = nir_src_for_ssa(nir_imm_int(b, 0));
width->num_components = 1;
nir_ssa_dest_init(&width->instr, &width->dest, 1, 32, "width");
@@ -504,8 +532,31 @@ build_nir_buffer_fetch(struct nir_builder *b, struct radv_device *device,
static const VkPipelineVertexInputStateCreateInfo normal_vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = 4 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 8
},
},
};
static nir_shader *
@@ -517,7 +568,7 @@ build_nir_copy_fragment_shader(struct radv_device *device,
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_strdup(b.shader, name);
b.shader->info->name = ralloc_strdup(b.shader, name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "v_tex_pos");
@@ -546,7 +597,7 @@ build_nir_copy_fragment_shader_depth(struct radv_device *device,
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_strdup(b.shader, name);
b.shader->info->name = ralloc_strdup(b.shader, name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "v_tex_pos");
@@ -575,7 +626,7 @@ build_nir_copy_fragment_shader_stencil(struct radv_device *device,
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_strdup(b.shader, name);
b.shader->info->name = ralloc_strdup(b.shader, name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "v_tex_pos");
@@ -703,8 +754,8 @@ blit2d_init_color_pipeline(struct radv_device *device,
.format = format,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.finalLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL,
},
.subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) {
@@ -713,12 +764,12 @@ blit2d_init_color_pipeline(struct radv_device *device,
.colorAttachmentCount = 1,
.pColorAttachments = &(VkAttachmentReference) {
.attachment = 0,
.layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = VK_ATTACHMENT_UNUSED,
.layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
@@ -861,8 +912,8 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
.format = 0,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.finalLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL,
},
.subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) {
@@ -873,7 +924,7 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
.pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = 0,
.layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
@@ -1016,8 +1067,8 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
.format = 0,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.finalLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL,
},
.subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) {
@@ -1028,7 +1079,7 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
.pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = 0,
.layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
@@ -1150,10 +1201,6 @@ radv_device_init_meta_blit2d_state(struct radv_device *device)
zero(device->meta_state.blit2d);
const VkPushConstantRange push_constant_ranges[] = {
{VK_SHADER_STAGE_VERTEX_BIT, 0, 16},
{VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4},
};
result = radv_CreateDescriptorSetLayout(radv_device_to_handle(device),
&(VkDescriptorSetLayoutCreateInfo) {
.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
@@ -1177,8 +1224,6 @@ radv_device_init_meta_blit2d_state(struct radv_device *device)
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1,
.pSetLayouts = &device->meta_state.blit2d.ds_layouts[BLIT2D_SRC_TYPE_IMAGE],
.pushConstantRangeCount = 1,
.pPushConstantRanges = push_constant_ranges,
},
&device->meta_state.alloc, &device->meta_state.blit2d.p_layouts[BLIT2D_SRC_TYPE_IMAGE]);
if (result != VK_SUCCESS)
@@ -1202,14 +1247,14 @@ radv_device_init_meta_blit2d_state(struct radv_device *device)
if (result != VK_SUCCESS)
goto fail;
const VkPushConstantRange push_constant_range = {VK_SHADER_STAGE_FRAGMENT_BIT, 0, 4};
result = radv_CreatePipelineLayout(radv_device_to_handle(device),
&(VkPipelineLayoutCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1,
.pSetLayouts = &device->meta_state.blit2d.ds_layouts[BLIT2D_SRC_TYPE_BUFFER],
.pushConstantRangeCount = 2,
.pPushConstantRanges = push_constant_ranges,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &push_constant_range,
},
&device->meta_state.alloc, &device->meta_state.blit2d.p_layouts[BLIT2D_SRC_TYPE_BUFFER]);
if (result != VK_SUCCESS)

View File

@@ -10,17 +10,17 @@ build_buffer_fill_shader(struct radv_device *dev)
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_buffer_fill");
b.shader->info.cs.local_size[0] = 64;
b.shader->info.cs.local_size[1] = 1;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "meta_buffer_fill");
b.shader->info->cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1;
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
@@ -36,8 +36,6 @@ build_buffer_fill_shader(struct radv_device *dev)
nir_builder_instr_insert(&b, &dst_buf->instr);
nir_intrinsic_instr *load = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(load, 0);
nir_intrinsic_set_range(load, 4);
load->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
load->num_components = 1;
nir_ssa_dest_init(&load->instr, &load->dest, 1, 32, "fill_value");
@@ -62,17 +60,17 @@ build_buffer_copy_shader(struct radv_device *dev)
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_buffer_copy");
b.shader->info.cs.local_size[0] = 64;
b.shader->info.cs.local_size[1] = 1;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "meta_buffer_copy");
b.shader->info->cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1;
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);

View File

@@ -42,10 +42,10 @@ build_nir_itob_compute_shader(struct radv_device *dev)
false,
GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_itob_cs");
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "meta_itob_cs");
b.shader->info->cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
sampler_type, "s_tex");
input_img->data.descriptor_set = 0;
@@ -59,25 +59,21 @@ build_nir_itob_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(offset, 0);
nir_intrinsic_set_range(offset, 12);
offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
offset->num_components = 2;
nir_ssa_dest_init(&offset->instr, &offset->dest, 2, 32, "offset");
nir_builder_instr_insert(&b, &offset->instr);
nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(stride, 0);
nir_intrinsic_set_range(stride, 12);
stride->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
stride->num_components = 1;
nir_ssa_dest_init(&stride->instr, &stride->dest, 1, 32, "stride");
@@ -244,10 +240,10 @@ build_nir_btoi_compute_shader(struct radv_device *dev)
false,
GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_btoi_cs");
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "meta_btoi_cs");
b.shader->info->cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
buf_type, "s_tex");
input_img->data.descriptor_set = 0;
@@ -261,23 +257,19 @@ build_nir_btoi_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(offset, 0);
nir_intrinsic_set_range(offset, 12);
offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
offset->num_components = 2;
nir_ssa_dest_init(&offset->instr, &offset->dest, 2, 32, "offset");
nir_builder_instr_insert(&b, &offset->instr);
nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(stride, 0);
nir_intrinsic_set_range(stride, 12);
stride->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
stride->num_components = 1;
nir_ssa_dest_init(&stride->instr, &stride->dest, 1, 32, "stride");
@@ -444,10 +436,10 @@ build_nir_itoi_compute_shader(struct radv_device *dev)
false,
GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_itoi_cs");
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "meta_itoi_cs");
b.shader->info->cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
buf_type, "s_tex");
input_img->data.descriptor_set = 0;
@@ -461,23 +453,19 @@ build_nir_itoi_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(src_offset, 0);
nir_intrinsic_set_range(src_offset, 16);
src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
src_offset->num_components = 2;
nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset");
nir_builder_instr_insert(&b, &src_offset->instr);
nir_intrinsic_instr *dst_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(dst_offset, 0);
nir_intrinsic_set_range(dst_offset, 16);
dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
dst_offset->num_components = 2;
nir_ssa_dest_init(&dst_offset->instr, &dst_offset->dest, 2, 32, "dst_offset");
@@ -634,10 +622,10 @@ build_nir_cleari_compute_shader(struct radv_device *dev)
false,
GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_cleari_cs");
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "meta_cleari_cs");
b.shader->info->cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1;
nir_variable *output_img = nir_variable_create(b.shader, nir_var_uniform,
img_type, "out_img");
@@ -647,15 +635,13 @@ build_nir_cleari_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *clear_val = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(clear_val, 0);
nir_intrinsic_set_range(clear_val, 16);
clear_val->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
clear_val->num_components = 4;
nir_ssa_dest_init(&clear_val->instr, &clear_val->dest, 4, 32, "clear_value");
@@ -859,6 +845,7 @@ radv_meta_end_cleari(struct radv_cmd_buffer *cmd_buffer,
static void
create_iview(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_surf *surf,
VkImageUsageFlags usage,
struct radv_image_view *iview)
{
@@ -875,7 +862,7 @@ create_iview(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = surf->layer,
.layerCount = 1
},
});
}, cmd_buffer, usage);
}
static void
@@ -961,7 +948,7 @@ radv_meta_image_to_buffer(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device;
struct itob_temps temps;
create_iview(cmd_buffer, src, &temps.src_iview);
create_iview(cmd_buffer, src, VK_IMAGE_USAGE_SAMPLED_BIT, &temps.src_iview);
create_bview(cmd_buffer, dst->buffer, dst->offset, dst->format, &temps.dst_bview);
itob_bind_descriptors(cmd_buffer, &temps);
@@ -1047,7 +1034,7 @@ radv_meta_buffer_to_image_cs(struct radv_cmd_buffer *cmd_buffer,
struct btoi_temps temps;
create_bview(cmd_buffer, src->buffer, src->offset, src->format, &temps.src_bview);
create_iview(cmd_buffer, dst, &temps.dst_iview);
create_iview(cmd_buffer, dst, VK_IMAGE_USAGE_STORAGE_BIT, &temps.dst_iview);
btoi_bind_descriptors(cmd_buffer, &temps);
btoi_bind_pipeline(cmd_buffer);
@@ -1137,8 +1124,8 @@ radv_meta_image_to_image_cs(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device;
struct itoi_temps temps;
create_iview(cmd_buffer, src, &temps.src_iview);
create_iview(cmd_buffer, dst, &temps.dst_iview);
create_iview(cmd_buffer, src, VK_IMAGE_USAGE_SAMPLED_BIT, &temps.src_iview);
create_iview(cmd_buffer, dst, VK_IMAGE_USAGE_STORAGE_BIT, &temps.dst_iview);
itoi_bind_descriptors(cmd_buffer, &temps);
@@ -1209,7 +1196,7 @@ radv_meta_clear_image_cs(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device;
struct radv_image_view dst_iview;
create_iview(cmd_buffer, dst, &dst_iview);
create_iview(cmd_buffer, dst, VK_IMAGE_USAGE_STORAGE_BIT, &dst_iview);
cleari_bind_descriptors(cmd_buffer, &dst_iview);
cleari_bind_pipeline(cmd_buffer);
@@ -1226,5 +1213,5 @@ radv_meta_clear_image_cs(struct radv_cmd_buffer *cmd_buffer,
VK_SHADER_STAGE_COMPUTE_BIT, 0, 16,
push_constants);
radv_unaligned_dispatch(cmd_buffer, dst->image->info.width, dst->image->info.height, 1);
radv_unaligned_dispatch(cmd_buffer, dst->image->extent.width, dst->image->extent.height, 1);
}

View File

@@ -27,6 +27,17 @@
#include "util/format_rgb9e5.h"
#include "vk_format.h"
/** Vertex attributes for color clears. */
struct color_clear_vattrs {
float position[2];
VkClearColorValue color;
};
/** Vertex attributes for depthstencil clears. */
struct depthstencil_clear_vattrs {
float position[2];
float depth_clear;
};
enum {
DEPTH_CLEAR_SLOW,
@@ -45,34 +56,47 @@ build_color_shaders(struct nir_shader **out_vs,
nir_builder_init_simple_shader(&vs_b, NULL, MESA_SHADER_VERTEX, NULL);
nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL);
vs_b.shader->info.name = ralloc_strdup(vs_b.shader, "meta_clear_color_vs");
fs_b.shader->info.name = ralloc_strdup(fs_b.shader, "meta_clear_color_fs");
vs_b.shader->info->name = ralloc_strdup(vs_b.shader, "meta_clear_color_vs");
fs_b.shader->info->name = ralloc_strdup(fs_b.shader, "meta_clear_color_fs");
const struct glsl_type *position_type = glsl_vec4_type();
const struct glsl_type *color_type = glsl_vec4_type();
nir_variable *vs_in_pos =
nir_variable_create(vs_b.shader, nir_var_shader_in, position_type,
"a_position");
vs_in_pos->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *vs_out_pos =
nir_variable_create(vs_b.shader, nir_var_shader_out, position_type,
"gl_Position");
vs_out_pos->data.location = VARYING_SLOT_POS;
nir_intrinsic_instr *in_color_load = nir_intrinsic_instr_create(fs_b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(in_color_load, 0);
nir_intrinsic_set_range(in_color_load, 16);
in_color_load->src[0] = nir_src_for_ssa(nir_imm_int(&fs_b, 0));
in_color_load->num_components = 4;
nir_ssa_dest_init(&in_color_load->instr, &in_color_load->dest, 4, 32, "clear color");
nir_builder_instr_insert(&fs_b, &in_color_load->instr);
nir_variable *vs_in_color =
nir_variable_create(vs_b.shader, nir_var_shader_in, color_type,
"a_color");
vs_in_color->data.location = VERT_ATTRIB_GENERIC1;
nir_variable *vs_out_color =
nir_variable_create(vs_b.shader, nir_var_shader_out, color_type,
"v_color");
vs_out_color->data.location = VARYING_SLOT_VAR0;
vs_out_color->data.interpolation = INTERP_MODE_FLAT;
nir_variable *fs_in_color =
nir_variable_create(fs_b.shader, nir_var_shader_in, color_type,
"v_color");
fs_in_color->data.location = vs_out_color->data.location;
fs_in_color->data.interpolation = vs_out_color->data.interpolation;
nir_variable *fs_out_color =
nir_variable_create(fs_b.shader, nir_var_shader_out, color_type,
"f_color");
fs_out_color->data.location = FRAG_RESULT_DATA0 + frag_output;
nir_store_var(&fs_b, fs_out_color, &in_color_load->dest.ssa, 0xf);
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&vs_b);
nir_store_var(&vs_b, vs_out_pos, outvec, 0xf);
nir_copy_var(&vs_b, vs_out_pos, vs_in_pos);
nir_copy_var(&vs_b, vs_out_color, vs_in_color);
nir_copy_var(&fs_b, fs_out_color, fs_in_color);
const struct glsl_type *layer_type = glsl_int_type();
nir_variable *vs_out_layer =
@@ -97,7 +121,6 @@ create_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo *vi_state,
const VkPipelineDepthStencilStateCreateInfo *ds_state,
const VkPipelineColorBlendStateCreateInfo *cb_state,
const VkPipelineLayout layout,
const struct radv_graphics_pipeline_create_info *extra,
const VkAllocationCallbacks *alloc,
struct radv_pipeline **pipeline)
@@ -177,11 +200,10 @@ create_pipeline(struct radv_device *device,
VK_DYNAMIC_STATE_STENCIL_REFERENCE,
},
},
.layout = layout,
.flags = 0,
.renderPass = radv_render_pass_to_handle(render_pass),
.subpass = 0,
},
.flags = 0,
.renderPass = radv_render_pass_to_handle(render_pass),
.subpass = 0,
},
extra,
alloc,
&pipeline_h);
@@ -247,8 +269,31 @@ create_color_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo vi_state = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = sizeof(struct color_clear_vattrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct color_clear_vattrs, position),
},
{
/* Color */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32A32_SFLOAT,
.offset = offsetof(struct color_clear_vattrs, color),
},
},
};
const VkPipelineDepthStencilStateCreateInfo ds_state = {
@@ -281,7 +326,6 @@ create_color_pipeline(struct radv_device *device,
};
result = create_pipeline(device, radv_render_pass_from_handle(pass),
samples, vs_nir, fs_nir, &vi_state, &ds_state, &cb_state,
device->meta_state.clear_color_p_layout,
&extra, &device->meta_state.alloc, pipeline);
return result;
@@ -324,12 +368,7 @@ radv_device_finish_meta_clear_state(struct radv_device *device)
}
destroy_render_pass(device, state->clear[i].depthstencil_rp);
}
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->clear_color_p_layout,
&state->alloc);
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->clear_depth_p_layout,
&state->alloc);
}
static void
@@ -343,13 +382,14 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
const uint32_t subpass_att = clear_att->colorAttachment;
const uint32_t pass_att = subpass->color_attachments[subpass_att].attachment;
const struct radv_image_view *iview = fb->attachments[pass_att].attachment;
const uint32_t samples = iview->image->info.samples;
const uint32_t samples = iview->image->samples;
const uint32_t samples_log2 = ffs(samples) - 1;
unsigned fs_key = radv_format_meta_fs_key(iview->vk_format);
struct radv_pipeline *pipeline;
VkClearColorValue clear_value = clear_att->clearValue.color;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
VkPipeline pipeline_h;
uint32_t offset;
if (fs_key == -1) {
radv_finishme("color clears incomplete");
@@ -367,10 +407,29 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
assert(clear_att->aspectMask == VK_IMAGE_ASPECT_COLOR_BIT);
assert(clear_att->colorAttachment < subpass->color_count);
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.clear_color_p_layout,
VK_SHADER_STAGE_FRAGMENT_BIT, 0, 16,
&clear_value);
const struct color_clear_vattrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0,
},
.color = clear_value,
},
{
.position = {
-1.0,
1.0,
},
.color = clear_value,
},
{
.position = {
1.0,
-1.0,
},
.color = clear_value,
},
};
struct radv_subpass clear_subpass = {
.color_count = 1,
@@ -382,6 +441,19 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
radv_cmd_buffer_set_subpass(cmd_buffer, &clear_subpass, false);
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
radv_CmdBindVertexBuffers(cmd_buffer_h, 0, 1,
(VkBuffer[]) { radv_buffer_to_handle(&vertex_buffer) },
(VkDeviceSize[]) { 0 });
if (cmd_buffer->state.pipeline != pipeline) {
radv_CmdBindPipeline(cmd_buffer_h, VK_PIPELINE_BIND_POINT_GRAPHICS,
pipeline_h);
@@ -412,25 +484,21 @@ build_depthstencil_shader(struct nir_shader **out_vs, struct nir_shader **out_fs
nir_builder_init_simple_shader(&vs_b, NULL, MESA_SHADER_VERTEX, NULL);
nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL);
vs_b.shader->info.name = ralloc_strdup(vs_b.shader, "meta_clear_depthstencil_vs");
fs_b.shader->info.name = ralloc_strdup(fs_b.shader, "meta_clear_depthstencil_fs");
const struct glsl_type *position_out_type = glsl_vec4_type();
vs_b.shader->info->name = ralloc_strdup(vs_b.shader, "meta_clear_depthstencil_vs");
fs_b.shader->info->name = ralloc_strdup(fs_b.shader, "meta_clear_depthstencil_fs");
const struct glsl_type *position_type = glsl_vec4_type();
nir_variable *vs_in_pos =
nir_variable_create(vs_b.shader, nir_var_shader_in, position_type,
"a_position");
vs_in_pos->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *vs_out_pos =
nir_variable_create(vs_b.shader, nir_var_shader_out, position_out_type,
nir_variable_create(vs_b.shader, nir_var_shader_out, position_type,
"gl_Position");
vs_out_pos->data.location = VARYING_SLOT_POS;
nir_intrinsic_instr *in_color_load = nir_intrinsic_instr_create(vs_b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(in_color_load, 0);
nir_intrinsic_set_range(in_color_load, 4);
in_color_load->src[0] = nir_src_for_ssa(nir_imm_int(&vs_b, 0));
in_color_load->num_components = 1;
nir_ssa_dest_init(&in_color_load->instr, &in_color_load->dest, 1, 32, "depth value");
nir_builder_instr_insert(&vs_b, &in_color_load->instr);
nir_ssa_def *outvec = radv_meta_gen_rect_vertices_comp2(&vs_b, &in_color_load->dest.ssa);
nir_store_var(&vs_b, vs_out_pos, outvec, 0xf);
nir_copy_var(&vs_b, vs_out_pos, vs_in_pos);
const struct glsl_type *layer_type = glsl_int_type();
nir_variable *vs_out_layer =
@@ -494,8 +562,24 @@ create_depthstencil_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo vi_state = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = sizeof(struct depthstencil_clear_vattrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = offsetof(struct depthstencil_clear_vattrs, position),
},
},
};
const VkPipelineDepthStencilStateCreateInfo ds_state = {
@@ -535,19 +619,14 @@ create_depthstencil_pipeline(struct radv_device *device,
}
result = create_pipeline(device, radv_render_pass_from_handle(render_pass),
samples, vs_nir, fs_nir, &vi_state, &ds_state, &cb_state,
device->meta_state.clear_depth_p_layout,
&extra, &device->meta_state.alloc, pipeline);
return result;
}
static bool depth_view_can_fast_clear(struct radv_cmd_buffer *cmd_buffer,
const struct radv_image_view *iview,
static bool depth_view_can_fast_clear(const struct radv_image_view *iview,
VkImageLayout layout,
const VkClearRect *clear_rect)
{
uint32_t queue_mask = radv_image_queue_family_mask(iview->image,
cmd_buffer->queue_family_index,
cmd_buffer->queue_family_index);
if (clear_rect->rect.offset.x || clear_rect->rect.offset.y ||
clear_rect->rect.extent.width != iview->extent.width ||
clear_rect->rect.extent.height != iview->extent.height)
@@ -555,15 +634,14 @@ static bool depth_view_can_fast_clear(struct radv_cmd_buffer *cmd_buffer,
if (iview->image->surface.htile_size &&
iview->base_mip == 0 &&
iview->base_layer == 0 &&
radv_layout_is_htile_compressed(iview->image, layout, queue_mask) &&
!radv_image_extent_compare(iview->image, &iview->extent))
radv_layout_can_expclear(iview->image, layout) &&
memcmp(&iview->extent, &iview->image->extent, sizeof(iview->extent)) == 0)
return true;
return false;
}
static struct radv_pipeline *
pick_depthstencil_pipeline(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_state *meta_state,
pick_depthstencil_pipeline(struct radv_meta_state *meta_state,
const struct radv_image_view *iview,
int samples_log2,
VkImageAspectFlags aspects,
@@ -571,7 +649,7 @@ pick_depthstencil_pipeline(struct radv_cmd_buffer *cmd_buffer,
const VkClearRect *clear_rect,
VkClearDepthStencilValue clear_value)
{
bool fast = depth_view_can_fast_clear(cmd_buffer, iview, layout, clear_rect);
bool fast = depth_view_can_fast_clear(iview, layout, clear_rect);
int index = DEPTH_CLEAR_SLOW;
if (fast) {
@@ -604,9 +682,10 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
VkClearDepthStencilValue clear_value = clear_att->clearValue.depthStencil;
VkImageAspectFlags aspects = clear_att->aspectMask;
const struct radv_image_view *iview = fb->attachments[pass_att].attachment;
const uint32_t samples = iview->image->info.samples;
const uint32_t samples = iview->image->samples;
const uint32_t samples_log2 = ffs(samples) - 1;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
assert(aspects == VK_IMAGE_ASPECT_DEPTH_BIT ||
aspects == VK_IMAGE_ASPECT_STENCIL_BIT ||
@@ -617,18 +696,48 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
if (!(aspects & VK_IMAGE_ASPECT_DEPTH_BIT))
clear_value.depth = 1.0f;
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.clear_depth_p_layout,
VK_SHADER_STAGE_VERTEX_BIT, 0, 4,
&clear_value.depth);
const struct depthstencil_clear_vattrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0
},
.depth_clear = clear_value.depth,
},
{
.position = {
-1.0,
1.0,
},
.depth_clear = clear_value.depth,
},
{
.position = {
1.0,
-1.0,
},
.depth_clear = clear_value.depth,
},
};
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT) {
radv_CmdSetStencilReference(cmd_buffer_h, VK_STENCIL_FACE_FRONT_BIT,
clear_value.stencil);
}
struct radv_pipeline *pipeline = pick_depthstencil_pipeline(cmd_buffer,
meta_state,
radv_CmdBindVertexBuffers(cmd_buffer_h, 0, 1,
(VkBuffer[]) { radv_buffer_to_handle(&vertex_buffer) },
(VkDeviceSize[]) { 0 });
struct radv_pipeline *pipeline = pick_depthstencil_pipeline(meta_state,
iview,
samples_log2,
aspects,
@@ -640,7 +749,7 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
radv_pipeline_to_handle(pipeline));
}
if (depth_view_can_fast_clear(cmd_buffer, iview, subpass->depth_stencil_attachment.layout, clear_rect))
if (depth_view_can_fast_clear(iview, subpass->depth_stencil_attachment.layout, clear_rect))
radv_set_depth_clear_regs(cmd_buffer, iview->image, clear_value, aspects);
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) {
@@ -657,95 +766,6 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, 0);
}
static bool
emit_fast_htile_clear(struct radv_cmd_buffer *cmd_buffer,
const VkClearAttachment *clear_att,
const VkClearRect *clear_rect,
enum radv_cmd_flush_bits *pre_flush,
enum radv_cmd_flush_bits *post_flush)
{
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
const uint32_t pass_att = subpass->depth_stencil_attachment.attachment;
VkImageLayout image_layout = subpass->depth_stencil_attachment.layout;
const struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_image_view *iview = fb->attachments[pass_att].attachment;
VkClearDepthStencilValue clear_value = clear_att->clearValue.depthStencil;
VkImageAspectFlags aspects = clear_att->aspectMask;
uint32_t clear_word;
if (!iview->image->surface.htile_size)
return false;
if (cmd_buffer->device->debug_flags & RADV_DEBUG_NO_FAST_CLEARS)
return false;
if (!radv_layout_is_htile_compressed(iview->image, image_layout, radv_image_queue_family_mask(iview->image, cmd_buffer->queue_family_index, cmd_buffer->queue_family_index)))
goto fail;
/* don't fast clear 3D */
if (iview->image->type == VK_IMAGE_TYPE_3D)
goto fail;
/* all layers are bound */
if (iview->base_layer > 0)
goto fail;
if (iview->image->info.array_size != iview->layer_count)
goto fail;
if (iview->image->info.levels > 1)
goto fail;
if (!radv_image_extent_compare(iview->image, &iview->extent))
goto fail;
if (clear_rect->rect.offset.x || clear_rect->rect.offset.y ||
clear_rect->rect.extent.width != iview->image->info.width ||
clear_rect->rect.extent.height != iview->image->info.height)
goto fail;
if (clear_rect->baseArrayLayer != 0)
goto fail;
if (clear_rect->layerCount != iview->image->info.array_size)
goto fail;
/* Don't do stencil clears till we have figured out if the clear words are
* correct. */
if (vk_format_aspects(iview->image->vk_format) & VK_IMAGE_ASPECT_STENCIL_BIT)
goto fail;
if (clear_value.depth == 1.0)
clear_word = 0xfffffff0;
else if (clear_value.depth == 0.0)
clear_word = 0;
else
goto fail;
if (pre_flush) {
cmd_buffer->state.flush_bits |= (RADV_CMD_FLAG_FLUSH_AND_INV_DB |
RADV_CMD_FLAG_FLUSH_AND_INV_DB_META) & ~ *pre_flush;
*pre_flush |= cmd_buffer->state.flush_bits;
} else
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_DB |
RADV_CMD_FLAG_FLUSH_AND_INV_DB_META;
radv_fill_buffer(cmd_buffer, iview->image->bo,
iview->image->offset + iview->image->htile_offset,
iview->image->surface.htile_size, clear_word);
radv_set_depth_clear_regs(cmd_buffer, iview->image, clear_value, aspects);
if (post_flush)
*post_flush |= RADV_CMD_FLAG_CS_PARTIAL_FLUSH |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2;
else
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_CS_PARTIAL_FLUSH |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2;
return true;
fail:
return false;
}
static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
@@ -768,34 +788,6 @@ radv_device_init_meta_clear_state(struct radv_device *device)
memset(&device->meta_state.clear, 0, sizeof(device->meta_state.clear));
VkPipelineLayoutCreateInfo pl_color_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 0,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &(VkPushConstantRange){VK_SHADER_STAGE_FRAGMENT_BIT, 0, 16},
};
res = radv_CreatePipelineLayout(radv_device_to_handle(device),
&pl_color_create_info,
&device->meta_state.alloc,
&device->meta_state.clear_color_p_layout);
if (res != VK_SUCCESS)
goto fail;
VkPipelineLayoutCreateInfo pl_depth_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 0,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &(VkPushConstantRange){VK_SHADER_STAGE_VERTEX_BIT, 0, 4},
};
res = radv_CreatePipelineLayout(radv_device_to_handle(device),
&pl_depth_create_info,
&device->meta_state.alloc,
&device->meta_state.clear_depth_p_layout);
if (res != VK_SUCCESS)
goto fail;
for (uint32_t i = 0; i < ARRAY_SIZE(state->clear); ++i) {
uint32_t samples = 1 << i;
for (uint32_t j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) {
@@ -893,30 +885,26 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
/* all layers are bound */
if (iview->base_layer > 0)
goto fail;
if (iview->image->info.array_size != iview->layer_count)
if (iview->image->array_size != iview->layer_count)
goto fail;
if (iview->image->info.levels > 1)
if (iview->image->levels > 1)
goto fail;
if (iview->image->surface.u.legacy.level[0].mode < RADEON_SURF_MODE_1D)
if (iview->image->surface.level[0].mode < RADEON_SURF_MODE_1D)
goto fail;
if (!radv_image_extent_compare(iview->image, &iview->extent))
if (memcmp(&iview->extent, &iview->image->extent, sizeof(iview->extent)))
goto fail;
if (clear_rect->rect.offset.x || clear_rect->rect.offset.y ||
clear_rect->rect.extent.width != iview->image->info.width ||
clear_rect->rect.extent.height != iview->image->info.height)
clear_rect->rect.extent.width != iview->image->extent.width ||
clear_rect->rect.extent.height != iview->image->extent.height)
goto fail;
if (clear_rect->baseArrayLayer != 0)
goto fail;
if (clear_rect->layerCount != iview->image->info.array_size)
goto fail;
/* RB+ doesn't work with CMASK fast clear on Stoney. */
if (!iview->image->surface.dcc_size &&
cmd_buffer->device->physical_device->rad_info.family == CHIP_STONEY)
if (clear_rect->layerCount != iview->image->array_size)
goto fail;
/* DCC */
@@ -977,9 +965,7 @@ emit_clear(struct radv_cmd_buffer *cmd_buffer,
} else {
assert(clear_att->aspectMask & (VK_IMAGE_ASPECT_DEPTH_BIT |
VK_IMAGE_ASPECT_STENCIL_BIT));
if (!emit_fast_htile_clear(cmd_buffer, clear_att, clear_rect,
pre_flush, post_flush))
emit_depthstencil_clear(cmd_buffer, clear_att, clear_rect);
emit_depthstencil_clear(cmd_buffer, clear_att, clear_rect);
}
}
@@ -1023,7 +1009,7 @@ radv_cmd_buffer_clear_subpass(struct radv_cmd_buffer *cmd_buffer)
if (!subpass_needs_clear(cmd_buffer))
return;
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
VkClearRect clear_rect = {
.rect = cmd_state->render_area,
@@ -1094,7 +1080,8 @@ radv_clear_image_layer(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = range->baseArrayLayer + layer,
.layerCount = 1
},
});
},
cmd_buffer, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT);
VkFramebuffer fb;
radv_CreateFramebuffer(device_h,
@@ -1227,7 +1214,7 @@ radv_cmd_clear_image(struct radv_cmd_buffer *cmd_buffer,
const VkImageSubresourceRange *range = &ranges[r];
for (uint32_t l = 0; l < radv_get_levelCount(image, range); ++l) {
const uint32_t layer_count = image->type == VK_IMAGE_TYPE_3D ?
radv_minify(image->info.depth, range->baseMipLevel + l) :
radv_minify(image->extent.depth, range->baseMipLevel + l) :
radv_get_layerCount(image, range);
for (uint32_t s = 0; s < layer_count; ++s) {
@@ -1270,7 +1257,7 @@ void radv_CmdClearColorImage(
if (cs)
radv_meta_begin_cleari(cmd_buffer, &saved_state.compute);
else
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state.gfx, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state.gfx, cmd_buffer);
radv_cmd_clear_image(cmd_buffer, image, imageLayout,
(const VkClearValue *) pColor,
@@ -1294,7 +1281,7 @@ void radv_CmdClearDepthStencilImage(
RADV_FROM_HANDLE(radv_image, image, image_h);
struct radv_meta_saved_state saved_state;
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
radv_cmd_clear_image(cmd_buffer, image, imageLayout,
(const VkClearValue *) pDepthStencil,
@@ -1318,7 +1305,7 @@ void radv_CmdClearAttachments(
if (!cmd_buffer->state.subpass)
return;
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
/* FINISHME: We can do better than this dumb loop. It thrashes too much
* state.

View File

@@ -118,12 +118,12 @@ meta_copy_buffer_to_image(struct radv_cmd_buffer *cmd_buffer,
/* The Vulkan 1.0 spec says "dstImage must have a sample count equal to
* VK_SAMPLE_COUNT_1_BIT."
*/
assert(image->info.samples == 1);
assert(image->samples == 1);
if (cs)
radv_meta_begin_bufimage(cmd_buffer, &saved_state.compute);
else
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state.gfx, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state.gfx, cmd_buffer);
for (unsigned r = 0; r < regionCount; r++) {
@@ -337,11 +337,11 @@ meta_copy_image(struct radv_cmd_buffer *cmd_buffer,
* vkCmdCopyImage can be used to copy image data between multisample
* images, but both images must have the same number of samples.
*/
assert(src_image->info.samples == dest_image->info.samples);
assert(src_image->samples == dest_image->samples);
if (cs)
radv_meta_begin_itoi(cmd_buffer, &saved_state.compute);
else
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state.gfx, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state.gfx, cmd_buffer);
for (unsigned r = 0; r < regionCount; r++) {
assert(pRegions[r].srcSubresource.aspectMask ==
@@ -447,8 +447,8 @@ void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer,
image_copy.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
image_copy.dstSubresource.layerCount = 1;
image_copy.extent.width = image->info.width;
image_copy.extent.height = image->info.height;
image_copy.extent.width = image->extent.width;
image_copy.extent.height = image->extent.height;
image_copy.extent.depth = 1;
meta_copy_image(cmd_buffer, image, linear_image,

View File

@@ -26,7 +26,53 @@
#include "radv_meta.h"
#include "radv_private.h"
#include "nir/nir_builder.h"
#include "sid.h"
/**
* Vertex attributes used by all pipelines.
*/
struct vertex_attrs {
float position[2]; /**< 3DPRIM_RECTLIST */
};
/* passthrough vertex shader */
static nir_shader *
build_nir_vs(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *a_position;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_depth_decomp_vs");
a_position = nir_variable_create(b.shader, nir_var_shader_in, vec4,
"a_position");
a_position->data.location = VERT_ATTRIB_GENERIC0;
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, v_position, a_position);
return b.shader;
}
/* simple passthrough shader */
static nir_shader *
build_nir_fs(void)
{
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_asprintf(b.shader,
"meta_depth_decomp_noop_fs");
return b.shader;
}
static VkResult
create_pass(struct radv_device *device)
@@ -78,7 +124,7 @@ create_pipeline(struct radv_device *device,
VkDevice device_h = radv_device_to_handle(device);
struct radv_shader_module fs_module = {
.nir = radv_meta_build_nir_fs_noop(),
.nir = build_nir_fs(),
};
if (!fs_module.nir) {
@@ -106,8 +152,24 @@ create_pipeline(struct radv_device *device,
},
.pVertexInputState = &(VkPipelineVertexInputStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = sizeof(struct vertex_attrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct vertex_attrs, position),
},
},
},
.pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
@@ -223,7 +285,7 @@ radv_device_init_meta_depth_decomp_state(struct radv_device *device)
zero(device->meta_state.depth_decomp);
struct radv_shader_module vs_module = { .nir = radv_meta_build_nir_vs_generate_vertices() };
struct radv_shader_module vs_module = { .nir = build_nir_vs() };
if (!vs_module.nir) {
/* XXX: Need more accurate error */
res = VK_ERROR_OUT_OF_HOST_MEMORY;
@@ -256,7 +318,45 @@ emit_depth_decomp(struct radv_cmd_buffer *cmd_buffer,
const VkExtent2D *depth_decomp_extent,
VkPipeline pipeline_h)
{
struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
const struct vertex_attrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0,
},
},
{
.position = {
-1.0,
1.0,
},
},
{
.position = {
1.0,
-1.0,
},
},
};
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
VkBuffer vertex_buffer_h = radv_buffer_to_handle(&vertex_buffer);
radv_CmdBindVertexBuffers(cmd_buffer_h,
/*firstBinding*/ 0,
/*bindingCount*/ 1,
(VkBuffer[]) { vertex_buffer_h },
(VkDeviceSize[]) { 0 });
RADV_FROM_HANDLE(radv_pipeline, pipeline, pipeline_h);
@@ -292,16 +392,16 @@ static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_saved_pass_state saved_pass_state;
VkDevice device_h = radv_device_to_handle(cmd_buffer->device);
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t width = radv_minify(image->info.width,
uint32_t width = radv_minify(image->extent.width,
subresourceRange->baseMipLevel);
uint32_t height = radv_minify(image->info.height,
uint32_t height = radv_minify(image->extent.height,
subresourceRange->baseMipLevel);
if (!image->surface.htile_size)
return;
radv_meta_save_pass(&saved_pass_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
for (uint32_t layer = 0; layer < radv_get_layerCount(image, subresourceRange); layer++) {
struct radv_image_view iview;
@@ -318,7 +418,8 @@ static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = subresourceRange->baseArrayLayer + layer,
.layerCount = 1,
},
});
},
cmd_buffer, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT);
VkFramebuffer fb_h;

View File

@@ -26,7 +26,53 @@
#include "radv_meta.h"
#include "radv_private.h"
#include "nir/nir_builder.h"
#include "sid.h"
/**
* Vertex attributes used by all pipelines.
*/
struct vertex_attrs {
float position[2]; /**< 3DPRIM_RECTLIST */
};
/* passthrough vertex shader */
static nir_shader *
build_nir_vs(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *a_position;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_fast_clear_vs");
a_position = nir_variable_create(b.shader, nir_var_shader_in, vec4,
"a_position");
a_position->data.location = VERT_ATTRIB_GENERIC0;
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, v_position, a_position);
return b.shader;
}
/* simple passthrough shader */
static nir_shader *
build_nir_fs(void)
{
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_asprintf(b.shader,
"meta_fast_clear_noop_fs");
return b.shader;
}
static VkResult
create_pass(struct radv_device *device)
@@ -82,7 +128,7 @@ create_pipeline(struct radv_device *device,
VkDevice device_h = radv_device_to_handle(device);
struct radv_shader_module fs_module = {
.nir = radv_meta_build_nir_fs_noop(),
.nir = build_nir_fs(),
};
if (!fs_module.nir) {
@@ -108,8 +154,24 @@ create_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo vi_state = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = sizeof(struct vertex_attrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct vertex_attrs, position),
},
}
};
const VkPipelineInputAssemblyStateCreateInfo ia_state = {
@@ -268,7 +330,7 @@ radv_device_init_meta_fast_clear_flush_state(struct radv_device *device)
zero(device->meta_state.fast_clear_flush);
struct radv_shader_module vs_module = { .nir = radv_meta_build_nir_vs_generate_vertices() };
struct radv_shader_module vs_module = { .nir = build_nir_vs() };
if (!vs_module.nir) {
/* XXX: Need more accurate error */
res = VK_ERROR_OUT_OF_HOST_MEMORY;
@@ -302,6 +364,43 @@ emit_fast_clear_flush(struct radv_cmd_buffer *cmd_buffer,
{
struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
const struct vertex_attrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0,
},
},
{
.position = {
-1.0,
1.0,
},
},
{
.position = {
1.0,
-1.0,
},
},
};
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
VkBuffer vertex_buffer_h = radv_buffer_to_handle(&vertex_buffer);
radv_CmdBindVertexBuffers(cmd_buffer_h,
/*firstBinding*/ 0,
/*bindingCount*/ 1,
(VkBuffer[]) { vertex_buffer_h },
(VkDeviceSize[]) { 0 });
VkPipeline pipeline_h;
if (fmask_decompress)
@@ -349,7 +448,7 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
assert(cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL);
radv_meta_save_pass(&saved_pass_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
for (uint32_t layer = 0; layer < layer_count; ++layer) {
struct radv_image_view iview;
@@ -367,7 +466,8 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = subresourceRange->baseArrayLayer + layer,
.layerCount = 1,
},
});
},
cmd_buffer, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT);
VkFramebuffer fb_h;
radv_CreateFramebuffer(device_h,
@@ -377,8 +477,8 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
.pAttachments = (VkImageView[]) {
radv_image_view_to_handle(&iview)
},
.width = image->info.width,
.height = image->info.height,
.width = image->extent.width,
.height = image->extent.height,
.layers = 1
},
&cmd_buffer->pool->alloc,
@@ -395,8 +495,8 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
0,
},
.extent = {
image->info.width,
image->info.height,
image->extent.width,
image->extent.height,
}
},
.clearValueCount = 0,
@@ -405,7 +505,7 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
VK_SUBPASS_CONTENTS_INLINE);
emit_fast_clear_flush(cmd_buffer,
&(VkExtent2D) { image->info.width, image->info.height },
&(VkExtent2D) { image->extent.width, image->extent.height },
image->fmask.size > 0);
radv_CmdEndRenderPass(cmd_buffer_h);

View File

@@ -28,8 +28,40 @@
#include "radv_private.h"
#include "nir/nir_builder.h"
#include "sid.h"
/**
* Vertex attributes used by all pipelines.
*/
struct vertex_attrs {
float position[2]; /**< 3DPRIM_RECTLIST */
};
/* emit 0, 0, 0, 1 */
/* passthrough vertex shader */
static nir_shader *
build_nir_vs(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *a_position;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_resolve_vs");
a_position = nir_variable_create(b.shader, nir_var_shader_in, vec4,
"a_position");
a_position->data.location = VERT_ATTRIB_GENERIC0;
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, v_position, a_position);
return b.shader;
}
/* simple passthrough shader */
static nir_shader *
build_nir_fs(void)
{
@@ -38,7 +70,7 @@ build_nir_fs(void)
nir_variable *f_color; /* vec4, fragment output color */
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_asprintf(b.shader,
b.shader->info->name = ralloc_asprintf(b.shader,
"meta_resolve_fs");
f_color = nir_variable_create(b.shader, nir_var_shader_out, vec4,
@@ -142,8 +174,24 @@ create_pipeline(struct radv_device *device,
},
.pVertexInputState = &(VkPipelineVertexInputStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
.vertexBindingDescriptionCount = 1,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) {
{
.binding = 0,
.stride = sizeof(struct vertex_attrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct vertex_attrs, position),
},
},
},
.pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
@@ -240,7 +288,7 @@ radv_device_init_meta_resolve_state(struct radv_device *device)
zero(device->meta_state.resolve);
struct radv_shader_module vs_module = { .nir = radv_meta_build_nir_vs_generate_vertices() };
struct radv_shader_module vs_module = { .nir = build_nir_vs() };
if (!vs_module.nir) {
/* XXX: Need more accurate error */
res = VK_ERROR_OUT_OF_HOST_MEMORY;
@@ -274,8 +322,44 @@ emit_resolve(struct radv_cmd_buffer *cmd_buffer,
{
struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
const struct vertex_attrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0,
},
},
{
.position = {
-1.0,
1.0,
},
},
{
.position = {
1.0,
-1.0,
},
},
};
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
VkBuffer vertex_buffer_h = radv_buffer_to_handle(&vertex_buffer);
radv_CmdBindVertexBuffers(cmd_buffer_h,
/*firstBinding*/ 0,
/*bindingCount*/ 1,
(VkBuffer[]) { vertex_buffer_h },
(VkDeviceSize[]) { 0 });
VkPipeline pipeline_h = device->meta_state.resolve.pipeline;
RADV_FROM_HANDLE(radv_pipeline, pipeline, pipeline_h);
@@ -303,25 +387,6 @@ emit_resolve(struct radv_cmd_buffer *cmd_buffer,
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
}
enum radv_resolve_method {
RESOLVE_HW,
RESOLVE_COMPUTE,
RESOLVE_FRAGMENT,
};
static void radv_pick_resolve_method_images(struct radv_image *src_image,
struct radv_image *dest_image,
enum radv_resolve_method *method)
{
if (dest_image->surface.micro_tile_mode != src_image->surface.micro_tile_mode) {
if (dest_image->surface.num_dcc_levels > 0)
*method = RESOLVE_FRAGMENT;
else
*method = RESOLVE_COMPUTE;
}
}
void radv_CmdResolveImage(
VkCommandBuffer cmd_buffer_h,
VkImage src_image_h,
@@ -337,39 +402,28 @@ void radv_CmdResolveImage(
struct radv_device *device = cmd_buffer->device;
struct radv_meta_saved_state saved_state;
VkDevice device_h = radv_device_to_handle(device);
enum radv_resolve_method resolve_method = RESOLVE_HW;
bool use_compute_resolve = false;
/* we can use the hw resolve only for single full resolves */
if (region_count == 1) {
if (regions[0].srcOffset.x ||
regions[0].srcOffset.y ||
regions[0].srcOffset.z)
resolve_method = RESOLVE_COMPUTE;
use_compute_resolve = true;
if (regions[0].dstOffset.x ||
regions[0].dstOffset.y ||
regions[0].dstOffset.z)
resolve_method = RESOLVE_COMPUTE;
use_compute_resolve = true;
if (regions[0].extent.width != src_image->info.width ||
regions[0].extent.height != src_image->info.height ||
regions[0].extent.depth != src_image->info.depth)
resolve_method = RESOLVE_COMPUTE;
if (regions[0].extent.width != src_image->extent.width ||
regions[0].extent.height != src_image->extent.height ||
regions[0].extent.depth != src_image->extent.depth)
use_compute_resolve = true;
} else
resolve_method = RESOLVE_COMPUTE;
use_compute_resolve = true;
radv_pick_resolve_method_images(src_image, dest_image,
&resolve_method);
if (use_compute_resolve) {
if (resolve_method == RESOLVE_FRAGMENT) {
radv_meta_resolve_fragment_image(cmd_buffer,
src_image,
src_image_layout,
dest_image,
dest_image_layout,
region_count, regions);
return;
}
if (resolve_method == RESOLVE_COMPUTE) {
radv_meta_resolve_compute_image(cmd_buffer,
src_image,
src_image_layout,
@@ -379,12 +433,12 @@ void radv_CmdResolveImage(
return;
}
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
assert(src_image->info.samples > 1);
assert(dest_image->info.samples == 1);
assert(src_image->samples > 1);
assert(dest_image->samples == 1);
if (src_image->info.samples >= 16) {
if (src_image->samples >= 16) {
/* See commit aa3f9aaf31e9056a255f9e0472ebdfdaa60abe54 for the
* glBlitFramebuffer workaround for samples >= 16.
*/
@@ -392,7 +446,7 @@ void radv_CmdResolveImage(
"samples >= 16");
}
if (src_image->info.array_size > 1)
if (src_image->array_size > 1)
radv_finishme("vkCmdResolveImage: multisample array images");
if (dest_image->surface.dcc_size) {
@@ -458,7 +512,8 @@ void radv_CmdResolveImage(
.baseArrayLayer = src_base_layer + layer,
.layerCount = 1,
},
});
},
cmd_buffer, VK_IMAGE_USAGE_SAMPLED_BIT);
struct radv_image_view dest_iview;
radv_image_view_init(&dest_iview, cmd_buffer->device,
@@ -474,7 +529,8 @@ void radv_CmdResolveImage(
.baseArrayLayer = dest_base_layer + layer,
.layerCount = 1,
},
});
},
cmd_buffer, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT);
VkFramebuffer fb_h;
radv_CreateFramebuffer(device_h,
@@ -485,9 +541,9 @@ void radv_CmdResolveImage(
radv_image_view_to_handle(&src_iview),
radv_image_view_to_handle(&dest_iview),
},
.width = radv_minify(dest_image->info.width,
.width = radv_minify(dest_image->extent.width,
region->dstSubresource.mipLevel),
.height = radv_minify(dest_image->info.height,
.height = radv_minify(dest_image->extent.height,
region->dstSubresource.mipLevel),
.layers = 1
},
@@ -543,7 +599,6 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radv_meta_saved_state saved_state;
enum radv_resolve_method resolve_method = RESOLVE_HW;
/* FINISHME(perf): Skip clears for resolve attachments.
*
@@ -557,27 +612,7 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
if (!subpass->has_resolve)
return;
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
struct radv_image *src_img = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment->image;
radv_pick_resolve_method_images(dst_img, src_img, &resolve_method);
if (resolve_method == RESOLVE_FRAGMENT) {
break;
}
}
if (resolve_method == RESOLVE_COMPUTE) {
radv_cmd_buffer_resolve_subpass_cs(cmd_buffer);
return;
} else if (resolve_method == RESOLVE_FRAGMENT) {
radv_cmd_buffer_resolve_subpass_fs(cmd_buffer);
return;
}
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer);
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];

View File

@@ -32,10 +32,11 @@
#include "vk_format.h"
static nir_shader *
build_resolve_compute_shader(struct radv_device *dev, bool is_integer, bool is_srgb, int samples)
build_resolve_compute_shader(struct radv_device *dev, bool is_integer, int samples)
{
nir_builder b;
char name[64];
nir_if *outer_if = NULL;
const struct glsl_type *sampler_type = glsl_sampler_type(GLSL_SAMPLER_DIM_MS,
false,
false,
@@ -44,12 +45,12 @@ build_resolve_compute_shader(struct radv_device *dev, bool is_integer, bool is_s
false,
false,
GLSL_TYPE_FLOAT);
snprintf(name, 64, "meta_resolve_cs-%d-%s", samples, is_integer ? "int" : (is_srgb ? "srgb" : "float"));
snprintf(name, 64, "meta_resolve_cs-%d-%s", samples, is_integer ? "int" : "float");
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, name);
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, name);
b.shader->info->cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
sampler_type, "s_tex");
@@ -63,40 +64,105 @@ build_resolve_compute_shader(struct radv_device *dev, bool is_integer, bool is_s
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(src_offset, 0);
nir_intrinsic_set_range(src_offset, 16);
src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
src_offset->num_components = 2;
nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset");
nir_builder_instr_insert(&b, &src_offset->instr);
nir_intrinsic_instr *dst_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(dst_offset, 0);
nir_intrinsic_set_range(dst_offset, 16);
dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
dst_offset->num_components = 2;
nir_ssa_dest_init(&dst_offset->instr, &dst_offset->dest, 2, 32, "dst_offset");
nir_builder_instr_insert(&b, &dst_offset->instr);
nir_ssa_def *img_coord = nir_channels(&b, nir_iadd(&b, global_id, &src_offset->dest.ssa), 0x3);
nir_variable *color = nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
/* do a txf_ms on each sample */
nir_ssa_def *tmp;
radv_meta_build_resolve_shader_core(&b, is_integer, is_srgb, samples,
input_img, color, img_coord);
nir_tex_instr *tex = nir_tex_instr_create(b.shader, 2);
tex->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex->op = nir_texop_txf_ms;
tex->src[0].src_type = nir_tex_src_coord;
tex->src[0].src = nir_src_for_ssa(img_coord);
tex->src[1].src_type = nir_tex_src_ms_index;
tex->src[1].src = nir_src_for_ssa(nir_imm_int(&b, 0));
tex->dest_type = nir_type_float;
tex->is_array = false;
tex->coord_components = 2;
tex->texture = nir_deref_var_create(tex, input_img);
tex->sampler = NULL;
nir_ssa_def *outval = nir_load_var(&b, color);
nir_ssa_dest_init(&tex->instr, &tex->dest, 4, 32, "tex");
nir_builder_instr_insert(&b, &tex->instr);
tmp = &tex->dest.ssa;
nir_variable *color =
nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
if (!is_integer && samples > 1) {
nir_tex_instr *tex_all_same = nir_tex_instr_create(b.shader, 1);
tex_all_same->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_all_same->op = nir_texop_samples_identical;
tex_all_same->src[0].src_type = nir_tex_src_coord;
tex_all_same->src[0].src = nir_src_for_ssa(img_coord);
tex_all_same->dest_type = nir_type_float;
tex_all_same->is_array = false;
tex_all_same->coord_components = 2;
tex_all_same->texture = nir_deref_var_create(tex_all_same, input_img);
tex_all_same->sampler = NULL;
nir_ssa_dest_init(&tex_all_same->instr, &tex_all_same->dest, 1, 32, "tex");
nir_builder_instr_insert(&b, &tex_all_same->instr);
nir_ssa_def *all_same = nir_ine(&b, &tex_all_same->dest.ssa, nir_imm_int(&b, 0));
nir_if *if_stmt = nir_if_create(b.shader);
if_stmt->condition = nir_src_for_ssa(all_same);
nir_cf_node_insert(b.cursor, &if_stmt->cf_node);
b.cursor = nir_after_cf_list(&if_stmt->then_list);
for (int i = 1; i < samples; i++) {
nir_tex_instr *tex_add = nir_tex_instr_create(b.shader, 2);
tex_add->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_add->op = nir_texop_txf_ms;
tex_add->src[0].src_type = nir_tex_src_coord;
tex_add->src[0].src = nir_src_for_ssa(img_coord);
tex_add->src[1].src_type = nir_tex_src_ms_index;
tex_add->src[1].src = nir_src_for_ssa(nir_imm_int(&b, i));
tex_add->dest_type = nir_type_float;
tex_add->is_array = false;
tex_add->coord_components = 2;
tex_add->texture = nir_deref_var_create(tex_add, input_img);
tex_add->sampler = NULL;
nir_ssa_dest_init(&tex_add->instr, &tex_add->dest, 4, 32, "tex");
nir_builder_instr_insert(&b, &tex_add->instr);
tmp = nir_fadd(&b, tmp, &tex_add->dest.ssa);
}
tmp = nir_fdiv(&b, tmp, nir_imm_float(&b, samples));
nir_store_var(&b, color, tmp, 0xf);
b.cursor = nir_after_cf_list(&if_stmt->else_list);
outer_if = if_stmt;
}
nir_store_var(&b, color, &tex->dest.ssa, 0xf);
if (outer_if)
b.cursor = nir_after_cf_node(&outer_if->cf_node);
nir_ssa_def *newv = nir_load_var(&b, color);
nir_ssa_def *coord = nir_iadd(&b, global_id, &dst_offset->dest.ssa);
nir_intrinsic_instr *store = nir_intrinsic_instr_create(b.shader, nir_intrinsic_image_store);
store->src[0] = nir_src_for_ssa(coord);
store->src[1] = nir_src_for_ssa(nir_ssa_undef(&b, 1, 32));
store->src[2] = nir_src_for_ssa(outval);
store->src[2] = nir_src_for_ssa(newv);
store->variables[0] = nir_deref_var_create(store, output_img);
nir_builder_instr_insert(&b, &store->instr);
return b.shader;
@@ -164,13 +230,12 @@ static VkResult
create_resolve_pipeline(struct radv_device *device,
int samples,
bool is_integer,
bool is_srgb,
VkPipeline *pipeline)
{
VkResult result;
struct radv_shader_module cs = { .nir = NULL };
cs.nir = build_resolve_compute_shader(device, is_integer, is_srgb, samples);
cs.nir = build_resolve_compute_shader(device, is_integer, samples);
/* compute shader */
@@ -217,15 +282,12 @@ radv_device_init_meta_resolve_compute_state(struct radv_device *device)
for (uint32_t i = 0; i < MAX_SAMPLES_LOG2; ++i) {
uint32_t samples = 1 << i;
res = create_resolve_pipeline(device, samples, false, false,
res = create_resolve_pipeline(device, samples, false,
&state->resolve_compute.rc[i].pipeline);
res = create_resolve_pipeline(device, samples, true, false,
res = create_resolve_pipeline(device, samples, true,
&state->resolve_compute.rc[i].i_pipeline);
res = create_resolve_pipeline(device, samples, false, true,
&state->resolve_compute.rc[i].srgb_pipeline);
}
return res;
@@ -243,10 +305,6 @@ radv_device_finish_meta_resolve_compute_state(struct radv_device *device)
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_compute.rc[i].i_pipeline,
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_compute.rc[i].srgb_pipeline,
&state->alloc);
}
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
@@ -257,78 +315,6 @@ radv_device_finish_meta_resolve_compute_state(struct radv_device *device)
&state->alloc);
}
static void
emit_resolve(struct radv_cmd_buffer *cmd_buffer,
struct radv_image_view *src_iview,
struct radv_image_view *dest_iview,
const VkOffset2D *src_offset,
const VkOffset2D *dest_offset,
const VkExtent2D *resolve_extent)
{
struct radv_device *device = cmd_buffer->device;
const uint32_t samples = src_iview->image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1;
radv_meta_push_descriptor_set(cmd_buffer,
VK_PIPELINE_BIND_POINT_COMPUTE,
device->meta_state.resolve_compute.p_layout,
0, /* set */
2, /* descriptorWriteCount */
(VkWriteDescriptorSet[]) {
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 0,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(src_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL },
}
},
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 1,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(dest_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
}
});
VkPipeline pipeline;
if (vk_format_is_int(src_iview->image->vk_format))
pipeline = device->meta_state.resolve_compute.rc[samples_log2].i_pipeline;
else if (vk_format_is_srgb(src_iview->image->vk_format))
pipeline = device->meta_state.resolve_compute.rc[samples_log2].srgb_pipeline;
else
pipeline = device->meta_state.resolve_compute.rc[samples_log2].pipeline;
if (cmd_buffer->state.compute_pipeline != radv_pipeline_from_handle(pipeline)) {
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
}
unsigned push_constants[4] = {
src_offset->x,
src_offset->y,
dest_offset->x,
dest_offset->y,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.resolve_compute.p_layout,
VK_SHADER_STAGE_COMPUTE_BIT, 0, 16,
push_constants);
radv_unaligned_dispatch(cmd_buffer, resolve_extent->width, resolve_extent->height, 1);
}
void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image,
VkImageLayout src_image_layout,
@@ -337,7 +323,10 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
uint32_t region_count,
const VkImageResolve *regions)
{
struct radv_device *device = cmd_buffer->device;
struct radv_meta_saved_compute_state saved_state;
const uint32_t samples = src_image->samples;
const uint32_t samples_log2 = ffs(samples) - 1;
for (uint32_t r = 0; r < region_count; ++r) {
const VkImageResolve *region = &regions[r];
@@ -394,7 +383,8 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = src_base_layer + layer,
.layerCount = 1,
},
});
},
cmd_buffer, VK_IMAGE_USAGE_SAMPLED_BIT);
struct radv_image_view dest_iview;
radv_image_view_init(&dest_iview, cmd_buffer->device,
@@ -410,108 +400,68 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = dest_base_layer + layer,
.layerCount = 1,
},
});
},
cmd_buffer, VK_IMAGE_USAGE_STORAGE_BIT);
emit_resolve(cmd_buffer,
&src_iview,
&dest_iview,
&(VkOffset2D) {srcOffset.x, srcOffset.y },
&(VkOffset2D) {dstOffset.x, dstOffset.y },
&(VkExtent2D) {extent.width, extent.height });
radv_meta_push_descriptor_set(cmd_buffer,
VK_PIPELINE_BIND_POINT_COMPUTE,
device->meta_state.resolve_compute.p_layout,
0, /* set */
2, /* descriptorWriteCount */
(VkWriteDescriptorSet[]) {
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 0,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(&src_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
},
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 1,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(&dest_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
}
});
VkPipeline pipeline;
if (vk_format_is_int(src_image->vk_format))
pipeline = device->meta_state.resolve_compute.rc[samples_log2].i_pipeline;
else
pipeline = device->meta_state.resolve_compute.rc[samples_log2].pipeline;
if (cmd_buffer->state.compute_pipeline != radv_pipeline_from_handle(pipeline)) {
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
}
unsigned push_constants[4] = {
srcOffset.x,
srcOffset.y,
dstOffset.x,
dstOffset.y,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.resolve_compute.p_layout,
VK_SHADER_STAGE_COMPUTE_BIT, 0, 16,
push_constants);
radv_unaligned_dispatch(cmd_buffer, extent.width, extent.height, 1);
}
}
radv_meta_restore_compute(&saved_state, cmd_buffer, 16);
}
/**
* Emit any needed resolves for the current subpass.
*/
void
radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer *cmd_buffer)
{
struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radv_meta_saved_compute_state saved_state;
/* FINISHME(perf): Skip clears for resolve attachments.
*
* From the Vulkan 1.0 spec:
*
* If the first use of an attachment in a render pass is as a resolve
* attachment, then the loadOp is effectively ignored as the resolve is
* guaranteed to overwrite all pixels in the render area.
*/
if (!subpass->has_resolve)
return;
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
if (dst_img->surface.dcc_size) {
radv_initialize_dcc(cmd_buffer, dst_img, 0xffffffff);
cmd_buffer->state.attachments[dest_att.attachment].current_layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
}
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = 0;
range.levelCount = 1;
range.baseArrayLayer = 0;
range.layerCount = 1;
radv_fast_clear_flush_image_inplace(cmd_buffer, src_iview->image, &range);
}
radv_meta_save_compute(&saved_state, cmd_buffer, 16);
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
struct radv_image_view *dst_iview = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_subpass resolve_subpass = {
.color_count = 1,
.color_attachments = (VkAttachmentReference[]) { dest_att },
.depth_stencil_attachment = { .attachment = VK_ATTACHMENT_UNUSED },
};
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
/* Subpass resolves must respect the render area. We can ignore the
* render area here because vkCmdBeginRenderPass set the render area
* with 3DSTATE_DRAWING_RECTANGLE.
*
* XXX(chadv): Does the hardware really respect
* 3DSTATE_DRAWING_RECTANGLE when draing a 3DPRIM_RECTLIST?
*/
emit_resolve(cmd_buffer,
src_iview,
dst_iview,
&(VkOffset2D) { 0, 0 },
&(VkOffset2D) { 0, 0 },
&(VkExtent2D) { fb->width, fb->height });
}
radv_meta_restore_compute(&saved_state, cmd_buffer, 16);
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = 0;
range.levelCount = 1;
range.baseArrayLayer = 0;
range.layerCount = 1;
radv_fast_clear_flush_image_inplace(cmd_buffer, dst_img, &range);
}
}

View File

@@ -1,666 +0,0 @@
/*
* Copyright © 2016 Dave Airlie
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include <assert.h>
#include <stdbool.h>
#include "radv_meta.h"
#include "radv_private.h"
#include "nir/nir_builder.h"
#include "sid.h"
#include "vk_format.h"
static nir_shader *
build_nir_vertex_shader(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_resolve_vs");
nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "gl_Position");
pos_out->data.location = VARYING_SLOT_POS;
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
nir_store_var(&b, pos_out, outvec, 0xf);
return b.shader;
}
static nir_shader *
build_resolve_fragment_shader(struct radv_device *dev, bool is_integer, bool is_srgb, int samples)
{
nir_builder b;
char name[64];
const struct glsl_type *vec2 = glsl_vector_type(GLSL_TYPE_FLOAT, 2);
const struct glsl_type *vec4 = glsl_vec4_type();
const struct glsl_type *sampler_type = glsl_sampler_type(GLSL_SAMPLER_DIM_MS,
false,
false,
GLSL_TYPE_FLOAT);
snprintf(name, 64, "meta_resolve_fs-%d-%s", samples, is_integer ? "int" : (is_srgb ? "srgb" : "float"));
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_strdup(b.shader, name);
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
sampler_type, "s_tex");
input_img->data.descriptor_set = 0;
input_img->data.binding = 0;
nir_variable *fs_pos_in = nir_variable_create(b.shader, nir_var_shader_in, vec2, "fs_pos_in");
fs_pos_in->data.location = VARYING_SLOT_POS;
nir_variable *color_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "f_color");
color_out->data.location = FRAG_RESULT_DATA0;
nir_ssa_def *pos_in = nir_load_var(&b, fs_pos_in);
nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(src_offset, 0);
nir_intrinsic_set_range(src_offset, 8);
src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
src_offset->num_components = 2;
nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset");
nir_builder_instr_insert(&b, &src_offset->instr);
nir_ssa_def *pos_int = nir_f2i32(&b, pos_in);
nir_ssa_def *img_coord = nir_channels(&b, nir_iadd(&b, pos_int, &src_offset->dest.ssa), 0x3);
nir_variable *color = nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
radv_meta_build_resolve_shader_core(&b, is_integer, is_srgb,samples,
input_img, color, img_coord);
nir_ssa_def *outval = nir_load_var(&b, color);
nir_store_var(&b, color_out, outval, 0xf);
return b.shader;
}
static VkResult
create_layout(struct radv_device *device)
{
VkResult result;
/*
* one descriptors for the image being sampled
*/
VkDescriptorSetLayoutCreateInfo ds_create_info = {
.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
.flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR,
.bindingCount = 1,
.pBindings = (VkDescriptorSetLayoutBinding[]) {
{
.binding = 0,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.descriptorCount = 1,
.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT,
.pImmutableSamplers = NULL
},
}
};
result = radv_CreateDescriptorSetLayout(radv_device_to_handle(device),
&ds_create_info,
&device->meta_state.alloc,
&device->meta_state.resolve_fragment.ds_layout);
if (result != VK_SUCCESS)
goto fail;
VkPipelineLayoutCreateInfo pl_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1,
.pSetLayouts = &device->meta_state.resolve_fragment.ds_layout,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &(VkPushConstantRange){VK_SHADER_STAGE_FRAGMENT_BIT, 0, 8},
};
result = radv_CreatePipelineLayout(radv_device_to_handle(device),
&pl_create_info,
&device->meta_state.alloc,
&device->meta_state.resolve_fragment.p_layout);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
fail:
return result;
}
static const VkPipelineVertexInputStateCreateInfo normal_vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
};
static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,
VK_FORMAT_R16G16B16A16_SINT,
VK_FORMAT_R32_SFLOAT,
VK_FORMAT_R32G32_SFLOAT,
VK_FORMAT_R32G32B32A32_SFLOAT
};
static VkResult
create_resolve_pipeline(struct radv_device *device,
int samples_log2,
VkFormat format)
{
VkResult result;
bool is_integer = false, is_srgb = false;
uint32_t samples = 1 << samples_log2;
unsigned fs_key = radv_format_meta_fs_key(format);
const VkPipelineVertexInputStateCreateInfo *vi_create_info;
vi_create_info = &normal_vi_create_info;
if (vk_format_is_int(format))
is_integer = true;
else if (vk_format_is_srgb(format))
is_srgb = true;
struct radv_shader_module fs = { .nir = NULL };
fs.nir = build_resolve_fragment_shader(device, is_integer, is_srgb, samples);
struct radv_shader_module vs = {
.nir = build_nir_vertex_shader(),
};
VkRenderPass *rp = is_srgb ?
&device->meta_state.resolve_fragment.rc[samples_log2].srgb_render_pass :
&device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
assert(!*rp);
VkPipeline *pipeline = is_srgb ?
&device->meta_state.resolve_fragment.rc[samples_log2].srgb_pipeline :
&device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
assert(!*pipeline);
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
{
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.stage = VK_SHADER_STAGE_VERTEX_BIT,
.module = radv_shader_module_to_handle(&vs),
.pName = "main",
.pSpecializationInfo = NULL
}, {
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.stage = VK_SHADER_STAGE_FRAGMENT_BIT,
.module = radv_shader_module_to_handle(&fs),
.pName = "main",
.pSpecializationInfo = NULL
},
};
result = radv_CreateRenderPass(radv_device_to_handle(device),
&(VkRenderPassCreateInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = &(VkAttachmentDescription) {
.format = format,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL,
},
.subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) {
.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS,
.inputAttachmentCount = 0,
.colorAttachmentCount = 1,
.pColorAttachments = &(VkAttachmentReference) {
.attachment = 0,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = VK_ATTACHMENT_UNUSED,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
},
.dependencyCount = 0,
}, &device->meta_state.alloc, rp);
const VkGraphicsPipelineCreateInfo vk_pipeline_info = {
.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,
.stageCount = ARRAY_SIZE(pipeline_shader_stages),
.pStages = pipeline_shader_stages,
.pVertexInputState = vi_create_info,
.pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP,
.primitiveRestartEnable = false,
},
.pViewportState = &(VkPipelineViewportStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO,
.viewportCount = 1,
.scissorCount = 1,
},
.pRasterizationState = &(VkPipelineRasterizationStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO,
.rasterizerDiscardEnable = false,
.polygonMode = VK_POLYGON_MODE_FILL,
.cullMode = VK_CULL_MODE_NONE,
.frontFace = VK_FRONT_FACE_COUNTER_CLOCKWISE
},
.pMultisampleState = &(VkPipelineMultisampleStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,
.rasterizationSamples = 1,
.sampleShadingEnable = false,
.pSampleMask = (VkSampleMask[]) { UINT32_MAX },
},
.pColorBlendState = &(VkPipelineColorBlendStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = (VkPipelineColorBlendAttachmentState []) {
{ .colorWriteMask =
VK_COLOR_COMPONENT_A_BIT |
VK_COLOR_COMPONENT_R_BIT |
VK_COLOR_COMPONENT_G_BIT |
VK_COLOR_COMPONENT_B_BIT },
}
},
.pDynamicState = &(VkPipelineDynamicStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO,
.dynamicStateCount = 9,
.pDynamicStates = (VkDynamicState[]) {
VK_DYNAMIC_STATE_VIEWPORT,
VK_DYNAMIC_STATE_SCISSOR,
VK_DYNAMIC_STATE_LINE_WIDTH,
VK_DYNAMIC_STATE_DEPTH_BIAS,
VK_DYNAMIC_STATE_BLEND_CONSTANTS,
VK_DYNAMIC_STATE_DEPTH_BOUNDS,
VK_DYNAMIC_STATE_STENCIL_COMPARE_MASK,
VK_DYNAMIC_STATE_STENCIL_WRITE_MASK,
VK_DYNAMIC_STATE_STENCIL_REFERENCE,
},
},
.flags = 0,
.layout = device->meta_state.resolve_fragment.p_layout,
.renderPass = *rp,
.subpass = 0,
};
const struct radv_graphics_pipeline_create_info radv_pipeline_info = {
.use_rectlist = true
};
result = radv_graphics_pipeline_create(radv_device_to_handle(device),
radv_pipeline_cache_to_handle(&device->meta_state.cache),
&vk_pipeline_info, &radv_pipeline_info,
&device->meta_state.alloc,
pipeline);
ralloc_free(vs.nir);
ralloc_free(fs.nir);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
fail:
ralloc_free(vs.nir);
ralloc_free(fs.nir);
return result;
}
VkResult
radv_device_init_meta_resolve_fragment_state(struct radv_device *device)
{
struct radv_meta_state *state = &device->meta_state;
VkResult res;
memset(&state->resolve_fragment, 0, sizeof(state->resolve_fragment));
res = create_layout(device);
if (res != VK_SUCCESS)
return res;
for (uint32_t i = 0; i < MAX_SAMPLES_LOG2; ++i) {
for (unsigned j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) {
res = create_resolve_pipeline(device, i, pipeline_formats[j]);
}
res = create_resolve_pipeline(device, i, VK_FORMAT_R8G8B8A8_SRGB);
}
return res;
}
void
radv_device_finish_meta_resolve_fragment_state(struct radv_device *device)
{
struct radv_meta_state *state = &device->meta_state;
for (uint32_t i = 0; i < MAX_SAMPLES_LOG2; ++i) {
for (unsigned j = 0; j < NUM_META_FS_KEYS; ++j) {
radv_DestroyRenderPass(radv_device_to_handle(device),
state->resolve_fragment.rc[i].render_pass[j],
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_fragment.rc[i].pipeline[j],
&state->alloc);
}
radv_DestroyRenderPass(radv_device_to_handle(device),
state->resolve_fragment.rc[i].srgb_render_pass,
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_fragment.rc[i].srgb_pipeline,
&state->alloc);
}
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
state->resolve_fragment.ds_layout,
&state->alloc);
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->resolve_fragment.p_layout,
&state->alloc);
}
static void
emit_resolve(struct radv_cmd_buffer *cmd_buffer,
struct radv_image_view *src_iview,
struct radv_image_view *dest_iview,
const VkOffset2D *src_offset,
const VkOffset2D *dest_offset,
const VkExtent2D *resolve_extent)
{
struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
const uint32_t samples = src_iview->image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1;
radv_meta_push_descriptor_set(cmd_buffer,
VK_PIPELINE_BIND_POINT_GRAPHICS,
cmd_buffer->device->meta_state.resolve_fragment.p_layout,
0, /* set */
1, /* descriptorWriteCount */
(VkWriteDescriptorSet[]) {
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 0,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(src_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
},
});
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
unsigned push_constants[2] = {
src_offset->x,
src_offset->y,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.resolve_fragment.p_layout,
VK_SHADER_STAGE_FRAGMENT_BIT, 0, 8,
push_constants);
unsigned fs_key = radv_format_meta_fs_key(dest_iview->vk_format);
VkPipeline pipeline_h = vk_format_is_srgb(dest_iview->vk_format) ?
device->meta_state.resolve_fragment.rc[samples_log2].srgb_pipeline :
device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
radv_CmdBindPipeline(cmd_buffer_h, VK_PIPELINE_BIND_POINT_GRAPHICS,
pipeline_h);
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) {
.x = dest_offset->x,
.y = dest_offset->y,
.width = resolve_extent->width,
.height = resolve_extent->height,
.minDepth = 0.0f,
.maxDepth = 1.0f
});
radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkRect2D) {
.offset = *dest_offset,
.extent = *resolve_extent,
});
radv_CmdDraw(cmd_buffer_h, 3, 1, 0, 0);
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
}
void radv_meta_resolve_fragment_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image,
VkImageLayout src_image_layout,
struct radv_image *dest_image,
VkImageLayout dest_image_layout,
uint32_t region_count,
const VkImageResolve *regions)
{
struct radv_device *device = cmd_buffer->device;
struct radv_meta_saved_state saved_state;
const uint32_t samples = src_image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1;
unsigned fs_key = radv_format_meta_fs_key(dest_image->vk_format);
VkRenderPass rp;
for (uint32_t r = 0; r < region_count; ++r) {
const VkImageResolve *region = &regions[r];
const uint32_t src_base_layer =
radv_meta_get_iview_layer(src_image, &region->srcSubresource,
&region->srcOffset);
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = region->srcSubresource.mipLevel;
range.levelCount = 1;
range.baseArrayLayer = src_base_layer;
range.layerCount = region->srcSubresource.layerCount;
radv_fast_clear_flush_image_inplace(cmd_buffer, src_image, &range);
}
rp = vk_format_is_srgb(dest_image->vk_format) ?
device->meta_state.resolve_fragment.rc[samples_log2].srgb_render_pass :
device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t r = 0; r < region_count; ++r) {
const VkImageResolve *region = &regions[r];
assert(region->srcSubresource.aspectMask == VK_IMAGE_ASPECT_COLOR_BIT);
assert(region->dstSubresource.aspectMask == VK_IMAGE_ASPECT_COLOR_BIT);
assert(region->srcSubresource.layerCount == region->dstSubresource.layerCount);
const uint32_t src_base_layer =
radv_meta_get_iview_layer(src_image, &region->srcSubresource,
&region->srcOffset);
const uint32_t dest_base_layer =
radv_meta_get_iview_layer(dest_image, &region->dstSubresource,
&region->dstOffset);
const struct VkExtent3D extent =
radv_sanitize_image_extent(src_image->type, region->extent);
const struct VkOffset3D srcOffset =
radv_sanitize_image_offset(src_image->type, region->srcOffset);
const struct VkOffset3D dstOffset =
radv_sanitize_image_offset(dest_image->type, region->dstOffset);
for (uint32_t layer = 0; layer < region->srcSubresource.layerCount;
++layer) {
struct radv_image_view src_iview;
radv_image_view_init(&src_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(src_image),
.viewType = radv_meta_get_view_type(src_image),
.format = src_image->vk_format,
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = region->srcSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = src_base_layer + layer,
.layerCount = 1,
},
});
struct radv_image_view dest_iview;
radv_image_view_init(&dest_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(dest_image),
.viewType = radv_meta_get_view_type(dest_image),
.format = dest_image->vk_format,
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = region->dstSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = dest_base_layer + layer,
.layerCount = 1,
},
});
VkFramebuffer fb;
radv_CreateFramebuffer(radv_device_to_handle(cmd_buffer->device),
&(VkFramebufferCreateInfo) {
.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = (VkImageView[]) {
radv_image_view_to_handle(&dest_iview),
},
.width = extent.width,
.height = extent.height,
.layers = 1
}, &cmd_buffer->pool->alloc, &fb);
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = rp,
.framebuffer = fb,
.renderArea = {
.offset = { dstOffset.x, dstOffset.y, },
.extent = { extent.width, extent.height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
emit_resolve(cmd_buffer,
&src_iview,
&dest_iview,
&(VkOffset2D) { srcOffset.x, srcOffset.y },
&(VkOffset2D) { dstOffset.x, dstOffset.y },
&(VkExtent2D) { extent.width, extent.height });
radv_CmdEndRenderPass(radv_cmd_buffer_to_handle(cmd_buffer));
radv_DestroyFramebuffer(radv_device_to_handle(cmd_buffer->device), fb, &cmd_buffer->pool->alloc);
}
}
radv_meta_restore(&saved_state, cmd_buffer);
}
/**
* Emit any needed resolves for the current subpass.
*/
void
radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer)
{
struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radv_meta_saved_state saved_state;
/* FINISHME(perf): Skip clears for resolve attachments.
*
* From the Vulkan 1.0 spec:
*
* If the first use of an attachment in a render pass is as a resolve
* attachment, then the loadOp is effectively ignored as the resolve is
* guaranteed to overwrite all pixels in the render area.
*/
if (!subpass->has_resolve)
return;
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image_view *dest_iview = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment;
struct radv_image *dst_img = dest_iview->image;
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
if (dst_img->surface.dcc_size) {
radv_initialize_dcc(cmd_buffer, dst_img, 0xffffffff);
cmd_buffer->state.attachments[dest_att.attachment].current_layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
}
{
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = 0;
range.levelCount = 1;
range.baseArrayLayer = 0;
range.layerCount = 1;
radv_fast_clear_flush_image_inplace(cmd_buffer, src_iview->image, &range);
}
struct radv_subpass resolve_subpass = {
.color_count = 1,
.color_attachments = (VkAttachmentReference[]) { dest_att },
.depth_stencil_attachment = { .attachment = VK_ATTACHMENT_UNUSED },
};
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
/* Subpass resolves must respect the render area. We can ignore the
* render area here because vkCmdBeginRenderPass set the render area
* with 3DSTATE_DRAWING_RECTANGLE.
*
* XXX(chadv): Does the hardware really respect
* 3DSTATE_DRAWING_RECTANGLE when draing a 3DPRIM_RECTLIST?
*/
emit_resolve(cmd_buffer,
src_iview,
dest_iview,
&(VkOffset2D) { 0, 0 },
&(VkOffset2D) { 0, 0 },
&(VkExtent2D) { fb->width, fb->height });
}
cmd_buffer->state.subpass = subpass;
radv_meta_restore(&saved_state, cmd_buffer);
}

View File

@@ -26,7 +26,6 @@
*/
#include "util/mesa-sha1.h"
#include "util/u_atomic.h"
#include "radv_private.h"
#include "nir/nir.h"
#include "nir/nir_builder.h"
@@ -36,14 +35,12 @@
#include <llvm-c/TargetMachine.h>
#include "sid.h"
#include "gfx9d.h"
#include "r600d_common.h"
#include "ac_binary.h"
#include "ac_llvm_util.h"
#include "ac_nir_to_llvm.h"
#include "vk_format.h"
#include "util/debug.h"
#include "ac_exp_param.h"
void radv_shader_variant_destroy(struct radv_device *device,
struct radv_shader_variant *variant);
@@ -53,8 +50,6 @@ static const struct nir_shader_compiler_options nir_options = {
.lower_scmp = true,
.lower_flrp32 = true,
.lower_fsat = true,
.lower_fdiv = true,
.lower_sub = true,
.lower_pack_snorm_2x16 = true,
.lower_pack_snorm_4x8 = true,
.lower_pack_unorm_2x16 = true,
@@ -65,7 +60,6 @@ static const struct nir_shader_compiler_options nir_options = {
.lower_unpack_unorm_4x8 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
.max_unroll_iterations = 32
};
VkResult radv_CreateShaderModule(
@@ -157,12 +151,6 @@ radv_optimize_nir(struct nir_shader *shader)
NIR_PASS(progress, shader, nir_copy_prop);
NIR_PASS(progress, shader, nir_opt_remove_phis);
NIR_PASS(progress, shader, nir_opt_dce);
if (nir_opt_trivial_continues(shader)) {
progress = true;
NIR_PASS(progress, shader, nir_copy_prop);
NIR_PASS(progress, shader, nir_opt_dce);
}
NIR_PASS(progress, shader, nir_opt_if);
NIR_PASS(progress, shader, nir_opt_dead_cf);
NIR_PASS(progress, shader, nir_opt_cse);
NIR_PASS(progress, shader, nir_opt_peephole_select, 8);
@@ -170,9 +158,6 @@ radv_optimize_nir(struct nir_shader *shader)
NIR_PASS(progress, shader, nir_opt_constant_folding);
NIR_PASS(progress, shader, nir_opt_undef);
NIR_PASS(progress, shader, nir_opt_conditional_discard);
if (shader->options->max_unroll_iterations) {
NIR_PASS(progress, shader, nir_opt_loop_unroll, 0);
}
} while (progress);
}
@@ -266,7 +251,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
}
/* Vulkan uses the separate-shader linking model */
nir->info.separate_shader = true;
nir->info->separate_shader = true;
nir_shader_gather_info(nir, entry_point->impl);
@@ -375,7 +360,7 @@ static void radv_dump_pipeline_stats(struct radv_device *device, struct radv_pip
void radv_shader_variant_destroy(struct radv_device *device,
struct radv_shader_variant *variant)
{
if (!p_atomic_dec_zero(&variant->ref_count))
if (__sync_fetch_and_sub(&variant->ref_count, 1) != 1)
return;
device->ws->buffer_destroy(variant->bo);
@@ -541,8 +526,8 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS);
if (module->nir)
_mesa_sha1_compute(module->nir->info.name,
strlen(module->nir->info.name),
_mesa_sha1_compute(module->nir->info->name,
strlen(module->nir->info->name),
module->sha1);
radv_hash_shader(sha1, module, entrypoint, spec_info, layout, key, 0);
@@ -606,14 +591,11 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
}
static union ac_shader_variant_key
radv_compute_tes_key(bool as_es, bool export_prim_id)
radv_compute_tes_key(bool as_es)
{
union ac_shader_variant_key key;
memset(&key, 0, sizeof(key));
key.tes.as_es = as_es;
/* export prim id only happens when no geom shader */
if (!as_es)
key.tes.export_prim_id = export_prim_id;
return key;
}
@@ -644,15 +626,13 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
nir_shader *tes_nir, *tcs_nir;
void *tes_code = NULL, *tcs_code = NULL;
unsigned tes_code_size = 0, tcs_code_size = 0;
union ac_shader_variant_key tes_key;
union ac_shader_variant_key tes_key = radv_compute_tes_key(radv_pipeline_has_gs(pipeline));
union ac_shader_variant_key tcs_key;
bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS);
tes_key = radv_compute_tes_key(radv_pipeline_has_gs(pipeline),
pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.prim_id_input);
if (tes_module->nir)
_mesa_sha1_compute(tes_module->nir->info.name,
strlen(tes_module->nir->info.name),
_mesa_sha1_compute(tes_module->nir->info->name,
strlen(tes_module->nir->info->name),
tes_module->sha1);
radv_hash_shader(tes_sha1, tes_module, tes_entrypoint, tes_spec_info, layout, &tes_key, 0);
@@ -664,8 +644,8 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
tcs_key = radv_compute_tcs_key(tes_variant->info.tes.primitive_mode, input_vertices);
if (tcs_module->nir)
_mesa_sha1_compute(tcs_module->nir->info.name,
strlen(tcs_module->nir->info.name),
_mesa_sha1_compute(tcs_module->nir->info->name,
strlen(tcs_module->nir->info->name),
tcs_module->sha1);
radv_hash_shader(tcs_sha1, tcs_module, tcs_entrypoint, tcs_spec_info, layout, &tcs_key, 0);
@@ -694,16 +674,16 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
return;
nir_lower_tes_patch_vertices(tes_nir,
tcs_nir->info.tess.tcs_vertices_out);
tcs_nir->info->tess.tcs_vertices_out);
tes_variant = radv_shader_variant_create(pipeline->device, tes_nir,
layout, &tes_key, &tes_code,
&tes_code_size, dump);
tcs_key = radv_compute_tcs_key(tes_nir->info.tess.primitive_mode, input_vertices);
tcs_key = radv_compute_tcs_key(tes_nir->info->tess.primitive_mode, input_vertices);
if (tcs_module->nir)
_mesa_sha1_compute(tcs_module->nir->info.name,
strlen(tcs_module->nir->info.name),
_mesa_sha1_compute(tcs_module->nir->info->name,
strlen(tcs_module->nir->info->name),
tcs_module->sha1);
radv_hash_shader(tcs_sha1, tcs_module, tcs_entrypoint, tcs_spec_info, layout, &tcs_key, 0);
@@ -1338,12 +1318,11 @@ radv_pipeline_init_multisample_state(struct radv_pipeline *pipeline,
S_028A4C_MULTI_SHADER_ENGINE_PRIM_DISCARD_ENABLE(1) |
EG_S_028A4C_FORCE_EOV_CNTDWN_ENABLE(1) |
EG_S_028A4C_FORCE_EOV_REZ_ENABLE(1);
ms->pa_sc_mode_cntl_0 = S_028A48_ALTERNATE_RBS_PER_TILE(pipeline->device->physical_device->rad_info.chip_class >= GFX9);
if (ms->num_samples > 1) {
unsigned log_samples = util_logbase2(ms->num_samples);
unsigned log_ps_iter_samples = util_logbase2(util_next_power_of_two(ps_iter_samples));
ms->pa_sc_mode_cntl_0 |= S_028A48_MSAA_ENABLE(1);
ms->pa_sc_mode_cntl_0 = S_028A48_MSAA_ENABLE(1);
ms->pa_sc_line_cntl |= S_028BDC_EXPAND_LINE_WIDTH(1); /* CM_R_028BDC_PA_SC_LINE_CNTL */
ms->db_eqaa |= S_028804_MAX_ANCHOR_SAMPLES(log_samples) |
S_028804_PS_ITER_SAMPLES(log_ps_iter_samples) |
@@ -1612,7 +1591,7 @@ radv_pipeline_init_dynamic_state(struct radv_pipeline *pipeline,
}
static union ac_shader_variant_key
radv_compute_vs_key(const VkGraphicsPipelineCreateInfo *pCreateInfo, bool as_es, bool as_ls, bool export_prim_id)
radv_compute_vs_key(const VkGraphicsPipelineCreateInfo *pCreateInfo, bool as_es, bool as_ls)
{
union ac_shader_variant_key key;
const VkPipelineVertexInputStateCreateInfo *input_state =
@@ -1622,7 +1601,6 @@ radv_compute_vs_key(const VkGraphicsPipelineCreateInfo *pCreateInfo, bool as_es,
key.vs.instance_rate_inputs = 0;
key.vs.as_es = as_es;
key.vs.as_ls = as_ls;
key.vs.export_prim_id = export_prim_id;
for (unsigned i = 0; i < input_state->vertexAttributeDescriptionCount; ++i) {
unsigned binding;
@@ -1864,24 +1842,6 @@ static uint32_t si_vgt_gs_mode(struct radv_shader_variant *gs)
S_028A40_GS_WRITE_OPTIMIZE(1);
}
static void calculate_vgt_gs_mode(struct radv_pipeline *pipeline)
{
struct radv_shader_variant *vs;
vs = radv_pipeline_has_gs(pipeline) ? pipeline->gs_copy_shader : (radv_pipeline_has_tess(pipeline) ? pipeline->shaders[MESA_SHADER_TESS_EVAL] : pipeline->shaders[MESA_SHADER_VERTEX]);
struct ac_vs_output_info *outinfo = &vs->info.vs.outinfo;
pipeline->graphics.vgt_primitiveid_en = false;
pipeline->graphics.vgt_gs_mode = 0;
if (radv_pipeline_has_gs(pipeline)) {
pipeline->graphics.vgt_gs_mode = si_vgt_gs_mode(pipeline->shaders[MESA_SHADER_GEOMETRY]);
} else if (outinfo->export_prim_id) {
pipeline->graphics.vgt_gs_mode = S_028A40_MODE(V_028A40_GS_SCENARIO_A);
pipeline->graphics.vgt_primitiveid_en = true;
}
}
static void calculate_pa_cl_vs_out_cntl(struct radv_pipeline *pipeline)
{
struct radv_shader_variant *vs;
@@ -1909,25 +1869,6 @@ static void calculate_pa_cl_vs_out_cntl(struct radv_pipeline *pipeline)
clip_dist_mask;
}
static uint32_t offset_to_ps_input(uint32_t offset, bool flat_shade)
{
uint32_t ps_input_cntl;
if (offset <= AC_EXP_PARAM_OFFSET_31) {
ps_input_cntl = S_028644_OFFSET(offset);
if (flat_shade)
ps_input_cntl |= S_028644_FLAT_SHADE(1);
} else {
/* The input is a DEFAULT_VAL constant. */
assert(offset >= AC_EXP_PARAM_DEFAULT_VAL_0000 &&
offset <= AC_EXP_PARAM_DEFAULT_VAL_1111);
offset -= AC_EXP_PARAM_DEFAULT_VAL_0000;
ps_input_cntl = S_028644_OFFSET(0x20) |
S_028644_DEFAULT_VAL(offset);
}
return ps_input_cntl;
}
static void calculate_ps_inputs(struct radv_pipeline *pipeline)
{
struct radv_shader_variant *ps, *vs;
@@ -1939,23 +1880,6 @@ static void calculate_ps_inputs(struct radv_pipeline *pipeline)
outinfo = &vs->info.vs.outinfo;
unsigned ps_offset = 0;
if (ps->info.fs.prim_id_input) {
unsigned vs_offset = outinfo->vs_output_param_offset[VARYING_SLOT_PRIMITIVE_ID];
if (vs_offset != AC_EXP_PARAM_UNDEFINED) {
pipeline->graphics.ps_input_cntl[ps_offset] = offset_to_ps_input(vs_offset, true);
++ps_offset;
}
}
if (ps->info.fs.layer_input) {
unsigned vs_offset = outinfo->vs_output_param_offset[VARYING_SLOT_LAYER];
if (vs_offset != AC_EXP_PARAM_UNDEFINED) {
pipeline->graphics.ps_input_cntl[ps_offset] = offset_to_ps_input(vs_offset, true);
++ps_offset;
}
}
if (ps->info.fs.has_pcoord) {
unsigned val;
val = S_028644_PT_SPRITE_TEX(1) | S_028644_OFFSET(0x20);
@@ -1963,22 +1887,52 @@ static void calculate_ps_inputs(struct radv_pipeline *pipeline)
ps_offset++;
}
if (ps->info.fs.prim_id_input && (outinfo->prim_id_output != 0xffffffff)) {
unsigned vs_offset, flat_shade;
unsigned val;
vs_offset = outinfo->prim_id_output;
flat_shade = true;
val = S_028644_OFFSET(vs_offset) | S_028644_FLAT_SHADE(flat_shade);
pipeline->graphics.ps_input_cntl[ps_offset] = val;
++ps_offset;
}
if (ps->info.fs.layer_input && (outinfo->layer_output != 0xffffffff)) {
unsigned vs_offset, flat_shade;
unsigned val;
vs_offset = outinfo->layer_output;
flat_shade = true;
val = S_028644_OFFSET(vs_offset) | S_028644_FLAT_SHADE(flat_shade);
pipeline->graphics.ps_input_cntl[ps_offset] = val;
++ps_offset;
}
for (unsigned i = 0; i < 32 && (1u << i) <= ps->info.fs.input_mask; ++i) {
unsigned vs_offset;
bool flat_shade;
unsigned vs_offset, flat_shade;
unsigned val;
if (!(ps->info.fs.input_mask & (1u << i)))
continue;
vs_offset = outinfo->vs_output_param_offset[VARYING_SLOT_VAR0 + i];
if (vs_offset == AC_EXP_PARAM_UNDEFINED) {
if (!(outinfo->export_mask & (1u << i))) {
pipeline->graphics.ps_input_cntl[ps_offset] = S_028644_OFFSET(0x20);
++ps_offset;
continue;
}
vs_offset = util_bitcount(outinfo->export_mask & ((1u << i) - 1));
if (outinfo->prim_id_output != 0xffffffff) {
if (vs_offset >= outinfo->prim_id_output)
vs_offset++;
}
if (outinfo->layer_output != 0xffffffff) {
if (vs_offset >= outinfo->layer_output)
vs_offset++;
}
flat_shade = !!(ps->info.fs.flat_shaded_mask & (1u << ps_offset));
pipeline->graphics.ps_input_cntl[ps_offset] = offset_to_ps_input(vs_offset, flat_shade);
val = S_028644_OFFSET(vs_offset) | S_028644_FLAT_SHADE(flat_shade);
pipeline->graphics.ps_input_cntl[ps_offset] = val;
++ps_offset;
}
@@ -2013,10 +1967,62 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
radv_pipeline_init_blend_state(pipeline, pCreateInfo, extra);
if (modules[MESA_SHADER_VERTEX]) {
bool as_es = false;
bool as_ls = false;
if (modules[MESA_SHADER_TESS_CTRL])
as_ls = true;
else if (modules[MESA_SHADER_GEOMETRY])
as_es = true;
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, as_es, as_ls);
pipeline->shaders[MESA_SHADER_VERTEX] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_VERTEX],
pStages[MESA_SHADER_VERTEX]->pName,
MESA_SHADER_VERTEX,
pStages[MESA_SHADER_VERTEX]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_VERTEX);
}
if (modules[MESA_SHADER_GEOMETRY]) {
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, false, false);
pipeline->shaders[MESA_SHADER_GEOMETRY] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_GEOMETRY],
pStages[MESA_SHADER_GEOMETRY]->pName,
MESA_SHADER_GEOMETRY,
pStages[MESA_SHADER_GEOMETRY]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_GEOMETRY);
pipeline->graphics.vgt_gs_mode = si_vgt_gs_mode(pipeline->shaders[MESA_SHADER_GEOMETRY]);
} else
pipeline->graphics.vgt_gs_mode = 0;
if (modules[MESA_SHADER_TESS_EVAL]) {
assert(modules[MESA_SHADER_TESS_CTRL]);
radv_tess_pipeline_compile(pipeline,
cache,
modules[MESA_SHADER_TESS_CTRL],
modules[MESA_SHADER_TESS_EVAL],
pStages[MESA_SHADER_TESS_CTRL]->pName,
pStages[MESA_SHADER_TESS_EVAL]->pName,
pStages[MESA_SHADER_TESS_CTRL]->pSpecializationInfo,
pStages[MESA_SHADER_TESS_EVAL]->pSpecializationInfo,
pipeline->layout,
pCreateInfo->pTessellationState->patchControlPoints);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_TESS_EVAL) |
mesa_to_vk_shader_stage(MESA_SHADER_TESS_CTRL);
}
if (!modules[MESA_SHADER_FRAGMENT]) {
nir_builder fs_b;
nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL);
fs_b.shader->info.name = ralloc_strdup(fs_b.shader, "noop_fs");
fs_b.shader->info->name = ralloc_strdup(fs_b.shader, "noop_fs");
fs_m.nir = fs_b.shader;
modules[MESA_SHADER_FRAGMENT] = &fs_m;
}
@@ -2040,58 +2046,6 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
if (fs_m.nir)
ralloc_free(fs_m.nir);
if (modules[MESA_SHADER_VERTEX]) {
bool as_es = false;
bool as_ls = false;
bool export_prim_id = false;
if (modules[MESA_SHADER_TESS_CTRL])
as_ls = true;
else if (modules[MESA_SHADER_GEOMETRY])
as_es = true;
else if (pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.prim_id_input)
export_prim_id = true;
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, as_es, as_ls, export_prim_id);
pipeline->shaders[MESA_SHADER_VERTEX] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_VERTEX],
pStages[MESA_SHADER_VERTEX]->pName,
MESA_SHADER_VERTEX,
pStages[MESA_SHADER_VERTEX]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_VERTEX);
}
if (modules[MESA_SHADER_GEOMETRY]) {
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, false, false, false);
pipeline->shaders[MESA_SHADER_GEOMETRY] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_GEOMETRY],
pStages[MESA_SHADER_GEOMETRY]->pName,
MESA_SHADER_GEOMETRY,
pStages[MESA_SHADER_GEOMETRY]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_GEOMETRY);
}
if (modules[MESA_SHADER_TESS_EVAL]) {
assert(modules[MESA_SHADER_TESS_CTRL]);
radv_tess_pipeline_compile(pipeline,
cache,
modules[MESA_SHADER_TESS_CTRL],
modules[MESA_SHADER_TESS_EVAL],
pStages[MESA_SHADER_TESS_CTRL]->pName,
pStages[MESA_SHADER_TESS_EVAL]->pName,
pStages[MESA_SHADER_TESS_CTRL]->pSpecializationInfo,
pStages[MESA_SHADER_TESS_EVAL]->pSpecializationInfo,
pipeline->layout,
pCreateInfo->pTessellationState->patchControlPoints);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_TESS_EVAL) |
mesa_to_vk_shader_stage(MESA_SHADER_TESS_CTRL);
}
radv_pipeline_init_depth_stencil_state(pipeline, pCreateInfo, extra);
radv_pipeline_init_raster_state(pipeline, pCreateInfo);
radv_pipeline_init_multisample_state(pipeline, pCreateInfo);
@@ -2155,16 +2109,9 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
ps->info.fs.writes_z ? V_028710_SPI_SHADER_32_R :
V_028710_SPI_SHADER_ZERO;
calculate_vgt_gs_mode(pipeline);
calculate_pa_cl_vs_out_cntl(pipeline);
calculate_ps_inputs(pipeline);
for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
if (pipeline->shaders[i]) {
pipeline->need_indirect_descriptor_sets |= pipeline->shaders[i]->info.need_indirect_descriptor_sets;
}
}
uint32_t stages = 0;
if (radv_pipeline_has_tess(pipeline)) {
stages |= S_028B54_LS_EN(V_028B54_LS_STAGE_ON) |
@@ -2176,15 +2123,10 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER);
else
stages |= S_028B54_VS_EN(V_028B54_VS_STAGE_DS);
} else if (radv_pipeline_has_gs(pipeline))
stages |= S_028B54_ES_EN(V_028B54_ES_STAGE_REAL) |
S_028B54_GS_EN(1) |
S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER);
if (device->physical_device->rad_info.chip_class >= GFX9)
stages |= S_028B54_MAX_PRIMGRP_IN_WAVE(2);
pipeline->graphics.vgt_shader_stages_en = stages;
if (radv_pipeline_has_gs(pipeline))
@@ -2232,16 +2174,6 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
pipeline->binding_stride[desc->binding] = desc->stride;
}
struct ac_userdata_info *loc = radv_lookup_user_sgpr(pipeline, MESA_SHADER_VERTEX,
AC_UD_VS_BASE_VERTEX_START_INSTANCE);
if (loc->sgpr_idx != -1) {
pipeline->graphics.vtx_base_sgpr = radv_shader_stage_to_user_data_0(MESA_SHADER_VERTEX, radv_pipeline_has_gs(pipeline), radv_pipeline_has_tess(pipeline));
pipeline->graphics.vtx_base_sgpr += loc->sgpr_idx * 4;
if (pipeline->shaders[MESA_SHADER_VERTEX]->info.info.vs.needs_draw_id)
pipeline->graphics.vtx_emit_num = 3;
else
pipeline->graphics.vtx_emit_num = 2;
}
if (device->debug_flags & RADV_DEBUG_DUMP_SHADER_STATS) {
radv_dump_pipeline_stats(device, pipeline);
}
@@ -2338,7 +2270,6 @@ static VkResult radv_compute_pipeline_create(
pipeline->layout, NULL);
pipeline->need_indirect_descriptor_sets |= pipeline->shaders[MESA_SHADER_COMPUTE]->info.need_indirect_descriptor_sets;
result = radv_pipeline_scratch_init(device, pipeline);
if (result != VK_SUCCESS) {
radv_pipeline_destroy(device, pipeline, pAllocator);

View File

@@ -23,7 +23,6 @@
#include "util/mesa-sha1.h"
#include "util/debug.h"
#include "util/u_atomic.h"
#include "radv_private.h"
#include "ac_nir_to_llvm.h"
@@ -172,7 +171,6 @@ radv_create_shader_variant_from_pipeline_cache(struct radv_device *device,
variant->info = entry->variant_info;
variant->rsrc1 = entry->rsrc1;
variant->rsrc2 = entry->rsrc2;
variant->code_size = entry->code_size;
variant->ref_count = 1;
variant->bo = device->ws->buffer_create(device->ws, entry->code_size, 256,
@@ -185,7 +183,7 @@ radv_create_shader_variant_from_pipeline_cache(struct radv_device *device,
entry->variant = variant;
}
p_atomic_inc(&entry->variant->ref_count);
__sync_fetch_and_add(&entry->variant->ref_count, 1);
return entry->variant;
}
@@ -277,7 +275,7 @@ radv_pipeline_cache_insert_shader(struct radv_pipeline_cache *cache,
} else {
entry->variant = variant;
}
p_atomic_inc(&variant->ref_count);
__sync_fetch_and_add(&variant->ref_count, 1);
pthread_mutex_unlock(&cache->mutex);
return variant;
}
@@ -297,7 +295,7 @@ radv_pipeline_cache_insert_shader(struct radv_pipeline_cache *cache,
entry->rsrc2 = variant->rsrc2;
entry->code_size = code_size;
entry->variant = variant;
p_atomic_inc(&variant->ref_count);
__sync_fetch_and_add(&variant->ref_count, 1);
radv_pipeline_cache_add_entry(cache, entry);

View File

@@ -47,14 +47,12 @@
#include "compiler/shader_enums.h"
#include "util/macros.h"
#include "util/list.h"
#include "util/vk_alloc.h"
#include "main/macros.h"
#include "vk_alloc.h"
#include "radv_radeon_winsys.h"
#include "ac_binary.h"
#include "ac_nir_to_llvm.h"
#include "ac_gpu_info.h"
#include "ac_surface.h"
#include "radv_debug.h"
#include "radv_descriptor_set.h"
@@ -268,14 +266,10 @@ struct radv_physical_device {
char path[20];
const char * name;
uint8_t uuid[VK_UUID_SIZE];
uint8_t device_uuid[VK_UUID_SIZE];
int local_fd;
struct wsi_device wsi_device;
struct radv_extensions extensions;
bool has_rbplus; /* if RB+ register exist */
bool rbplus_allowed; /* if RB+ is allowed */
};
struct radv_instance {
@@ -288,7 +282,6 @@ struct radv_instance {
struct radv_physical_device physicalDevices[RADV_MAX_DRM_DEVICES];
uint64_t debug_flags;
uint64_t perftest_flags;
};
VkResult radv_init_wsi(struct radv_physical_device *physical_device);
@@ -350,8 +343,6 @@ struct radv_meta_state {
struct radv_pipeline *depthstencil_pipeline[NUM_DEPTH_CLEAR_PIPELINES];
} clear[1 + MAX_SAMPLES_LOG2];
VkPipelineLayout clear_color_p_layout;
VkPipelineLayout clear_depth_p_layout;
struct {
VkRenderPass render_pass[NUM_META_FS_KEYS];
@@ -424,22 +415,9 @@ struct radv_meta_state {
struct {
VkPipeline pipeline;
VkPipeline i_pipeline;
VkPipeline srgb_pipeline;
} rc[MAX_SAMPLES_LOG2];
} resolve_compute;
struct {
VkDescriptorSetLayout ds_layout;
VkPipelineLayout p_layout;
struct {
VkRenderPass srgb_render_pass;
VkPipeline srgb_pipeline;
VkRenderPass render_pass[NUM_META_FS_KEYS];
VkPipeline pipeline[NUM_META_FS_KEYS];
} rc[MAX_SAMPLES_LOG2];
} resolve_fragment;
struct {
VkPipeline decompress_pipeline;
VkPipeline resummarize_pipeline;
@@ -592,10 +570,6 @@ struct radv_descriptor_pool {
uint64_t size;
struct list_head vram_list;
uint8_t *host_memory_base;
uint8_t *host_memory_ptr;
uint8_t *host_memory_end;
};
struct radv_descriptor_update_template_entry {
@@ -611,6 +585,7 @@ struct radv_descriptor_update_template_entry {
uint32_t dst_stride;
uint32_t buffer_offset;
uint32_t buffer_count;
/* Only valid for combined image samplers and samplers */
uint16_t has_sampler;
@@ -751,6 +726,7 @@ struct radv_attachment_state {
struct radv_cmd_state {
uint32_t vb_dirty;
radv_cmd_dirty_mask_t dirty;
bool vertex_descriptors_dirty;
bool push_descriptors_dirty;
struct radv_pipeline * pipeline;
@@ -765,9 +741,9 @@ struct radv_cmd_state {
struct radv_descriptor_set * descriptors[MAX_SETS];
struct radv_attachment_state * attachments;
VkRect2D render_area;
struct radv_buffer * index_buffer;
uint32_t index_type;
uint64_t index_va;
uint32_t max_index_count;
uint32_t index_offset;
int32_t last_primitive_reset_en;
uint32_t last_primitive_reset_index;
enum radv_cmd_flush_bits flush_bits;
@@ -815,6 +791,8 @@ struct radv_cmd_buffer {
struct radv_cmd_buffer_upload upload;
bool record_fail;
uint32_t scratch_size_needed;
uint32_t compute_scratch_size_needed;
uint32_t esgs_ring_size_needed;
@@ -822,12 +800,7 @@ struct radv_cmd_buffer {
bool tess_rings_needed;
bool sample_positions_needed;
bool record_fail;
int ring_offsets_idx; /* just used for verification */
uint32_t gfx9_fence_offset;
struct radeon_winsys_bo *gfx9_fence_bo;
uint32_t gfx9_fence_idx;
};
struct radv_image;
@@ -847,29 +820,18 @@ void si_write_scissors(struct radeon_winsys_cs *cs, int first,
uint32_t si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
bool instanced_draw, bool indirect_draw,
uint32_t draw_vertex_count);
void si_cs_emit_write_event_eop(struct radeon_winsys_cs *cs,
enum chip_class chip_class,
bool is_mec,
unsigned event, unsigned event_flags,
unsigned data_sel,
uint64_t va,
uint32_t old_fence,
uint32_t new_fence);
void si_emit_wait_fence(struct radeon_winsys_cs *cs,
uint64_t va, uint32_t ref,
uint32_t mask);
void si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
enum chip_class chip_class,
uint32_t *fence_ptr, uint64_t va,
bool is_mec,
enum radv_cmd_flush_bits flush_bits);
enum chip_class chip_class,
bool is_mec,
enum radv_cmd_flush_bits flush_bits);
void si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
enum chip_class chip_class,
bool is_mec,
enum radv_cmd_flush_bits flush_bits);
void si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer);
void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
uint64_t src_va, uint64_t dest_va,
uint64_t size);
void si_cp_dma_prefetch(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
unsigned size);
void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
uint64_t size, unsigned value);
void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer);
@@ -894,8 +856,6 @@ void
radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_clear_subpass(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer);
void radv_cayman_emit_msaa_sample_locs(struct radeon_winsys_cs *cs, int nr_samples);
unsigned radv_cayman_get_maxdist(int log_samples);
void radv_device_init_msaa(struct radv_device *device);
@@ -1047,7 +1007,7 @@ struct radv_pipeline {
struct radv_pipeline_layout * layout;
bool needs_data_cache;
bool need_indirect_descriptor_sets;
struct radv_shader_variant * shaders[MESA_SHADER_STAGES];
struct radv_shader_variant *gs_copy_shader;
VkShaderStageFlags active_stages;
@@ -1071,7 +1031,6 @@ struct radv_pipeline {
unsigned prim;
unsigned gs_out;
uint32_t vgt_gs_mode;
bool vgt_primitiveid_en;
bool prim_restart_enable;
unsigned esgs_ring_size;
unsigned gsvs_ring_size;
@@ -1079,8 +1038,6 @@ struct radv_pipeline {
uint32_t ps_input_cntl_num;
uint32_t pa_cl_vs_out_cntl;
uint32_t vgt_shader_stages_en;
uint32_t vtx_base_sgpr;
uint8_t vtx_emit_num;
struct radv_prim_vertex_count prim_vertex_count;
bool can_use_guardband;
} graphics;
@@ -1100,11 +1057,6 @@ static inline bool radv_pipeline_has_tess(struct radv_pipeline *pipeline)
return pipeline->shaders[MESA_SHADER_TESS_EVAL] ? true : false;
}
uint32_t radv_shader_stage_to_user_data_0(gl_shader_stage stage, bool has_gs, bool has_tess);
struct ac_userdata_info *radv_lookup_user_sgpr(struct radv_pipeline *pipeline,
gl_shader_stage stage,
int idx);
struct radv_graphics_pipeline_create_info {
bool use_rectlist;
bool db_depth_clear;
@@ -1189,7 +1141,10 @@ struct radv_image {
*/
VkFormat vk_format;
VkImageAspectFlags aspects;
struct ac_surf_info info;
VkExtent3D extent;
uint32_t levels;
uint32_t array_size;
uint32_t samples; /**< VkImageCreateInfo::samples */
VkImageUsageFlags usage; /**< Superset of VkImageCreateInfo::usage. */
VkImageTiling tiling; /** VkImageCreateInfo::tiling */
VkImageCreateFlags flags; /** VkImageCreateInfo::flags */
@@ -1212,22 +1167,12 @@ struct radv_image {
uint32_t clear_value_offset;
};
/* Whether the image has a htile that is known consistent with the contents of
* the image. */
bool radv_layout_has_htile(const struct radv_image *image,
VkImageLayout layout,
unsigned queue_mask);
/* Whether the image has a htile that is known consistent with the contents of
* the image and is allowed to be in compressed form.
*
* If this is false reads that don't use the htile should be able to return
* correct results.
*/
VkImageLayout layout);
bool radv_layout_is_htile_compressed(const struct radv_image *image,
VkImageLayout layout,
unsigned queue_mask);
VkImageLayout layout);
bool radv_layout_can_expclear(const struct radv_image *image,
VkImageLayout layout);
bool radv_layout_can_fast_clear(const struct radv_image *image,
VkImageLayout layout,
unsigned queue_mask);
@@ -1240,7 +1185,7 @@ radv_get_layerCount(const struct radv_image *image,
const VkImageSubresourceRange *range)
{
return range->layerCount == VK_REMAINING_ARRAY_LAYERS ?
image->info.array_size - range->baseArrayLayer : range->layerCount;
image->array_size - range->baseArrayLayer : range->layerCount;
}
static inline uint32_t
@@ -1248,7 +1193,7 @@ radv_get_levelCount(const struct radv_image *image,
const VkImageSubresourceRange *range)
{
return range->levelCount == VK_REMAINING_MIP_LEVELS ?
image->info.levels - range->baseMipLevel : range->levelCount;
image->levels - range->baseMipLevel : range->levelCount;
}
struct radeon_bo_metadata;
@@ -1275,6 +1220,7 @@ struct radv_image_view {
struct radv_image_create_info {
const VkImageCreateInfo *vk_info;
uint32_t stride;
bool scanout;
};
@@ -1285,8 +1231,11 @@ VkResult radv_image_create(VkDevice _device,
void radv_image_view_init(struct radv_image_view *view,
struct radv_device *device,
const VkImageViewCreateInfo* pCreateInfo);
const VkImageViewCreateInfo* pCreateInfo,
struct radv_cmd_buffer *cmd_buffer,
VkImageUsageFlags usage_mask);
void radv_image_set_optimal_micro_tile_mode(struct radv_device *device,
struct radv_image *image, uint32_t micro_tile_mode);
struct radv_buffer_view {
struct radeon_winsys_bo *bo;
VkFormat vk_format;
@@ -1330,57 +1279,42 @@ radv_sanitize_image_offset(const VkImageType imageType,
}
}
static inline bool
radv_image_extent_compare(const struct radv_image *image,
const VkExtent3D *extent)
{
if (extent->width != image->info.width ||
extent->height != image->info.height ||
extent->depth != image->info.depth)
return false;
return true;
}
struct radv_sampler {
uint32_t state[4];
};
struct radv_color_buffer_info {
uint64_t cb_color_base;
uint64_t cb_color_cmask;
uint64_t cb_color_fmask;
uint64_t cb_dcc_base;
uint32_t cb_color_base;
uint32_t cb_color_pitch;
uint32_t cb_color_slice;
uint32_t cb_color_view;
uint32_t cb_color_info;
uint32_t cb_color_attrib;
uint32_t cb_color_attrib2;
uint32_t cb_dcc_control;
uint32_t cb_color_cmask;
uint32_t cb_color_cmask_slice;
uint32_t cb_color_fmask;
uint32_t cb_color_fmask_slice;
uint32_t cb_clear_value0;
uint32_t cb_clear_value1;
uint32_t cb_dcc_base;
uint32_t micro_tile_mode;
uint32_t gfx9_epitch;
};
struct radv_ds_buffer_info {
uint64_t db_z_read_base;
uint64_t db_stencil_read_base;
uint64_t db_z_write_base;
uint64_t db_stencil_write_base;
uint64_t db_htile_data_base;
uint32_t db_depth_info;
uint32_t db_z_info;
uint32_t db_stencil_info;
uint32_t db_z_read_base;
uint32_t db_stencil_read_base;
uint32_t db_z_write_base;
uint32_t db_stencil_write_base;
uint32_t db_depth_view;
uint32_t db_depth_size;
uint32_t db_depth_slice;
uint32_t db_htile_surface;
uint32_t db_htile_data_base;
uint32_t pa_su_poly_offset_db_fmt_cntl;
uint32_t db_z_info2;
uint32_t db_stencil_info2;
float offset_scale;
};
@@ -1409,8 +1343,8 @@ struct radv_subpass_barrier {
struct radv_subpass {
uint32_t input_count;
uint32_t color_count;
VkAttachmentReference * input_attachments;
uint32_t color_count;
VkAttachmentReference * color_attachments;
VkAttachmentReference * resolve_attachments;
VkAttachmentReference depth_stencil_attachment;

View File

@@ -72,8 +72,6 @@ static struct nir_ssa_def *
radv_load_push_int(nir_builder *b, unsigned offset, const char *name)
{
nir_intrinsic_instr *flags = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(flags, 0);
nir_intrinsic_set_range(flags, 16);
flags->src[0] = nir_src_for_ssa(nir_imm_int(b, offset));
flags->num_components = 1;
nir_ssa_dest_init(&flags->instr, &flags->dest, 1, 32, name);
@@ -122,10 +120,10 @@ build_occlusion_query_shader(struct radv_device *device) {
*/
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "occlusion_query");
b.shader->info.cs.local_size[0] = 64;
b.shader->info.cs.local_size[1] = 1;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "occlusion_query");
b.shader->info->cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1;
nir_variable *result = nir_local_variable_create(b.impl, glsl_uint64_t_type(), "result");
nir_variable *outer_counter = nir_local_variable_create(b.impl, glsl_int_type(), "outer_counter");
@@ -155,9 +153,9 @@ build_occlusion_query_shader(struct radv_device *device) {
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
global_id = nir_channel(&b, global_id, 0); // We only care about x here.
@@ -317,10 +315,10 @@ build_pipeline_statistics_query_shader(struct radv_device *device) {
*/
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "pipeline_statistics_query");
b.shader->info.cs.local_size[0] = 64;
b.shader->info.cs.local_size[1] = 1;
b.shader->info.cs.local_size[2] = 1;
b.shader->info->name = ralloc_strdup(b.shader, "pipeline_statistics_query");
b.shader->info->cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1;
nir_variable *output_offset = nir_local_variable_create(b.impl, glsl_int_type(), "output_offset");
@@ -347,9 +345,9 @@ build_pipeline_statistics_query_shader(struct radv_device *device) {
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info.cs.local_size[0],
b.shader->info.cs.local_size[1],
b.shader->info.cs.local_size[2], 0);
b.shader->info->cs.local_size[0],
b.shader->info->cs.local_size[1],
b.shader->info->cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
global_id = nir_channel(&b, global_id, 0); // We only care about x here.
@@ -609,10 +607,12 @@ VkResult radv_device_init_meta_query_state(struct radv_device *device)
radv_pipeline_cache_to_handle(&device->meta_state.cache),
1, &pipeline_statistics_vk_pipeline_info, NULL,
&device->meta_state.query.pipeline_statistics_query_pipeline);
fail:
if (result != VK_SUCCESS)
radv_device_finish_meta_query_state(device);
goto fail;
return VK_SUCCESS;
fail:
radv_device_finish_meta_query_state(device);
ralloc_free(occlusion_cs.nir);
ralloc_free(pipeline_statistics_cs.nir);
return result;
@@ -992,7 +992,13 @@ void radv_CmdCopyQueryPoolResults(
uint64_t avail_va = va + pool->availability_offset + 4 * query;
/* This waits on the ME. All copies below are done on the ME */
si_emit_wait_fence(cs, avail_va, 1, 0xffffffff);
radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0));
radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1));
radeon_emit(cs, avail_va);
radeon_emit(cs, avail_va >> 32);
radeon_emit(cs, 1); /* reference value */
radeon_emit(cs, 0xffffffff); /* mask */
radeon_emit(cs, 4); /* poll interval */
}
}
radv_query_shader(cmd_buffer, cmd_buffer->device->meta_state.query.pipeline_statistics_query_pipeline,
@@ -1015,7 +1021,13 @@ void radv_CmdCopyQueryPoolResults(
uint64_t avail_va = va + pool->availability_offset + 4 * query;
/* This waits on the ME. All copies below are done on the ME */
si_emit_wait_fence(cs, avail_va, 1, 0xffffffff);
radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0));
radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1));
radeon_emit(cs, avail_va);
radeon_emit(cs, avail_va >> 32);
radeon_emit(cs, 1); /* reference value */
radeon_emit(cs, 0xffffffff); /* mask */
radeon_emit(cs, 4); /* poll interval */
}
if (flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) {
uint64_t avail_va = va + pool->availability_offset + 4 * query;
@@ -1139,7 +1151,7 @@ void radv_CmdEndQuery(
break;
case VK_QUERY_TYPE_PIPELINE_STATISTICS:
radeon_check_space(cmd_buffer->device->ws, cs, 16);
radeon_check_space(cmd_buffer->device->ws, cs, 10);
va += pipelinestat_block_size;
@@ -1148,11 +1160,13 @@ void radv_CmdEndQuery(
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
si_cs_emit_write_event_eop(cs,
cmd_buffer->device->physical_device->rad_info.chip_class,
false,
EVENT_TYPE_BOTTOM_OF_PIPE_TS, 0,
1, avail_va, 0, 1);
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_BOTTOM_OF_PIPE_TS) |
EVENT_INDEX(5));
radeon_emit(cs, avail_va);
radeon_emit(cs, (avail_va >> 32) | EOP_DATA_SEL(1));
radeon_emit(cs, 1);
radeon_emit(cs, 0);
break;
default:
unreachable("ending unhandled query type");
@@ -1175,40 +1189,32 @@ void radv_CmdWriteTimestamp(
cmd_buffer->device->ws->cs_add_buffer(cs, pool->bo, 5);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 28);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 12);
switch(pipelineStage) {
case VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT:
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, COPY_DATA_COUNT_SEL | COPY_DATA_WR_CONFIRM |
COPY_DATA_SRC_SEL(COPY_DATA_TIMESTAMP) |
COPY_DATA_DST_SEL(V_370_MEM_ASYNC));
radeon_emit(cs, 0);
radeon_emit(cs, 0);
if (mec) {
radeon_emit(cs, PKT3(PKT3_RELEASE_MEM, 5, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_BOTTOM_OF_PIPE_TS) | EVENT_INDEX(5));
radeon_emit(cs, 3 << 29);
radeon_emit(cs, query_va);
radeon_emit(cs, query_va >> 32);
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_ME));
radeon_emit(cs, avail_va);
radeon_emit(cs, avail_va >> 32);
radeon_emit(cs, 1);
break;
default:
si_cs_emit_write_event_eop(cs,
cmd_buffer->device->physical_device->rad_info.chip_class,
mec,
V_028A90_BOTTOM_OF_PIPE_TS, 0,
3, query_va, 0, 0);
si_cs_emit_write_event_eop(cs,
cmd_buffer->device->physical_device->rad_info.chip_class,
mec,
V_028A90_BOTTOM_OF_PIPE_TS, 0,
1, avail_va, 0, 1);
break;
radeon_emit(cs, 0);
radeon_emit(cs, 0);
} else {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_BOTTOM_OF_PIPE_TS) | EVENT_INDEX(5));
radeon_emit(cs, query_va);
radeon_emit(cs, (3 << 29) | ((query_va >> 32) & 0xFFFF));
radeon_emit(cs, 0);
radeon_emit(cs, 0);
}
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
radeon_emit(cs, S_370_DST_SEL(mec ? V_370_MEM_ASYNC : V_370_MEMORY_SYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_ME));
radeon_emit(cs, avail_va);
radeon_emit(cs, avail_va >> 32);
radeon_emit(cs, 1);
assert(cmd_buffer->cs->cdw <= cdw_max);
}

View File

@@ -35,10 +35,6 @@
#include "main/macros.h"
#include "amd_family.h"
struct radeon_info;
struct ac_surf_info;
struct radeon_surf;
#define FREE(x) free(x)
enum radeon_bo_domain { /* bitfield */
@@ -75,6 +71,63 @@ struct radeon_winsys_cs {
uint32_t *buf; /* The base pointer of the chunk. */
};
struct radeon_info {
/* PCI info: domain:bus:dev:func */
uint32_t pci_domain;
uint32_t pci_bus;
uint32_t pci_dev;
uint32_t pci_func;
/* Device info. */
uint32_t pci_id;
enum radeon_family family;
const char *name;
enum chip_class chip_class;
uint32_t gart_page_size;
uint64_t gart_size;
uint64_t vram_size;
uint64_t visible_vram_size;
bool has_dedicated_vram;
bool has_virtual_memory;
bool gfx_ib_pad_with_type2;
bool has_uvd;
uint32_t sdma_rings;
uint32_t compute_rings;
uint32_t vce_fw_version;
uint32_t vce_harvest_config;
uint32_t clock_crystal_freq; /* in kHz */
/* Kernel info. */
uint32_t drm_major; /* version */
uint32_t drm_minor;
uint32_t drm_patchlevel;
bool has_userptr;
/* Shader cores. */
uint32_t r600_max_quad_pipes; /* wave size / 16 */
uint32_t max_shader_clock;
uint32_t num_good_compute_units;
uint32_t max_se; /* shader engines */
uint32_t max_sh_per_se; /* shader arrays per shader engine */
/* Render backends (color + depth blocks). */
uint32_t r300_num_gb_pipes;
uint32_t r300_num_z_pipes;
uint32_t r600_gb_backend_map; /* R600 harvest config */
bool r600_gb_backend_map_valid;
uint32_t r600_num_banks;
uint32_t num_render_backends;
uint32_t num_tile_pipes; /* pipe count from PIPE_CONFIG */
uint32_t pipe_interleave_bytes;
uint32_t enabled_rb_mask; /* GCN harvest config */
/* Tile modes. */
uint32_t si_tile_mode_array[32];
uint32_t cik_macrotile_mode_array[16];
};
#define RADEON_SURF_MAX_LEVEL 32
#define RADEON_SURF_TYPE_MASK 0xFF
#define RADEON_SURF_TYPE_SHIFT 0
#define RADEON_SURF_TYPE_1D 0
@@ -85,11 +138,93 @@ struct radeon_winsys_cs {
#define RADEON_SURF_TYPE_2D_ARRAY 5
#define RADEON_SURF_MODE_MASK 0xFF
#define RADEON_SURF_MODE_SHIFT 8
#define RADEON_SURF_MODE_LINEAR_ALIGNED 1
#define RADEON_SURF_MODE_1D 2
#define RADEON_SURF_MODE_2D 3
#define RADEON_SURF_SCANOUT (1 << 16)
#define RADEON_SURF_ZBUFFER (1 << 17)
#define RADEON_SURF_SBUFFER (1 << 18)
#define RADEON_SURF_Z_OR_SBUFFER (RADEON_SURF_ZBUFFER | RADEON_SURF_SBUFFER)
#define RADEON_SURF_HAS_SBUFFER_MIPTREE (1 << 19)
#define RADEON_SURF_HAS_TILE_MODE_INDEX (1 << 20)
#define RADEON_SURF_FMASK (1 << 21)
#define RADEON_SURF_DISABLE_DCC (1 << 22)
#define RADEON_SURF_TC_COMPATIBLE_HTILE (1 << 23)
#define RADEON_SURF_GET(v, field) (((v) >> RADEON_SURF_ ## field ## _SHIFT) & RADEON_SURF_ ## field ## _MASK)
#define RADEON_SURF_SET(v, field) (((v) & RADEON_SURF_ ## field ## _MASK) << RADEON_SURF_ ## field ## _SHIFT)
#define RADEON_SURF_CLR(v, field) ((v) & ~(RADEON_SURF_ ## field ## _MASK << RADEON_SURF_ ## field ## _SHIFT))
struct radeon_surf_level {
uint64_t offset;
uint64_t slice_size;
uint32_t npix_x;
uint32_t npix_y;
uint32_t npix_z;
uint32_t nblk_x;
uint32_t nblk_y;
uint32_t nblk_z;
uint32_t pitch_bytes;
uint32_t mode;
uint64_t dcc_offset;
uint64_t dcc_fast_clear_size;
bool dcc_enabled;
};
/* surface defintions from the winsys */
struct radeon_surf {
/* These are inputs to the calculator. */
uint32_t npix_x;
uint32_t npix_y;
uint32_t npix_z;
uint32_t blk_w;
uint32_t blk_h;
uint32_t blk_d;
uint32_t array_size;
uint32_t last_level;
uint32_t bpe;
uint32_t nsamples;
uint32_t flags;
/* These are return values. Some of them can be set by the caller, but
* they will be treated as hints (e.g. bankw, bankh) and might be
* changed by the calculator.
*/
uint64_t bo_size;
uint64_t bo_alignment;
/* This applies to EG and later. */
uint32_t bankw;
uint32_t bankh;
uint32_t mtilea;
uint32_t tile_split;
uint32_t stencil_tile_split;
uint64_t stencil_offset;
struct radeon_surf_level level[RADEON_SURF_MAX_LEVEL];
struct radeon_surf_level stencil_level[RADEON_SURF_MAX_LEVEL];
uint32_t tiling_index[RADEON_SURF_MAX_LEVEL];
uint32_t stencil_tiling_index[RADEON_SURF_MAX_LEVEL];
uint32_t pipe_config;
uint32_t num_banks;
uint32_t macro_tile_index;
uint32_t micro_tile_mode; /* displayable, thin, depth, rotated */
/* Whether the depth miptree or stencil miptree as used by the DB are
* adjusted from their TC compatible form to ensure depth/stencil
* compatibility. If either is true, the corresponding plane cannot be
* sampled from.
*/
bool depth_adjusted;
bool stencil_adjusted;
uint64_t dcc_size;
uint64_t dcc_alignment;
uint64_t htile_size;
uint64_t htile_slice_size;
uint64_t htile_alignment;
};
enum radeon_bo_layout {
RADEON_LAYOUT_LINEAR = 0,
RADEON_LAYOUT_TILED,
@@ -103,25 +238,16 @@ struct radeon_bo_metadata {
/* Tiling flags describing the texture layout for display code
* and DRI sharing.
*/
union {
struct {
enum radeon_bo_layout microtile;
enum radeon_bo_layout macrotile;
unsigned pipe_config;
unsigned bankw;
unsigned bankh;
unsigned tile_split;
unsigned mtilea;
unsigned num_banks;
unsigned stride;
bool scanout;
} legacy;
struct {
/* surface flags */
unsigned swizzle_mode:5;
} gfx9;
} u;
enum radeon_bo_layout microtile;
enum radeon_bo_layout macrotile;
unsigned pipe_config;
unsigned bankw;
unsigned bankh;
unsigned tile_split;
unsigned mtilea;
unsigned num_banks;
unsigned stride;
bool scanout;
/* Additional metadata associated with the buffer, in bytes.
* The maximum size is 64 * 4. This is opaque for the winsys & kernel.
@@ -208,7 +334,6 @@ struct radeon_winsys {
void (*cs_dump)(struct radeon_winsys_cs *cs, FILE* file, uint32_t trace_id);
int (*surface_init)(struct radeon_winsys *ws,
const struct ac_surf_info *surf_info,
struct radeon_surf *surf);
int (*surface_best)(struct radeon_winsys *ws,

View File

@@ -26,7 +26,7 @@
#include "radv_private.h"
#include "radv_meta.h"
#include "wsi_common.h"
#include "vk_util.h"
#include "util/vk_util.h"
static const struct wsi_callbacks wsi_cbs = {
.get_phys_device_format_properties = radv_GetPhysicalDeviceFormatProperties,
@@ -224,7 +224,7 @@ radv_wsi_image_create(VkDevice device_h,
*memory_p = memory_h;
*size = image->size;
*offset = image->offset;
*row_pitch = surface->u.legacy.level[0].nblk_x * surface->bpe;
*row_pitch = surface->level[0].pitch_bytes;
return VK_SUCCESS;
fail_alloc_memory:
radv_FreeMemory(device_h, memory_h, pAllocator);
@@ -438,7 +438,7 @@ VkResult radv_AcquireNextImageKHR(
VkResult result = swapchain->acquire_next_image(swapchain, timeout, semaphore,
pImageIndex);
if (fence && (result == VK_SUCCESS || result == VK_SUBOPTIMAL_KHR)) {
if (fence && result == VK_SUCCESS) {
fence->submitted = true;
fence->signalled = true;
}

View File

@@ -30,7 +30,6 @@
#include "radv_private.h"
#include "radv_cs.h"
#include "sid.h"
#include "gfx9d.h"
#include "radv_util.h"
#include "main/macros.h"
@@ -242,9 +241,6 @@ si_emit_config(struct radv_physical_device *physical_device,
radeon_set_context_reg(cs, R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET, 0);
radeon_set_context_reg(cs, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0);
radeon_set_context_reg(cs, R_028AA0_VGT_INSTANCE_STEP_RATE_0, 1);
if (physical_device->rad_info.chip_class >= GFX9)
radeon_set_context_reg(cs, R_028AB4_VGT_REUSE_OFF, 0);
radeon_set_context_reg(cs, R_028AB8_VGT_VTX_CNT_EN, 0x0);
if (physical_device->rad_info.chip_class < CIK)
radeon_set_config_reg(cs, R_008A14_PA_CL_ENHANCE, S_008A14_NUM_CLIP_SEQ(3) |
@@ -332,28 +328,24 @@ si_emit_config(struct radv_physical_device *physical_device,
raster_config_1 = 0x00000000;
break;
default:
if (physical_device->rad_info.chip_class <= VI) {
fprintf(stderr,
"radeonsi: Unknown GPU, using 0 for raster_config\n");
raster_config = 0x00000000;
raster_config_1 = 0x00000000;
}
fprintf(stderr,
"radeonsi: Unknown GPU, using 0 for raster_config\n");
raster_config = 0x00000000;
raster_config_1 = 0x00000000;
break;
}
/* Always use the default config when all backends are enabled
* (or when we failed to determine the enabled backends).
*/
if (physical_device->rad_info.chip_class <= VI) {
if (!rb_mask || util_bitcount(rb_mask) >= num_rb) {
radeon_set_context_reg(cs, R_028350_PA_SC_RASTER_CONFIG,
raster_config);
if (physical_device->rad_info.chip_class >= CIK)
radeon_set_context_reg(cs, R_028354_PA_SC_RASTER_CONFIG_1,
raster_config_1);
} else {
si_write_harvested_raster_configs(physical_device, cs, raster_config, raster_config_1);
}
if (!rb_mask || util_bitcount(rb_mask) >= num_rb) {
radeon_set_context_reg(cs, R_028350_PA_SC_RASTER_CONFIG,
raster_config);
if (physical_device->rad_info.chip_class >= CIK)
radeon_set_context_reg(cs, R_028354_PA_SC_RASTER_CONFIG_1,
raster_config_1);
} else {
si_write_harvested_raster_configs(physical_device, cs, raster_config, raster_config_1);
}
radeon_set_context_reg(cs, R_028204_PA_SC_WINDOW_SCISSOR_TL, S_028204_WINDOW_OFFSET_DISABLE(1));
@@ -377,31 +369,22 @@ si_emit_config(struct radv_physical_device *physical_device,
S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) |
S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE));
if (physical_device->rad_info.chip_class >= GFX9) {
radeon_set_uconfig_reg(cs, R_030920_VGT_MAX_VTX_INDX, ~0);
radeon_set_uconfig_reg(cs, R_030924_VGT_MIN_VTX_INDX, 0);
radeon_set_uconfig_reg(cs, R_030928_VGT_INDX_OFFSET, 0);
} else {
radeon_set_context_reg(cs, R_028400_VGT_MAX_VTX_INDX, ~0);
radeon_set_context_reg(cs, R_028404_VGT_MIN_VTX_INDX, 0);
radeon_set_context_reg(cs, R_028408_VGT_INDX_OFFSET, 0);
}
radeon_set_context_reg(cs, R_028400_VGT_MAX_VTX_INDX, ~0);
radeon_set_context_reg(cs, R_028404_VGT_MIN_VTX_INDX, 0);
radeon_set_context_reg(cs, R_028408_VGT_INDX_OFFSET, 0);
if (physical_device->rad_info.chip_class >= CIK) {
if (physical_device->rad_info.chip_class >= GFX9) {
radeon_set_sh_reg(cs, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, S_00B41C_CU_EN(0xffff));
} else {
radeon_set_sh_reg(cs, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, S_00B51C_CU_EN(0xffff));
radeon_set_sh_reg(cs, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0);
radeon_set_sh_reg(cs, R_00B31C_SPI_SHADER_PGM_RSRC3_ES, S_00B31C_CU_EN(0xffff));
/* If this is 0, Bonaire can hang even if GS isn't being used.
* Other chips are unaffected. These are suboptimal values,
* but we don't use on-chip GS.
*/
radeon_set_context_reg(cs, R_028A44_VGT_GS_ONCHIP_CNTL,
S_028A44_ES_VERTS_PER_SUBGRP(64) |
S_028A44_GS_PRIMS_PER_SUBGRP(4));
}
/* If this is 0, Bonaire can hang even if GS isn't being used.
* Other chips are unaffected. These are suboptimal values,
* but we don't use on-chip GS.
*/
radeon_set_context_reg(cs, R_028A44_VGT_GS_ONCHIP_CNTL,
S_028A44_ES_VERTS_PER_SUBGRP(64) |
S_028A44_GS_PRIMS_PER_SUBGRP(4));
radeon_set_sh_reg(cs, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, S_00B51C_CU_EN(0xffff));
radeon_set_sh_reg(cs, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0);
radeon_set_sh_reg(cs, R_00B31C_SPI_SHADER_PGM_RSRC3_ES, S_00B31C_CU_EN(0xffff));
radeon_set_sh_reg(cs, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, S_00B21C_CU_EN(0xffff));
if (physical_device->rad_info.num_good_compute_units /
@@ -452,41 +435,9 @@ si_emit_config(struct radv_physical_device *physical_device,
radeon_set_context_reg(cs, R_028C5C_VGT_OUT_DEALLOC_CNTL, 16);
}
if (physical_device->has_rbplus)
if (physical_device->rad_info.family == CHIP_STONEY)
radeon_set_context_reg(cs, R_028C40_PA_SC_SHADER_CONTROL, 0);
if (physical_device->rad_info.chip_class >= GFX9) {
unsigned num_se = physical_device->rad_info.max_se;
unsigned pc_lines = 0;
switch (physical_device->rad_info.family) {
case CHIP_VEGA10:
pc_lines = 4096;
break;
case CHIP_RAVEN:
pc_lines = 1024;
break;
default:
assert(0);
}
radeon_set_context_reg(cs, R_028060_DB_DFSM_CONTROL,
S_028060_PUNCHOUT_MODE(V_028060_FORCE_OFF));
radeon_set_context_reg(cs, R_028064_DB_RENDER_FILTER, 0);
/* TODO: We can use this to disable RBs for rendering to GART: */
radeon_set_context_reg(cs, R_02835C_PA_SC_TILE_STEERING_OVERRIDE, 0);
radeon_set_context_reg(cs, R_02883C_PA_SU_OVER_RASTERIZATION_CNTL, 0);
/* TODO: Enable the binner: */
radeon_set_context_reg(cs, R_028C44_PA_SC_BINNER_CNTL_0,
S_028C44_BINNING_MODE(V_028C44_DISABLE_BINNING_USE_LEGACY_SC) |
S_028C44_DISABLE_START_OF_PRIM(1));
radeon_set_context_reg(cs, R_028C48_PA_SC_BINNER_CNTL_1,
S_028C48_MAX_ALLOC_COUNT(MIN2(128, pc_lines / (4 * num_se))) |
S_028C48_MAX_PRIM_PER_BATCH(1023));
radeon_set_context_reg(cs, R_028C4C_PA_SC_CONSERVATIVE_RASTERIZATION_CNTL,
S_028C4C_NULL_SQUAD_AA_MASK_ENABLE(1));
radeon_set_uconfig_reg(cs, R_030968_VGT_INSTANCE_BASE_ID, 0);
}
si_emit_compute(physical_device, cs);
}
@@ -700,9 +651,6 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
multi_instances_smaller_than_primgroup = indirect_draw || (instanced_draw &&
num_prims < primgroup_size);
if (cmd_buffer->state.pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.prim_id_input)
ia_switch_on_eoi = true;
if (radv_pipeline_has_tess(cmd_buffer->state.pipeline)) {
/* SWITCH_ON_EOI must be set if PrimID is used. */
if (cmd_buffer->state.pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.tcs.uses_prim_id ||
@@ -719,8 +667,7 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
/* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
if (cmd_buffer->device->has_distributed_tess) {
if (radv_pipeline_has_gs(cmd_buffer->state.pipeline)) {
if (chip_class <= VI)
partial_es_wave = true;
partial_es_wave = true;
if (family == CHIP_TONGA ||
family == CHIP_FIJI ||
@@ -788,15 +735,10 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
assert(wd_switch_on_eop || !ia_switch_on_eop);
}
/* If SWITCH_ON_EOI is set, PARTIAL_ES_WAVE must be set too. */
if (chip_class <= VI && ia_switch_on_eoi)
if (ia_switch_on_eoi)
partial_es_wave = true;
if (radv_pipeline_has_gs(cmd_buffer->state.pipeline)) {
if (radv_pipeline_has_gs(cmd_buffer->state.pipeline) &&
cmd_buffer->state.pipeline->shaders[MESA_SHADER_GEOMETRY]->info.gs.uses_prim_id)
ia_switch_on_eoi = true;
/* GS requirement. */
if (SI_GS_PER_ES / primgroup_size >= cmd_buffer->device->gs_table_depth - 3)
partial_es_wave = true;
@@ -815,88 +757,22 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
S_028AA8_PARTIAL_ES_WAVE_ON(partial_es_wave) |
S_028AA8_PRIMGROUP_SIZE(primgroup_size - 1) |
S_028AA8_WD_SWITCH_ON_EOP(chip_class >= CIK ? wd_switch_on_eop : 0) |
/* The following field was moved to VGT_SHADER_STAGES_EN in GFX9. */
S_028AA8_MAX_PRIMGRP_IN_WAVE(chip_class == VI ?
max_primgroup_in_wave : 0) |
S_030960_EN_INST_OPT_BASIC(chip_class >= GFX9) |
S_030960_EN_INST_OPT_ADV(chip_class >= GFX9);
S_028AA8_MAX_PRIMGRP_IN_WAVE(chip_class >= VI ?
max_primgroup_in_wave : 0);
}
void si_cs_emit_write_event_eop(struct radeon_winsys_cs *cs,
enum chip_class chip_class,
bool is_mec,
unsigned event, unsigned event_flags,
unsigned data_sel,
uint64_t va,
uint32_t old_fence,
uint32_t new_fence)
{
unsigned op = EVENT_TYPE(event) |
EVENT_INDEX(5) |
event_flags;
unsigned is_gfx8_mec = is_mec && chip_class < GFX9;
if (chip_class >= GFX9 || is_gfx8_mec) {
radeon_emit(cs, PKT3(PKT3_RELEASE_MEM, is_gfx8_mec ? 5 : 6, 0));
radeon_emit(cs, op);
radeon_emit(cs, EOP_DATA_SEL(data_sel));
radeon_emit(cs, va); /* address lo */
radeon_emit(cs, va >> 32); /* address hi */
radeon_emit(cs, new_fence); /* immediate data lo */
radeon_emit(cs, 0); /* immediate data hi */
if (!is_gfx8_mec)
radeon_emit(cs, 0); /* unused */
} else {
if (chip_class == CIK ||
chip_class == VI) {
/* Two EOP events are required to make all engines go idle
* (and optional cache flushes executed) before the timestamp
* is written.
*/
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, op);
radeon_emit(cs, va);
radeon_emit(cs, ((va >> 32) & 0xffff) | EOP_DATA_SEL(data_sel));
radeon_emit(cs, old_fence); /* immediate data */
radeon_emit(cs, 0); /* unused */
}
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, op);
radeon_emit(cs, va);
radeon_emit(cs, ((va >> 32) & 0xffff) | EOP_DATA_SEL(data_sel));
radeon_emit(cs, new_fence); /* immediate data */
radeon_emit(cs, 0); /* unused */
}
}
void
si_emit_wait_fence(struct radeon_winsys_cs *cs,
uint64_t va, uint32_t ref,
uint32_t mask)
{
radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0));
radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit(cs, ref); /* reference value */
radeon_emit(cs, mask); /* mask */
radeon_emit(cs, 4); /* poll interval */
}
static void
si_emit_acquire_mem(struct radeon_winsys_cs *cs,
bool is_mec, bool is_gfx9,
bool is_mec,
unsigned cp_coher_cntl)
{
if (is_mec || is_gfx9) {
uint32_t hi_val = is_gfx9 ? 0xffffff : 0xff;
if (is_mec) {
radeon_emit(cs, PKT3(PKT3_ACQUIRE_MEM, 5, 0) |
PKT3_SHADER_TYPE_S(is_mec));
PKT3_SHADER_TYPE_S(1));
radeon_emit(cs, cp_coher_cntl); /* CP_COHER_CNTL */
radeon_emit(cs, 0xffffffff); /* CP_COHER_SIZE */
radeon_emit(cs, hi_val); /* CP_COHER_SIZE_HI */
radeon_emit(cs, 0xff); /* CP_COHER_SIZE_HI */
radeon_emit(cs, 0); /* CP_COHER_BASE */
radeon_emit(cs, 0); /* CP_COHER_BASE_HI */
radeon_emit(cs, 0x0000000A); /* POLL_INTERVAL */
@@ -913,47 +789,44 @@ si_emit_acquire_mem(struct radeon_winsys_cs *cs,
void
si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
enum chip_class chip_class,
uint32_t *flush_cnt,
uint64_t flush_va,
bool is_mec,
enum radv_cmd_flush_bits flush_bits)
{
unsigned cp_coher_cntl = 0;
uint32_t flush_cb_db = flush_bits & (RADV_CMD_FLAG_FLUSH_AND_INV_CB |
RADV_CMD_FLAG_FLUSH_AND_INV_DB);
if (flush_bits & RADV_CMD_FLAG_INV_ICACHE)
cp_coher_cntl |= S_0085F0_SH_ICACHE_ACTION_ENA(1);
if (flush_bits & RADV_CMD_FLAG_INV_SMEM_L1)
cp_coher_cntl |= S_0085F0_SH_KCACHE_ACTION_ENA(1);
if (chip_class <= VI) {
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_CB) {
cp_coher_cntl |= S_0085F0_CB_ACTION_ENA(1) |
S_0085F0_CB0_DEST_BASE_ENA(1) |
S_0085F0_CB1_DEST_BASE_ENA(1) |
S_0085F0_CB2_DEST_BASE_ENA(1) |
S_0085F0_CB3_DEST_BASE_ENA(1) |
S_0085F0_CB4_DEST_BASE_ENA(1) |
S_0085F0_CB5_DEST_BASE_ENA(1) |
S_0085F0_CB6_DEST_BASE_ENA(1) |
S_0085F0_CB7_DEST_BASE_ENA(1);
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_CB) {
cp_coher_cntl |= S_0085F0_CB_ACTION_ENA(1) |
S_0085F0_CB0_DEST_BASE_ENA(1) |
S_0085F0_CB1_DEST_BASE_ENA(1) |
S_0085F0_CB2_DEST_BASE_ENA(1) |
S_0085F0_CB3_DEST_BASE_ENA(1) |
S_0085F0_CB4_DEST_BASE_ENA(1) |
S_0085F0_CB5_DEST_BASE_ENA(1) |
S_0085F0_CB6_DEST_BASE_ENA(1) |
S_0085F0_CB7_DEST_BASE_ENA(1);
/* Necessary for DCC */
if (chip_class >= VI) {
si_cs_emit_write_event_eop(cs,
chip_class,
is_mec,
V_028A90_FLUSH_AND_INV_CB_DATA_TS,
0, 0, 0, 0, 0);
}
}
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_DB) {
cp_coher_cntl |= S_0085F0_DB_ACTION_ENA(1) |
S_0085F0_DB_DEST_BASE_ENA(1);
/* Necessary for DCC */
if (chip_class >= VI) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_CB_DATA_TS) |
EVENT_INDEX(5));
radeon_emit(cs, 0);
radeon_emit(cs, 0);
radeon_emit(cs, 0);
radeon_emit(cs, 0);
}
}
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_DB) {
cp_coher_cntl |= S_0085F0_DB_ACTION_ENA(1) |
S_0085F0_DB_DEST_BASE_ENA(1);
}
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_CB_META) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_CB_META) | EVENT_INDEX(0));
@@ -964,7 +837,8 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_DB_META) | EVENT_INDEX(0));
}
if (!flush_cb_db) {
if (!(flush_bits & (RADV_CMD_FLAG_FLUSH_AND_INV_CB |
RADV_CMD_FLAG_FLUSH_AND_INV_DB))) {
if (flush_bits & RADV_CMD_FLAG_PS_PARTIAL_FLUSH) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_PS_PARTIAL_FLUSH) | EVENT_INDEX(4));
@@ -979,54 +853,6 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
radeon_emit(cs, EVENT_TYPE(V_028A90_CS_PARTIAL_FLUSH) | EVENT_INDEX(4));
}
if (chip_class >= GFX9 && flush_cb_db) {
unsigned cb_db_event, tc_flags;
/* Set the CB/DB flush event. */
switch (flush_cb_db) {
case RADV_CMD_FLAG_FLUSH_AND_INV_CB:
cb_db_event = V_028A90_FLUSH_AND_INV_CB_DATA_TS;
break;
case RADV_CMD_FLAG_FLUSH_AND_INV_DB:
cb_db_event = V_028A90_FLUSH_AND_INV_DB_DATA_TS;
break;
default:
/* both CB & DB */
cb_db_event = V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT;
}
/* TC | TC_WB = invalidate L2 data
* TC_MD | TC_WB = invalidate L2 metadata
* TC | TC_WB | TC_MD = invalidate L2 data & metadata
*
* The metadata cache must always be invalidated for coherency
* between CB/DB and shaders. (metadata = HTILE, CMASK, DCC)
*
* TC must be invalidated on GFX9 only if the CB/DB surface is
* not pipe-aligned. If the surface is RB-aligned, it might not
* strictly be pipe-aligned since RB alignment takes precendence.
*/
tc_flags = EVENT_TC_WB_ACTION_ENA |
EVENT_TC_MD_ACTION_ENA;
/* Ideally flush TC together with CB/DB. */
if (flush_bits & RADV_CMD_FLAG_INV_GLOBAL_L2) {
tc_flags |= EVENT_TC_ACTION_ENA |
EVENT_TCL1_ACTION_ENA;
/* Clear the flags. */
flush_bits &= ~(RADV_CMD_FLAG_INV_GLOBAL_L2 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 |
RADV_CMD_FLAG_INV_VMEM_L1);
}
assert(flush_cnt);
uint32_t old_fence = (*flush_cnt)++;
si_cs_emit_write_event_eop(cs, chip_class, false, cb_db_event, tc_flags, 1,
flush_va, old_fence, *flush_cnt);
si_emit_wait_fence(cs, flush_va, *flush_cnt, 0xffffffff);
}
/* VGT state sync */
if (flush_bits & RADV_CMD_FLAG_VGT_FLUSH) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
@@ -1036,11 +862,7 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
/* Make sure ME is idle (it executes most packets) before continuing.
* This prevents read-after-write hazards between PFP and ME.
*/
if ((cp_coher_cntl ||
(flush_bits & (RADV_CMD_FLAG_CS_PARTIAL_FLUSH |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_INV_GLOBAL_L2 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2))) &&
if ((cp_coher_cntl || (flush_bits & RADV_CMD_FLAG_CS_PARTIAL_FLUSH)) &&
!is_mec) {
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0);
@@ -1048,39 +870,27 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
if ((flush_bits & RADV_CMD_FLAG_INV_GLOBAL_L2) ||
(chip_class <= CIK && (flush_bits & RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2))) {
si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9,
cp_coher_cntl |
S_0085F0_TC_ACTION_ENA(1) |
S_0085F0_TCL1_ACTION_ENA(1) |
S_0301F0_TC_WB_ACTION_ENA(chip_class >= VI));
cp_coher_cntl |= S_0085F0_TC_ACTION_ENA(1);
if (chip_class >= VI)
cp_coher_cntl |= S_0301F0_TC_WB_ACTION_ENA(1);
} else if(flush_bits & RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2) {
cp_coher_cntl |= S_0301F0_TC_WB_ACTION_ENA(1) |
S_0301F0_TC_NC_ACTION_ENA(1);
/* L2 writeback doesn't combine with L1 invalidate */
si_emit_acquire_mem(cs, is_mec, cp_coher_cntl);
cp_coher_cntl = 0;
} else {
if(flush_bits & RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2) {
/* WB = write-back
* NC = apply to non-coherent MTYPEs
* (i.e. MTYPE <= 1, which is what we use everywhere)
*
* WB doesn't work without NC.
*/
si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9,
cp_coher_cntl |
S_0301F0_TC_WB_ACTION_ENA(1) |
S_0301F0_TC_NC_ACTION_ENA(1));
cp_coher_cntl = 0;
}
if (flush_bits & RADV_CMD_FLAG_INV_VMEM_L1) {
si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9,
cp_coher_cntl |
S_0085F0_TCL1_ACTION_ENA(1));
cp_coher_cntl = 0;
}
}
if (flush_bits & RADV_CMD_FLAG_INV_VMEM_L1)
cp_coher_cntl |= S_0085F0_TCL1_ACTION_ENA(1);
/* When one of the DEST_BASE flags is set, SURFACE_SYNC waits for idle.
* Therefore, it should be last. Done in PFP.
*/
if (cp_coher_cntl)
si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9, cp_coher_cntl);
si_emit_acquire_mem(cs, is_mec, cp_coher_cntl);
}
void
@@ -1097,118 +907,67 @@ si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer)
RADV_CMD_FLAG_VS_PARTIAL_FLUSH |
RADV_CMD_FLAG_VGT_FLUSH);
if (!cmd_buffer->state.flush_bits)
return;
enum chip_class chip_class = cmd_buffer->device->physical_device->rad_info.chip_class;
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 128);
uint32_t *ptr = NULL;
uint64_t va = 0;
if (chip_class == GFX9) {
va = cmd_buffer->device->ws->buffer_get_va(cmd_buffer->gfx9_fence_bo) + cmd_buffer->gfx9_fence_offset;
ptr = &cmd_buffer->gfx9_fence_idx;
}
si_cs_emit_cache_flush(cmd_buffer->cs,
cmd_buffer->device->physical_device->rad_info.chip_class,
ptr, va,
radv_cmd_buffer_uses_mec(cmd_buffer),
cmd_buffer->state.flush_bits);
radv_cmd_buffer_trace_emit(cmd_buffer);
if (cmd_buffer->state.flush_bits)
radv_cmd_buffer_trace_emit(cmd_buffer);
cmd_buffer->state.flush_bits = 0;
}
/* Set this if you want the 3D engine to wait until CP DMA is done.
* It should be set on the last CP DMA packet. */
#define CP_DMA_SYNC (1 << 0)
#define R600_CP_DMA_SYNC (1 << 0) /* R600+ */
/* Set this if the source data was used as a destination in a previous CP DMA
* packet. It's for preventing a read-after-write (RAW) hazard between two
* CP DMA packets. */
#define CP_DMA_RAW_WAIT (1 << 1)
#define CP_DMA_USE_L2 (1 << 2)
#define CP_DMA_CLEAR (1 << 3)
#define SI_CP_DMA_RAW_WAIT (1 << 1) /* SI+ */
#define CIK_CP_DMA_USE_L2 (1 << 2)
/* Alignment for optimal performance. */
#define SI_CPDMA_ALIGNMENT 32
#define CP_DMA_ALIGNMENT 32
/* The max number of bytes to copy per packet. */
#define CP_DMA_MAX_BYTE_COUNT ((1 << 21) - CP_DMA_ALIGNMENT)
/* The max number of bytes that can be copied per packet. */
static inline unsigned cp_dma_max_byte_count(struct radv_cmd_buffer *cmd_buffer)
{
unsigned max = cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9 ?
S_414_BYTE_COUNT_GFX9(~0u) :
S_414_BYTE_COUNT_GFX6(~0u);
/* make it aligned for optimal performance */
return max & ~(SI_CPDMA_ALIGNMENT - 1);
}
/* Emit a CP DMA packet to do a copy from one buffer to another, or to clear
* a buffer. The size must fit in bits [20:0]. If CP_DMA_CLEAR is set, src_va is a 32-bit
* clear value.
*/
static void si_emit_cp_dma(struct radv_cmd_buffer *cmd_buffer,
uint64_t dst_va, uint64_t src_va,
unsigned size, unsigned flags)
static void si_emit_cp_dma_copy_buffer(struct radv_cmd_buffer *cmd_buffer,
uint64_t dst_va, uint64_t src_va,
unsigned size, unsigned flags)
{
struct radeon_winsys_cs *cs = cmd_buffer->cs;
uint32_t header = 0, command = 0;
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? S_411_CP_SYNC(1) : 0;
uint32_t wr_confirm = !(flags & R600_CP_DMA_SYNC) ? S_414_DISABLE_WR_CONFIRM_GFX6(1) : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? S_414_RAW_WAIT(1) : 0;
uint32_t sel = flags & CIK_CP_DMA_USE_L2 ?
S_411_SRC_SEL(V_411_SRC_ADDR_TC_L2) |
S_411_DSL_SEL(V_411_DST_ADDR_TC_L2) : 0;
assert(size);
assert(size <= cp_dma_max_byte_count(cmd_buffer));
assert((size & ((1<<21)-1)) == size);
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 9);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9)
command |= S_414_BYTE_COUNT_GFX9(size);
else
command |= S_414_BYTE_COUNT_GFX6(size);
/* Sync flags. */
if (flags & CP_DMA_SYNC)
header |= S_411_CP_SYNC(1);
else {
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9)
command |= S_414_DISABLE_WR_CONFIRM_GFX9(1);
else
command |= S_414_DISABLE_WR_CONFIRM_GFX6(1);
}
if (flags & CP_DMA_RAW_WAIT)
command |= S_414_RAW_WAIT(1);
/* Src and dst flags. */
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9 &&
!(flags & CP_DMA_CLEAR) &&
src_va == dst_va)
header |= S_411_DSL_SEL(V_411_NOWHERE); /* prefetch only */
else if (flags & CP_DMA_USE_L2)
header |= S_411_DSL_SEL(V_411_DST_ADDR_TC_L2);
if (flags & CP_DMA_CLEAR)
header |= S_411_SRC_SEL(V_411_DATA);
else if (flags & CP_DMA_USE_L2)
header |= S_411_SRC_SEL(V_411_SRC_ADDR_TC_L2);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) {
radeon_emit(cs, PKT3(PKT3_DMA_DATA, 5, 0));
radeon_emit(cs, header);
radeon_emit(cs, sync_flag | sel); /* CP_SYNC [31] */
radeon_emit(cs, src_va); /* SRC_ADDR_LO [31:0] */
radeon_emit(cs, src_va >> 32); /* SRC_ADDR_HI [31:0] */
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, dst_va >> 32); /* DST_ADDR_HI [31:0] */
radeon_emit(cs, command);
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */
} else {
assert(!(flags & CP_DMA_USE_L2));
header |= S_411_SRC_ADDR_HI(src_va >> 32);
radeon_emit(cs, PKT3(PKT3_CP_DMA, 4, 0));
radeon_emit(cs, src_va); /* SRC_ADDR_LO [31:0] */
radeon_emit(cs, header); /* SRC_ADDR_HI [15:0] + flags. */
radeon_emit(cs, sync_flag | ((src_va >> 32) & 0xffff)); /* CP_SYNC [31] | SRC_ADDR_HI [15:0] */
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, (dst_va >> 32) & 0xffff); /* DST_ADDR_HI [15:0] */
radeon_emit(cs, command);
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */
}
/* CP DMA is executed in ME, but index buffers are read by PFP.
@@ -1216,7 +975,7 @@ static void si_emit_cp_dma(struct radv_cmd_buffer *cmd_buffer,
* indices. If we wanted to execute CP DMA in PFP, this packet
* should precede it.
*/
if ((flags & CP_DMA_SYNC) && cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
if (sync_flag && cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0);
}
@@ -1224,14 +983,45 @@ static void si_emit_cp_dma(struct radv_cmd_buffer *cmd_buffer,
radv_cmd_buffer_trace_emit(cmd_buffer);
}
void si_cp_dma_prefetch(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
unsigned size)
/* Emit a CP DMA packet to clear a buffer. The size must fit in bits [20:0]. */
static void si_emit_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer,
uint64_t dst_va, unsigned size,
uint32_t clear_value, unsigned flags)
{
uint64_t aligned_va = va & ~(SI_CPDMA_ALIGNMENT - 1);
uint64_t aligned_size = ((va + size + SI_CPDMA_ALIGNMENT -1) & ~(SI_CPDMA_ALIGNMENT - 1)) - aligned_va;
struct radeon_winsys_cs *cs = cmd_buffer->cs;
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? S_411_CP_SYNC(1) : 0;
uint32_t wr_confirm = !(flags & R600_CP_DMA_SYNC) ? S_414_DISABLE_WR_CONFIRM_GFX6(1) : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? S_414_RAW_WAIT(1) : 0;
uint32_t dst_sel = flags & CIK_CP_DMA_USE_L2 ? S_411_DSL_SEL(V_411_DST_ADDR_TC_L2) : 0;
si_emit_cp_dma(cmd_buffer, aligned_va, aligned_va,
aligned_size, CP_DMA_USE_L2);
assert(size);
assert((size & ((1<<21)-1)) == size);
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 9);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) {
radeon_emit(cs, PKT3(PKT3_DMA_DATA, 5, 0));
radeon_emit(cs, sync_flag | dst_sel | S_411_SRC_SEL(V_411_DATA)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, clear_value); /* DATA [31:0] */
radeon_emit(cs, 0);
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, dst_va >> 32); /* DST_ADDR_HI [15:0] */
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */
} else {
radeon_emit(cs, PKT3(PKT3_CP_DMA, 4, 0));
radeon_emit(cs, clear_value); /* DATA [31:0] */
radeon_emit(cs, sync_flag | S_411_SRC_SEL(V_411_DATA)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, (dst_va >> 32) & 0xffff); /* DST_ADDR_HI [15:0] */
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */
}
/* See "copy_buffer" for explanation. */
if (sync_flag && cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0);
}
radv_cmd_buffer_trace_emit(cmd_buffer);
}
static void si_cp_dma_prepare(struct radv_cmd_buffer *cmd_buffer, uint64_t byte_count,
@@ -1243,14 +1033,14 @@ static void si_cp_dma_prepare(struct radv_cmd_buffer *cmd_buffer, uint64_t byte_
*/
if (cmd_buffer->state.flush_bits) {
si_emit_cache_flush(cmd_buffer);
*flags |= CP_DMA_RAW_WAIT;
*flags |= SI_CP_DMA_RAW_WAIT;
}
/* Do the synchronization after the last dma, so that all data
* is written to memory.
*/
if (byte_count == remaining_size)
*flags |= CP_DMA_SYNC;
*flags |= R600_CP_DMA_SYNC;
}
static void si_cp_dma_realign_engine(struct radv_cmd_buffer *cmd_buffer, unsigned size)
@@ -1258,20 +1048,20 @@ static void si_cp_dma_realign_engine(struct radv_cmd_buffer *cmd_buffer, unsigne
uint64_t va;
uint32_t offset;
unsigned dma_flags = 0;
unsigned buf_size = SI_CPDMA_ALIGNMENT * 2;
unsigned buf_size = CP_DMA_ALIGNMENT * 2;
void *ptr;
assert(size < SI_CPDMA_ALIGNMENT);
assert(size < CP_DMA_ALIGNMENT);
radv_cmd_buffer_upload_alloc(cmd_buffer, buf_size, SI_CPDMA_ALIGNMENT, &offset, &ptr);
radv_cmd_buffer_upload_alloc(cmd_buffer, buf_size, CP_DMA_ALIGNMENT, &offset, &ptr);
va = cmd_buffer->device->ws->buffer_get_va(cmd_buffer->upload.upload_bo);
va += offset;
si_cp_dma_prepare(cmd_buffer, size, size, &dma_flags);
si_emit_cp_dma(cmd_buffer, va, va + SI_CPDMA_ALIGNMENT, size,
dma_flags);
si_emit_cp_dma_copy_buffer(cmd_buffer, va, va + CP_DMA_ALIGNMENT, size,
dma_flags);
}
void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
@@ -1288,15 +1078,15 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
* just to align the internal counter. Otherwise, the DMA engine
* would slow down by an order of magnitude for following copies.
*/
if (size % SI_CPDMA_ALIGNMENT)
realign_size = SI_CPDMA_ALIGNMENT - (size % SI_CPDMA_ALIGNMENT);
if (size % CP_DMA_ALIGNMENT)
realign_size = CP_DMA_ALIGNMENT - (size % CP_DMA_ALIGNMENT);
/* If the copy begins unaligned, we must start copying from the next
* aligned block and the skipped part should be copied after everything
* else has been copied. Only the src alignment matters, not dst.
*/
if (src_va % SI_CPDMA_ALIGNMENT) {
skipped_size = SI_CPDMA_ALIGNMENT - (src_va % SI_CPDMA_ALIGNMENT);
if (src_va % CP_DMA_ALIGNMENT) {
skipped_size = CP_DMA_ALIGNMENT - (src_va % CP_DMA_ALIGNMENT);
/* The main part will be skipped if the size is too small. */
skipped_size = MIN2(skipped_size, size);
size -= skipped_size;
@@ -1307,14 +1097,14 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
while (size) {
unsigned dma_flags = 0;
unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer));
unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT);
si_cp_dma_prepare(cmd_buffer, byte_count,
size + skipped_size + realign_size,
&dma_flags);
si_emit_cp_dma(cmd_buffer, main_dest_va, main_src_va,
byte_count, dma_flags);
si_emit_cp_dma_copy_buffer(cmd_buffer, main_dest_va, main_src_va,
byte_count, dma_flags);
size -= byte_count;
main_src_va += byte_count;
@@ -1328,8 +1118,8 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
size + skipped_size + realign_size,
&dma_flags);
si_emit_cp_dma(cmd_buffer, dest_va, src_va,
skipped_size, dma_flags);
si_emit_cp_dma_copy_buffer(cmd_buffer, dest_va, src_va,
skipped_size, dma_flags);
}
if (realign_size)
si_cp_dma_realign_engine(cmd_buffer, realign_size);
@@ -1345,14 +1135,14 @@ void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
assert(va % 4 == 0 && size % 4 == 0);
while (size) {
unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer));
unsigned dma_flags = CP_DMA_CLEAR;
unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT);
unsigned dma_flags = 0;
si_cp_dma_prepare(cmd_buffer, byte_count, size, &dma_flags);
/* Emit the clear packet. */
si_emit_cp_dma(cmd_buffer, va, value, byte_count,
dma_flags);
si_emit_cp_dma_clear_buffer(cmd_buffer, va, byte_count, value,
dma_flags);
size -= byte_count;
va += byte_count;

View File

@@ -396,13 +396,6 @@ vk_format_is_int(VkFormat format)
return channel >= 0 && desc->channel[channel].pure_integer;
}
static inline bool
vk_format_is_srgb(VkFormat format)
{
const struct vk_format_description *desc = vk_format_description(format);
return desc->colorspace == VK_FORMAT_COLORSPACE_SRGB;
}
static inline VkFormat
vk_format_stencil_only(VkFormat format)
{

View File

@@ -467,29 +467,25 @@ radv_amdgpu_winsys_bo_set_metadata(struct radeon_winsys_bo *_bo,
struct amdgpu_bo_metadata metadata = {0};
uint32_t tiling_flags = 0;
if (bo->ws->info.chip_class >= GFX9) {
tiling_flags |= AMDGPU_TILING_SET(SWIZZLE_MODE, md->u.gfx9.swizzle_mode);
} else {
if (md->u.legacy.macrotile == RADEON_LAYOUT_TILED)
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 4); /* 2D_TILED_THIN1 */
else if (md->u.legacy.microtile == RADEON_LAYOUT_TILED)
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 2); /* 1D_TILED_THIN1 */
else
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 1); /* LINEAR_ALIGNED */
if (md->macrotile == RADEON_LAYOUT_TILED)
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 4); /* 2D_TILED_THIN1 */
else if (md->microtile == RADEON_LAYOUT_TILED)
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 2); /* 1D_TILED_THIN1 */
else
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 1); /* LINEAR_ALIGNED */
tiling_flags |= AMDGPU_TILING_SET(PIPE_CONFIG, md->u.legacy.pipe_config);
tiling_flags |= AMDGPU_TILING_SET(BANK_WIDTH, util_logbase2(md->u.legacy.bankw));
tiling_flags |= AMDGPU_TILING_SET(BANK_HEIGHT, util_logbase2(md->u.legacy.bankh));
if (md->u.legacy.tile_split)
tiling_flags |= AMDGPU_TILING_SET(TILE_SPLIT, radv_eg_tile_split_rev(md->u.legacy.tile_split));
tiling_flags |= AMDGPU_TILING_SET(MACRO_TILE_ASPECT, util_logbase2(md->u.legacy.mtilea));
tiling_flags |= AMDGPU_TILING_SET(NUM_BANKS, util_logbase2(md->u.legacy.num_banks)-1);
tiling_flags |= AMDGPU_TILING_SET(PIPE_CONFIG, md->pipe_config);
tiling_flags |= AMDGPU_TILING_SET(BANK_WIDTH, util_logbase2(md->bankw));
tiling_flags |= AMDGPU_TILING_SET(BANK_HEIGHT, util_logbase2(md->bankh));
if (md->tile_split)
tiling_flags |= AMDGPU_TILING_SET(TILE_SPLIT, radv_eg_tile_split_rev(md->tile_split));
tiling_flags |= AMDGPU_TILING_SET(MACRO_TILE_ASPECT, util_logbase2(md->mtilea));
tiling_flags |= AMDGPU_TILING_SET(NUM_BANKS, util_logbase2(md->num_banks)-1);
if (md->u.legacy.scanout)
tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 0); /* DISPLAY_MICRO_TILING */
else
tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 1); /* THIN_MICRO_TILING */
}
if (md->scanout)
tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 0); /* DISPLAY_MICRO_TILING */
else
tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 1); /* THIN_MICRO_TILING */
metadata.tiling_info = tiling_flags;
metadata.size_metadata = md->size_metadata;

View File

@@ -90,26 +90,25 @@ static int ring_to_hw_ip(enum ring_type ring)
}
static void radv_amdgpu_request_to_fence(struct radv_amdgpu_ctx *ctx,
struct radv_amdgpu_fence *fence,
struct amdgpu_cs_fence *fence,
struct amdgpu_cs_request *req)
{
fence->fence.context = ctx->ctx;
fence->fence.ip_type = req->ip_type;
fence->fence.ip_instance = req->ip_instance;
fence->fence.ring = req->ring;
fence->fence.fence = req->seq_no;
fence->user_ptr = (volatile uint64_t*)(ctx->fence_map + (req->ip_type * MAX_RINGS_PER_TYPE + req->ring) * sizeof(uint64_t));
fence->context = ctx->ctx;
fence->ip_type = req->ip_type;
fence->ip_instance = req->ip_instance;
fence->ring = req->ring;
fence->fence = req->seq_no;
}
static struct radeon_winsys_fence *radv_amdgpu_create_fence()
{
struct radv_amdgpu_fence *fence = calloc(1, sizeof(struct radv_amdgpu_fence));
struct radv_amdgpu_cs_fence *fence = calloc(1, sizeof(struct amdgpu_cs_fence));
return (struct radeon_winsys_fence*)fence;
}
static void radv_amdgpu_destroy_fence(struct radeon_winsys_fence *_fence)
{
struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence;
free(fence);
}
@@ -118,23 +117,16 @@ static bool radv_amdgpu_fence_wait(struct radeon_winsys *_ws,
bool absolute,
uint64_t timeout)
{
struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence;
unsigned flags = absolute ? AMDGPU_QUERY_FENCE_TIMEOUT_IS_ABSOLUTE : 0;
int r;
uint32_t expired = 0;
if (fence->user_ptr) {
if (*fence->user_ptr >= fence->fence.fence)
return true;
if (!absolute && !timeout)
return false;
}
/* Now use the libdrm query. */
r = amdgpu_cs_query_fence_status(&fence->fence,
timeout,
flags,
&expired);
r = amdgpu_cs_query_fence_status(fence,
timeout,
flags,
&expired);
if (r) {
fprintf(stderr, "amdgpu: radv_amdgpu_cs_query_fence_status failed.\n");
@@ -627,16 +619,6 @@ static int radv_amdgpu_create_bo_list(struct radv_amdgpu_winsys *ws,
return r;
}
static struct amdgpu_cs_fence_info radv_set_cs_fence(struct radv_amdgpu_ctx *ctx, int ip_type, int ring)
{
struct amdgpu_cs_fence_info ret = {0};
if (ctx->fence_map) {
ret.handle = radv_amdgpu_winsys_bo(ctx->fence_bo)->bo;
ret.offset = (ip_type * MAX_RINGS_PER_TYPE + ring) * sizeof(uint64_t);
}
return ret;
}
static void radv_assign_last_submit(struct radv_amdgpu_ctx *ctx,
struct amdgpu_cs_request *request)
{
@@ -655,7 +637,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
{
int r;
struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx);
struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence;
struct radv_amdgpu_cs *cs0 = radv_amdgpu_cs(cs_array[0]);
amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request = {0};
@@ -694,7 +676,6 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
request.number_of_ibs = 1;
request.ibs = &cs0->ib;
request.resources = bo_list;
request.fence_info = radv_set_cs_fence(ctx, cs0->hw_ip, queue_idx);
if (initial_preamble_cs) {
request.ibs = ibs;
@@ -732,7 +713,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
{
int r;
struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx);
struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence;
amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request;
@@ -759,7 +740,6 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
request.resources = bo_list;
request.number_of_ibs = cnt + !!preamble_cs;
request.ibs = ibs;
request.fence_info = radv_set_cs_fence(ctx, cs0->hw_ip, queue_idx);
if (preamble_cs) {
ibs[0] = radv_amdgpu_cs(preamble_cs)->ib;
@@ -809,14 +789,14 @@ static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
{
int r;
struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx);
struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence;
struct radv_amdgpu_cs *cs0 = radv_amdgpu_cs(cs_array[0]);
struct radeon_winsys *ws = (struct radeon_winsys*)cs0->ws;
amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request;
uint32_t pad_word = 0xffff1000U;
if (radv_amdgpu_winsys(ws)->info.chip_class == SI)
if (radv_amdgpu_winsys(ws)->family == FAMILY_SI)
pad_word = 0x80000000;
assert(cs_count);
@@ -878,7 +858,6 @@ static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
request.resources = bo_list;
request.number_of_ibs = 1;
request.ibs = &ib;
request.fence_info = radv_set_cs_fence(ctx, cs0->hw_ip, queue_idx);
r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1);
if (r) {
@@ -931,7 +910,7 @@ static int radv_amdgpu_winsys_cs_submit(struct radeon_winsys_ctx *_ctx,
if (!cs->ws->use_ib_bos) {
ret = radv_amdgpu_winsys_cs_submit_sysmem(_ctx, queue_idx, cs_array,
cs_count, initial_preamble_cs, continue_preamble_cs, _fence);
} else if (can_patch && cs_count > AMDGPU_CS_MAX_IBS_PER_SUBMIT && cs->ws->batchchain) {
} else if (can_patch && cs_count > AMDGPU_CS_MAX_IBS_PER_SUBMIT && false) {
ret = radv_amdgpu_winsys_cs_submit_chained(_ctx, queue_idx, cs_array,
cs_count, initial_preamble_cs, continue_preamble_cs, _fence);
} else {
@@ -999,15 +978,6 @@ static struct radeon_winsys_ctx *radv_amdgpu_ctx_create(struct radeon_winsys *_w
goto error_create;
}
ctx->ws = ws;
assert(AMDGPU_HW_IP_NUM * MAX_RINGS_PER_TYPE * sizeof(uint64_t) <= 4096);
ctx->fence_bo = ws->base.buffer_create(&ws->base, 4096, 8,
RADEON_DOMAIN_GTT,
RADEON_FLAG_CPU_ACCESS);
if (ctx->fence_bo)
ctx->fence_map = (uint64_t*)ws->base.buffer_map(ctx->fence_bo);
if (ctx->fence_map)
memset(ctx->fence_map, 0, 4096);
return (struct radeon_winsys_ctx *)ctx;
error_create:
FREE(ctx);
@@ -1017,7 +987,6 @@ error_create:
static void radv_amdgpu_ctx_destroy(struct radeon_winsys_ctx *rwctx)
{
struct radv_amdgpu_ctx *ctx = (struct radv_amdgpu_ctx *)rwctx;
ctx->ws->base.buffer_destroy(ctx->fence_bo);
amdgpu_cs_ctx_free(ctx->ctx);
FREE(ctx);
}
@@ -1028,9 +997,9 @@ static bool radv_amdgpu_ctx_wait_idle(struct radeon_winsys_ctx *rwctx,
struct radv_amdgpu_ctx *ctx = (struct radv_amdgpu_ctx *)rwctx;
int ip_type = ring_to_hw_ip(ring_type);
if (ctx->last_submission[ip_type][ring_index].fence.fence) {
if (ctx->last_submission[ip_type][ring_index].fence) {
uint32_t expired;
int ret = amdgpu_cs_query_fence_status(&ctx->last_submission[ip_type][ring_index].fence,
int ret = amdgpu_cs_query_fence_status(&ctx->last_submission[ip_type][ring_index],
1000000000ull, 0, &expired);
if (ret || !expired)

View File

@@ -42,19 +42,10 @@ enum {
MAX_RINGS_PER_TYPE = 8
};
struct radv_amdgpu_fence {
struct amdgpu_cs_fence fence;
volatile uint64_t *user_ptr;
};
struct radv_amdgpu_ctx {
struct radv_amdgpu_winsys *ws;
amdgpu_context_handle ctx;
struct radv_amdgpu_fence last_submission[AMDGPU_HW_IP_DMA + 1][MAX_RINGS_PER_TYPE];
struct radeon_winsys_bo *fence_bo;
uint64_t *fence_map;
struct amdgpu_cs_fence last_submission[AMDGPU_HW_IP_DMA + 1][MAX_RINGS_PER_TYPE];
};
static inline struct radv_amdgpu_ctx *

View File

@@ -35,39 +35,63 @@
#include "radv_amdgpu_surface.h"
#include "sid.h"
#include "ac_surface.h"
#ifndef NO_ENTRIES
#define NO_ENTRIES 32
#endif
static int radv_amdgpu_surface_sanity(const struct ac_surf_info *surf_info,
const struct radeon_surf *surf)
#ifndef NO_MACRO_ENTRIES
#define NO_MACRO_ENTRIES 16
#endif
#ifndef CIASICIDGFXENGINE_SOUTHERNISLAND
#define CIASICIDGFXENGINE_SOUTHERNISLAND 0x0000000A
#endif
static int radv_amdgpu_surface_sanity(const struct radeon_surf *surf)
{
unsigned type = RADEON_SURF_GET(surf->flags, TYPE);
if (!(surf->flags & RADEON_SURF_HAS_TILE_MODE_INDEX))
return -EINVAL;
if (!surf->blk_w || !surf->blk_h)
/* all dimension must be at least 1 ! */
if (!surf->npix_x || !surf->npix_y || !surf->npix_z ||
!surf->array_size)
return -EINVAL;
if (!surf->blk_w || !surf->blk_h || !surf->blk_d)
return -EINVAL;
switch (surf->nsamples) {
case 1:
case 2:
case 4:
case 8:
break;
default:
return -EINVAL;
}
switch (type) {
case RADEON_SURF_TYPE_1D:
if (surf_info->height > 1)
if (surf->npix_y > 1)
return -EINVAL;
/* fall through */
case RADEON_SURF_TYPE_2D:
case RADEON_SURF_TYPE_CUBEMAP:
if (surf_info->depth > 1 || surf_info->array_size > 1)
if (surf->npix_z > 1 || surf->array_size > 1)
return -EINVAL;
break;
case RADEON_SURF_TYPE_3D:
if (surf_info->array_size > 1)
if (surf->array_size > 1)
return -EINVAL;
break;
case RADEON_SURF_TYPE_1D_ARRAY:
if (surf_info->height > 1)
if (surf->npix_y > 1)
return -EINVAL;
/* fall through */
case RADEON_SURF_TYPE_2D_ARRAY:
if (surf_info->depth > 1)
if (surf->npix_z > 1)
return -EINVAL;
break;
default:
@@ -76,28 +100,453 @@ static int radv_amdgpu_surface_sanity(const struct ac_surf_info *surf_info,
return 0;
}
static void *ADDR_API radv_allocSysMem(const ADDR_ALLOCSYSMEM_INPUT * pInput)
{
return malloc(pInput->sizeInBytes);
}
static ADDR_E_RETURNCODE ADDR_API radv_freeSysMem(const ADDR_FREESYSMEM_INPUT * pInput)
{
free(pInput->pVirtAddr);
return ADDR_OK;
}
ADDR_HANDLE radv_amdgpu_addr_create(struct amdgpu_gpu_info *amdinfo, int family, int rev_id,
enum chip_class chip_class)
{
ADDR_CREATE_INPUT addrCreateInput = {0};
ADDR_CREATE_OUTPUT addrCreateOutput = {0};
ADDR_REGISTER_VALUE regValue = {0};
ADDR_CREATE_FLAGS createFlags = {{0}};
ADDR_E_RETURNCODE addrRet;
addrCreateInput.size = sizeof(ADDR_CREATE_INPUT);
addrCreateOutput.size = sizeof(ADDR_CREATE_OUTPUT);
regValue.noOfBanks = amdinfo->mc_arb_ramcfg & 0x3;
regValue.gbAddrConfig = amdinfo->gb_addr_cfg;
regValue.noOfRanks = (amdinfo->mc_arb_ramcfg & 0x4) >> 2;
regValue.backendDisables = amdinfo->backend_disable[0];
regValue.pTileConfig = amdinfo->gb_tile_mode;
regValue.noOfEntries = ARRAY_SIZE(amdinfo->gb_tile_mode);
if (chip_class == SI) {
regValue.pMacroTileConfig = NULL;
regValue.noOfMacroEntries = 0;
} else {
regValue.pMacroTileConfig = amdinfo->gb_macro_tile_mode;
regValue.noOfMacroEntries = ARRAY_SIZE(amdinfo->gb_macro_tile_mode);
}
createFlags.value = 0;
createFlags.useTileIndex = 1;
addrCreateInput.chipEngine = CIASICIDGFXENGINE_SOUTHERNISLAND;
addrCreateInput.chipFamily = family;
addrCreateInput.chipRevision = rev_id;
addrCreateInput.createFlags = createFlags;
addrCreateInput.callbacks.allocSysMem = radv_allocSysMem;
addrCreateInput.callbacks.freeSysMem = radv_freeSysMem;
addrCreateInput.callbacks.debugPrint = 0;
addrCreateInput.regValue = regValue;
addrRet = AddrCreate(&addrCreateInput, &addrCreateOutput);
if (addrRet != ADDR_OK)
return NULL;
return addrCreateOutput.hLib;
}
static int radv_compute_level(ADDR_HANDLE addrlib,
struct radeon_surf *surf, bool is_stencil,
unsigned level, unsigned type, bool compressed,
ADDR_COMPUTE_SURFACE_INFO_INPUT *AddrSurfInfoIn,
ADDR_COMPUTE_SURFACE_INFO_OUTPUT *AddrSurfInfoOut,
ADDR_COMPUTE_DCCINFO_INPUT *AddrDccIn,
ADDR_COMPUTE_DCCINFO_OUTPUT *AddrDccOut)
{
struct radeon_surf_level *surf_level;
ADDR_E_RETURNCODE ret;
AddrSurfInfoIn->mipLevel = level;
AddrSurfInfoIn->width = u_minify(surf->npix_x, level);
AddrSurfInfoIn->height = u_minify(surf->npix_y, level);
if (type == RADEON_SURF_TYPE_3D)
AddrSurfInfoIn->numSlices = u_minify(surf->npix_z, level);
else if (type == RADEON_SURF_TYPE_CUBEMAP)
AddrSurfInfoIn->numSlices = 6;
else
AddrSurfInfoIn->numSlices = surf->array_size;
if (level > 0) {
/* Set the base level pitch. This is needed for calculation
* of non-zero levels. */
if (is_stencil)
AddrSurfInfoIn->basePitch = surf->stencil_level[0].nblk_x;
else
AddrSurfInfoIn->basePitch = surf->level[0].nblk_x;
/* Convert blocks to pixels for compressed formats. */
if (compressed)
AddrSurfInfoIn->basePitch *= surf->blk_w;
}
ret = AddrComputeSurfaceInfo(addrlib,
AddrSurfInfoIn,
AddrSurfInfoOut);
if (ret != ADDR_OK)
return ret;
surf_level = is_stencil ? &surf->stencil_level[level] : &surf->level[level];
surf_level->offset = align64(surf->bo_size, AddrSurfInfoOut->baseAlign);
surf_level->slice_size = AddrSurfInfoOut->sliceSize;
surf_level->pitch_bytes = AddrSurfInfoOut->pitch * (is_stencil ? 1 : surf->bpe);
surf_level->npix_x = u_minify(surf->npix_x, level);
surf_level->npix_y = u_minify(surf->npix_y, level);
surf_level->npix_z = u_minify(surf->npix_z, level);
surf_level->nblk_x = AddrSurfInfoOut->pitch;
surf_level->nblk_y = AddrSurfInfoOut->height;
if (type == RADEON_SURF_TYPE_3D)
surf_level->nblk_z = AddrSurfInfoOut->depth;
else
surf_level->nblk_z = 1;
switch (AddrSurfInfoOut->tileMode) {
case ADDR_TM_LINEAR_ALIGNED:
surf_level->mode = RADEON_SURF_MODE_LINEAR_ALIGNED;
break;
case ADDR_TM_1D_TILED_THIN1:
surf_level->mode = RADEON_SURF_MODE_1D;
break;
case ADDR_TM_2D_TILED_THIN1:
surf_level->mode = RADEON_SURF_MODE_2D;
break;
default:
assert(0);
}
if (is_stencil)
surf->stencil_tiling_index[level] = AddrSurfInfoOut->tileIndex;
else
surf->tiling_index[level] = AddrSurfInfoOut->tileIndex;
surf->bo_size = surf_level->offset + AddrSurfInfoOut->surfSize;
/* Clear DCC fields at the beginning. */
surf_level->dcc_offset = 0;
surf_level->dcc_enabled = false;
/* The previous level's flag tells us if we can use DCC for this level. */
if (AddrSurfInfoIn->flags.dccCompatible &&
(level == 0 || AddrDccOut->subLvlCompressible)) {
AddrDccIn->colorSurfSize = AddrSurfInfoOut->surfSize;
AddrDccIn->tileMode = AddrSurfInfoOut->tileMode;
AddrDccIn->tileInfo = *AddrSurfInfoOut->pTileInfo;
AddrDccIn->tileIndex = AddrSurfInfoOut->tileIndex;
AddrDccIn->macroModeIndex = AddrSurfInfoOut->macroModeIndex;
ret = AddrComputeDccInfo(addrlib,
AddrDccIn,
AddrDccOut);
if (ret == ADDR_OK) {
surf_level->dcc_offset = surf->dcc_size;
surf_level->dcc_fast_clear_size = AddrDccOut->dccFastClearSize;
surf_level->dcc_enabled = true;
surf->dcc_size = surf_level->dcc_offset + AddrDccOut->dccRamSize;
surf->dcc_alignment = MAX2(surf->dcc_alignment, AddrDccOut->dccRamBaseAlign);
}
}
if (!is_stencil && AddrSurfInfoIn->flags.depth &&
surf_level->mode == RADEON_SURF_MODE_2D && level == 0) {
ADDR_COMPUTE_HTILE_INFO_INPUT AddrHtileIn = {0};
ADDR_COMPUTE_HTILE_INFO_OUTPUT AddrHtileOut = {0};
AddrHtileIn.flags.tcCompatible = AddrSurfInfoIn->flags.tcCompatible;
AddrHtileIn.pitch = AddrSurfInfoOut->pitch;
AddrHtileIn.height = AddrSurfInfoOut->height;
AddrHtileIn.numSlices = AddrSurfInfoOut->depth;
AddrHtileIn.blockWidth = ADDR_HTILE_BLOCKSIZE_8;
AddrHtileIn.blockHeight = ADDR_HTILE_BLOCKSIZE_8;
AddrHtileIn.pTileInfo = AddrSurfInfoOut->pTileInfo;
AddrHtileIn.tileIndex = AddrSurfInfoOut->tileIndex;
AddrHtileIn.macroModeIndex = AddrSurfInfoOut->macroModeIndex;
ret = AddrComputeHtileInfo(addrlib,
&AddrHtileIn,
&AddrHtileOut);
if (ret == ADDR_OK) {
surf->htile_size = AddrHtileOut.htileBytes;
surf->htile_slice_size = AddrHtileOut.sliceSize;
surf->htile_alignment = AddrHtileOut.baseAlign;
}
}
return 0;
}
static void radv_set_micro_tile_mode(struct radeon_surf *surf,
struct radeon_info *info)
{
uint32_t tile_mode = info->si_tile_mode_array[surf->tiling_index[0]];
if (info->chip_class >= CIK)
surf->micro_tile_mode = G_009910_MICRO_TILE_MODE_NEW(tile_mode);
else
surf->micro_tile_mode = G_009910_MICRO_TILE_MODE(tile_mode);
}
static unsigned cik_get_macro_tile_index(struct radeon_surf *surf)
{
unsigned index, tileb;
tileb = 8 * 8 * surf->bpe;
tileb = MIN2(surf->tile_split, tileb);
for (index = 0; tileb > 64; index++)
tileb >>= 1;
assert(index < 16);
return index;
}
static int radv_amdgpu_winsys_surface_init(struct radeon_winsys *_ws,
const struct ac_surf_info *surf_info,
struct radeon_surf *surf)
{
struct radv_amdgpu_winsys *ws = radv_amdgpu_winsys(_ws);
unsigned mode, type;
unsigned level, mode, type;
bool compressed;
ADDR_COMPUTE_SURFACE_INFO_INPUT AddrSurfInfoIn = {0};
ADDR_COMPUTE_SURFACE_INFO_OUTPUT AddrSurfInfoOut = {0};
ADDR_COMPUTE_DCCINFO_INPUT AddrDccIn = {0};
ADDR_COMPUTE_DCCINFO_OUTPUT AddrDccOut = {0};
ADDR_TILEINFO AddrTileInfoIn = {0};
ADDR_TILEINFO AddrTileInfoOut = {0};
int r;
r = radv_amdgpu_surface_sanity(surf_info, surf);
r = radv_amdgpu_surface_sanity(surf);
if (r)
return r;
AddrSurfInfoIn.size = sizeof(ADDR_COMPUTE_SURFACE_INFO_INPUT);
AddrSurfInfoOut.size = sizeof(ADDR_COMPUTE_SURFACE_INFO_OUTPUT);
AddrDccIn.size = sizeof(ADDR_COMPUTE_DCCINFO_INPUT);
AddrDccOut.size = sizeof(ADDR_COMPUTE_DCCINFO_OUTPUT);
AddrSurfInfoOut.pTileInfo = &AddrTileInfoOut;
type = RADEON_SURF_GET(surf->flags, TYPE);
mode = RADEON_SURF_GET(surf->flags, MODE);
compressed = surf->blk_w == 4 && surf->blk_h == 4;
struct ac_surf_config config;
/* MSAA and FMASK require 2D tiling. */
if (surf->nsamples > 1 ||
(surf->flags & RADEON_SURF_FMASK))
mode = RADEON_SURF_MODE_2D;
memcpy(&config.info, surf_info, sizeof(config.info));
config.is_3d = !!(type == RADEON_SURF_TYPE_3D);
config.is_cube = !!(type == RADEON_SURF_TYPE_CUBEMAP);
/* DB doesn't support linear layouts. */
if (surf->flags & (RADEON_SURF_Z_OR_SBUFFER) &&
mode < RADEON_SURF_MODE_1D)
mode = RADEON_SURF_MODE_1D;
return ac_compute_surface(ws->addrlib, &ws->info, &config, mode, surf);
/* Set the requested tiling mode. */
switch (mode) {
case RADEON_SURF_MODE_LINEAR_ALIGNED:
AddrSurfInfoIn.tileMode = ADDR_TM_LINEAR_ALIGNED;
break;
case RADEON_SURF_MODE_1D:
AddrSurfInfoIn.tileMode = ADDR_TM_1D_TILED_THIN1;
break;
case RADEON_SURF_MODE_2D:
AddrSurfInfoIn.tileMode = ADDR_TM_2D_TILED_THIN1;
break;
default:
assert(0);
}
/* The format must be set correctly for the allocation of compressed
* textures to work. In other cases, setting the bpp is sufficient. */
if (compressed) {
switch (surf->bpe) {
case 8:
AddrSurfInfoIn.format = ADDR_FMT_BC1;
break;
case 16:
AddrSurfInfoIn.format = ADDR_FMT_BC3;
break;
default:
assert(0);
}
} else {
AddrDccIn.bpp = AddrSurfInfoIn.bpp = surf->bpe * 8;
}
AddrDccIn.numSamples = AddrSurfInfoIn.numSamples = surf->nsamples;
AddrSurfInfoIn.tileIndex = -1;
/* Set the micro tile type. */
if (surf->flags & RADEON_SURF_SCANOUT)
AddrSurfInfoIn.tileType = ADDR_DISPLAYABLE;
else if (surf->flags & RADEON_SURF_Z_OR_SBUFFER)
AddrSurfInfoIn.tileType = ADDR_DEPTH_SAMPLE_ORDER;
else
AddrSurfInfoIn.tileType = ADDR_NON_DISPLAYABLE;
AddrSurfInfoIn.flags.color = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER);
AddrSurfInfoIn.flags.depth = (surf->flags & RADEON_SURF_ZBUFFER) != 0;
AddrSurfInfoIn.flags.cube = type == RADEON_SURF_TYPE_CUBEMAP;
AddrSurfInfoIn.flags.display = (surf->flags & RADEON_SURF_SCANOUT) != 0;
AddrSurfInfoIn.flags.pow2Pad = surf->last_level > 0;
AddrSurfInfoIn.flags.opt4Space = 1;
/* DCC notes:
* - If we add MSAA support, keep in mind that CB can't decompress 8bpp
* with samples >= 4.
* - Mipmapped array textures have low performance (discovered by a closed
* driver team).
*/
AddrSurfInfoIn.flags.dccCompatible = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER) &&
!(surf->flags & RADEON_SURF_DISABLE_DCC) &&
!compressed && AddrDccIn.numSamples <= 1 &&
((surf->array_size == 1 && surf->npix_z == 1) ||
surf->last_level == 0);
AddrSurfInfoIn.flags.noStencil = (surf->flags & RADEON_SURF_SBUFFER) == 0;
AddrSurfInfoIn.flags.compressZ = AddrSurfInfoIn.flags.depth;
/* noStencil = 0 can result in a depth part that is incompatible with
* mipmapped texturing. So set noStencil = 1 when mipmaps are requested (in
* this case, we may end up setting stencil_adjusted).
*
* TODO: update addrlib to a newer version, remove this, and
* use flags.matchStencilTileCfg = 1 as an alternative fix.
*/
if (surf->last_level > 0)
AddrSurfInfoIn.flags.noStencil = 1;
/* Set preferred macrotile parameters. This is usually required
* for shared resources. This is for 2D tiling only. */
if (AddrSurfInfoIn.tileMode >= ADDR_TM_2D_TILED_THIN1 &&
surf->bankw && surf->bankh && surf->mtilea && surf->tile_split) {
/* If any of these parameters are incorrect, the calculation
* will fail. */
AddrTileInfoIn.banks = surf->num_banks;
AddrTileInfoIn.bankWidth = surf->bankw;
AddrTileInfoIn.bankHeight = surf->bankh;
AddrTileInfoIn.macroAspectRatio = surf->mtilea;
AddrTileInfoIn.tileSplitBytes = surf->tile_split;
AddrTileInfoIn.pipeConfig = surf->pipe_config + 1; /* +1 compared to GB_TILE_MODE */
AddrSurfInfoIn.flags.opt4Space = 0;
AddrSurfInfoIn.pTileInfo = &AddrTileInfoIn;
/* If AddrSurfInfoIn.pTileInfo is set, Addrlib doesn't set
* the tile index, because we are expected to know it if
* we know the other parameters.
*
* This is something that can easily be fixed in Addrlib.
* For now, just figure it out here.
* Note that only 2D_TILE_THIN1 is handled here.
*/
assert(!(surf->flags & RADEON_SURF_Z_OR_SBUFFER));
assert(AddrSurfInfoIn.tileMode == ADDR_TM_2D_TILED_THIN1);
if (ws->info.chip_class == SI) {
if (AddrSurfInfoIn.tileType == ADDR_DISPLAYABLE) {
if (surf->bpe == 2)
AddrSurfInfoIn.tileIndex = 11; /* 16bpp */
else
AddrSurfInfoIn.tileIndex = 12; /* 32bpp */
} else {
if (surf->bpe == 1)
AddrSurfInfoIn.tileIndex = 14; /* 8bpp */
else if (surf->bpe == 2)
AddrSurfInfoIn.tileIndex = 15; /* 16bpp */
else if (surf->bpe == 4)
AddrSurfInfoIn.tileIndex = 16; /* 32bpp */
else
AddrSurfInfoIn.tileIndex = 17; /* 64bpp (and 128bpp) */
}
} else {
if (AddrSurfInfoIn.tileType == ADDR_DISPLAYABLE)
AddrSurfInfoIn.tileIndex = 10; /* 2D displayable */
else
AddrSurfInfoIn.tileIndex = 14; /* 2D non-displayable */
AddrSurfInfoOut.macroModeIndex = cik_get_macro_tile_index(surf);
}
}
surf->bo_size = 0;
surf->dcc_size = 0;
surf->dcc_alignment = 1;
surf->htile_size = surf->htile_slice_size = 0;
surf->htile_alignment = 1;
/* Calculate texture layout information. */
for (level = 0; level <= surf->last_level; level++) {
r = radv_compute_level(ws->addrlib, surf, false, level, type, compressed,
&AddrSurfInfoIn, &AddrSurfInfoOut, &AddrDccIn, &AddrDccOut);
if (r)
break;
if (level == 0) {
surf->bo_alignment = AddrSurfInfoOut.baseAlign;
surf->pipe_config = AddrSurfInfoOut.pTileInfo->pipeConfig - 1;
radv_set_micro_tile_mode(surf, &ws->info);
/* For 2D modes only. */
if (AddrSurfInfoOut.tileMode >= ADDR_TM_2D_TILED_THIN1) {
surf->bankw = AddrSurfInfoOut.pTileInfo->bankWidth;
surf->bankh = AddrSurfInfoOut.pTileInfo->bankHeight;
surf->mtilea = AddrSurfInfoOut.pTileInfo->macroAspectRatio;
surf->tile_split = AddrSurfInfoOut.pTileInfo->tileSplitBytes;
surf->num_banks = AddrSurfInfoOut.pTileInfo->banks;
surf->macro_tile_index = AddrSurfInfoOut.macroModeIndex;
} else {
surf->macro_tile_index = 0;
}
}
}
/* Calculate texture layout information for stencil. */
if (surf->flags & RADEON_SURF_SBUFFER) {
AddrSurfInfoIn.bpp = 8;
AddrSurfInfoIn.flags.depth = 0;
AddrSurfInfoIn.flags.stencil = 1;
/* This will be ignored if AddrSurfInfoIn.pTileInfo is NULL. */
AddrTileInfoIn.tileSplitBytes = surf->stencil_tile_split;
for (level = 0; level <= surf->last_level; level++) {
r = radv_compute_level(ws->addrlib, surf, true, level, type, compressed,
&AddrSurfInfoIn, &AddrSurfInfoOut, &AddrDccIn, &AddrDccOut);
if (r)
return r;
/* DB uses the depth pitch for both stencil and depth. */
if (surf->stencil_level[level].nblk_x != surf->level[level].nblk_x)
surf->stencil_adjusted = true;
if (level == 0) {
/* For 2D modes only. */
if (AddrSurfInfoOut.tileMode >= ADDR_TM_2D_TILED_THIN1) {
surf->stencil_tile_split =
AddrSurfInfoOut.pTileInfo->tileSplitBytes;
}
}
}
}
/* Recalculate the whole DCC miptree size including disabled levels.
* This is what addrlib does, but calling addrlib would be a lot more
* complicated.
*/
#if 0
if (surf->dcc_size && surf->last_level > 0) {
surf->dcc_size = align64(surf->bo_size >> 8,
ws->info.pipe_interleave_bytes *
ws->info.num_tile_pipes);
}
#endif
return 0;
}
static int radv_amdgpu_winsys_surface_best(struct radeon_winsys *rws,

View File

@@ -28,5 +28,6 @@
#include <amdgpu.h>
void radv_amdgpu_surface_init_functions(struct radv_amdgpu_winsys *ws);
ADDR_HANDLE radv_amdgpu_addr_create(struct amdgpu_gpu_info *amdinfo, int family, int rev_id, enum chip_class chip_class);
#endif /* RADV_AMDGPU_SURFACE_H */

View File

@@ -29,7 +29,6 @@
#include "radv_amdgpu_surface.h"
#include "radv_debug.h"
#include "amdgpu_id.h"
#include "ac_surface.h"
#include "xf86drm.h"
#include <stdio.h>
#include <stdlib.h>
@@ -40,30 +39,302 @@
#include "radv_amdgpu_bo.h"
#include "radv_amdgpu_surface.h"
#define CIK_TILE_MODE_COLOR_2D 14
#define CIK__GB_TILE_MODE__PIPE_CONFIG(x) (((x) >> 6) & 0x1f)
#define CIK__PIPE_CONFIG__ADDR_SURF_P2 0
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_8x16 4
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_16x16 5
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_16x32 6
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_32x32 7
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x16_8x16 8
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_8x16 9
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_8x16 10
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_16x16 11
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x16 12
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x32 13
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x64_32x32 14
#define CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_8X16 16
#define CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_16X16 17
static unsigned radv_cik_get_num_tile_pipes(struct amdgpu_gpu_info *info)
{
unsigned mode2d = info->gb_tile_mode[CIK_TILE_MODE_COLOR_2D];
switch (CIK__GB_TILE_MODE__PIPE_CONFIG(mode2d)) {
case CIK__PIPE_CONFIG__ADDR_SURF_P2:
return 2;
case CIK__PIPE_CONFIG__ADDR_SURF_P4_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_16x32:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_32x32:
return 4;
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x16_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x32:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x64_32x32:
return 8;
case CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_8X16:
case CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_16X16:
return 16;
default:
fprintf(stderr, "Invalid CIK pipe configuration, assuming P2\n");
assert(!"this should never occur");
return 2;
}
}
static const char *
get_chip_name(enum radeon_family family)
{
switch (family) {
case CHIP_TAHITI: return "AMD RADV TAHITI";
case CHIP_PITCAIRN: return "AMD RADV PITCAIRN";
case CHIP_VERDE: return "AMD RADV CAPE VERDE";
case CHIP_OLAND: return "AMD RADV OLAND";
case CHIP_HAINAN: return "AMD RADV HAINAN";
case CHIP_BONAIRE: return "AMD RADV BONAIRE";
case CHIP_KAVERI: return "AMD RADV KAVERI";
case CHIP_KABINI: return "AMD RADV KABINI";
case CHIP_HAWAII: return "AMD RADV HAWAII";
case CHIP_MULLINS: return "AMD RADV MULLINS";
case CHIP_TONGA: return "AMD RADV TONGA";
case CHIP_ICELAND: return "AMD RADV ICELAND";
case CHIP_CARRIZO: return "AMD RADV CARRIZO";
case CHIP_FIJI: return "AMD RADV FIJI";
case CHIP_POLARIS10: return "AMD RADV POLARIS10";
case CHIP_POLARIS11: return "AMD RADV POLARIS11";
case CHIP_POLARIS12: return "AMD RADV POLARIS12";
case CHIP_STONEY: return "AMD RADV STONEY";
default: return "AMD RADV unknown";
}
}
static bool
do_winsys_init(struct radv_amdgpu_winsys *ws, int fd)
{
if (!ac_query_gpu_info(fd, ws->dev, &ws->info, &ws->amdinfo))
return false;
struct amdgpu_buffer_size_alignments alignment_info = {};
struct amdgpu_heap_info vram, visible_vram, gtt;
struct drm_amdgpu_info_hw_ip dma = {};
struct drm_amdgpu_info_hw_ip compute = {};
drmDevicePtr devinfo;
int r;
int i, j;
/* Get PCI info. */
r = drmGetDevice2(fd, 0, &devinfo);
if (r) {
fprintf(stderr, "amdgpu: drmGetDevice2 failed.\n");
goto fail;
}
ws->info.pci_domain = devinfo->businfo.pci->domain;
ws->info.pci_bus = devinfo->businfo.pci->bus;
ws->info.pci_dev = devinfo->businfo.pci->dev;
ws->info.pci_func = devinfo->businfo.pci->func;
drmFreeDevice(&devinfo);
/* LLVM 5.0 is required for GFX9. */
if (ws->info.chip_class >= GFX9 && HAVE_LLVM < 0x0500) {
fprintf(stderr, "amdgpu: LLVM 5.0 is required, got LLVM %i.%i\n",
HAVE_LLVM >> 8, HAVE_LLVM & 255);
return false;
/* Query hardware and driver information. */
r = amdgpu_query_gpu_info(ws->dev, &ws->amdinfo);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_gpu_info failed.\n");
goto fail;
}
ws->addrlib = amdgpu_addr_create(&ws->info, &ws->amdinfo);
r = amdgpu_query_buffer_size_alignment(ws->dev, &alignment_info);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_buffer_size_alignment failed.\n");
goto fail;
}
r = amdgpu_query_heap_info(ws->dev, AMDGPU_GEM_DOMAIN_VRAM, 0, &vram);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram) failed.\n");
goto fail;
}
r = amdgpu_query_heap_info(ws->dev, AMDGPU_GEM_DOMAIN_VRAM,
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED, &visible_vram);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(visible_vram) failed.\n");
goto fail;
}
r = amdgpu_query_heap_info(ws->dev, AMDGPU_GEM_DOMAIN_GTT, 0, &gtt);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(gtt) failed.\n");
goto fail;
}
r = amdgpu_query_hw_ip_info(ws->dev, AMDGPU_HW_IP_DMA, 0, &dma);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(dma) failed.\n");
goto fail;
}
r = amdgpu_query_hw_ip_info(ws->dev, AMDGPU_HW_IP_COMPUTE, 0, &compute);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(compute) failed.\n");
goto fail;
}
ws->info.pci_id = ws->amdinfo.asic_id; /* TODO: is this correct? */
ws->info.vce_harvest_config = ws->amdinfo.vce_harvest_config;
switch (ws->info.pci_id) {
#define CHIPSET(pci_id, name, cfamily) case pci_id: ws->info.family = CHIP_##cfamily; break;
#include "pci_ids/radeonsi_pci_ids.h"
#undef CHIPSET
default:
fprintf(stderr, "amdgpu: Invalid PCI ID.\n");
goto fail;
}
if (ws->info.family >= CHIP_TONGA)
ws->info.chip_class = VI;
else if (ws->info.family >= CHIP_BONAIRE)
ws->info.chip_class = CIK;
else if (ws->info.family >= CHIP_TAHITI)
ws->info.chip_class = SI;
else {
fprintf(stderr, "amdgpu: Unknown family.\n");
goto fail;
}
/* family and rev_id are for addrlib */
switch (ws->info.family) {
case CHIP_TAHITI:
ws->family = FAMILY_SI;
ws->rev_id = SI_TAHITI_P_A0;
break;
case CHIP_PITCAIRN:
ws->family = FAMILY_SI;
ws->rev_id = SI_PITCAIRN_PM_A0;
break;
case CHIP_VERDE:
ws->family = FAMILY_SI;
ws->rev_id = SI_CAPEVERDE_M_A0;
break;
case CHIP_OLAND:
ws->family = FAMILY_SI;
ws->rev_id = SI_OLAND_M_A0;
break;
case CHIP_HAINAN:
ws->family = FAMILY_SI;
ws->rev_id = SI_HAINAN_V_A0;
break;
case CHIP_BONAIRE:
ws->family = FAMILY_CI;
ws->rev_id = CI_BONAIRE_M_A0;
break;
case CHIP_KAVERI:
ws->family = FAMILY_KV;
ws->rev_id = KV_SPECTRE_A0;
break;
case CHIP_KABINI:
ws->family = FAMILY_KV;
ws->rev_id = KB_KALINDI_A0;
break;
case CHIP_HAWAII:
ws->family = FAMILY_CI;
ws->rev_id = CI_HAWAII_P_A0;
break;
case CHIP_MULLINS:
ws->family = FAMILY_KV;
ws->rev_id = ML_GODAVARI_A0;
break;
case CHIP_TONGA:
ws->family = FAMILY_VI;
ws->rev_id = VI_TONGA_P_A0;
break;
case CHIP_ICELAND:
ws->family = FAMILY_VI;
ws->rev_id = VI_ICELAND_M_A0;
break;
case CHIP_CARRIZO:
ws->family = FAMILY_CZ;
ws->rev_id = CARRIZO_A0;
break;
case CHIP_STONEY:
ws->family = FAMILY_CZ;
ws->rev_id = STONEY_A0;
break;
case CHIP_FIJI:
ws->family = FAMILY_VI;
ws->rev_id = VI_FIJI_P_A0;
break;
case CHIP_POLARIS10:
ws->family = FAMILY_VI;
ws->rev_id = VI_POLARIS10_P_A0;
break;
case CHIP_POLARIS11:
ws->family = FAMILY_VI;
ws->rev_id = VI_POLARIS11_M_A0;
break;
case CHIP_POLARIS12:
ws->family = FAMILY_VI;
ws->rev_id = VI_POLARIS12_V_A0;
break;
default:
fprintf(stderr, "amdgpu: Unknown family.\n");
goto fail;
}
ws->addrlib = radv_amdgpu_addr_create(&ws->amdinfo, ws->family, ws->rev_id, ws->info.chip_class);
if (!ws->addrlib) {
fprintf(stderr, "amdgpu: Cannot create addrlib.\n");
return false;
goto fail;
}
ws->info.num_sdma_rings = MIN2(ws->info.num_sdma_rings, MAX_RINGS_PER_TYPE);
ws->info.num_compute_rings = MIN2(ws->info.num_compute_rings, MAX_RINGS_PER_TYPE);
assert(util_is_power_of_two(dma.available_rings + 1));
assert(util_is_power_of_two(compute.available_rings + 1));
ws->use_ib_bos = ws->info.chip_class >= CIK;
/* Set hardware information. */
ws->info.name = get_chip_name(ws->info.family);
ws->info.gart_size = gtt.heap_size;
ws->info.vram_size = vram.heap_size;
ws->info.visible_vram_size = visible_vram.heap_size;
/* convert the shader clock from KHz to MHz */
ws->info.max_shader_clock = ws->amdinfo.max_engine_clk / 1000;
ws->info.max_se = ws->amdinfo.num_shader_engines;
ws->info.max_sh_per_se = ws->amdinfo.num_shader_arrays_per_engine;
ws->info.has_uvd = 0;
ws->info.vce_fw_version = 0;
ws->info.has_userptr = TRUE;
ws->info.num_render_backends = ws->amdinfo.rb_pipes;
ws->info.clock_crystal_freq = ws->amdinfo.gpu_counter_freq;
ws->info.num_tile_pipes = radv_cik_get_num_tile_pipes(&ws->amdinfo);
ws->info.pipe_interleave_bytes = 256 << ((ws->amdinfo.gb_addr_cfg >> 4) & 0x7);
ws->info.has_virtual_memory = TRUE;
ws->info.sdma_rings = MIN2(util_bitcount(dma.available_rings),
MAX_RINGS_PER_TYPE);
ws->info.compute_rings = MIN2(util_bitcount(compute.available_rings),
MAX_RINGS_PER_TYPE);
/* Get the number of good compute units. */
ws->info.num_good_compute_units = 0;
for (i = 0; i < ws->info.max_se; i++)
for (j = 0; j < ws->info.max_sh_per_se; j++)
ws->info.num_good_compute_units +=
util_bitcount(ws->amdinfo.cu_bitmap[i][j]);
memcpy(ws->info.si_tile_mode_array, ws->amdinfo.gb_tile_mode,
sizeof(ws->amdinfo.gb_tile_mode));
ws->info.enabled_rb_mask = ws->amdinfo.enabled_rb_pipes_mask;
memcpy(ws->info.cik_macrotile_mode_array, ws->amdinfo.gb_macro_tile_mode,
sizeof(ws->amdinfo.gb_macro_tile_mode));
ws->info.gart_page_size = alignment_info.size_remote;
if (ws->info.chip_class == SI)
ws->info.gfx_ib_pad_with_type2 = TRUE;
ws->use_ib_bos = ws->family >= FAMILY_CI;
return true;
fail:
return false;
}
static void radv_amdgpu_winsys_query_info(struct radeon_winsys *rws,
@@ -82,7 +353,7 @@ static void radv_amdgpu_winsys_destroy(struct radeon_winsys *rws)
}
struct radeon_winsys *
radv_amdgpu_winsys_create(int fd, uint64_t debug_flags, uint64_t perftest_flags)
radv_amdgpu_winsys_create(int fd, uint32_t debug_flags)
{
uint32_t drm_major, drm_minor, r;
amdgpu_device_handle dev;
@@ -106,7 +377,6 @@ radv_amdgpu_winsys_create(int fd, uint64_t debug_flags, uint64_t perftest_flags)
if (debug_flags & RADV_DEBUG_NO_IBS)
ws->use_ib_bos = false;
ws->batchchain = !!(perftest_flags & RADV_PERFTEST_BATCHCHAIN);
LIST_INITHEAD(&ws->global_bo_list);
pthread_mutex_init(&ws->global_bo_list_lock, NULL);
ws->base.query_info = radv_amdgpu_winsys_query_info;

View File

@@ -29,7 +29,6 @@
#define RADV_AMDGPU_WINSYS_H
#include "radv_radeon_winsys.h"
#include "ac_gpu_info.h"
#include "addrlib/addrinterface.h"
#include <amdgpu.h>
#include "util/list.h"
@@ -42,8 +41,10 @@ struct radv_amdgpu_winsys {
struct amdgpu_gpu_info amdinfo;
ADDR_HANDLE addrlib;
uint32_t rev_id;
unsigned family;
bool debug_all_bos;
bool batchchain;
pthread_mutex_t global_bo_list_lock;
struct list_head global_bo_list;
unsigned num_buffers;

View File

@@ -29,7 +29,6 @@
#ifndef RADV_AMDGPU_WINSYS_PUBLIC_H
#define RADV_AMDGPU_WINSYS_PUBLIC_H
struct radeon_winsys *radv_amdgpu_winsys_create(int fd, uint64_t debug_flags,
uint64_t perftest_flags);
struct radeon_winsys *radv_amdgpu_winsys_create(int fd, uint32_t debug_flags);
#endif /* RADV_AMDGPU_WINSYS_PUBLIC_H */

View File

@@ -37,7 +37,6 @@ LOCAL_C_INCLUDES += \
LOCAL_EXPORT_C_INCLUDE_DIRS += \
$(intermediates)/nir \
$(MESA_TOP)/src/compiler \
$(MESA_TOP)/src/compiler/nir
LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/, \

View File

@@ -186,6 +186,7 @@ NIR_GENERATED_FILES = \
NIR_FILES = \
nir/nir.c \
nir/nir.h \
nir/nir_array.h \
nir/nir_builder.h \
nir/nir_clone.c \
nir/nir_constant_expressions.h \
@@ -207,7 +208,6 @@ NIR_FILES = \
nir/nir_lower_64bit_packing.c \
nir/nir_lower_alu_to_scalar.c \
nir/nir_lower_atomics.c \
nir/nir_lower_atomics_to_ssbo.c \
nir/nir_lower_bitmap.c \
nir/nir_lower_clamp_color_outputs.c \
nir/nir_lower_clip.c \

View File

@@ -622,14 +622,6 @@ struct ast_type_qualifier {
* is used.
*/
unsigned inner_coverage:1;
/** \name Layout qualifiers for GL_ARB_bindless_texture */
/** \{ */
unsigned bindless_sampler:1;
unsigned bindless_image:1;
unsigned bound_sampler:1;
unsigned bound_image:1;
/** \} */
}
/** \brief Set of flags, accessed by name. */
q;
@@ -844,7 +836,6 @@ public:
/* List of ast_declarator_list * */
exec_list declarations;
bool is_declaration;
const glsl_type *type;
};

View File

@@ -299,18 +299,12 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
* values must not diverge between shader invocations run together. If the
* values *do* diverge, then the behavior of the operation requiring a
* dynamically uniform expression is undefined.
*
* From section 4.1.7 of the ARB_bindless_texture spec:
*
* "Samplers aggregated into arrays within a shader (using square
* brackets []) can be indexed with arbitrary integer expressions."
*/
if (array->type->without_array()->is_sampler()) {
if (!state->is_version(400, 320) &&
!state->ARB_gpu_shader5_enable &&
!state->EXT_gpu_shader5_enable &&
!state->OES_gpu_shader5_enable &&
!state->has_bindless()) {
!state->OES_gpu_shader5_enable) {
if (state->is_version(130, 300))
_mesa_glsl_error(&loc, state,
"sampler arrays indexed with non-constant "

View File

@@ -107,35 +107,35 @@ verify_image_parameter(YYLTYPE *loc, _mesa_glsl_parse_state *state,
* qualifiers. [...] It is legal to have additional qualifiers
* on a formal parameter, but not to have fewer."
*/
if (actual->data.memory_coherent && !formal->data.memory_coherent) {
if (actual->data.image_coherent && !formal->data.image_coherent) {
_mesa_glsl_error(loc, state,
"function call parameter `%s' drops "
"`coherent' qualifier", formal->name);
return false;
}
if (actual->data.memory_volatile && !formal->data.memory_volatile) {
if (actual->data.image_volatile && !formal->data.image_volatile) {
_mesa_glsl_error(loc, state,
"function call parameter `%s' drops "
"`volatile' qualifier", formal->name);
return false;
}
if (actual->data.memory_restrict && !formal->data.memory_restrict) {
if (actual->data.image_restrict && !formal->data.image_restrict) {
_mesa_glsl_error(loc, state,
"function call parameter `%s' drops "
"`restrict' qualifier", formal->name);
return false;
}
if (actual->data.memory_read_only && !formal->data.memory_read_only) {
if (actual->data.image_read_only && !formal->data.image_read_only) {
_mesa_glsl_error(loc, state,
"function call parameter `%s' drops "
"`readonly' qualifier", formal->name);
return false;
}
if (actual->data.memory_write_only && !formal->data.memory_write_only) {
if (actual->data.image_write_only && !formal->data.image_write_only) {
_mesa_glsl_error(loc, state,
"function call parameter `%s' drops "
"`writeonly' qualifier", formal->name);
@@ -235,8 +235,6 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
formal->name);
return false;
}
val->variable_referenced()->data.must_be_shader_input = 1;
}
/* Verify that 'out' and 'inout' actual parameters are lvalues. */
@@ -283,7 +281,7 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
mode, formal->name,
actual->variable_referenced()->name);
return false;
} else if (!actual->is_lvalue(state)) {
} else if (!actual->is_lvalue()) {
_mesa_glsl_error(&loc, state,
"function parameter '%s %s' is not an lvalue",
mode, formal->name);
@@ -740,8 +738,8 @@ convert_component(ir_rvalue *src, const glsl_type *desired_type)
if (src->type->is_error())
return src;
assert(a <= GLSL_TYPE_IMAGE);
assert(b <= GLSL_TYPE_IMAGE);
assert(a <= GLSL_TYPE_BOOL);
assert(b <= GLSL_TYPE_BOOL);
if (a == b)
return src;
@@ -769,12 +767,6 @@ convert_component(ir_rvalue *src, const glsl_type *desired_type)
case GLSL_TYPE_INT64:
result = new(ctx) ir_expression(ir_unop_i642u, src);
break;
case GLSL_TYPE_SAMPLER:
result = new(ctx) ir_expression(ir_unop_unpack_sampler_2x32, src);
break;
case GLSL_TYPE_IMAGE:
result = new(ctx) ir_expression(ir_unop_unpack_image_2x32, src);
break;
}
break;
case GLSL_TYPE_INT:
@@ -917,22 +909,6 @@ convert_component(ir_rvalue *src, const glsl_type *desired_type)
break;
}
break;
case GLSL_TYPE_SAMPLER:
switch (b) {
case GLSL_TYPE_UINT:
result = new(ctx)
ir_expression(ir_unop_pack_sampler_2x32, desired_type, src);
break;
}
break;
case GLSL_TYPE_IMAGE:
switch (b) {
case GLSL_TYPE_UINT:
result = new(ctx)
ir_expression(ir_unop_pack_image_2x32, desired_type, src);
break;
}
break;
}
assert(result != NULL);
@@ -1528,7 +1504,8 @@ emit_inline_matrix_constructor(const glsl_type *type,
* components with zero.
*/
glsl_base_type param_base_type = first_param->type->base_type;
assert(first_param->type->is_float() || first_param->type->is_double());
assert(param_base_type == GLSL_TYPE_FLOAT ||
param_base_type == GLSL_TYPE_DOUBLE);
ir_variable *rhs_var =
new(ctx) ir_variable(glsl_type::get_instance(param_base_type, 4, 1),
"mat_ctor_vec",
@@ -1537,7 +1514,7 @@ emit_inline_matrix_constructor(const glsl_type *type,
ir_constant_data zero;
for (unsigned i = 0; i < 4; i++)
if (first_param->type->is_float())
if (param_base_type == GLSL_TYPE_FLOAT)
zero.f[i] = 0.0;
else
zero.d[i] = 0.0;
@@ -1952,13 +1929,6 @@ ast_function_expression::handle_method(exec_list *instructions,
return ir_rvalue::error_value(ctx);
}
static inline bool is_valid_constructor(const glsl_type *type,
struct _mesa_glsl_parse_state *state)
{
return type->is_numeric() || type->is_boolean() ||
(state->has_bindless() && (type->is_sampler() || type->is_image()));
}
ir_rvalue *
ast_function_expression::hir(exec_list *instructions,
struct _mesa_glsl_parse_state *state)
@@ -1991,21 +1961,9 @@ ast_function_expression::hir(exec_list *instructions,
/* Constructors for opaque types are illegal.
*
* From section 4.1.7 of the ARB_bindless_texture spec:
*
* "Samplers are represented using 64-bit integer handles, and may be "
* converted to and from 64-bit integers using constructors."
*
* From section 4.1.X of the ARB_bindless_texture spec:
*
* "Images are represented using 64-bit integer handles, and may be
* converted to and from 64-bit integers using constructors."
*/
if (constructor_type->contains_atomic() ||
(!state->has_bindless() && constructor_type->contains_opaque())) {
_mesa_glsl_error(& loc, state, "cannot construct %s type `%s'",
state->has_bindless() ? "atomic" : "opaque",
if (constructor_type->contains_opaque()) {
_mesa_glsl_error(& loc, state, "cannot construct opaque type `%s'",
constructor_type->name);
return ir_rvalue::error_value(ctx);
}
@@ -2048,7 +2006,7 @@ ast_function_expression::hir(exec_list *instructions,
state);
}
if (!is_valid_constructor(constructor_type, state))
if (!constructor_type->is_numeric() && !constructor_type->is_boolean())
return ir_rvalue::error_value(ctx);
/* Total number of components of the type being constructed. */
@@ -2078,7 +2036,7 @@ ast_function_expression::hir(exec_list *instructions,
return ir_rvalue::error_value(ctx);
}
if (!is_valid_constructor(result->type, state)) {
if (!result->type->is_numeric() && !result->type->is_boolean()) {
_mesa_glsl_error(& loc, state, "cannot construct `%s' from a "
"non-numeric data type",
constructor_type->name);
@@ -2170,51 +2128,10 @@ ast_function_expression::hir(exec_list *instructions,
/* Type cast each parameter and, if possible, fold constants.*/
foreach_in_list_safe(ir_rvalue, ir, &actual_parameters) {
const glsl_type *desired_type;
/* From section 5.4.1 of the ARB_bindless_texture spec:
*
* "In the following four constructors, the low 32 bits of the sampler
* type correspond to the .x component of the uvec2 and the high 32
* bits correspond to the .y component."
*
* uvec2(any sampler type) // Converts a sampler type to a
* // pair of 32-bit unsigned integers
* any sampler type(uvec2) // Converts a pair of 32-bit unsigned integers to
* // a sampler type
* uvec2(any image type) // Converts an image type to a
* // pair of 32-bit unsigned integers
* any image type(uvec2) // Converts a pair of 32-bit unsigned integers to
* // an image type
*/
if (ir->type->is_sampler() || ir->type->is_image()) {
/* Convert a sampler/image type to a pair of 32-bit unsigned
* integers as defined by ARB_bindless_texture.
*/
if (constructor_type != glsl_type::uvec2_type) {
_mesa_glsl_error(&loc, state, "sampler and image types can only "
"be converted to a pair of 32-bit unsigned "
"integers");
}
desired_type = glsl_type::uvec2_type;
} else if (constructor_type->is_sampler() ||
constructor_type->is_image()) {
/* Convert a pair of 32-bit unsigned integers to a sampler or image
* type as defined by ARB_bindless_texture.
*/
if (ir->type != glsl_type::uvec2_type) {
_mesa_glsl_error(&loc, state, "sampler and image types can only "
"be converted from a pair of 32-bit unsigned "
"integers");
}
desired_type = constructor_type;
} else {
desired_type =
glsl_type::get_instance(constructor_type->base_type,
ir->type->vector_elements,
ir->type->matrix_columns);
}
const glsl_type *desired_type =
glsl_type::get_instance(constructor_type->base_type,
ir->type->vector_elements,
ir->type->matrix_columns);
ir_rvalue *result = convert_component(ir, desired_type);
/* Attempt to convert the parameter to a constant valued expression.

Some files were not shown because too many files have changed in this diff Show More