Compare commits

...

2479 Commits

Author SHA1 Message Date
Bruce Cherniak
277621bbb7 swr: Remove need to allocate vertex buffer scratch space all in one go
Deferred deletion (via "fence_work") has obsoleted the need to allocate
all client vertex buffer scratch space in a single chunk.  Scratch
allocations are now valid until the referenced fence is complete.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-06-29 13:23:33 -05:00
Bruce Cherniak
2b27dcd075 swr: conditionally validate vertex buffer state
Vertex buffer state doesn't need to be validated on every call,
only on dirty _NEW_VERTEX or indexed draws.

Unconditional validation was introduced as part of patch 330d0607ed,
"remove pipe_index_buffer and set_index_buffer", with the expectation
we'd optimize later.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-06-29 13:23:33 -05:00
Tim Rowley
867e111769 swr: set dynamic vertex size
Reduces the memory footprint of the frontend processing by packing
vertices.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-29 13:23:33 -05:00
Eric Engestrom
9f6110ad32 scons: wait on subprocess' completion
Windows doesn't allow you to move a file that's opened, and Popen()
doesn't wait on its subprocess' completion before returning, which leads
to broken Windows build.

Fixes: 3fd425aed7 "build systems: uniformize git_sha1.h generation"
Suggested-by: Scott D Phillips <scott.d.phillips@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-29 17:38:26 +01:00
Eric Engestrom
3fd425aed7 build systems: uniformize git_sha1.h generation
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-29 16:24:58 +01:00
Marek Olšák
ccfac28835 radeonsi: set COMPUTE_DISPATCH_INITIATOR.ORDER_MODE = 1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-29 16:19:35 +02:00
Marek Olšák
af52e61935 radeonsi: use the DISPATCH packets to force COMPUTE_START_X/Y/Z = 0
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-29 16:19:35 +02:00
Rob Herring
a3d98ca62f Android: use symlinks for driver loading
Instead of having special driver loading logic for Android, create
symlinks to gallium_dri.so so we can use the standard loading logic.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-06-29 09:09:49 -05:00
Rob Herring
4aaa21d12e Android: i965: remove libdrm_intel dependency
Commit 7dd20bc3ee ("anv/i965: drop libdrm_intel dependency completely")
removed the libdrm_intel dependency for automake, but Android builds still
depended on it. Now the build requires a newer version of i915_drm.h and
fails on Android builds:

src/mesa/drivers/dri/i965/brw_performance_query.c:616:9: error: use of undeclared identifier 'I915_OA_FORMAT_A32u40_A4u32_B8_C8'
   case I915_OA_FORMAT_A32u40_A4u32_B8_C8:
        ^
src/mesa/drivers/dri/i965/brw_performance_query.c:1887:18: error: use of undeclared identifier 'I915_PARAM_SLICE_MASK'
      gp.param = I915_PARAM_SLICE_MASK;
                 ^
src/mesa/drivers/dri/i965/brw_performance_query.c:1893:18: error: use of undeclared identifier 'I915_PARAM_SUBSLICE_MASK'
      gp.param = I915_PARAM_SUBSLICE_MASK;
                 ^

Remove the libdrm_intel dependency for Android builds and add the necessary
include paths for the local copy of i915_drm.h.

Fixes: 7dd20bc ("anv/i965: drop libdrm_intel dependency completely")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-29 12:33:27 +01:00
Mauro Rossi
b693fd8464 android: anv: drop libdrm_intel dependency
In addition to Rob Herring "Android: i965: remove libdrm_intel dependency",
we can drop libdrm_intel dependency in anv for Android.

Please check if libdrm has to stay as shared dependency and drop this comment line.

Fixes: 7dd20bc ("anv/i965: drop libdrm_intel dependency completely")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-29 12:31:00 +01:00
Lucas Stach
4fb9f97047 etnaviv: fix memory leak when BO allocation fails
The resource struct is already allocated at this point and should be
freed properly.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-29 11:34:50 +02:00
Lucas Stach
b2a87ce34f etnaviv: fill in layer_stride for imported resources
The layer stride information is used in various parts of the driver,
so it needs to be present regardless if the driver allocated the
buffer itself or merely imported it from an external source.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-29 11:34:24 +02:00
Lionel Landwerlin
d8bf2861ad anv: use devinfo for number of thread/eu
It turns out Gen9LP has fewer threads per EU (6 vs 7).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2017-06-29 10:07:52 +01:00
Juan A. Suarez Romero
93b8dc4b94 intel: tools: add intel_aub.h as part of aubinator
Include intel_aub.h in the Makefile.tools.am

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-29 10:03:40 +02:00
Juan A. Suarez Romero
be5fe2153b intel: automake: include Makefile.drm.am
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-29 10:03:40 +02:00
Kenneth Graunke
40f842ab57 mesa: Require mipmap completeness for glCopyImageSubData() at times.
This patch makes glCopyImageSubData require mipmap completeness when the
texture object's built-in sampler object has a mipmapping MinFilter.
This is apparently the de facto behavior and mandated by Android's CTS.

One exception is that we ignore format based completeness rules
(specifically integer formats with linear filtering), as this is
also the de facto behavior that until recently was mandated by the
OpenGL 4.5 CTS.

This was discussed with both the OpenGL and OpenGL ES working groups,
and while everyone agrees this behavior is unfortunate and complicated,
it is what it is at this point.  There was little appetite for relaxing
restrictions given that all conformant Android drivers followed the
mipmapping rule, and all conformant GL 4.5 implementations ignored the
integer/linear rule.

Fixes (on i965):
dEQP-GLES31.functional.debug.negative_coverage.*.buffer.copy_image_sub_data

Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16224
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-06-28 22:29:41 -07:00
Timothy Arceri
6120fbc444 mesa: tidy up white space in pixelstore.c 2017-06-29 14:14:03 +10:00
Ian Romanick
e0acd62536 mesa: Refactor error checking for GL_TEXTURE_BASE_LEVEL vs texture targets
Add a big spec quotation justifying the error generated, which has
changed over the GL versions.

v2: Compact the spec quote based on a Khronos bug and discussion with Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2017-06-28 16:38:05 -07:00
Kenneth Graunke
d2c4f714d1 i965: Drop index buffer re-alignment code.
This shouldn't ever happen - GL requires it to be aligned:

   "Clients must align data elements consistent with the requirements
    of the client platform, with an additional base-level requirement
    that an offset within a buffer to a datum comprising N basic
    machine units be a multiple of N."

Mesa should reject unaligned index buffers for us - we shouldn't have
to handle them in the driver.

Note that Gallium already makes this assumption.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-28 16:21:44 -07:00
Timothy Arceri
c1b1cad586 mesa: add KHR_no_error support for glBlendFunc*()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
f21a764092 mesa: create some glBlendFunc*() helper functions
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
87bc32166a mesa: add KHR_no_error support for glBindFragDataLocation*()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
aed0fc5efd mesa: add bind_frag_data_location() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
cb209dae99 mesa: add KHR_no_error support for glGetUniformLocation()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
cc88eb97e0 mesa: inline _mesa_finish()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
d8143a4bde mesa: add KHR_no_error support for glDisableVertexA*A*()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
73e0140acc mesa: move error handling into disable_vertex_array_attrib() callers
This will let us just call disable_vertex_array_attrib() for
KHR_no_error support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
d731b18933 mesa: add KHR_no_error support for glEnableVertexA*A*()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:11 +10:00
Timothy Arceri
8e77fceedb mesa: add KHR_no_error support for glLogicOp()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:10 +10:00
Timothy Arceri
ccbcb3ca17 mesa: add logic_op() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:10 +10:00
Timothy Arceri
774580c8b9 mesa: add KHR_no_error support for glPixelStore*()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:10 +10:00
Timothy Arceri
9853ca6037 mesa: add pixel_storei() helper
Will be used to add KHR_no_error support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:10 +10:00
Timothy Arceri
7d8937d23c mesa: remove redundant error check
We do the same check in the shared code in the set_tex_parameterf()
call.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-29 08:54:10 +10:00
Ian Romanick
2e3f40272e mesa: GL_TEXTURE_BORDER_COLOR exists in OpenGL 1.0, so don't depend on GL_ARB_texture_border_clamp
On NV20 (and probably also on earlier NV GPUs that lack
GL_ARB_texture_border_clamp) fixes the following piglit tests:

    gl-1.0-beginend-coverage gltexparameter[if]{v,}
    push-pop-texture-state
    texwrap 1d
    texwrap 1d proj
    texwrap 2d proj
    texwrap formats

All told, 49 more tests pass on NV20 (10de:0201).

No changes on Intel CI run or RV250 (1002:4c66).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-28 14:51:05 -07:00
Ian Romanick
36bd4a5f21 genxml: Silence about a billion unused parameter warnings
v2: Use textwrap.dedent to make the source line a lot shorter.
Shortening (?) the line was requested by Jason.

v3: Simplify the texwrap.dedent usage.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-06-28 14:50:14 -07:00
Chad Versace
a56f0203c3 mesa: Fix Android build
The format_fallback.py script wants two arguments: 'csv-file' and
'out-file'.

Fixes: 20c99eaece "mesa: Add _mesa_format_fallback_rgbx_to_rgba() [v2]"
Reported-by: Rob Herring <robh@kernel.org>
2017-06-28 14:41:45 -07:00
Eero Tamminen
c35fd58688 i965: Fix anisotropic filtering for mag filter
Commit f8d69beed4 moving sampler
handling to genxml messed up change done by commit
6a7c5257ca.

This broke rendering in SynMark CSDof and TexFilterAniso tests.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101607

Thanks to Kevin, who spotted the actual typo!
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-28 13:33:28 -07:00
Lucas Stach
ec43605189 etnaviv: fix shader miscompilation with more than 16 labels
The labels array may change its virtual address on a reallocation, so
it is invalid to cache pointers into the array. Rather than using the
pointer directly, remember the array index.

Fixes miscompilation of shaders in glmark2 ideas, leading to GPU hangs.

Fixes: c9e8b49b (etnaviv: gallium driver for Vivante GPUs)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-28 22:04:30 +02:00
Dave Airlie
ff422500cc ac/nir: remove last remnants of v16i8
llvm doesn't need this workaround anymore.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-28 20:22:30 +01:00
Alex Smith
909184ac9c ac/nir: Use correct LLVM intrinsics for atomic ops on imageBuffers
The buffer intrinsics should be used instead of the image ones.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-28 21:05:04 +02:00
James Legg
69a17da037 ac/nir: assert printfs will fit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-28 21:05:04 +02:00
James Legg
6fc41bb4d5 ac/nir: Make intrinsic_name buffer long enough
When using cmpswap on an image, it was being trunctated to
lvm.amdgcn.image.atomic.cmpswa, with the coords type missing entirely.

v2: Add stable CC

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-28 21:05:04 +02:00
Chad Versace
2cde8ff545 i965/dri: Support R8G8B8A8 and R8G8B8X8 configs
The Android framework requires support for EGLConfigs with
HAL_PIXEL_FORMAT_RGBX_8888 and HAL_PIXEL_FORMAT_RGBA_8888.

Even though all RGBX formats are disabled on gen9 by
brw_surface_formats.c, the new configs work correctly on Broxton thanks
to _mesa_format_fallback_rgbx_to_rgba().

On GLX, this creates no new configs, and therefore breaks no existing
apps. See in-patch comments for explanation. I tested with glxinfo and
glxgears on Skylake.

On Wayland, this also creates no new configs, and therfore breaks no
existing apps. (I tested with mesa-demos' eglinfo and es2gears_wayland
on Skylake). The reason differs from GLX, though. In
dri2_wl_add_configs_for_visual(), the format table contains only
B8G8R8X8, B8G8R8A8, and B5G6B5; and dri2_add_config() correctly matches
EGLConfig to format by inspecting channel masks.

On Android, in Chrome OS, I tested this on a Broxton device. I confirmed
that the Google Play Store's EGLSurface used HAL_PIXEL_FORMAT_RGBA_8888,
and that an Asteroid game's EGLSurface used HAL_PIXEL_FORMAT_RGBX_8888.
Both apps worked well. (Disclaimer: I didn't test this patch on Android
with Mesa master. I backported this patch series to an older Android
branch).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-28 09:42:27 -07:00
Juan A. Suarez Romero
89d4008ac8 mesa: do not use format string as literal string
This fixes a couple of  errors when building in Android:

external/mesa3d/src/mesa/main/shaderapi.c:293:49: error: format string
is not a string literal (potentially insecure)
[-Werror,-Wformat-security]
         _mesa_error(ctx, GL_INVALID_OPERATION, caller);
                                                ^~~~~~
external/mesa3d/src/mesa/main/shaderapi.c:293:49: note: treat the string
as an argument to avoid this
         _mesa_error(ctx, GL_INVALID_OPERATION, caller);
                                                ^
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-28 16:19:55 +02:00
Brian Paul
7bbcf3ac70 scons: add code to generate format_fallback.c file
Fixes: a1983223d8 "mesa: Add _mesa_format_fallback_rgbx_to_rgba() [v2]"

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-28 04:12:25 -06:00
Samuel Pitoiset
e529ade0ea mesa: add KHR_no_error support for glClear()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
34e8d0e4ba mesa: add clear() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
48400e0bd6 mesa: add KHR_no_error support for glBindAttribLocation()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
34e5b39f37 mesa: add bind_attrib_location() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
352adb53db mesa: add KHR_no_error support for gl*ReadBuffer()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
91fcba9914 mesa: create read_buffer_err() and always inline read_buffer()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
89bc3ed7a3 mesa: add KHR_no_error support for glVertex*AttribBinding()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
401fa69132 mesa: add KHR_no_error support for glShaderStorageBlockBinding()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
edd5082861 mesa: add shader_storage_block_binding() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
6a2c1e76f2 mesa: add KHR_no_error support for glUniformBlockBinding()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
277135c1ed mesa: add uniform_block_binding() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
f543107256 mesa: add KHR_no_error support for glFenceSync()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
dd71fd1dd3 mesa: add fence_sync() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
6e0cd29132 mesa: add KHR_no_error support for glClientWaitSync()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
20ff1f9db7 mesa: add client_wait_sync() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
78d3510f0c mesa: add KHR_no_error support for glCheckFramebufferStatus()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
b87a2cbec4 mesa: add KHR_no_error support for gl*Renderbuffers()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
beb74c9b87 mesa: prepare create_render_buffers() for KHR_no_error support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
836b48a836 mesa: add KHR_no_error support for gl*ProgramPipelines()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
89510d26a9 mesa: prepare create_program_pipelines() for KHR_no_error support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:13 +02:00
Samuel Pitoiset
18e31bb252 mesa: add KHR_no_error support for gl*Samplers()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
455b1a3a4b mesa: prepare create_samplers() helper for KHR_no_error support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
56f428817f mesa: add KHR_no_error support for gl*Textures()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
ab6d383e32 mesa: prepare create_textures() helper for KHR_no_error support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
821b806d23 mesa: fix an error message in create_textures()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
7c02267673 mesa: add KHR_no_error support for gl*Buffers()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
064bb7499c mesa: prepare create_buffers() helper for KHR_no_error support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
50f9f510c9 mesa: add KHR_no_error support for glBindTextureUnit()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
81968cb748 mesa: add bind_texture_unit() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
3561d93668 mesa: add KHR_no_error support for glDepthRangeIndexed()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
9628282f1e mesa: add KHR_no_error support for glDepthFunc()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
bbc03839d1 mesa: add depth_func() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
eaa477104c mesa: add KHR_no_error support for glFrontFace()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
d77ad9da63 mesa: add front_face() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
d700ade81a mesa: add KHR_no_error support for glCullFace()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
ac92b75002 mesa: add cull_face() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
03dc92ad97 mesa: add KHR_no_error support for glCreateShader() and glCreateShaderObjectARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
c1782e44d0 mesa: rename create_shader() to create_shader_err()
And add a no_error variant.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
868c9c244d mesa: pass the 'caller' function to create_shader()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
8863996940 mesa: add KHR_no_error support for glAttachShader() and glAttachObjectARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
f04a5b4df0 mesa: rename attach_shader() to attach_shader_err()
And add a no_error variant.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Samuel Pitoiset
3ae7777d25 mesa: pass the 'caller' function to attach_shader()
In order to fix GL error messages.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-28 10:25:12 +02:00
Ben Crocker
c26be2b2e9 mapi: Enable assembly language API acceleration for PPC64LE (V2)
Implement assembly language API acceleration for PPC64LE,
analogous to long-standing implementations for X86 and X86-64.

See also similar implementation in libglvnd.

Tested with Piglit.

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
2017-06-28 08:20:45 +01:00
Chad Versace
74db56b97a i965: Add a RGBX->RGBA fallback for glEGLImageTextureTarget2D()
This enables support for importing RGBX8888 EGLImage textures on
Skylake.

Chrome OS needs support for RGBX8888 EGLImage textures because because
the Android framework produces HAL_PIXEL_FORMAT_RGBX8888 winsys
surfaces, which the Chrome OS compositor consumes as dma_bufs.  On
hardware for which RGBX is unsupported or disabled, normally core Mesa
provides the RGBX->RGBA fallback during glTexStorage.  But the DRIimage
code bypasses core Mesa, so we must do the fallback in i965.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 16:56:30 -07:00
Chad Versace
a1983223d8 mesa: Add _mesa_format_fallback_rgbx_to_rgba() [v2]
The new function takes a mesa_format and, if the format is an alpha
format with a non-alpha variant, returns the non-alpha format.
Otherwise, it returns the original format.

Example:
  input -> output

  // Fallback exists
  MESA_FORMAT_R8G8B8X8_UNORM -> MESA_FORMAT_R8G8B8A8_UNORM
  MESA_FORMAT_RGBX_UNORM16 -> MESA_FORMAT_RGBA_UNORM16

  // No fallback
  MESA_FORMAT_R8G8B8A8_UNORM -> MESA_FORMAT_R8G8B8A8_UNORM
  MESA_FORMAT_Z_FLOAT32 -> MESA_FORMAT_Z_FLOAT32

i965 will use this for EGLImages and DRIimages.

v2 (Jason Ekstrand):
 - Use mako
 - Rework to be easier to read
 - Write directly to the output file

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 16:56:28 -07:00
Marek Olšák
4a10d6154e radeonsi: move instance divisors into a constant buffer
Shader key size: 107 -> 47

Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors
are loaded from a constant buffer.

The shader code doing the division is huge. Is it something we need to
worry about? Does any app use instance divisors >= 2?

VS prolog disassembly:
    s_load_dwordx4 s[12:15], s[0:1], 0x80  ; C00A0300 00000080
    s_nop 0                                ; BF800000
    s_waitcnt lgkmcnt(0)                   ; BF8C007F
    s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004
    s_waitcnt lgkmcnt(0)                   ; BF8C007F
    v_cvt_f32_u32_e32 v4, s14              ; 7E080C0E
    v_rcp_iflag_f32_e32 v4, v4             ; 7E084704
    v_mul_f32_e32 v4, 0x4f800000, v4       ; 0A0808FF 4F800000
    v_cvt_u32_f32_e32 v4, v4               ; 7E080F04
    v_mul_hi_u32 v5, v4, s14               ; D2860005 00001D04
    v_mul_lo_i32 v6, v4, s14               ; D2850006 00001D04
    v_cmp_eq_u32_e64 s[12:13], 0, v5       ; D0CA000C 00020A80
    v_sub_i32_e32 v5, vcc, 0, v6           ; 340A0C80
    v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06
    v_mul_hi_u32 v5, v5, v4                ; D2860005 00020905
    v_add_i32_e32 v6, vcc, v5, v4          ; 320C0905
    v_subrev_i32_e32 v4, vcc, v5, v4       ; 36080905
    v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04
    v_mul_hi_u32 v5, v4, v1                ; D2860005 00020304
    v_add_i32_e32 v4, vcc, s8, v0          ; 32080008
    v_mul_lo_i32 v6, v5, s14               ; D2850006 00001D05
    v_add_i32_e32 v7, vcc, 1, v5           ; 320E0A81
    v_cmp_ge_u32_e64 s[12:13], v1, v6      ; D0CE000C 00020D01
    v_sub_i32_e32 v6, vcc, v1, v6          ; 340C0D01
    v_cmp_le_u32_e32 vcc, s14, v6          ; 7D960C0E
    v_cndmask_b32_e64 v8, 0, -1, s[12:13]  ; D1000008 00318280
    v_cndmask_b32_e64 v6, 0, -1, vcc       ; D1000006 01A98280
    v_and_b32_e32 v6, v8, v6               ; 260C0D08
    v_cmp_eq_u32_e32 vcc, 0, v6            ; 7D940C80
    v_cndmask_b32_e32 v6, v7, v5, vcc      ; 000C0B07
    v_add_i32_e32 v5, vcc, -1, v5          ; 320A0AC1
    v_cmp_eq_u32_e32 vcc, 0, v8            ; 7D941080
    v_cndmask_b32_e32 v5, v6, v5, vcc      ; 000A0B06
    v_add_i32_e32 v5, vcc, s9, v5          ; 320A0A09

v2: set prefer_mono for fetched instance divisors

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 19:55:09 +02:00
Marek Olšák
aef998fe4b radeonsi: check nr_cbufs in other places before flushing CB
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:46:12 +02:00
Marek Olšák
f9a7e7fe14 radeonsi: use #pragma pack to pack si_shader_key
sizeof(struct si_shader_key):
  Before reverting the 2 commits: 120 bytes
  After reverting the 2 commits: 128 bytes
  With #pragma pack: 107 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:45:07 +02:00
Marek Olšák
77d2a98353 Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs"
This reverts commit 7b2240ac9c.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:45:07 +02:00
Marek Olšák
dbe45e1180 Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy"
This reverts commit 6b6fed3a3c.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:45:07 +02:00
Marek Olšák
984f7feeb4 mesa: optimize GL_PRIMITIVE_RESTART_NV more
And other client state changes don't have to call
update_derived_primitive_restart_state.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:45:07 +02:00
Marek Olšák
bcf5d5ce40 mesa: fix clip plane enable breakage
Broken by:

commit 00173d91b7
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Sat Jun 10 12:09:43 2017 +0200

    mesa: don't flag _NEW_TRANSFORM for st/mesa if possible

It also optimizes the case slightly for GL core.

It doesn't try to fix that glEnable might be a bad place to do the
clip plane transformation.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-06-27 18:45:07 +02:00
Leo Liu
fad0b47219 radeon/vcn: enable h264 decode entension support
It's enabled through message buffer for UVD

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-06-27 10:59:44 -04:00
Charmaine Lee
b2e78e79d7 svga: clean up format_cap_table
Per Jose's suggestion, this patch cleans up format_cap_table to remove
the unnecessary default cap value for vgpu10 formats since those devcap values
can be retrieved from the device.

Tested with MTT conform, glretrace, piglit in HWv13 and HWv8.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-27 07:49:03 -06:00
Charmaine Lee
122ca27a48 svga: fix the default devcap for SVGA3D_Z_D24S8_INT
The default devcap for format SVGA3D_Z_D24S8_INT in HWv8 when its devcap is
not explicitly advertised should be set to zero to match the default value
in the device.

Tested with MTT piglit in HW version 8.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-27 07:49:02 -06:00
Charmaine Lee
eea6223184 svga: create buffer surfaces for incompatible bind flags
In cases where certain bind flags cannot be enabled together,
such as CONSTANT_BUFFER cannot be combined with any other flags,
a separate host surface will be created.
For example, if a stream output buffer is reused as a constant buffer,
two host surfaces will be created, one for stream output,
and another one for constant buffer. Data will be copied from the
stream output surface to the constant buffer surface.

Fixes piglit test ext_transform_feedback-immediate-reuse-index-buffer,
                  ext_transform_feedback-immediate-reuse-uniform-buffer

Tested with MTT piglit, MTT glretrace, Nature, NobelClinician Viewer, Tropics.

v2: Fix bind flags compatibility check as suggested by Brian.
v3: Use the list utility to maintain the buffer surface list.
v4: Use the SAFE rev of LIST_FOR_EACH_ENTRY

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-27 07:49:02 -06:00
Charmaine Lee
7abfb0b0d5 svga: do not unconditionally enable streamout bind flag
Currently we unconditionally enable streamout bind flag at
buffer resource creation time. This is not necessary if the buffer
is never used as a streamout buffer. With this patch, we enable
streamout bind flag as indicated by the state tracker. If the buffer
is later bound to streamout and does not already has streamout bind
flag enabled, we will recreate the buffer with
the new set of bind flags. Buffer content will be copied
from the old buffer to the new one.

Tested with MTT piglit, Nature, Tropics, Lightsmark.

v2: Fix bind flags check as suggested by Brian.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-27 07:49:02 -06:00
Charmaine Lee
b549f5e6b1 svga: pass tobind_flags to svga_buffer_handle
This is to prepare for more bind_flags optimization
in subsequent patches.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-27 07:49:02 -06:00
Charmaine Lee
4a79b508a4 svga: pass bind_flags to surface create functions
This is to prepare for other bind_flags optimization
in subsequent patches.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-27 07:49:02 -06:00
Brian Paul
ce608784d0 pipe_loader_sw: fix compilation warning
Add the new 'flags' parameter to pipe_loader_sw_create_screen().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-27 07:49:02 -06:00
Eric Engestrom
b3eda74acf mesa: add missing include
src/mesa/drivers/x11/xm_dd.c:688:7: warning: implicit declaration of function ‘_mesa_update_draw_buffer_bounds’; did you mean ‘_mesa_has_ARB_draw_buffers_blend’? [-Wimplicit-function-declaration]
       _mesa_update_draw_buffer_bounds(ctx, ctx->DrawBuffer);
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Marek Olšák <marek.olsak@amd.com>
Fixes: 585c5cf8a5 ("mesa: don't update draw buffer bounds in
			      _mesa_update_state")
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-27 14:33:49 +01:00
Lionel Landwerlin
3e0d54d270 i965: perf: add support for Geminilake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:30 +03:00
Lionel Landwerlin
9a50fc7cfc i965: perf: add support for Kabylake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:30 +03:00
Lionel Landwerlin
8ff086fa68 i965: perf: use gen_device_info rather then brw_context
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:30 +03:00
Robert Bragg
e277ff41c0 i965: perf: ensure isolated timer reports while idle don't confuse filtering
From experimentation in IGT, we found that the OA unit might label
some report as "idle" (using an invalid context ID), right after a
report for a given context. Deltas generated by those reports actually
belong to the previous context, even though they're not labelled as
such.

This change makes ensure that while reading OA reports, we only
consider the GPU actually idle after 2 reports with an invalid context
ID.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:29 +03:00
Lionel Landwerlin
31b11f69f7 i965: perf: keep on reading reports until delimiting timestamp
Due to an underlying hardware race condition, we have no guarantee
that all the reports coming from the OA buffer related to the workload
we're trying to measure have landed to memory by the time all the work
submitted has completed. That means we need to keep on reading the OA
stream until we read a report with a timestamp more recent than the
timestamp recored by the MI_REPORT_PERF_COUNT at the end of the
performance query.

v2: fix uninitialized offset variable to 0 (Lionel)

v3: rework the reading to avoid blocking the user of the API unless
    requested (Rob)

v4: fix a bug that makes the i965 driver reading the perf stream when
    not necessary, leading to very long counter accumulation times
    (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:29 +03:00
Robert Bragg
1fc7b95127 i965: Add Gen8+ INTEL_performance_query support
Enables access to OA unit metrics on Gen8+ via INTEL_performance_query.

v2: make use of new parameters coming from gen_device_info (Lionel)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:29 +03:00
Robert Bragg
243909d41e i965: Add XML OA metric sets for Gen8+
Also updates Makefile.am to generate corresponding normalization code.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:29 +03:00
Robert Bragg
e74972a3a6 i965: Add Gen8+ sys_vars for generated OA code
In preparation for adding XML OA metric set descriptions for Gen 8 and 9
which will result in auto generated code that depends on a number of new
system variables ($EuSubslicesTotalCount, $EuThreadsCount and
$SliceMask) this adds corresponding members to brw->perf.sys_vars.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:29 +03:00
Lionel Landwerlin
7dd20bc3ee anv/i965: drop libdrm_intel dependency completely
With Ken's work to drop the library dependency on libdrm_intel, we now
only depend on libdrm for the kernel uapi headers it provides. It
seems like we're better off just embeddeding those headers ourselves,
making the lives of people developping news features tightly
integrated with the kernel a tiny bit easier.

This change also makes it a bit more obvious what cflags/libs are
required by the i915 drivers vs i965, by renaming INTEL_CFLAGS/LIBS
into I915_CFLAGS/LIBS.

Headers were generated from drm-tip on the following commit :

   commit 6d61e70ccc21606ffb8a0a03bd3aba24f659502b
   Merge: 338ffbf7cb5e c0bc126f97fb
   Author: Dave Airlie <airlied@redhat.com>
   Date:   Tue Jun 27 07:24:49 2017 +1000

       Backmerge tag 'v4.12-rc7' into drm-next

v2: Use installed files from the kernel (Daniel Vetter)

v3: Use headers from drm-next rather than drm-tip (Dave/Daniel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:29 +03:00
Lionel Landwerlin
3c50ebce25 i915: use different CFLAGS/LIBS variables than i965/anv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-27 14:10:29 +03:00
Lionel Landwerlin
230691b8e5 aubinator: import intel_aub.h from libdrm
This enables us to compile aubinator without the libdrm dependency.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:28 +03:00
Lionel Landwerlin
adafe4b733 i965: perf: minimize the chances to spread queries across batchbuffers
Counter related to timings will be sensitive to any delay introduced
by the software. In particular if our begin & end of performance
queries end up in different batches, time related counters will
exhibit biffer values caused by the time it takes for the kernel
driver to load new requests into the hardware.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-27 14:10:25 +03:00
Juan A. Suarez Romero
7ee409dd4e nir: implement GLSL.std.450 NMax, NMIn and NClamp operations
v2: NIR fmax/fmin already handles NaN (Connor).

Reviewed by: Elie Tournier <elie.tournier@collabora.com>
2017-06-27 12:01:11 +02:00
Juan A. Suarez Romero
b5ae17fe59 nir: add support for 64-bit in SmoothStep function
According to GLSL.std.450 spec, SmoothStep expects input to be a
floating-point type, but it does not restrict the bitsize.

Current implementation relies on inputs to be 32-bit.

This commit extends the support to 64-bit size inputs.

Reviewed by: Elie Tournier <elie.tournier@collabora.com>
2017-06-27 12:01:11 +02:00
Juan A. Suarez Romero
4195a9450b nir: sge operation is defined for floating-point types
According to GLSL.std.450 spec, the operand for step() function must be
a floating-point. It does not restrict the value to 32-bit floats.

Reviewed by: Elie Tournier <elie.tournier@collabora.com>
2017-06-27 12:01:11 +02:00
Topi Pohjolainen
b3bf453686 i965: Separate gen < 8 and gen >= 8 paths explicitly in wrap_mode()
Makes coverity happier.

Fix indentation in gen >= 8 block while at it.

CID: 1413020
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-27 10:20:35 +03:00
Topi Pohjolainen
fbcc9555c5 intel/anv: Add missing break in anv_CreateDevice()
CID: 1413018
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-27 10:19:55 +03:00
Nicolai Hähnle
2ce126df3a ac/nir: convert emit helpers to ac_llvm_context
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:30 +10:00
Nicolai Hähnle
58d496c8e2 ac/nir: remove unused nir_to_llvm_context::has_ddxy
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:30 +10:00
Nicolai Hähnle
6ecef25545 ac/nir: implement nir_op_f2b
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:30 +10:00
Nicolai Hähnle
dacf73e527 ac/nir: implement nir_op_{b2i,i2b}
Booleans in NIR are ~0 for true, b2i returns 0/1.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:30 +10:00
Nicolai Hähnle
77d7764d5e ac/nir: convert type helpers to ac_llvm_context
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:30 +10:00
Nicolai Hähnle
b7bd49158e ac/llvm: fix type of second llvm.cttz.* parameter
LLVM has required an i1 here for a long time. llvm.ctlz.* was fixed in
commit edd23e0606 ("ac/llvm: fix various findMSB bugs").

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:30 +10:00
Nicolai Hähnle
e8ba03d32a ac/shader_info: fix a comment
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:29 +10:00
Nicolai Hähnle
edfd3be77e ac: add ac_llvm_context::v8i32
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:29 +10:00
Nicolai Hähnle
331a574732 ac: add ac_llvm_context::{i,f}32_{0,1}
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:29 +10:00
Nicolai Hähnle
7bf8c944dc ac: add ac_llvm_context::{i16, i64, f16, f64}
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-27 10:28:29 +10:00
Ilia Mirkin
4a79f2be33 nv50/ir: fix combineLd/St to update existing records as necessary
Previously the logic would decide that the record is kept, which
translates into keep = false in the caller, which meant that these
passes did not run.

While it's right that keep = false which means that a new record does
not need to be added, we do still have to perform the usual list
maintenance. It's easiest to do this pre-merge rather than post.

The lowering that clip/cull distance passes produce triggers this bug in
TCS (since reading outputs is done differently in other stages), but it
should be possible to achieve it with the right sequence of regular
reads/writes.

Fixes: KHR-GL45.cull_distance.functional
Fixes: generated_tests/spec/arb_tessellation_shader/execution/tes-input/tes-input-gl_ClipDistance.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-06-26 20:24:19 -04:00
Ilia Mirkin
7d56ae5eb2 nv50/ir: adjust overlapping logic to take fileIndex-relative offsets
If the fileIndex is different, that means they are in logically
different spaces. However if there's also a relative offset, then they
could end up pointing at the same spot again.

Also add a note about potential for multiple buffers to overlap even if
they're at different file indexes. However that's potentially lowered
away by the point that this logic hits.

Not known to fix any specific application or test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-26 20:24:19 -04:00
Ilia Mirkin
55a8c11705 nv50/ir: VFETCH is also considered a load for MemoryOpt
This has no effect since in practice this will only play for
memory-backed files, for which VFETCH will never happen.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-26 20:24:19 -04:00
Ilia Mirkin
c12f8305a8 nv50,nvc0: remove IDX from bufctx immediately, to avoid conflicts with clear
The idxbuf could linger, and when a clear happened, which also uses the
3d bufctx, we could get an error trying to access it.

This fixes spurious crashes/errors in CTS tests.

Fixes: 61d8f3387d ("nv50,nvc0: clear index buffer bufctx bin unconditionally")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-06-26 20:23:04 -04:00
Ilia Mirkin
8c02ee4a8b nv50/ir: fetch indirect sources BEFORE the op that uses them
All the BuildUtil helpers just insert the operation into the current BB.
So we have to take care that any fetchSrc() operations happen before the
operation whose setIndirect() it goes into.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-06-26 20:22:46 -04:00
Timothy Arceri
9545139ce5 mesa: skip FLUSH_VERTICES() if no samplers were changed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-27 09:20:29 +10:00
Timothy Arceri
191ff86d53 mesa: don't set _NEW_PROGRAM_CONSTANTS for non-bindless opaque uniforms
v2: rebase on new _mesa_flush_vertices_for_uniforms() helper

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-27 09:17:16 +10:00
Rob Herring
c4291a3283 Android: add renderonly files to libmesa_gallium
vc4 now depends on renderonly functions, but these weren't added to the
Android build resulting in the following errors:

src/gallium/drivers/vc4/vc4_resource.c:380: error: undefined reference to 'renderonly_scanout_destroy'
src/gallium/drivers/vc4/vc4_resource.c:681: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
src/gallium/drivers/vc4/vc4_screen.c:625: error: undefined reference to 'renderonly_dup'
src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource'

Fixes: 7029ec05e2 ("gallium: Add renderonly-based support for pl111+vc4.")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-06-26 16:10:42 -07:00
Timothy Arceri
a00a277da9 mesa: add KHR_no_error support for glCopyTexImage*D()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:27:11 +10:00
Timothy Arceri
8bf02efed3 mesa: add no error support to copyteximage()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:27:11 +10:00
Timothy Arceri
167f6a33fa mesa: create copyteximage_err() helper and always inline copyteximage()
This will be useful in the following patch when we add KHR_no_error
support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:27:11 +10:00
Timothy Arceri
8b9eccc061 mesa: tidy up copyteximage()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:27:11 +10:00
Ian Romanick
f73c63a175 i915: On Gen <= 3 there are no array textures
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-26 15:20:09 -07:00
Ian Romanick
122e6dc451 i915: On Gen <= 3 there is no W-tiling
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-26 15:20:09 -07:00
Ian Romanick
97d332ce0e i915: Remove unused fields intel_mipmap_tree::logical_(width|height|depth)0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-26 15:20:09 -07:00
Ian Romanick
ca8e8d5520 i915: Remove unused field intel_mipmap_tree::array_spacing_lod0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-26 15:20:09 -07:00
Ian Romanick
e5a632a256 i915: On Gen <= 3 there is no multisampling
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-26 15:20:09 -07:00
Ian Romanick
7b7a0ba04c i915: Trivial code reformatting
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-26 15:20:09 -07:00
Ian Romanick
b08981009d i915,i965: Don't condition use of GLSL clear on the current API
Meta always sets the API to API_OPENGL_COMPAT, so the current API
setting is irrelevant.

   text	   data	    bss	    dec	    hex	filename
7154994	 256860	  37332	7449186	 71aa62	32-bit i965_dri.so before
7154978	 256860	  37332	7449170	 71aa52	32-bit i965_dri.so after
6788451	 328056	  50704	7167211	 6d5ceb	64-bit i965_dri.so before
6788419	 328056	  50704	7167179	 6d5ccb	64-bit i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-26 15:20:09 -07:00
Timothy Arceri
7719f52d5f mesa: add KHR_no_error support for glCopyTex{ture}SubImage*D()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:15:09 +10:00
Timothy Arceri
b480211058 mesa: add copy_texture_sub_image_no_error() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:15:09 +10:00
Timothy Arceri
3034c4c725 mesa: remove redundant NULL check
This can never be NULL in any of the entry paths.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:15:09 +10:00
Timothy Arceri
c7f7a375d9 mesa: create copy_texture_sub_image_err() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:15:09 +10:00
Timothy Arceri
45498aff82 mesa: make _mesa_copy_texture_sub_image() static
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:15:09 +10:00
Timothy Arceri
bc0af44a5a mesa: add KHR_no_error support for gl{Compressed}TexImage*D()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:11:02 +10:00
Timothy Arceri
51f4ebdbdc mesa: add no error support to teximage()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:11:02 +10:00
Timothy Arceri
ca5f1e82de mesa: create wrapper around teximage()
This is used to inline KHR_no_error logic without inlining
the function into all its callers.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:11:02 +10:00
Timothy Arceri
62abf6862f mesa: fix unused variable warning in release builds
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-27 08:03:29 +10:00
Marek Olšák
ccf963ed29 radeonsi: don't flush and wait for CB after depth-only rendering
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-26 23:35:19 +02:00
Ian Romanick
1b101ca809 blorp: Use normalized coordinates on Gen6
Apparently, the sampler has some sort of precision issues for
non-normalized texture coordinates with linear filtering.  This caused
some small precision issues in scaled blits.  Work around this by using
normalized coordinates.  There is some extra work necessary because Gen6
uses TEX (instead of TXF) for some multisample resolve blits.

Fixes piglit.spec.arb_framebuffer_object.fbo-blit-stretch on SNB.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68365
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-06-26 13:41:11 -07:00
Marek Olšák
25ea7aa5cd mesa/glthread: don't include pthread.h
Not needed. This fixes the Windows build.
2017-06-26 22:23:31 +02:00
Nanley Chery
d6748f1fc4 anv/gpu_memcpy: Rename the gpu_memcpy function
A GPU memcpy function could alternatively be implemented using MI_*
commands. Provide more detail into how this one operates in case another
memcpy function is created.

v2:
- Update the commit message.
v3:
- Use 'memcpy' instead of 'cpy' (Jason Ekstrand)
- Shorten 'streamout' to 'so'

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
1415e7a997 anv/blorp: Provide surface states for CCS resolves
In the future, we plan on using this method to resolve images whose
surface state fast-clear value is dynamically updated during command
buffer execution. Start using it now for testing and to reduce churn
later on.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
4b2a2b70e0 anv/blorp: Add a surface-state-based CCS resolve function
This will be used in the next patch.

v2:
- Omit BLORP_BATCH_NO_EMIT_DEPTH_STENCIL (Jason Ekstrand)
- Update commit message.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
d1119ab7b6 blorp/clear: Add a binding-table-based CCS resolve function
v2:
- Do layered resolves.
(Jason Ekstrand):
- Replace "bt" suffix with "attachment".
- Rename helper function to prepare_ccs_resolve.
- Move blorp_params_init() into helper function.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
6235f08ff8 anv: Adjust params of color buffer transitioning functions
Splitting out these fields will make the color buffer transitioning
function simpler when it gains more features.

v2: Remove unintended blank line (Iago Toral)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
e15b1c41a4 anv/blorp: Remove 3D subresource transition workaround
For 3D image subresources undergoing a layout transition via
PipelineBarrier, we increase the number of fast-cleared layers to match
the intended behaviour of KHR_maintenance1. When such subresources
undergo layout transitions between subpasses, we don't do this to avoid
failing incorrect CTS tests. Instead, unify the behaviour in both
scenarios, and wait for the CTS tests to catch up. See CL 1111 for the
test fix and Vulkan issue #849 for more information.

On SKL+, this causes 3 test failures under:
dEQP-VK.pipeline.render_to_image.3d.*

v2: Add a reference to the Vulkan issue (Iago Toral).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
5ca2fbcee2 anv/cmd_buffer: Adjust the image view reloc function
Make the function take in an image instead of an image view. This
enables us to record relocations for surfaces states created outside of
the anv_CreateImageView path.

v2 (Jason Ekstrand):
- Use image->offset instead of surf_offset in aux_offset calculation.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
5f4f50419c anv/cmd_buffer: Adjust layout transition aspect checking
Reflect the fact that an image view or subresource range with the color
aspect cannot have any other aspect.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
bc838fc759 anv: Add and use color auxiliary buffer helpers
v2:
- Check for aux levels in layer helper (Jason Ekstrand)
- Don't assert aux is present, return 0 if it isn't.
- Use the helpers.
v3:
- Make the helpers aspect-agnostic (Jason Ekstrand)
- Drop anv_image_has_color_aux()

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
8aaa13467d intel/isl: Only create a CCS buffer if the image supports rendering
v2: Omit the commit message.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
b934330191 intel/isl: Limit CCS to one level and layer on gen7
v2 (Jason Ekstrand):
- Remove Vulkan-specific terminology from the commit title.
- Replace '== 7' with '<= 7' to hint that this is a new feature on BDW+.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
6b23c65f3a intel/blorp: Check for layer fast-clear restriction
v2: Update commit title (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Nanley Chery
b46a071758 intel/blorp: Assert levels and layers are in range
v2 (Jason Ekstrand):
- Update commit title.
- Check aux level and layer as well.
v3 (Jason Ekstrand):
- Move the non-aux layer check.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-26 11:09:12 -07:00
Lucas Stach
28550c7875 etnaviv: only flush resource to self if no scanout buffer exists
Currently a resource flush may trigger a self resolve, even if a scanout buffer
exists, but is up to date. If a scanout buffer exists we only ever want to
flush the resource to the scanout buffer. This fixes a performance regression.

Fixes: dda956340c (etnaviv: resolve tile status when flushing resource)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-26 20:06:01 +02:00
Christian Gmeiner
d8b2ccdb88 etnaviv: add support for snorm textures
Based on a patch from Wladimir J. van der Laan and untested due
to lack of hardware. Binary blob emits those formats if GPU supports
HALTI1 (faked with ibvivhook).

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-26 19:59:42 +02:00
Christian Gmeiner
3bbf8dcfe4 etnaviv: add R8G8 texture support
Passes texwrap GL_ARB_texture_rg piglit (with faked full texture rg support).

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-26 19:56:59 +02:00
Christian Gmeiner
751ae6afbe etnaviv: add support for swizzled texture formats
Passes all ext_texture_swizzle piglits.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-26 19:56:39 +02:00
Christian Gmeiner
0ddcccac4f etnaviv: add support for extended texture formats
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-26 19:49:30 +02:00
Chad Versace
31c3c440b5 glapi: Fix -Wduplicate-decl-specifier due to double-const
Fix all lines in src/mesa/main/marshal_generated.c that declare
double-const variables. Below is all such lines, with duplicates
removed:

   $ grep 'const const' marshal_generated.c | sort -u
   const const GLboolean * pointer = cmd->pointer;
   const const GLvoid * indices = cmd->indices;
   const const GLvoid * pointer = cmd->pointer;

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-26 10:26:23 -07:00
Eric Engestrom
2b237ff64c anv: use Mesa's u_atomic.h header
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-26 18:21:22 +01:00
Eric Engestrom
a2ae2d1fb0 radv: use Mesa's u_atomic.h header
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-26 18:21:22 +01:00
Bruce Cherniak
6fafba0e67 swr: set an explicit clear_rect if scissor is not enabled.
Fix regression of "no rendering" on simple apps like glxgears by
setting an explicit full surface clear_rect when scissor is not
enabled.

This regressed with commit 00173d91 "st/mesa: don't set 16
scissors and 16 viewports if they're unused" due to an assumption
that a default scissor rect is always set, which was the case prior
to this optimization.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-06-26 11:30:08 -05:00
Tim Rowley
0e1e5a2b14 swr/rast: adjust std::string usage to fix build
Some combinations of c++ compilers and standard libraries had problems
with the string::replace code we were using previously.

This should fix the travis-ci system.

Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-26 11:29:27 -05:00
Eric Engestrom
3c7c82cef0 travis: add missing libs: xdamage + xfixes
> configure: error: Package requirements (x11 xext xdamage >= 1.1 xfixes
> x11-xcb xcb xcb-glx >= 1.8.1 xcb-dri2 >= 1.8) were not met:
> No package 'xdamage' found
> No package 'xfixes' found

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-26 15:44:37 +01:00
Nicolai Hähnle
f17d78becc radeonsi: support indirect indexing in INTERP_* opcodes
The hardware doesn't support it, so we just interpolate all array elements
and then use indirect indexing on the resulting vector.

Clearly, this is not very efficient. There is an argument to be had for
adding if/else, or perhaps even pulling the data out of LDS directly.
Both don't really seem worth the effort, considering that it seems nobody
actually uses this feature.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-26 14:02:06 +02:00
Ben Crocker
162c42f8ed egl_dri2: swrastGetDrawableInfo: set *x, *y [v2]
In swrastGetDrawableInfo, set *x and *y, not just *w and *h;
this fixes a crash later in drisw_update_tex_buffer when the
(formerly) uninitialized x and y values are used to construct
an address in a call to llvmpipe_transfer_map.

Fixes crash in Piglit test
"spec@egl 1.4@eglcreatepbuffersurface and then glclear"
(<piglit dir>/bin/egl-create-pbuffer-surface -auto)
that occurred intermittently, e.g. when the uninitialized x and y in
drisw_update_tex_buffer just happened to contain absurd non-zero values.

v2: Initialize in case if function succeeds or fails, just like *w/*h.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-26 12:48:19 +01:00
Emil Velikov
c58af5cbb2 egl: fold _eglError() + return EGL_FALSE
The function _eglError() already explicitly returns EGL_FALSE,
explicitly to simplify the callers. Make use of it.

While EGL_FALSE is numerically identical to false, NULL, EGL_NO_FOO,
storage is not the same so we cannot use it for "everything".

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-26 12:41:00 +01:00
Emil Velikov
d42b09580a egl: drop _eglInitImage() return type
Function cannot fail and always returns true.

v2: Inline the one line function in the header

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-26 12:40:22 +01:00
Juan A. Suarez Romero
860919a3b2 glsl: do not call link_xfb_stride_layout_qualifiers() for fragment shaders
xfb only applies to the latest stage before the fragment shader, so
there is no need to invoke it in the fragment shader.

Fixes:
KHR-GL45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL45.enhanced_layouts.xfb_stride_of_empty_list_and_api

v2: do reset only if shaders provide an explicit stride

v3: do not call link_xfb_stride_layout_qualifiers() for fragment shaders
(Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-26 12:00:22 +02:00
Constantine Charlamov
abc7b110b6 r600g: fix crash when file in R600_TRACE doesn't exist
…and print error in such case. Which probably is not a rare event btw
because fopen doesn't expand ~ to $HOME.

Also get rid of unused "bool ret" variable.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 17:39:54 +10:00
Constantine Charlamov
3d466f3e9f r600g: take into account offset to system inputs at tgsi_interp_egcm()
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=100785

v2: I was too much twiddling whether to initialize nsys_inputs at the beginning of shader initialization or for allocation of system values, and by the time I decided to go with the first one, I forgot to change it back.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 16:32:36 +10:00
Constantine Charlamov
469e2ed473 r600g: get rid of trailing whitespace
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 16:30:10 +10:00
Dave Airlie
27380d6b3e r600/asm: add support for other GDS operations.
This adds support for the GDS operations needed to do atomic
counters.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 16:27:51 +10:00
Dave Airlie
ccab3f7e1b r600: don't merge GDS into VTX
We don't want vtx/tex instructions ending up in GDS sections.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 16:23:21 +10:00
Dave Airlie
043f16eba1 r600: for memory instructions dump index gpr for read indirects also.
This just makes sure we can see the index gpr in the asm dumps.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 16:23:21 +10:00
Dave Airlie
ac8fb9800a r600: add support for vertex fetches via texture cache
On evergreen we can route vertex fetches via the texture cache,
and this is required for some images support. So add support
to the asm builder for it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 16:23:20 +10:00
Dave Airlie
b050b91e33 r600: route indirect address register correctly for vtx fetches.
This was found during writing the images code, we need to
make sure we route the correct index register.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 16:23:20 +10:00
Dave Airlie
4a34f3244a radv/meta: don't need vertex info for resolve shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 01:24:10 +01:00
Marek Olšák
0715b3c2ee drirc: whitelist glthread for a few games
Performance deltas:
    Alien Isolation: +17% (it varies depending on the location)
    Borderlands 2: +50% (it varies depending on the location)
    BioShock Infinite: +76% (benchmark)
    Civilization 6: +20% (benchmark)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
4f38b48e05 mesa/glthread: decrease the batch size for better perf scaling
This is the key to better performance.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
09f6915bf8 gallium/hud: add glthread counters
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
8f4bc8a324 gallium/hud: add API-thread-busy for monitoring the thread load
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
11cf079b67 gallium/hud: add hud_pane::hud pointer
for later use

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
5fa69be3c8 mesa/glthread: add glthread "perf" counters and pass them to gallium HUD
for HUD integration in following commits. This valuable profiling data
will allow us to see on the HUD how well glthread is able to utilize
parallelism. This is better than benchmarking, because you can see
exactly what's happening and you don't have to be CPU-bound.

u_threaded_context has the same counters.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
833f3c1c31 gallium/hud: move struct hud_context to hud_private.h
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
7492201c4e gallium/hud: rename API-thread-busy to main-thread-busy
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
d1513edaa0 mesa/glthread: switch to u_queue and redesign the batch management
This mirrors exactly how u_threaded_context works.
If you understand this, you also understand u_threaded_context.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
1e37a5054b mesa/glthread: remove HAVE_PTHREAD guards
we are switching to util_queue.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Marek Olšák
6884c95ab4 util: move pipe_thread_is_self from gallium to src/util
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 02:17:03 +02:00
Bas Nieuwenhuizen
78bef01da2 radv: Remove unused args of radv_image_view_init.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-26 01:24:50 +02:00
Bas Nieuwenhuizen
789f480029 radv: Use correct image layout for blit based copies.
v2: Don't pass layout to image view usage mask.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 0628580eff "radv: Specify semantics of HTILE layout helpers."
2017-06-26 01:24:29 +02:00
Grigori Goronzy
95fb1c187a mesa/marshal: add custom marshalling for glNamedBuffer(Sub)Data
These entry points are used by Alien Isolation and caused
synchronization with glthread. The async marshalling implementation
is similar to glBuffer(Sub)Data. However unlike Buffer(Sub)Data
we don't need to worry about EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD,
as this isn't applicable to these DSA variants.

Results in an approximately 6x drop in glthread synchronizations and a
~30% FPS jump in Alien Isolation (Medium preset, Athlon 860K, RX 480).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-26 09:06:23 +10:00
Dave Airlie
6a68170c83 radv: handle primitive id input into fragment shader with no geom shader
Fixes:
dEQP-VK.pipeline.framebuffer_attachment.no_attachments
dEQP-VK.pipeline.framebuffer_attachment.no_attachments_ms

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 08:45:30 +10:00
Dave Airlie
2a87ddbdcb radv: compile fragment shader first.
This reorders things as we need something from the fs for the vs key.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 08:45:26 +10:00
Dave Airlie
a563f611c3 radv: set prim_id for geometry shaders
Noticed in passing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 08:45:22 +10:00
Dave Airlie
4042892cee radv: set use_prim_id for tess shaders correctly.
Just noticed in passing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-26 08:45:14 +10:00
Pierre Moreau
afb8f2d4a3 nv50/ir: Properly fold constants in SPLIT operation
Fixes: b7d9677d ("nv50/ir: constant fold OP_SPLIT")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-25 15:23:46 +02:00
Marek Olšák
e25950808f radeonsi/gfx9: don't overallocate shader binaries
It's not needed. The hw doesn't fetch ahead over page boundaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-24 23:04:37 +02:00
Lucas Stach
d6b9ba36a4 st/dri2: implement image offset query
This trivially adds support for the image offset query, which is needed
for the zwp_linux_dmabuf based EGL platform wayland implementation.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-06-24 16:57:55 +01:00
Samuel Pitoiset
cb577e379e mesa: only flush vertices when the viewport is different
This prevents glViewport() and friends to always flush and
trigger _NEW_VIEWPORT.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-24 16:47:43 +02:00
Samuel Pitoiset
4178cea06d mesa: remove useless comments in the viewport code path
No need to explain why calling a driver callback is needed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-24 16:47:38 +02:00
Roland Scheidegger
8bfe451ed3 llvmpipe: initialize default fb correctly in setup
If lp_setup_bind_framebuffer() is never called, then setup fb x1/y1 was not
correctly initialized. This can happen if there's never a fb set - both
cso and llvmpipe would consider setting this with no cbufs and no zsbuf a
redundant change and therefore it would never get set.
We rely on this setup fb rect being initialized correctly for the tri intersect
tests, throwing away tris which don't intersect. Not initializing it meant
we'd then say it intersected, and we'd try to bin that despite that we have
no actual tiles to bin it to, leading to assertion failures (pretty harmless
since tile 0/0 always exists nevertheless as tiles are statically allocated,
albeit that should change at some point).
(Note probably not an issue with gl state tracker)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-24 00:18:43 +02:00
Jason Ekstrand
f7f2fa8eb1 i965/miptree: Rework aux enabling
This commit replaces the complex and confusing set of disable flags with
two fairly straightforward fields which describe the intended auxiliary
surface usage and whether or not the miptree supports fast clears.
Right now, supports_fast_clear can be entirely derived from aux_usage
but that will not always be the case.

This commit makes functional changes.  One of these changes is that it
re-enables multisampled fast-clears which were accidentally disabled in
cec30a6669 around a year ago.  Fixing this
improves the SynMark v7 DeferredAA test by around ~3% on some gen9
hardware.  This commit also gets us closer to enabling CCS_E for
window-system buffers which are Y-tiled.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-23 12:30:24 -07:00
Jason Ekstrand
f1fa4be871 i965: Clamp clear colors to the representable range
Starting with Sky Lake, we can clear to arbitrary floats or integers.
Unfortunately, the hardware isn't particularly smart when it comes
sampling from that clear color.  If the clear color is out of range for
the surface format, it will happily return whatever we put in the
surface state packet unmodified.  In order to avoid returning bogus
values for surfaces with a limited range, we need to do some clamping.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-23 12:30:24 -07:00
Jason Ekstrand
793b312b4a i965: Don't bother with HiZ in renderbuffer_move_to_temp
This function is only used on gen4-5 which don't support HiZ.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-23 12:30:24 -07:00
Jason Ekstrand
764cce442e i965/miptree: Rename the non_msrt_mcs functions to _ccs
While we're here, we also make the two support checks static since there
are no users outside intel_mipmap_tree.c.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-23 12:30:24 -07:00
Jason Ekstrand
a7059a764e i965/miptree: Delete the layered rendering resolve
We never fast-clear more than the base slice (LOD 0, layer 0) anyway, so
layered rendering without a resolve is always perfectly safe.  Should
this ever change in the future, we'll have to put some sort of resolve
back in but we can cross that bridge when we come to it.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-23 12:30:24 -07:00
Anuj Phogat
7896dee349 anv/cnl: Don't write to Cache Mode Register 1 on gen10+
For PartialResolveDisableInVC field recommendation is to
always set this to 0 and that's the default value of the bit.
So, we have nothing left to write to CACHE_MODE_1.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-23 11:16:00 -07:00
Anuj Phogat
b980553309 i965/cnl: Don't write to Cache Mode Register 1 on gen10+
With below optimizations gone in gen10+ we have nothing left out to
write to CACHE_MODE_1:
Float Blend Optimization Enable: This bit have been removed in gen10+
Partial Resolve Disable in VC: Recommendation is to always set this
field to 0 in gen10+ and that's the default value of the bit.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-23 11:16:00 -07:00
Marek Olšák
f6e98e99e3 radeonsi: unreference vertex buffers when destroying the context
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-23 19:53:54 +02:00
Edmondo Tommasina
2ea16f08f3 drirc: Add glsl_correct_derivatives_after_discard for The Witcher 2
This fixes the long-standing problem with black transitions in The Wicher 2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98238

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák
ee16796d54 radeonsi: implement the workaround for Rocket League - postponed TGSI kill
Do KILL at the end of shaders so as not to break WQM.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100070

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák
a98a04ec80 gallium/radeon: pass create_screen flags to r600_common_screen_init
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák
118b2008ba st/dri: add a drirc workaround for Rocket League
This needs to be passed to gallium drivers.

No game fix is planned at this time.

The addition of glsl_correct_derivatives_after_discard is
generally a good thing for mesa compatibility with the broader GL
driver ecosystem.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100070

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák
6b0f6e693b st/dri: get drirc options before creating pipe_screen
dri_init_options_get_screen_flags will return the flags for create_screen().

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák
76f379330a gallium: allow passing 'unsigned flags' to create_screen()
for drirc options

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák
516488bb51 mesa: don't flush vertices in glClientActiveTexture
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-23 19:50:20 +02:00
Marek Olšák
522173aee4 mesa: don't flag _NEW_ARRAY for GL_PRIMITIVE_RESTART_NV
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-23 19:50:20 +02:00
Roland Scheidegger
c7688d2de5 llvmpipe:fix using 32bit rasterization mistakenly, causing overflows
We use the bounding box (triangle extents) to figure out if 32bit rasterization
could potentially overflow. However, we used the bounding box which already got
rounded up to 0 for negative coords for this, which is incorrect, leading to
overflows and hence bogus rendering in some of our private use.

It might be possible to simplify this somehow (we're now using 3 different
boxes for binning) but I don't quite see how.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-23 19:39:29 +02:00
Roland Scheidegger
672d245ffe llvmpipe: fill in debug vertex info for tri rasterization
This is pretty useful for debugging rasterization issues, so turn it on
based on DEBUG (the actual existence of the fields is also conditionalized
on DEBUG, lines fill it out the same too).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-06-23 19:39:29 +02:00
Marek Olšák
c2f82fc1d3 Revert "radeonsi: don't emit partial flushes at the end of IBs (v2)"
This reverts commit c9040dc9e7.

People have reported it causes corruption on VI, and I see GPU hangs
on GFX9.
2017-06-23 19:13:55 +02:00
Samuel Pitoiset
7f7487f262 mesa: remove spurious flush in _mesa_Viewport()
I don't think this is actually required, if the viewport
values are different from the ones stored in the context, we
already flush and trigger _NEW_VIEWPORT in
set_viewport_no_notify().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-23 16:26:25 +02:00
Samuel Pitoiset
2f76b45415 mesa: remove spurious flush in _mesa_DepthRange()
I don't think this is actually required, if the depth range
values are different from the ones stored in the context, we
already flush and trigger _NEW_VIEWPORT in
set_depth_range_no_notify().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-23 16:26:25 +02:00
Samuel Pitoiset
f314a532fd mesa: do not trigger _NEW_TEXTURE_STATE in glActiveTexture()
This looks like useless because gl_context::Texture::CurrentUnit
is not used by _mesa_update_texture_state() and friends.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-23 16:26:24 +02:00
Samuel Pitoiset
c244c25ce3 mesa: add KHR_no_error support for glViewport()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:43 +02:00
Samuel Pitoiset
ad0afa87b8 mesa: add viewport() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:43 +02:00
Samuel Pitoiset
128822c59f mesa: add KHR_no_error support for glViewportArrayv()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:43 +02:00
Samuel Pitoiset
e1d6de7a1e mesa: add viewport_array() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:43 +02:00
Samuel Pitoiset
0a667f03bb mesa: add KHR_no_error support for glViewportIndexed*()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:43 +02:00
Samuel Pitoiset
efd42b5791 mesa: rename ViewportIndexedf() to viewport_indexed_err()
While are at it, add a 'context' parameter for consistency.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:43 +02:00
Samuel Pitoiset
52a448c7d0 mesa: add KHR_no_error support for glClipControl()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:42 +02:00
Samuel Pitoiset
5a6779c722 mesa: add clip_control() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-23 09:26:42 +02:00
Rafael Antognolli
9fd0aee17d i965: Convert upload_default_color to genxml.
This function was moved to genX_state_upload.c but was still not using genxml.
By converting it to genxml, we make some things simpler, like setting
haswell's border color state, but others are more complex, since the structs
used by each gen are different.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-22 16:51:51 -07:00
Rafael Antognolli
e547915935 i965: Remove unused code and delete file.
The sampler state code was all moved to genxml, so we can get rid of these
functions and delete the file.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-22 16:51:51 -07:00
Rafael Antognolli
e30bbe32a3 i965: Convert vs, gs, tcs, tes and cs samplers to genxml.
Since they just use the code that is already available in genX_state_upload.c,
convert them in one batch.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-22 16:51:51 -07:00
Rafael Antognolli
f8d69beed4 i965: Convert fs sampler state to use genxml.
Also convert some auxiliary functions used by it, and copy
upload_default_color to genX_state_upload.c.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-22 16:51:47 -07:00
Rafael Antognolli
9b78a52042 genxml: fix gen5 sampler border color state.
Based on the current code, gen5 and gen6 have the same sampler border color
state struct. So fix the gen5 one to match gen6.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-22 16:38:44 -07:00
Rafael Antognolli
f43c21cbbd aubinator: Dump sampler state pointers on gen6 too.
We already have a function to dump sampler states, so do that for gen6
too.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-22 16:38:44 -07:00
Chad Versace
ecd8f85802 anv: Fix -Wswitch in anv_layout_to_aux_usage()
anv_layout_to_aux_usage() lacked a case for
VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR. Add an unreachable case, because we
don't support the extension.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-22 15:18:24 -07:00
Chad Versace
55f335bd30 i965: Fix -Wunused-variable in gen8_write_pma_stall_bits()
Trivial fix.  'ctx' was unused.
2017-06-22 14:44:06 -07:00
Anusha Srivatsa
de7ed0ba55 i965/CFL: Add PCI Ids for Coffee Lake.
Coffee Lake has a gen9 graphics following KBL.
From 3D perspective, CFL is a clone of KBL/SKL features.

v2: Change commit message, correct alignment <Anuj Phogat>
v3: Update IDs.
v4: Initialize l3_banks, correct nomenclature <Anuj>

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Acked-by: Benjamin Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-22 14:28:43 -07:00
Anuj Phogat
43d11b128c intel: Enable vulkan build for gen10
This patch just enables building Vulkan libs for gen10. We
still don't have gen 10 support enabled on Vulkan.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-22 14:17:46 -07:00
Anuj Phogat
ac6bc0e034 anv/cnl: Generate and use gen10 functions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-22 14:17:45 -07:00
Anuj Phogat
c17e214a6b anv/cnl: Don't set FloatBlendOptimizationEnable{Mask}
This field is remove from CACHE_MODE_1 register in gen10.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-22 14:17:45 -07:00
Anuj Phogat
bf1d2c37c6 anv/cnl: Use GENX(xx) in place of GEN9_xx
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-22 14:17:45 -07:00
Anuj Phogat
1e5a5d18d1 anv/cnl: Add #defines for MOCS and genX(x)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-22 14:17:45 -07:00
Anuj Phogat
ceed55e7bb intel/genxml: Add Gen10 CACHE_MODE_1 definitions
Few of the fields in this register are changed as compared
to gen9.xml.

V2: Remove some fields which are not valid anymore.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-22 14:17:45 -07:00
Anuj Phogat
6338b63270 intel/genxml: Rename StartInstanceLocation to StartingInstanceLocation
This is required because we already have a macro defined with
the name StartInstanceLocation.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-22 14:17:45 -07:00
Anuj Phogat
8869c8b3dc intel/genxml: Rename IndirectStatePointer to BorderColorPointer
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-22 14:17:45 -07:00
Anuj Phogat
97f75fdfd0 intel/genxml: Combine DataDWord{0, 1} fields in to ImmediateData field
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-22 14:17:45 -07:00
Anuj Phogat
c61b909d14 intel/genxml: Add INSTDONE registers in gen10
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-22 14:17:45 -07:00
Anuj Phogat
03fddd3c1d intel/genxml: Add better support for MI_MATH in gen10
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-22 14:17:45 -07:00
Chad Versace
a9e5e9f5ec i965/dri: Add intel_screen param to intel_create_winsys_renderbuffer
The param is currently unused. It will later be used it to support
R8G8B8X8 EGLConfigs on Skylake.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-22 12:44:28 -07:00
Chad Versace
4b9cbfa0b0 i965: Move brw_context format arrays to intel_screen
This allows us to query the driver's supported formats in i965's DRI code,
where often there is available a DRIscreen but no GL context.

To reduce diff noise, this patch does not completely remove
brw_context's format arrays. It just redeclares them as pointers which
point to the arrays in intel_screen.

Specifically, move these two arrays from brw_context to intel_screen:
    mesa_to_isl_render_format[]
    mesa_format_supports_render[]

And add a new array to intel_screen,
    mesa_format_supportex_texture[]
which brw_init_surface_formats() copies to ctx->TextureFormatSupported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-22 12:44:28 -07:00
Chad Versace
c09b2aefae i965: Rename some vague format members of brw_context
I'm swimming in a vortex of formats. Mesa formats, isl formats, DRI
formats, GL formats, etc.

It's easy to misinterpret the following brw_context members unless
you've recently read their definition.  In upcoming patches, I change
them from embedded arrays to simple pointers; after that, even their
definition doesn't help, because the MESA_FORMAT_COUNT hint will no
longer be present.

Rename them to prevent further confusion. While we're renaming, choose
shorter names too.

    -format_supported_as_render_target
    +mesa_format_supports_render

    -render_target_format
    +mesa_to_isl_render_format

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-22 12:43:53 -07:00
Chad Versace
ffbf50b1c6 egl: Rename 'count' in ${platform}_add_configs_for_visuals (v2)
Rename 'count' to 'config_count'. I didn't understand what the variable
did until I untangled the for-loops. Now the next person won't have that
problem.

v2: Rebase. Fix typo. Apply to all platforms (for emil).

Reviewed-by: Eric Engestrom <eric@engestrom.ch>  (v1)
2017-06-22 12:35:49 -07:00
Chad Versace
a6fad55961 egl/x11: Declare EGLConfig attrib array inside loop
No behavioral change. Just a readability cleanup.

Instead of modifying this small array on each loop iteration, we now
initialize it in-place with the values it needs.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-22 12:35:49 -07:00
Chad Versace
f8ad7f4054 egl/drm: Declare EGLConfig attrib array inside loop
No behavioral change. Just a readability cleanup.

Instead of modifying this small array on each loop iteration, we now
initialize it in-place with the values it needs.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-22 12:35:49 -07:00
Chad Versace
bd789098a5 egl/android: Declare EGLConfig attrib array inside loop (v2)
No behavioral change. Just a readability cleanup.

Instead of modifying this small array on each loop iteration, we now
initialize it in-place with the values it needs.

v2: Rebase.

Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v1)
2017-06-22 12:35:49 -07:00
Chad Versace
cd717cbe1a egl/dri2: Declare loop vars inside the loop
That is, consistently do this:

    for (int i = 0; ...)

No behavioral change.
This patch touches only egl_dri2.c.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-22 12:35:49 -07:00
Chad Versace
98497dfd6a egl/wayland: Declare loop vars inside the loop
That is, consistently do this:

    for (int i = 0; ...)

No behavioral change.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-22 12:35:49 -07:00
Chad Versace
927625ca60 egl/surfaceless: Move loop vars inside the loop
That is, consistently do this:

    for (int i = 0; ...)

No behavioral change.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-22 12:35:49 -07:00
Chad Versace
263d4b8b1c egl/x11: Declare loop vars inside the loop
That is, consistently do this:

    for (int i = 0; ...)

No behavioral change.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-22 12:35:49 -07:00
Chad Versace
c31146f080 egl/drm: Move loop vars inside the loop
That is, consistently do this:

    for (int i = 0; ...)

No behavioral change.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-22 12:35:49 -07:00
Chad Versace
09455123f3 egl/android: Declare loop vars inside their loops (v2)
That is, consistently do this:

    for (int i = 0; ...)

No behavioral change.

v2: Rebase.

Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v1)
2017-06-22 12:35:49 -07:00
Brian Paul
9e57a2cbcf svga: minor whitespace fixes in svga_pipe_vertex.c 2017-06-22 13:33:48 -06:00
Brian Paul
041f8ae9f6 svga: check return value from svga_set_shader( SVGA3D_SHADERTYPE_GS, NULL)
If the call fails we need to flush the command buffer and retry.  In this
case, we were failing to unbind the GS which led to subsequent errors.

This fixes a bug replaying a Cinebench R15 apitrace in a Linux guest.
VMware bug 1894451

cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-22 13:33:48 -06:00
Charmaine Lee
3fbdab8778 svga: fix pre-mature flushing of the command buffer
When surface_invalidate is called to invalidate a newly created surface
in svga_validate_surface_view(), it is possible that the command
buffer is already full, and in this case, currently, the associated wddm
winsys function will flush the command buffer and resend the invalidate
surface command. However, this can pre-maturely flush the command buffer
if there is still pending image updates to be patched.

To fix the problem, this patch will add a return status to the
surface_invalidate interface and if it returns FALSE, the caller will
call svga_context_flush() to do the proper context flush.
Note, we don't call svga_context_flush() if surface_invalidate()
fails when flushing the screen surface cache though, because it is
already in the process of context flush, all the image updates are already
patched, calling svga_context_flush() can trigger a deadlock.
So in this case, we call the winsys context flush interface directly
to flush the command buffer.

Fixes driver errors and graphics corruption running Tropics. VMware bug 1891975.

Also tested with MTT glretrace, piglit and various OpenGL apps such as
Heaven, CinebenchR15, NobelClinicianViewer, Lightsmark, GoogleEarth.

cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-22 13:33:48 -06:00
George Kyriazis
08cb8cf256 swr: invalidate attachment on transition change
Consider the following RT attachment order:
1. Attach surfaces attachments 0 & 1, and render with them
2. Detach 0 & 1
3. Re-attach 0 & 1 to different surfaces
4. Render with the new attachment

The definition of a tile being resolved is that local changes have been
flushed out to the surface, hence there is no need to reload the tile before
it's written to.  For an invalid tile, the tile has to be reloaded from
the surface before rendering.

Stage (2) was marking hot tiles for attachements 0 & 1 as RESOLVED,
which means that the hot tiles can be written out to memory with no
need to read them back in (they are "clean").  They need to be marked as
resolved here, because a surface may be destroyed after a detach, and we
don't want to have un-resolved tiles that may force a readback from a
NULL (destroyed) surface.  (Part of a destroy is detach all attachments first)

Stage (3), during the no att -> att transition, we  need to realize that the
"new" surface tiles need to be fetched fresh from the new surface, instead
of using the resolved tiles, that belong to a stale attachment.

This is done by marking the hot tiles as invalid in stage (3), when we realize
that a new attachment is being made, so that they are re-fetched during
rendering in stage (4).

Also note that hot tiles are indexed by attachment.

- Fixes VTK dual depth-peeling tests.
- No piglit changes

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-06-22 11:51:08 -05:00
Juan A. Suarez Romero
87a2d3963a Revert "getteximage: Return correct error value when texure object is not found"
From OpenGL 4.5 spec PDF, section '8.11. Texture Queries', page 236:
  "An INVALID_VALUE error is generated if texture is not the name of
   an existing texture object."

Same wording applies to the compressed version.

But turns out this is a spec bug, and Khronos is fixing it for the next
revisions.

The proposal is to return INVALID_OPERATION in these cases.

This reverts commit 633c959fae.

v2:
- Use _mesa_lookup_texture_err (Samuel Pitoiset)

v3:
- _mesa_lookup_texture_err() already handles texture > 0 (Samuel
Pitoiset)
- Just revert 633c959fae (Juan A. Suarez)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-22 18:48:18 +02:00
Eric Engestrom
c87f73724e egl: properly count configs
dri2_conf represents another config (which shouldn't be counted)
if it doesn't have the requested ID.

Reported-by: Liu Zhiquan <zhiquan.liu@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-22 17:32:31 +01:00
Chad Versace
5e884353e6 egl/android: Change order of EGLConfig generation (v2)
Many Android apps (such as Google's official NDK GLES2 example app), and
even portions the core framework code (such as SystemServiceManager in
Nougat), incorrectly choose their EGLConfig.  They neglect to match the
EGLConfig's EGL_NATIVE_VISUAL_ID against the window's native format, and
instead choose the first EGLConfig whose channel sizes match those of
the native window format while ignoring the channel *ordering*.

We can detect such buggy clients in logcat when they call
eglCreateSurface, by detecting the mismatch between the EGLConfig's
format and the window's format.

As a workaround, this patch changes the order of EGLConfig generation
such that all EGLConfigs for HAL pixel format i precede those for HAL
pixel format i+1. In my (chadversary) testing on Android Nougat, this
was good enough to pacify the buggy clients.

v2: Rebase to make patch cherry-pickable to stable.

Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-22 08:58:45 -07:00
Ville Syrjälä
1c409fe4c1 i915: Fix gl_Fragcoord interpolation
gl_FragCoord contains the window coordinates so it seems to me that
we should not use perspective correct interpolation for it. At least
now I get similar output as i965/swrast/llvmpipe produce.

This fixes dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_w.
dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_xyz was already
passing, though I'm not quite sure how it managed to do that.

v2: Add definitons for the S3 "wrap shortest" bits as well (Ian)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-22 17:26:17 +03:00
Eric Engestrom
b81cfc7340 egl: simplify dri_config conditionals
In the same spirit as 858f2f2ae6 (egl/dri2: ease srgb __DRIconfig
conditionals), let's merge dri_single_config and dri_double_config into
a single dri_config[2].

This moves the `if (double) dri_double_config else dri_single_config`
logic to `dri_config[double]`, reducing code duplication and making it
easier to read.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-22 14:54:36 +01:00
Marek Olšák
bcd67b1711 radeonsi/gfx9: enable DCC fast clear
It seems to work now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Marek Olšák
db37c0be13 radeonsi/gfx9: don't ever flush the TC metadata cache
The closed Vulkan driver doesn't do it either.

Also remove some old comments that aren't useful.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Marek Olšák
920f20f039 radeonsi/gfx9: use TC L2 for fast color clear with CP DMA
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Marek Olšák
c1754b69dc radeonsi: fix DCC fast clear for luminance and alpha formats
I reproduced this bug on Polaris11 and Raven.

I can't get this bug on Fiji. The reason might be that Fiji doesn't use
2D tiling for the test due to higher 2D tiling alignment requirements.

Fixes piglit: spec@ext_framebuffer_object@fbo-fast-clear

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Marek Olšák
c9040dc9e7 radeonsi: don't emit partial flushes at the end of IBs (v2)
The kernel sort of does the same thing with fences.

v2: do emit partial flushes on SI

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Andres Gomez
5352174d49 anv: FORMAT_FEATURE_TRANSFER_SRC/DST_BIT_KHR not used with VkFormatProperties.bufferFeatures
VK_FORMAT_FEATURE_TRANSFER_[SRC|DST]_BIT_KHR is a flag value of the
VkFormatFeatureFlagBits enum that can only be hold and checked against
the linearTilingFeatures or optimalTilingFeatures members of the
VkFormatProperties struct but not the bufferFeatures member.

>From the Vulkan® 1.0.51, with the VK_KHR_maintenance1 extension,
section 32.3.2 docs for VkFormatProperties:

   "* linearTilingFeatures is a bitmask of VkFormatFeatureFlagBits
      specifying features supported by images created with a tiling
      parameter of VK_IMAGE_TILING_LINEAR.

    * optimalTilingFeatures is a bitmask of VkFormatFeatureFlagBits
      specifying features supported by images created with a tiling
      parameter of VK_IMAGE_TILING_OPTIMAL.

    * bufferFeatures is a bitmask of VkFormatFeatureFlagBits
      specifying features supported by buffers."

    ...

    Bits which can be set in the VkFormatProperties features
    linearTilingFeatures, optimalTilingFeatures, and bufferFeatures
    are:

    typedef enum VkFormatFeatureFlagBits {

    ...

      VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR = 0x00004000,
      VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR = 0x00008000,

    ...

    } VkFormatFeatureFlagBits;

    ...

    The following bits may be set in linearTilingFeatures and
    optimalTilingFeatures, specifying that the features are supported
    by images or image views created with the queried
    vkGetPhysicalDeviceFormatProperties::format:

    ...

    * VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR specifies that an image
      can be used as a source image for copy commands.

    * VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR specifies that an image
      can be used as a destination image for copy commands and clear
      commands."

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-22 13:45:22 +03:00
Chandu Babu N
1d4cbcdf28 change va max_entrypoints
As encode support is added along with decode, increase max_entrypoints to two.
vaMaxNumEntrypoints was returning incorrect value and causing
memory corruption before this commit

v2: assert when max_entrypoints needs to be bigger

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-06-22 12:10:57 +02:00
Chandu Babu N
b1a359b7d8 st/va: Fix leak in VAAPI subpictures
sampler view allocated in vaAssociateSubpicture is not cleared
in vaiDeassociateSubpicture.

Reviewed-by: Christian König <christian.koenig@amd.com>
2017-06-22 12:09:43 +02:00
Timothy Arceri
9e9f7840bd glsl: tidy up int declaration
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-22 20:06:38 +10:00
Timothy Arceri
95927bb27f glsl: fix typo in comment
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-22 20:06:32 +10:00
Samuel Pitoiset
a285caaf25 mesa: fix using texture id 0 with glTextureSubImage*()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:36 +02:00
Samuel Pitoiset
45eb87e5e5 mesa: fix using texture id 0 with gl*TextureParameter*()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:30 +02:00
Samuel Pitoiset
7f47c31f8c mesa: fix using texture id 0 with VDPAURegisterSurfaceNV()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:22 +02:00
Samuel Pitoiset
51a7e0d14f mesa: fix using texture id 0 with glTextureStorage*()
This fixes an assertion in debug build, and probably a crash
in release build.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:19 +02:00
Samuel Pitoiset
1f38363e68 mesa: pass the 'caller' function to texturestorage() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:17 +02:00
Samuel Pitoiset
8a7ab8d418 mesa: use _mesa_lookup_texture_err() in get_tex_obj_for_clear()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:15 +02:00
Samuel Pitoiset
048de9e34a mesa: remove unused _mesa_delete_nameless_texture()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:13 +02:00
Samuel Pitoiset
75044f0854 mesa: check for allocation failures in _mesa_new_texture_object()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 10:41:10 +02:00
Nicolai Hähnle
da2e52b382 radeonsi: use the correct LLVMTargetMachineRef in si_build_shader_variant
si_build_shader_variant can actually be called directly from one of
normal-priority compiler threads. In that case, the thread_index is
only valid for the normal tm array.

v2:
- use the correct sel/shader->compiler_ctx_state

Fixes: 86cc809726 ("radeonsi: use a compiler queue with a low priority for optimized shaders")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-22 09:45:23 +02:00
Marek Olšák
79bd1d4f8b radeonsi/gfx9: keep reusing the same buffer/address for the gfx9 flush fence
instead of using a monotonic suballocator

v2: initialize the memory at context creation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
c66fc618cc radeonsi/gfx9: enable the constant engine
I think this kernel commit fixes it:
     drm/amdgpu:use FRAME_CNTL for new GFX ucode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
d7141d8bc0 radeonsi/gfx9: indirect buffers and all CP packets use TC L2
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
2638250fec radeonsi: flush CB after MSAA only when transitioning from CB to textures
The main flush before texturing is done after the FMASK decompress pass.

CB after MSAA rendering is not flushed in set_framebuffer_state and also
not in memory_barrier if the current color buffer is MSAA. We fully rely
on the FMASK decompress pass for the flushing.

Some CB decompress and resolve passes need an explicit flush before and
after.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
51c219739c radeonsi: unify CB_RESOLVE blitter invocation code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
2263610827 radeonsi: flush DB caches only when transitioning from DB to texturing
Use the mechanism of si_decompress_textures, but instead of doing
the actual decompression, just flag the DB cache flush there.

This removes a lot of unnecessary DB cache flushes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
fdca690e91 radeonsi: add separate HUD counters for CB and DB cache flushes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
b007744051 st/mesa: don't set the border color if it's unused
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
743ad599a9 st/mesa: don't set 16 scissors and 16 viewports if they're unused
Only do so if there is a shader writing gl_ViewportIndex.
This removes a lot of CPU overhead for the most common case.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
2ec1e32d11 st/mesa: fix pipe_rasterizer_state::scissor with multiple viewports
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
d7e52327f0 st/mesa: simplify st_update_viewport
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
a8753b254d st/mesa: remove redundant sample_mask checking
cso does that too

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
2108b73cf3 st/mesa: use precomputed st_fb_orientation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
91579254db mesa: don't call _mesa_update_clip_plane in the GL core profile
It uses the projection matrix to transform the clip plane.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-22 01:51:02 +02:00
Marek Olšák
602a3e50e5 st/mesa: set st_context::...num_samplers to 0 when there are no samplers
This was missed during my st/mesa series.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
f368ea37a2 st/mesa: unify fail paths for update_single_texture
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
d14bb37a0a st/mesa: don't call u_sampler_view_default_template for sampler views
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
bfe1e7737a st/mesa: always set sampler swizzle according to the texture base format
Mainly don't (indirectly) call util_format_description here.

If the driver supports texture swizzling, this will always do the right
thing. If the driver doesn't support it, it doesn't matter.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
25723857d9 st/mesa: samplers only need to track whether GLSL >= 130
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
3ee1c9b126 st/mesa: simplify get_texture_format_swizzle
- Don't check GL_NONE (that was only for buffers).
- Don't use util_format_is_depth_or_stencil.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
f0ecd36ef8 st/mesa: add an entirely separate codepath for setting up buffer views
Remove handling of buffers from all texture paths.
This simplifies things for both buffers and textures.

get_sampler_view_format is also cleaned up not to call
util_format_is_depth_and_stencil.

v2: also update st_NewTextureHandle

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2017-06-22 01:51:02 +02:00
Marek Olšák
fbd9cc6169 st/mesa: don't return an error from update_single_texture
It can just return a NULL sampler view, which is better than not doing
anything at all.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
6f4ead6bfd st/mesa: clean up trivial dereferences in update_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
5766c18f59 st/mesa: don't check MaxTextureImageUnits in update_textures
The linker takes care of it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
4c0bce921b st/mesa: don't call st_shader_stage_to_ptarget in update_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
c9a16fde80 cso: inline a few frequently-used functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
0e0fc1ce71 cso: don't return errors from sampler functions
No code checks the errors.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
4d6fab245e cso: don't track the number of sampler states bound
This removes 2 loops from hot codepaths and adds 1 loop to a rare codepath
(restore_sampler_states), and makes sanitize_hash() slightly worse.

Sampler states, when bound, are not unbound for draw calls that don't need
them. That's OK, because bound sampler states don't add any overhead.

This results in lower CPU overhead in most cases.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
c845984690 st/mesa: sink and simplify texBaseFormat getting for sampler states
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
8aba778fa2 st/mesa: don't set sampler states for TBOs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
222a910a9b st/mesa: optimize sampler state translation code
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
39ab9fb36c st/mesa: sink code needed for apply_texture_swizzle_to_border_color
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
588371b772 st/mesa: simplify update_shader_samplers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
18d498a1ae st/mesa: when binding sampler states, don't check the max sampler limit
The GLSL linker takes care of it.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
fd86876fe4 st/mesa: don't unbind sampler states if none are used
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
b43c887a9b st/mesa: unify update_gp/tcp/tep code
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
56a28ace35 st/mesa: don't search through shader variants if there is only one
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
dbf1413014 st/mesa: don't track shader variants in st_context
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
1c818fff0c st/mesa: move blend color into its own state atom
This is now sensible thanks to the NewBlendColor flag.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
0b03d82f9c st/mesa: check correctly if multisampling is enabled
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
9c499e6759 st/mesa: don't invoke st_finalize_texture & st_convert_sampler for TBOs
This is a v2 of the previous patch (v1 didn't skip st_finalize_texture).

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
c0ed52f614 mesa: simplify _mesa_is_image_unit_valid for buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
caf39d6df9 mesa: don't flag _NEW_PROGRAM_CONSTANTS for GLSL programs for st/mesa
v2: also update _mesa_uniform_handle for bindless textures

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-22 01:51:02 +02:00
Kenneth Graunke
b7ba745032 glsl: Track whether uniforms are active per stage
for finer granularity state flagging

v2: Marek - use a bitmask, add shader cache support

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
670c4dd395 mesa: don't flag _NEW_PROGRAM_CONSTANTS for non-GLSL programs for st/mesa
This has the benefit that we get to set up constants for exactly
the shader stage that needs it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
0b70d6ec56 mesa: flush vertices before updating ctx->_Shader
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
0160a59f29 mesa: set driver flags for glPopAttrib(GL_ENABLE_BIT) properly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
df0f6a0af3 mesa: don't flag _NEW_POLYGON_STIPPLE for st/mesa
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
58a02196b9 mesa: don't flag _NEW_LINE for st/mesa
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
bc871a1baf mesa: don't flag _NEW_POLYGON for st/mesa
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
00173d91b7 mesa: don't flag _NEW_TRANSFORM for st/mesa if possible
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
55f1106637 mesa: don't flag _NEW_TRANSFORM for Transform.RasterPositionUnclipped
It's not a driver state, it's for glRasterPos.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
1f3dc332f5 mesa: don't flag _NEW_TRANSFORM for primitive restart
It's a draw state.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
29db5c1dcc mesa: don't flag _NEW_VIEWPORT for st/mesa if possible
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
c8363eb027 mesa: flush vertices before changing viewports
Cc: 17.1 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
86840d3f08 mesa: don't flag _NEW_MULTISAMPLE for st/mesa
There are several new driver flags here so that it maps nicely to gallium.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
581d77315b mesa: don't flag _NEW_COLOR for st/mesa if possible
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
b677e96078 mesa: use DriverFlags.NewAlphaTest to communicate alphatest changes to st/mesa
Now AlphaFunc avoids the blend state update in st/mesa and avoids
_mesa_update_state_locked.

The GL_ALPHA_TEST enable won't trigger blend state updates in st/mesa
after st/mesa stops relying on _NEW_COLOR.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
37b834923d mesa: don't flag _NEW_DEPTH for st/mesa
skipping _mesa_update_state_locked

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
a9315627bc mesa: make _mesa_set_varying_vp_inputs a no-op in GL core profile
just don't set _NEW_VARYING_VP_INPUTS.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
95d9fdc6f8 mesa: remove _NEW_BUFFER_OBJECT
not used

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
be15028ede mesa: don't flag _NEW_SCISSOR for st/mesa
Not needed and we get to bypass _mesa_update_state_locked that would be
a no-op.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
124df5d71a mesa: don't execute most of _mesa_update_state_locked for GL core profile
There is plenty of legacy stuff here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
9c1a6a2082 mesa: simplify handling the return value of update_program
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
bc4e914f95 mesa: simplify a loop in _mesa_update_texture_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
68a0e15f44 mesa: replace VP/FP/ATIfs _Enabled flags with helper functions
These are only used in the GL compatibility profile.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
585c5cf8a5 mesa: don't update draw buffer bounds in _mesa_update_state
st/mesa doesn't need the draw bounds for draw calls. I've added the call
where it's necessary in core Mesa and drivers, but I suspect that most
drivers can just move the call to the right places.

The core Mesa places aren't hot paths, so the call overhead doesn't matter
there.

For now, only st/mesa is made such that this function is invoked very
rarely.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
ab784e0fee mesa: remove update_framebuffer_size
For the default framebuffer, _mesa_resize_framebuffer updates it.
For FBOs, _mesa_test_framebuffer_completeness updates it.

This code is redundant.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
e7a091936f mesa: replace ctx->Polygon._FrontBit with a helper function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
c19b08b079 mesa: replace ctx->VertexProgram._TwoSideEnabled with a helper function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
480bf7731b mesa: stop using _NEW_STENCIL with st/mesa, use DriverFlags.NewStencil instead
This bypasses _mesa_update_state_locked.

Before:
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    3.99 million
   DrawArrays   ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    4.56 million

After:
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    4.93 million
   DrawArrays   ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    5.84 million

It's quite a difference in the draw call rate when ctx->NewState stays
equal to 0 the whole time.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:51:02 +02:00
Marek Olšák
c2408838c8 mesa: replace _mesa_update_stencil() with helper functions
The idea is to remove the dependency on _mesa_update_state_locked,
so that st/mesa can skip it for stencil state updates, and then stop
setting _NEW_STENCIL in mesa/main if the driver is st/mesa.

The main motivation is to stop invoking _mesa_update_state_locked for
certain state groups.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-22 01:48:30 +02:00
Marek Olšák
d28cc798bd meta: do the full FBO completeness check in decompress_texture_image
_mesa_update_state will no longer recompute Width/Height if the framebuffer
is complete. We now rely on the FBO completeness check to do it.

The only code that needs to be fixed seems to be this one.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-22 01:48:30 +02:00
Pohjolainen, Topi
6a86795a3d i965/gen6: Use isl-based miptree also for stencil rbs
Fixes dEQP-EGL.functional.image.render_multiple_contexts.
gles2_renderbuffer_stencil_stencil_buffer

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-21 16:25:24 -07:00
Ian Romanick
c06b1d3c16 i965: Remove spurious mutex frobbing around call to intel_miptree_blit
These locks were added in 2f28a0dc, but I don't see anything in the
intel_miptree_blit path that should make this necessary.

When asked, Kristian says:

    I doubt it's needed now with the new blorp. If I remember correctly,
    I had to drop the lock there since intel_miptree_blit() could hit
    the XY blit path that requires a fast clear resolve. The fast
    resolve being meta, would then try to lock the texture again.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2017-06-21 14:34:56 -07:00
Eric Engestrom
4a1238a452 egl: turn one more boolean int into a bool
Same as the previous commit, but this one was split out because it's
a bit more complicated: this field is given as a pointer to a function,
so the function had to be changed as well, and the function was use in
a bunch of places, which needed updating as well.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-21 21:42:14 +01:00
Eric Engestrom
60f984262c egl: turn boolean ints into bools
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-21 21:42:08 +01:00
Jason Ekstrand
17918a0372 i965/miptree: Move isl_surf_get_(hiz|mcs)_surf out of the assert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101535
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101538
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101539
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-21 11:21:19 -07:00
Rafael Antognolli
78b843af3c intel/genxml: Use the same naming convention for Floating Point Mode.
In newer gens, this field has a prefix and the non-IEEEE-745 mode is called
"Alternate", instead of simply "Alt".

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Rafael Antognolli
ce728594fd intel/genxml: Normalize URB Data field in WM_STATE.
On gen6+, this is called "Dispatch GRF Start Register For Constant/Setup Data
0", while on gen5 and lower it's called only "Dispatch GRF Start Register For
URB Data", but it's essentially the same thing (URB data), so rename it to
match newer gens and simplify the C code that handles it.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Rafael Antognolli
44415056e7 intel/genxml: Rename field on WM_STATE to match gen6+.
"Pixel Shader Kill Pixel" -> "Pixel Shader Kills Pixel", which is how it's
called on newer gens.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Rafael Antognolli
82c66965ac intel/genxml: Normalize fields on WM_STATE.
On gen4, WM_STATE only has one Kernel Start Pointer and one GRF Register
Count, but we can make the code that handles this on multiple gens simpler if
we add an index 0 to it too.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Rafael Antognolli
eddb1ebccf intel/genxml: Add missing field to CLIP_STATE.
Just because it's not set doesn't mean that it doesn't exist. And since the
field is there on newer gens, having it on gen5 simplifies the code when
porting gen5 and lower.

Also add missing value to API Mode on CLIP_STATE on gen4.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Rafael Antognolli
9a5ae19cbb intel/genxml: Fix type of UserClipFlags ClipTest Enable Bitmask.
This is a bitmask, so it can't be a boolean. Also rename it so it matches
gen6+.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Rafael Antognolli
19d1defcd5 intel/genxml: Add missing fields to CLIP_STATE on gen4-5.
These fields are set by brw_clip_unit, so we need them when converting to
genxml.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Rafael Antognolli
faa4f5c42d intel/genxml: Normalize GS_STATE.
Rename "Rendering Enable" to "Rendering Enabled", so it matches gen6+.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-21 10:16:05 -07:00
Ville Syrjälä
0eef03a6f2 i915: Always emit W on gen3
Unlike the older gen2 hardware, gen3 performs perspective
correct interpolation even for the primary/secondary colors.
To do that it naturally needs us to emit W for the vertices.

Currently we emit W only when at least one texture coordinate
set gets emitted. This means the interpolation of color will
change depending on whether texcoords/varyings are used or not.
That's probably not what anyone would expect, so let's just
always emit W to get consistent behaviour. Trying to avoid
emitting W seems like more hassle than it's worth, especially
as bspec seems to suggest that the hardware will perform the
perspective division anyway.

This used to be broken until it was accidentally fixed it in
commit c349031c27 ("i915: Fix texcoord vs. varying collision
in fragment programs") by introducing a bug that made the driver
always emit W. After fixing that bug in commit c1eedb43f3
("i915: Fix wpos_tex vs. -1 comparison") we went back to the
old behaviour and caused an apparent regression.

Fixes: c1eedb43f3 ("i915: Fix wpos_tex vs. -1 comparison")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101451
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-21 13:10:58 +03:00
Samuel Pitoiset
26fbdb12f4 mesa: add KHR_no_error support for glStencilOp()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:26 +02:00
Samuel Pitoiset
5407662570 mesa: add stencil_op() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:24 +02:00
Samuel Pitoiset
e6659c560a mesa: add KHR_no_error support for glStencilFunc()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:22 +02:00
Samuel Pitoiset
db967dcb05 mesa: add stencil_func() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:18 +02:00
Samuel Pitoiset
b9e2d5c18d mesa: add KHR_no_error support for glStencilOpSeparate()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:16 +02:00
Samuel Pitoiset
0614b7a6f7 mesa: add stencil_op_separate() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:14 +02:00
Samuel Pitoiset
d222e14ffa mesa: add KHR_no_error support for glStencilMaskSeparate()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:12 +02:00
Samuel Pitoiset
8ab0aaa350 mesa: add stencil_mask_separate() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:10 +02:00
Samuel Pitoiset
9c49c9d8dd mesa: add KHR_no_error support for glStencilFuncSeparate()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:09 +02:00
Samuel Pitoiset
6f10d93ea4 mesa: add stencil_func_separate() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-21 08:47:06 +02:00
Lucas Stach
629003b5b8 etnaviv: fix blend color for RB swapped rendertargets
Same as with the colormasks, the blend color needs to be swizzled according
to the rendertarget format.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-21 07:45:15 +02:00
Jason Ekstrand
1bd0acab21 spirv: Work around the Doom shader bug
Doom shipped with a broken version of GLSLang which handles samplers as
function arguments in a way that isn't spec-compliant.  In particular,
it creates a temporary local sampler variable and copies the sampler
into it.  While Dave has had a hack patch out for a while that gets it
working, we've never landed it because we've been hoping that a game
update would come out with fixed shaders.  Unfortunately, no game update
appears on to be on the horizon and I've found this issue in yet another
application so I think we're stuck working around it.  Hopefully, we can
delete this code one day.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99467
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-20 18:51:26 -07:00
Ian Romanick
93055576ae glsl: Update build instructions for int64.glsl
Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-20 17:45:49 -07:00
Elie Tournier
7e46be3dec glsl: Fix indent in dump code
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-20 17:45:49 -07:00
Ilia Mirkin
8754c5359f st/xvmc: deal with drivers wanting different texture formats
Previously, texture formats were being used unconditionally without
checking. However nv30 supports neither RGBX8 nor R4A4/A4R4 formats. Add
sufficient fallbacks so that the nv30 driver can have working OSD.

Tested on a NV44A/PCI.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-20 20:20:55 -04:00
Ben Skeggs
7bae3ef812 nvc0: fix transfer of larger rectangles with DmaCopy on gk104 and up
By treating the rectangles as 1cpp, we can run up against some internal
copy engine limits and trigger a MEM2MEM_RECT_OUT_OF_BOUNDS error check
at launch time.

This commit enables the REMAP hardware, which allows us to specify both
the component size and number of components for a transfer.  We're then
able to pass in the real width/nblocksx values and not hit the limits.

There's a couple of "supported" CPPs in the list that we can't actually
hit, but are there simply because they're possible.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-20 20:18:54 -04:00
Ben Skeggs
ec3d489d5b nvc0: copy engine surface params are only relevant for tiled surfaces
Aside from reducing pushbuf usage in some situations, this commit should
have no other effect, and is just to make it somewhat obvious that those
methods have zero effect on linear surfaces.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-20 20:18:54 -04:00
Dave Airlie
72c8c68458 st/mesa: fix assert to be simpler
I just noticed a warning with a non-debug build, but really
this could all be one line, and I'm not even 100% the assert
makes sense here.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-21 08:59:18 +10:00
Lionel Landwerlin
030abc6109 intel: compiler/i965: fix is_broxton checks
In 5f2fe9302c is_geminilake was introduced for the differenciate
broxton from geminilake. Unfortunately I failed as verifying that
is_broxton is throughout the code base to mean Gen9lp.

Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-20 23:26:42 +01:00
Plamena Manolova
b3b6121115 mesa/main: Move NULL pointer check.
In blit_framebuffer we're already doing a NULL
pointer check for readFb and drawFb so it makes
sense to do it before we actually use the pointers.

CID: 1412569
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-20 13:57:20 -07:00
George Kyriazis
12f52942f8 swr: Include definition of missing function
Inline function SWR_MULTISAMPLE_POS::PrecalcSampleData() was missing
definition.  Include definition in core/state_funcs.h.

Fixes windows build.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-06-20 14:42:42 -05:00
Ben Widawsky
3e1055591b i965/cnl: Add l3 configuration for Cannonlake
V2 (Anuj):
Squash the changes in one patch rebase on master.
Address the review comments made by Francisco Jerez.
Do the URB allocation per slice (not per bank).

V3 (Anuj):
Update the comment.
Format the table as other l3 config tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
---
V1 was sent out with the heading:
"i965/cnl: Properly handle l3 configuration"
2017-06-20 12:18:26 -07:00
Anuj Phogat
1024dad4d9 i965: Add a variable for way size per bank in get_l3_way_size()
Adding this variable better explains the computation of L3 way
size in the function.

V2: Use const variable for way_size_per_bank.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-20 12:18:26 -07:00
Anuj Phogat
8521559e08 i965: Fix broxton 2x6 l3 config
The new table added in this patch matches with the table
in gfxspecs. We were programming the wrong values earlier.

V2: Update the comment.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-20 12:18:26 -07:00
Ian Romanick
4eb3747544 i965: Fall back to normal blorp clear instead of meta clear
When intel_miptree_alloc_non_msrt_mcs fails, fall back to normal blorp
color clear instead of falling back to meta.  With this change,
brw_blorp_clear_color can never fail.

v2: Combine two if-statements to remove a level of indentation.
Suggested by Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Ian Romanick
cbb941cdec intel/blorp: Apply source offset in the TEX case
Previously the offset was only applied in the TXF case.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Ian Romanick
990f2be139 intel/blorp: Apply Gen4 coord. normalization after cubemap sizes are adjusted
Otherwise the values used for coordinate normalization use the wrong
sizes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Jason Ekstrand
b2dd61196e intel/blorp: Set needs_(dst|src)_offset for Gen4 cubemaps
We call convert_to_single_slice so they may end up with a non-trivial
offset that needs to be taken into account.

v2 (idr): Also set needs_src_offset.  Suggested by Jason.

Fixes ES2-CTS.functional.texture.specification.basic_copyteximage2d.cube_rgba
and ES2-CTS.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
on G45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101284
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Ian Romanick
cc14286930 meta/blit: Silence unused parameter warning
drivers/common/meta_blit.c: In function ‘setup_glsl_msaa_blit_scaled_shader’:
drivers/common/meta_blit.c:62:58: warning: unused parameter ‘filter’ [-Wunused-parameter]
                                    GLenum target, GLenum filter)
                                                          ^~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Ian Romanick
37164da272 meta: Silence unused parameter warning
drivers/common/meta.c:2694:71: warning: unused parameter ‘dims’ [-Wunused-parameter]
 copytexsubimage_using_blit_framebuffer(struct gl_context *ctx, GLuint dims,
                                                                       ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Ian Romanick
691beaf241 i965: Fix incorrect comment
There is no intel_miptree_slice_has_hiz function, but there is a
intel_miptree_level_has_hiz function.  I assume that's the correct one
to use.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:01 -07:00
Samuel Pitoiset
4f00b2bc7e mesa: simplify _mesa_IsVertexArray()
_mesa_lookup_vao() already returns NULL if id is zero.

v2: - change the conditional (Ian)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
2017-06-20 19:15:17 +02:00
Eric Engestrom
cb3e01ca71 mesa/format_info: use designated initialiser list
Also, make that table const, since no-one is supposed to modify it anyway.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-20 17:55:13 +01:00
Eric Anholt
45b0172693 vc4: Clean up release build warnings using MAYBE_UNUSED.
These variables are all used in an assert(), so release builds see no
usages.
2017-06-20 09:09:09 -07:00
Eric Anholt
743dcdd936 vc4: Allow VBOs to be mapped during execution.
There's no reason we can't -- the mappings we expose are basically
equivalent to persistent/coherent, already.

Improves mesa-demos drawoverhead (no state change) performance by
5.21362% +/- 1.25078% (n=11).
2017-06-20 09:05:44 -07:00
Brian Paul
d8148ed10a gallium/vbuf: avoid segfault when we get invalid glDrawRangeElements()
A common user error is to call glDrawRangeElements() with the 'end'
argument being one too large.  If we use the vbuf module to translate
some vertex attributes this error can cause us to read past the end of
the mapped hardware buffer, resulting in a crash.

This patch adjusts the vertex count to avoid that issue.  Typically,
the vertex_count gets decremented by one.

This fixes crashes with the Unigine Tropics and Sanctuary demos with older
VMware hardware versions.  The issue isn't hit with VGPU10 because we
don't hit this fallback.

No piglit changes.

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 08:03:18 -06:00
Brian Paul
2a9d8a45a6 gallium/vbuf: add some const qualifiers
Helps understandability a bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 08:03:12 -06:00
Brian Paul
ed83e73c4e translate: whitespace fixes in translate_generic.c 2017-06-20 07:56:34 -06:00
Brian Paul
ceb9ca7fa5 softpipe: remove unused softpipe_context::line_stipple_counter
Trivial.
2017-06-20 07:56:34 -06:00
Samuel Pitoiset
ea2492b62f radeonsi: set correct usage flag according to image access type
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 13:01:18 +02:00
Marek Olšák
58af1f6bb0 winsys/amdgpu: fix a deadlock when waiting for submission_in_progress
First this happens:

1) amdgpu_cs_flush (lock bo_fence_lock)
   -> amdgpu_add_fence_dependency
   -> os_wait_until_zero (wait for submission_in_progress) - WAITING

2) amdgpu_bo_create
   -> pb_cache_reclaim_buffer (lock pb_cache::mutex)
   -> pb_cache_is_buffer_compat
   -> amdgpu_bo_wait (lock bo_fence_lock) - WAITING

So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't
continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job,
but the CS ioctl is trying to release a buffer:

3) amdgpu_cs_submit_ib (CS thread - job entrypoint)
   -> amdgpu_cs_context_cleanup
   -> pb_reference
   -> pb_destroy
   -> amdgpu_bo_destroy_or_cache
   -> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK

The simple solution is not to wait for submission_in_progress, which we
need in order to create the list of dependencies for the CS ioctl. Instead
of building the list of dependencies as a direct input to the CS ioctl,
build the list of dependencies as a list of fences, and make the final list
of dependencies in the CS thread itself.

Therefore, amdgpu_cs_flush doesn't have to wait and can continue.
Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib
can continue.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-20 12:53:46 +02:00
Samuel Pitoiset
afeaa2e98a radeonsi: update all resident texture descriptors when needed
To avoid useless DCC fetches when DCC is disabled, descriptors
have to be updated in order to reflect this change. This is
quite similar to how we update descriptors of bound textures.

As a side effect, this should also prevent VM faults when
bindless textures are invalidated, because the VA in the
descriptor has to be updated accordingly as well.

I don't see any performance improvements with DOW3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:55 +02:00
Samuel Pitoiset
f00e80e3f7 radeonsi: keep track of the sampler state for texture handles
Needed for updating all resident texture descriptors when
dirty_tex_counter changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:52 +02:00
Lionel Landwerlin
bf5ca4f0b2 i965: perf: use gen_device_info rather then brw_context
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Lionel Landwerlin
6d759cbd49 intel: common: add number of thread per eu
This will be used by to normalize OA counters.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Lionel Landwerlin
c77d98ef32 intel: common: express timestamps units in frequency
Rather than storing the period as a double that looses some precision.

Also fixes the Gen9LP timestamp frequency which is no 19200123 but
19200000 as pointed by Ville :

https://lists.freedesktop.org/archives/intel-gfx/2017-April/125126.html

Finally add the Cannonlake timestamp frequency.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Lionel Landwerlin
e5743ee014 i965: convert MI_REPORT_PERF_COUNT to genxml
Also make it available from gen7 only to gen7+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Lionel Landwerlin
a26f8d99a6 i965: perf: fix codegen with single operand equation
We did support single value operand equations, but not single variable
operand ones. In particular we were failing on "$Sampler0Bottleneck".

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Lionel Landwerlin
5f2fe9302c intel: common: add flag to identify platforms by name
The perf infrastructure needs to identify specific platforms, not just
generations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Topi Pohjolainen
b539f6958e i965/wm: Use stored hiz surface instead of creating copy
Now the last user of intel_miptree_get_aux_isl_surf() is gone.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:57 +03:00
Topi Pohjolainen
7e4ea22762 i965/blorp: Use hiz surface instead of creating copy
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:57 +03:00
Topi Pohjolainen
f60e23cb57 i965/miptree/gen7+: Use isl for hiz layouts
v2: Use better assert by checking isl_surf_get_hiz_surf()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:57 +03:00
Topi Pohjolainen
67b44a8423 i965/miptree: Drop BO_ALLOC_FOR_RENDER in intel_miptree_alloc_mcs()
because buffers get unconditionally initialised by cpu writing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:57 +03:00
Topi Pohjolainen
1a43d774b6 i965/miptree: Use isl for mcs layouts
and pass the ccs isl surface to blorp instead of creating a
copy.

v2 (Jason): Explain ccs change and use better assert checking
            isl_surf_get_mcs_surf()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:57 +03:00
Topi Pohjolainen
31bd461816 i965/miptree: Refactor aux surface allocation
v2 (Jason): Drop unused argument in intel_alloc_aux_buffer() and
            move assignment of "buf->surf" in intel_alloc_aux_buffer()
            into this patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:57 +03:00
Topi Pohjolainen
7e25410563 i965/gen6: Use isl for hiz
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:57 +03:00
Topi Pohjolainen
59e5519afa i965/miptree: Refactor isl aux usage resolver
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:56 +03:00
Topi Pohjolainen
d8a4b8bc88 i965/gen6: Use isl for stencil surfaces
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:56 +03:00
Topi Pohjolainen
0e816c9deb i965/miptree: Prepare range getter for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:56 +03:00
Topi Pohjolainen
a808eb172a i965/miptree: Prepare stencil mapping for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:56 +03:00
Topi Pohjolainen
7294cde750 i965/blorp: Prepare for isl based miptrees
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:56 +03:00
Topi Pohjolainen
3cf470f2b6 i965: Add isl based miptree creator
v2: Use new brw_bo_alloc_tiled() interface

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:57:44 +03:00
Topi Pohjolainen
5d125f999e i965/miptree: Add option to resolve offsets using isl_surf
v2 (Nanley): Add comment telling why "level -= mt->first_level"

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
71ac909137 i965: Prepare slice copy for isl based miptrees
v2 (Jason): Fix a helper variable only used for assert -
            open code instead.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
de158c1e43 i965/tex: Prepare image update for isl based miptrees
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
bb9c4113dc i965: Prepare framebuffer validator for isl based miptrees
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
c05817ffc5 i965: Prepare slice validator for isl based miptrees
v2 (Nanley): Minify depth in case of 3D surface. Also moved to
             .c file to get minify() without additional
             header inclusions

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
143e3a679a i965: Prepare image validation for isl based miptrees
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
41a7a0e548 i965: Prepare up/downsampling for isl based miptrees
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
02fa622037 i965/miptree: Add isl surface
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
5a3105fe9a i965: Add helper for converting isl tiling to bufmgr tiling
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:45 +03:00
Topi Pohjolainen
a7480d3f03 i965/miptree: Refactor mapping table alloc
v2 (Nanley): Use minify() instead of direct shift

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:41:39 +03:00
Topi Pohjolainen
335543699a i965/gen6: Declare minify(depth, level) layers for 3D stencil
Keeps following patch refactoring the table allocation
non-functional.

Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:18:53 +03:00
Topi Pohjolainen
a5e1c9f1d5 i965/gen4: Add support for single layer in alignment workaround
On gen < 6 one doesn't have level or layer specifiers available
for render and depth targets. In order to support rendering to
specific level/layer, driver needs to manually offset the surface
to the desired slice.
There are, however, alignment restrictions to respect as well and
in come cases the only option is to use temporary single slice
surface which driver copies after rendering to the full miptree.

Current alignment workaround introduces new texture images which
are added to the parent texture object. Texture validation later
on copies the additional levels back to the surface that contains
the full mipmap.
This only works for non-arrayed surfaces and driver currently
creates new arrayed images in vain - individual layers within the
newly created are still unaligned the same as before.

This patch drops this mechanism and instead attaches single
temporary slice into the render buffer. This gets immediately
copied back to the mipmapped and/or arrayed surface just after
the render is done.

Sitting on top of earlier series cleaning up the depth buffer
state, this patch additionally fixes the following piglit tests:

    arb_framebuffer_object.fbo-generatemipmap-cubemap.g965m64
    arb_texture_cube_map.copyteximage cube.g965m64
    arb_texture_cube_map.copyteximage cube.ilkm64
    arb_pixel_buffer_object.texsubimage array pbo.g965m64
    ext_framebuffer_object.fbo-cubemap.g965m64
    ext_texture_array.copyteximage 1d_array.g45m64
    ext_texture_array.copyteximage 1d_array.g965m64
    ext_texture_array.copyteximage 1d_array.ilkm64
    ext_texture_array.copyteximage 2d_array.g45m64
    ext_texture_array.copyteximage 2d_array.g965m64
    ext_texture_array.copyteximage 2d_array.ilkm64
    ext_texture_array.fbo-array.g965m64
    ext_texture_array.fbo-generatemipmap-array.g965m64
    ext_texture_array.gen-mipmap.g965m64

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:18:53 +03:00
Topi Pohjolainen
a9c59c10a5 i965/miptree: Separate src and dst slice specifiers in slice copy
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:18:53 +03:00
Topi Pohjolainen
920c8e89c5 i965/miptree: Clarify face/level/layer in slice copy
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-19 22:18:53 +03:00
Jonas Kulla
a52ee32a9a anv: Fix L3 cache programming on Bay Trail
Valid values for URBAllocation start at 32, so substract that
before programming the register.

This was missed when porting from the GL driver.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-19 12:05:52 -07:00
Marek Olšák
3fc99f1299 radeonsi: fix dumping shader descriptors into ddebug logs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:16:20 +02:00
Marek Olšák
f9dc29a9a5 radeonsi: add a workaround for inexact SNORM8 blitting again
GFX9 is affected.

We only have tests for GL_x_SNORM where x is R8, RG8, RGB8, and RGBA8.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
0f827b51c0 radeonsi/gfx9: fix TC-compatible stencil compression
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
8a264dd829 radeonsi/gfx9: fix TXF_LZ with 1D textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
353b60cab5 radeonsi/gfx9: disable sparse buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
064f07fef3 ac/sid.h: don't use parentheses in PKT3_RELEASE_MEM definition
The parses skips the line if it contains parentheses.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
ed291cea3d ac: parse EVENT_WRITE_EOP, RELEASE_MEM, WAIT_REG_MEM, NOWHERE
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
66b6babbea st/mesa: simplify returning GL_VENDOR
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:09:52 +02:00
Marek Olšák
92b4ca4550 st/mesa: remove the "Gallium 0.4 on" prefix from GL_RENDERER
If you want to keep it for your driver, please raise your hand.
The prefix will probably have to be added into the driver instead of here.

I cringe when I look at my long renderer string:
  Gallium 0.4 on AMD Radeon R9 Fury Series (DRM 3.17.0 / 4.11.0-staging-01277-gab25a9e, LLVM 5.0.0)

I'm sincerely sorry for all apps that detect Mesa by expecting "Gallium"
in the string.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-06-19 20:09:52 +02:00
Marek Olšák
61dc2c964e st/mesa: don't update MSAA states for GL_FRAMEBUFFER_SRGB
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:09:52 +02:00
Kenneth Graunke
6a7c5257ca i965: Ignore anisotropic filtering in nearest mode.
This fixes both Europa Universalis IV and Stellaris rendering on i965.
This was tested on SKL.

This fix was discovered by Jakub Szuppe at Stream HPC
(https://streamhpc.com/).

bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96958
bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95530
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
2017-06-19 10:09:06 -07:00
Iago Toral Quiroga
b70d6a2de1 glsl: gl_Max{Vertex,Fragment}UniformComponents exist in all desktop GL versions
The current implementation assumed that these were replaced in GLSL >= 4.10
by gl_Max{Vertex,Fragment}UniformVectors, however this is not true: both
built-ins should be produced from GLSL 4.10 onwards.

This was raised by new CTS tests that are in development.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 14:43:54 +02:00
Emil Velikov
4a7222518d docs: update calendar, add news item and link release notes for 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-19 12:23:07 +01:00
Emil Velikov
42098bf9b2 docs: add sha256 checksums for 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-19 12:20:52 +01:00
Emil Velikov
b55dfb7be3 docs: add release notes for 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-19 12:20:51 +01:00
Nicolai Hähnle
b28938ffce st/glsl_to_tgsi: use correct writemask when converting generic intrinsics
This fixes a bug when lowering ballotARB: previously, using writemask 0xf,
emit_asm would create TGSI_OPCODE_BALLOT instructions that span two registers
to cover 4 64-bit channels. This could trample over other a neighbouring
temporary.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101360
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-19 12:07:05 +02:00
Nicolai Hähnle
25e5534734 gallium/radeon/gfx9: fix PBO texture uploads to compressed textures
st/mesa creates a surface that reinterprets the compressed blocks as
RGBA16UI or RGBA32UI. We have to adjust width0 & height0 accordingly to
avoid out-of-bounds memory accesses by CB.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-19 12:05:15 +02:00
Nicolai Hähnle
4d5bb1b987 r600: fix off-by-one in egd_tables.py
Port of the corresponding fix in sid_tables.py.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-19 12:05:12 +02:00
Nicolai Hähnle
67e49a7f65 amd/common: fix off-by-one in sid_tables.py
The very last entry in the sid_strings_offsets table ended up missing,
leading to out-of-bounds reads and potential crashes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-19 12:03:59 +02:00
Iago Toral Quiroga
b72b7c541d i965: update MaxTextureRectSize to match PRMs and comply with OpenGL 4.1+
We were exposing 4096, but we can do up to 8192 in Gen4-6 and up to
16384 in gen7+. OpenGL 4.1+ requires at least 16384.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 07:55:48 +02:00
Samuel Pitoiset
10d104207a mesa: add KHR_no_error support for gl*UniformHandleui64*ARB
Similar to _mesa_uniform() except that we have to call
validate_uniform_parameters() instead of validate_uniform().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-18 14:21:05 +02:00
Samuel Pitoiset
304de4edb9 mesa: add KHR_no_error support for glGetImageHandleARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-18 14:21:04 +02:00
Samuel Pitoiset
530ff887eb mesa: add KHR_no_error support for glGetTexture*HandleARB()
It would be nice to have a no_error path for
_mesa_test_texobj_completeness() because this function doesn't
only test if the texture is complete.

Anyway, that seems enough for now and a bunch of checks are
skipped with this patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-18 14:21:01 +02:00
Samuel Pitoiset
0fb2c89c71 mesa: add KHR_no_error support for glMake{Image,Texture}Handle*ResidentARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-18 14:20:59 +02:00
Samuel Pitoiset
d7bee4a022 mesa: add KHR_no_error support for glIs{Image,Texture}HandleResidentARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-18 14:20:57 +02:00
Samuel Pitoiset
6ff6863c32 radeonsi: reduce overhead for resident textures which need color decompression
This is done by introducing a separate list.

si_decompress_textures() is now 5x faster.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:38 +02:00
Samuel Pitoiset
06ed251c32 radeonsi: reduce overhead for resident textures which need depth decompression
This is done by introducing a separate list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:36 +02:00
Samuel Pitoiset
705a6a560e radeonsi: use util_dynarray_foreach for bindless resources
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:34 +02:00
Samuel Pitoiset
db73595018 mesa/util: add util_dynarray_clear() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:32 +02:00
Samuel Pitoiset
8d9e76ce1f gallium/radeon: add a new HUD query for the number of resident handles
Useful for debugging performance issues when ARB_bindless_texture
is enabled. This query doesn't make a distinction between texture
and image handles.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:08:08 +02:00
Topi Pohjolainen
e08171ef53 i965/gen4: Refactor depth/stencil rebase
Effectively there is the same code twice, once for depth and
again for stencil.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-18 10:46:44 +03:00
Topi Pohjolainen
84b195b361 i965: Drop depth/stencil miptree pointers in alignment workaround
In brw_workaround_depthstencil_alignment() corresponding
renderbuffers are always set to refer to the same temp miptrees.
There is no need to carry them in context.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-18 10:46:44 +03:00
Topi Pohjolainen
cd0804c359 i965/gen4: Simplify depth/stencil invalidate check
There is no separate stencil on gen < 6.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-18 10:46:44 +03:00
Topi Pohjolainen
bb5d3fe96a i965/gen4: Remove redundant check for depth when rebasing stencil
In case of gen < 6 stencil (if present) is always combined with
depth. Both stencil and depth attachments point to the same
physical surface.
Alignment workaround starts by considering depth and updates
stencil accordingly. Current logic continues with stencil and
in vain considers the case where depth would refer to different
surface than stencil.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-18 10:46:44 +03:00
Topi Pohjolainen
04524ac0d4 i965/gen4: Remove non-existing stencil and hiz buffer setup
Separate stencil and hiz are only enabled for gen6+.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-18 10:46:44 +03:00
Mauro Rossi
58d337941e android: ac: add missing libdrm_amdgpu shared dependency
Fixes building errors in amd/common:

target  C: libmesa_amd_common <= external/mesa/src/amd/common/ac_gpu_info.c
...
target  C: libmesa_amd_common <= external/mesa/src/amd/common/ac_surface.c
...

external/mesa/src/amd/common/ac_gpu_info.h:31:10: fatal error: 'amdgpu.h' file not found
         ^
2 errors

Fixes: 98a2492 ("ac_surface: use radeon_info from ac_gpu_info")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-06-17 18:38:31 +01:00
Emil Velikov
68aa39d5c2 r600: include libelf headers only as needed
Headers are required only when building with OpenCL. As we're building
w/o it libelf may be missing, hence we'll error out as below:

src/gallium/drivers/r600/evergreen_compute.c:27:10:
fatal error: 'gelf.h' file not found
         ^
1 error generated.

Fixes: d96a210842 ("r600g,compute: provide local copy of functions from
ac_binary.c")
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-17 16:57:18 +01:00
Emil Velikov
1f958c1337 radeonsi: include ac_binary.h for struct ac_shader_binary
The header embeds the struct so it needs the header inclusion instead of
the dummy forward declaration.

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Tom Stellard <tstellar@redhat.com>
Fixes: 32206c5e56 ("radeonsi: Add radeon_shader_binary member to struct
si_shader")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-17 11:38:02 +01:00
Emil Velikov
7e1c42cf89 r600, radeon: move radeon_shader_binary_{init,clean} back to radeon
Those are used by r600 and radeonsi, so moving them within the former
was a bad idea.

Fixes: d96a210842 ("r600g,compute: provide local copy of functions
from ac_binary.c")
Cc: Jan Vesely <jan.vesely@rutgers.edu>
Cc: Aaron Watry <awatry@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-17 11:37:58 +01:00
Emil Velikov
84bf7e5ad6 ac: resolve conflicts introduced with "ac: remove amdgpu.h dependency"
The commit did not add the relevant includes - in particular
stdint.h and stdbool.h for the respective standard types.

At the same time, the amdgpu_device_handle typedef redeclaration was
off.

Fixes: 81945ded0d ("ac: remove amdgpu.h dependency")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101471
Cc: Mark Janes <mark.a.janes@intel.com>
Cc: Gregor Münch <gr.muench@gmail.com>
Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reported-by: Gregor Münch <gr.muench@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-17 11:37:51 +01:00
Topi Pohjolainen
6967285981 i965/gen4: Set depth offset when there is stencil attachment only
Current version fails to set depthstencil.depth_offset when there
is only stencil attachment (it does set the intra tile offsets
though). Fixes piglits:

g45,g965,ilk:   depthstencil-render-miplevels 1024 s=z24_s8
g45,ilk:        depthstencil-render-miplevels 273 s=z24_s8

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-17 06:38:56 +03:00
Topi Pohjolainen
a8e89cd539 i965/gen6: Remove dead code in hiz surface setup
In intel_hiz_miptree_buf_create() the miptree is unconditionally
created with MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-17 06:38:56 +03:00
Topi Pohjolainen
0d1af164e1 intel/isl/gen6: Allow arrayed stencil
Nothing prevents arrayed stencil surfaces even though hardware
doesn't support mipmapping.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-17 06:38:56 +03:00
Brian Paul
e3f5b8ac16 svga: add new num-failed-allocations HUD query
This counter is incremented if we fail to allocate memory for
vertex/index/const buffers, textures, etc.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 17:04:08 -06:00
Brian Paul
b27281c110 gallium/hud: support GALLIUM_HUD_DUMP_DIR feature on Windows
Use a dummy implementation of the access() function.  Use \ path separator.
Add a few comments.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 17:04:02 -06:00
Brian Paul
d6cb912d65 svga: add a few minor comments
Trivial.
2017-06-16 17:03:01 -06:00
Brian Paul
15f4c3ada4 mesa: whitespace fixes in enable.c
Remove trailing whitespace, replace tabs w/ spaces, etc.  Trivial.
2017-06-16 17:03:01 -06:00
Rafael Antognolli
c2b5a26dc2 i965: Convert SF_STATE to genxml.
This patch finishes the work done by Ken of converting SF_STATE to genxml, and
merges it with gen6+ code for emitting that state.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Rafael Antognolli
3a767f8b06 genxml: The viewport state offset is actually an address.
This fixes code generation on gen45.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Rafael Antognolli
ad109c16c2 genxml: Rename fields to match gen6+.
"Anti-aliasing Enable" to "Anti-Aliasing Enable".

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Rafael Antognolli
1b42cd52a2 genxml: Rename SF_STATE field to match gen6+.
Rename "Use Point Width State" to "Point Width Source". It accepts the same
values and has the same meaning as gen6+, so lets keep them with the same name
to simplify the code.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Rafael Antognolli
bd40c71132 i965: aa_line_distance_mode should be before the padding.
It seems that it was never set correctly.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Tim Rowley
a6237e4b7f swr/rast: Fix read-back of viewport array index
Binner/clipper read viewport array index from the vertex header as needed.
Move viewport state to BACKEND_STATE.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
9b448da60f swr/rast: Refactor includes to limit simdintrin.h usage
Reduces the files rebuilt after modifying simdintrin.h from
84 to 64.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
08a466aec0 swr/rast: Fix read-back of render target array index
The last FE stage can emit render target array index. Currently we only
check to see if GS is emitting it. Moved the state to BACKEND_STATE and
plumbed the driver to set it.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
17cdd1e796 swr/rast: Adjust cast for gcc warning
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
bea00a7b6e swr/rast: Don't transition hottile resolved->dirty during store tiles
Fixes crash when dumping render targets and RT surface has been deleted.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
5c08bfbd17 swr/rast: gen_llvm_types.py support for SIMD256/SIMD512
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
21baadfe58 swr/rast: Properly size GS stage scratch space
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
3695c8ec1e swr/rast: Fix early z / query interaction
For certain cases, we perform early z for optimization. The GL_SAMPLES_PASSED
query was providing erroneous results because we were counting the number
of samples passed before the fragment shader, which did not work if the
fragment shader contained a discard.

Account properly for discard and early z, by anding the zpass mask with
the post fragment shader active mask, after the fragment shader.

Fixes the following piglit tests:
    - occlusion-query-discard
    - occlusion_query_meta_fragments

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
b7eb86c617 swr/rast: Share vertex memory between VS input/output
Removes large simdvertex stack allocation.

Vertex shader must ensure reads happen before writes.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
7f3be3f0b8 swr/rast: Add support for dynamic vertex size for VS output
Add support for dynamic vertex size for the vertex shader output.

Add new state in SWR_FRONTEND_STATE to specify the size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
8e5d11cd7b swr/rast: SIMD16 FE - improve calcDeterminantIntVertical
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
01eca81cd4 swr/rast: Add support to PA for variable sized vertices
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
b10cdb217a swr/rast: Rework attribute layout
Move fixed attributes to the top and pack single component SGVs.
WIP to support dynamically allocated vertex size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
36ac8ba511 swr/rast: Remove explicit primitive id slot in the vertex layout
- Remove any special casing in the PS stage when primitive ID is input.
  Treat as a normal attribute that must be set up properly in the FE linkage.
- Remove primitive id from the PS_CONTEXT and TRI_FLAGS

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
8716e0d8b4 swr/rast: Fix invalid 16-bit format traits for A1R5G5B5
Correctly handle formats of <= 16 bits where the component bits don't
add up to the pixel size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
a25093de71 swr/rast: Implement JIT shader caching to disk
Disabled by default; currently doesn't cache shaders (fs,gs,vs).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Brian Paul
1c33dc77f7 gallium/docs: improve docs for SAMPLE_POS, SAMPLE_INFO, TXQS, MSAA semantics
For the SAMPLE_POS and SAMPLE_INFO opcodes, clarify resource vs. render
target queries, range of postion values, swizzling, etc.  We basically
follow the DX10.1 conventions.

For the TXQS opcode and TGSI_SEMANTIC_SAMPLEID, clarify return value
and type.

For the TGSI_SEMANTIC_SAMPLEPOS system value, clarify the range of
positions returned.

v2: use 'undef' for unused vector components.  Use (0.5, 0.5, undef, undef)
for sample pos when MSAA not applicable.

v3: Add note that OPCODE_SAMPLE_INFO, OPCODE_SAMPLE_POS are not used yet
and the information is subject to change.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-06-16 14:07:31 -06:00
Brian Paul
005c978c5a svga: add some missing SVGA_STATS_* enum values, prefix strings
To fix the build when VMX86_STATS is defined.
Also, some minor whitespace changes to match upstream code.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 14:06:53 -06:00
Alex Deucher
5c603b902b radeonsi: add new polaris12 pci id
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
2017-06-16 16:03:16 -04:00
Bruce Cherniak
80b587ba27 swr: Don't crash when encountering a VBO with stride = 0.
The swr driver uses vertex_buffer->stride to determine the number
of elements in a VBO. A recent change to the state-tracker made it
possible for VBO's with stride=0. This resulted in a divide by zero
crash in the driver. The solution is to use the pre-calculated vertex
element stream_pitch in this case.

This patch fixes the crash in a number of piglit and VTK tests introduced
by 17f776c27b.

There are several VTK tests that still crash and need proper handling of
vertex_buffer_index.  This will come in a follow-on patch.

v2: Correctly update all parameters for VBO constants (stride = 0).
    Also fixes the remaining crashes/regressions that v1 did
    not address, without touching vertex_buffer_index.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-06-16 13:45:24 -05:00
Anuj Phogat
c07271fef0 intel/isl: Add the maximum surface size limit
V2: Use 2^31 bytes (2GB) surface size limit on pre-gen9 and
    2^38 bytes for gen9+.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-06-16 09:05:05 -07:00
Anuj Phogat
7022978237 intel/isl: Use uint64_t to store total surface size
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-06-16 09:05:05 -07:00
Chris Wilson
05d5caffc4 i965: Mark freshly allocate bo as idle
When created, buffers are idle, so mark them as such to save an early
ioctl or mistakenly assuming the fresh buffer is busy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-16 16:20:28 +01:00
Christian Gmeiner
82db591155 etnaviv: add rs-operations sw query
It could be useful to get the number of emited resolve operations when
doing driver optimizations.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-16 15:28:12 +02:00
Lucas Stach
5065549e2a etnaviv: advertise correct max LOD bias
The maximum LOD bias supported is the same as the max texture level
supported.

Fixes piglit: ext_texture_lod_bias

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
8644b59b5d etnaviv: mask correct channel for RB swapped rendertargets
Now that we support RB swapped targets by using a shader variant, we
must derive the color mask from both the blend state and the bound
framebuffer.

Fixes piglit: fbo-colormask-formats

Fixes: 7f62ffb68a ("etnaviv: add support for rb swap")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
d6aa2ba2b2 etnaviv: replace translate_clear_color with util_pack_color
This replaces the open coded etnaviv version of the color pack with the
common util_pack_color.

Fixes piglits:
arb_color_buffer_float-clear
fcc-front-buffer-distraction
fbo-clearmipmap

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
6633880e7e etnaviv: remove bogus assert
etna_resource_copy_region handles resources with multiple samples
by falling back to the software path. There is no need to kill the
application there.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
ff490eb8fd etnaviv: use padded width/height for resource copies
When copying a resource fully we can just blit the whole level. This allows
to use the RS even for level sizes not aligned to the RS min alignment. This
is especially useful, as etna_copy_resource is part of the software fallback
paths (used in etna_transfer), that are used for doing unaligned copies.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
2a6183d416 etnaviv: don't try RS blit if blit region is unaligned
If the blit region is not aligned to the RS min alignment don't try
to execute the blit, but fall back to the software path.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Emil Velikov
d5199cdd7a Revert "amd/common: add missing libdrm include path"
This reverts commit 44b29dd7b6.

Should no longer be required as of last patch.

Cc: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-16 12:41:44 +01:00
Emil Velikov
81945ded0d ac: remove amdgpu.h dependency
Add a couple of forward declarations and drop the amdgpu.h requirement.

With this we can build the r300 and r600 drivers without the need for
amdgpu.

v2:
 - Add amdgpu.h include in the C file (Marek)
 - Add a comment about pre C11 typedef redeclaration warning (Eric)

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101189
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-16 12:41:44 +01:00
Jan Vesely
d96a210842 r600g,compute: provide local copy of functions from ac_binary.c
This is a verbatim copy of the code. The functions can be cleaned up since
r600 does not use all the stuff that gcn does.
The symbol names have been changed since we still use ac_binary.h header
(for struct definition)

v2: Add ifdef guard around r600_binary_clean call (Aaron)
    Remove stray comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-By: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-16 12:41:44 +01:00
Jan Vesely
d41b7b0104 r600: android: amdgpu_common is only required when building OpenCL
v2: split off Android changes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-16 12:41:44 +01:00
Eric Engestrom
311c091658 egl/display: make platform detection thread-safe
Imagine there are 2 threads that both call _eglGetNativePlatform()
simultaneously:
- thread 1 completes the first "if (native_platform ==
  _EGL_INVALID_PLATFORM)" check and is preempted to do something else
- thread 2 executes the whole function, does "native_platform =
  _EGL_NATIVE_PLATFORM" and just before returning it's preempted
- thread 1 wakes up and calls _eglGetNativePlatformFromEnv() which
  returns _EGL_INVALID_PLATFORM because no env vars are set, updates
  native_platform and then gets preempted again
- thread 2 wakes up and returns wrong _EGL_INVALID_PLATFORM

Solve this by doing the detection in a local var and only overwriting
the global one at the end, if no other thread has updated it since.

This means the platform detected in the thread might not be the platform
returned by the function, but this is a different issue that will need
to be discussed when this becomes possible.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-06-16 11:02:06 +01:00
Eric Engestrom
4ca9ae587c egl/display: only detect the platform once
My refactor missed the fact that `native_platform` is static.
Add the proper guard around the detection code, as it might not be
necessary, and only print the debug message when a detection was
actually performed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Fixes: 7adb9b0948 ("egl/display: remove unnecessary code and
                              make it easier to read")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-06-16 11:02:05 +01:00
Thomas Hellstrom
9d81ab7376 svga: Relax the format checks for copy_region_vgpu10 somewhat
The new generic checks were actually more restrictive than the previous svga-
specific tests and not vice versa. So bypass the common format checks for
copy_region_vgpu10.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
a37eede540 svga: Fix incorrect format conversion blit destination
The blit.dst.resource member that was used as destination was
modified earlier in the function, effectively making us try to blit
the content onto itself. Fix this and also add a debug printout when the
format conversion blits fail.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
5732ac3ecc svga: Fix srgb copy_region regression
This fixes a tf2 srgb copy_region regression from
"svga: Rework the blit and resource_copy_region functionality v3"

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
14f888a2ba svga: Prefer accelerated blits over cpu copy region
This reduces the number of cpu copy_region fallbacks on a Nvidia system
running the piglit command

./publish/bin/piglit run  -1 -t copy -t blit tests/quick

from 64789 to 780

Previously this has caused a regression in piglit test
spec@!opengl 1.0@gl-1.0-scissor-copypixels, but I'm currently not able to
reproduce that regression.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
4c3e8f141b svga: Support accelerated conditional blitting
The blitter has functions to save and restore the conditional rendering state,
but we currently don't save the needed info.

Since also the copy_region_vgpu10 path supports conditional blitting,
we instead use the same function as the clearing routines and move
that function to svga_pipe_query.c

Note that we still haven't implemented conditional blitting with
the software fallbacks.

Fixes piglit nv_conditional_render::copyteximage

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
71f857d6ab svga: Use utility functions to help determine whether we can use copy_region
It seems like the SVGA tests are in general more stringent than the utility
tests, but they also miss some blitter features like filters and window
rectangles, and if new blitter features are added in the future, it might
be possible that we forget adding tests for those.

So in addition to the SVGA tests, use the utility tests to restrict the
situations where we can use copy_region.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
f4c2d4bd4a svga: Rework the blit and resource_copy_region functionality v3
This work was initially trigged by the fact that imported surfaces may
be backed by other SVGA3D formats than the default. Therefore some fixes were
needed to avoid using the copy_region_vgpu10() functionality for incompatible
SVGA3D formats where the pipe formats were OK. This situation happens when
using dri3.

Also in some situations, for example where a R8G8_UNORM surface is backed by
an SVGA3D_NV12 format, we can't use the copy_region functionality at all and
thus need to fall back to the quad blitter also for the resource_copy_region
function. This situation doesn't happen currently, but will if we start using
video textures.

The patch makes the blit- and copy_region paths similar and the decision whether
to use a certain gpu command should now be easy to locate. Probably the
resource_copy_region path will suffer from a minor additional cpu overhead,
but on the other hand there are more cases now that we accelerate, since
we try harder before falling back to cpu copies / blits.

v2: Addressed review comments and fixed up piglit failures by sometimes
preferring cpu_copy_region() over blit().
v3: Removed a stray test statement. Updated commit message.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Kenneth Graunke
ad412d6319 i965: Improve conditional rendering in fallback paths.
We need to fall back in a couple of cases:
- Sandybridge (it just doesn't do this in hardware)
- Occlusion queries on Gen7-7.5 with command parser version < 2
- Transform feedback overflow queries on Gen7, or on Gen7.5 with
  command parser version < 7

In these cases, we printed a perf_debug message and fell back to
_mesa_check_conditional_render(), which stalls until the full
query result is available.  Additionally, the code to handle this
was a bit of a mess.

We can do better by using our normal conditional rendering code,
and setting a new state, BRW_PREDICATE_STATE_STALL_FOR_QUERY, when
we would have set BRW_PREDICATE_STATE_USE_BIT.  Only if that state
is set do we perf_debug and potentially stall.  This means we avoid
stalls when we have a partial query result (i.e. we know it's > 0,
but don't have the full value).  The perf_debug should trigger less
often as well.

Still, this is primarily intended as a cleanup.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-15 22:42:50 -07:00
Emil Velikov
1b03323e17 configure.ac: remove manual AC_SUBST for pthread-stubs
Unneeded, since the PKG_CHECK_MODULES macro already does the
substitution of the package Cflags/Libs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-15 23:24:26 +01:00
Emil Velikov
e5aa806e5f configure.ac: add -pthread to PTHREAD_LIBS
As described inline - follow what's written in the manual and what works
for all platforms that Mesa supports.

We want to untangle things leaving only -pthread, yet that has a
potential of causing regressions. Thus we'll do it as a follow-up patch.

As a nice side-effect this resolves issues, where the system lacks
libpthread.so, yet the linker does not warn about it and we and up with
unresolved symbols.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101071
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-15 23:24:26 +01:00
Timothy Arceri
fcbb93e860 mesa: stop assigning unused storage for non-bindless opaque types
The storage was once used by get_sampler_uniform_value() but that
was fixed long ago to use the uniform storage assigned by the
linker.

By not assigning storage for images/samplers the constant buffer
for gallium drivers will be reduced which could result in small
perf improvements.

V2: rebase on ARB_bindless_texture

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-16 08:09:03 +10:00
Robert Foss
a4d3371176 egl/android: Fix typ-o
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-06-15 22:35:10 +01:00
Brian Paul
c8f344ed2d draw: check for line_width != 1.0f in validate_pipeline()
We shouldn't use the wide line stage if the line width is 1.
This check isn't strictly needed because all drivers are (now)
specifying a line wide threshold of at least 1.0 pixels, but
let's play it safe.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-15 13:53:00 -06:00
Brian Paul
c2b92dada0 svga: clamp device line width to at least 1 to fix HWv8 line stippling
The line stipple fallback code for virtual HW version 8 didn't work.

With HW version 8, we were getting zero when querying the max line
widths (AA and non-AA).  This means we were setting the draw module's
wide line threshold to zero.  This caused the wide line stage to always
get enabled.  That caused the line stipple module to fall because the
wide line stage was clobbering the rasterization state with a state
object setting the line stipple pattern to 0xffff.

Now the wide_lines variable in draw's validate_pipeline() will not
be incorrectly set.

Also improve debug output.

BTW, also this fixes several other piglit tests: polygon-mode,
primitive- restart-draw-mode, and line-flat-clip-color since they
all use the draw module fallback.

See VMware bug 1895811.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-15 13:53:00 -06:00
Brian Paul
c9f4e069ba draw: whitespace and formatting fixes
Trivial.
2017-06-15 13:53:00 -06:00
Brian Paul
c2e00c29b7 automake: increase the MESA_GIT_SHA1 hash id length from 7 to 10 digits
The SCons build has been using 10 digits of the git hash id for the
MESA_GIT_SHA1 string in git_sha1.h for about a year now.  I bumped it
up after running into a case where a 7-digit hash ID was ambiguous.

This patch makes the same change for the autotools build.

The command "git log | grep "^commit" | cut -b 8-14 | sort | uniq -d"
shows there are currently 17 cases where 7 digits of hash id are
ambiguous on master (probably quite a few more if we'd consider other
branches).

Instead of using "git log -n 1 --oneline" use
"git rev-parse --short=10 HEAD" to get the HEAD hash id.

v2: use printf instead of sed, per Eric's suggestion.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-15 13:53:00 -06:00
Eric Anholt
7029ec05e2 gallium: Add renderonly-based support for pl111+vc4.
This follows the model of imx (display) and etnaviv (render): pl111 is a
display-only device, so when asked to do GL for it, we see if we have a
vc4 renderer, make the vc4 screen, and have vc4 call back to pl111 to do
scanout allocations.

The difference from etnaviv is that we share the same BO between vc4 and
pl111, rather than having a vc4 bo and a pl11 bo and copies between the
two.  The only mismatch between their requirements is that vc4 requires
4-pixel (at 32bpp) stride alignment, while pl111 requires that stride
match width.  The kernel will reject any modesets to an incorrect stride,
so the 3D driver doesn't need to worry about that.

v2: Rebase on Android rework, drop unused include.
v3: Fix another Android bug, from Rob Herring's build-testing.

Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-15 11:41:22 -07:00
Eric Anholt
7a17191305 etnaviv: Only use renderonly_get_handle for GEM handles.
Note that for requests for Prime FDs or flink names, we return handles to
the etanviv BO, not the scanout BO.  This is at least better than previous
behavior of returning GEM handles for a request for an FD or flink name.

And add an assert that renderonly_get_handle is only used for getting the
GEM handle.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-15 11:41:22 -07:00
Mauro Rossi
d5a9608076 android: r600/eg: add support for tracing IBs after a hang.
The rules to generate egd_tables.h are added in Android makefile

Fixes: f42fb00 "r600/eg: add support for tracing IBs after a hang."
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-15 15:46:44 +01:00
Mauro Rossi
d5523d912c svga: fix git_sha1.h include path in Android.mk (v3)
Adds libmesa_git_sha1 static (dummy) library to generate git_sha1.h
with some polishing to header dependency on .git/HEAD and scripted rules.

The now redundant generation rules are removed from Android.gen.mk
libmesa_git_sha1 whole static depedency is added to libmesa_pipe_svga,
libmesa_dricore and libmesa_st_mesa modules

Fixes the following building error:

external/mesa/src/gallium/drivers/svga/svga_screen.c:26:10:
fatal error: 'git_sha1.h' file not found
         ^
1 error generated.

Fixes: 1ce3a27 ("svga: Add the ability to log messages to
vmware.log on the host.")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-15 15:46:44 +01:00
Andres Gomez
5d87667fed bin/get-fixes-pick-list.sh: better identify multiple "fixes:" tags
We were not considering as multiple fixes lines with:
Fixes: $sha_1, Fixes: $sha_2

Now, we split the lines so we will consider them individually, as in:
Fixes: $sha_1,
Fixes: $sha_2

Additionally, we try to get the SHA from split lines so:
Fixes:
$sha_1

Will be considered as:
Fixes: $sha_1

v2:
 - Treat empty spaces earlier in fix lines (Emil)
 - Fold 2 lines into one to gather fix commit ids (Emil)

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
2017-06-15 15:53:21 +03:00
Andres Gomez
f1590363c9 bin/get-fixes-pick-list.sh: parse just the commit message
We were parsing the whole diff, although the candidates were
identified only by the commit message.

Now, we only use the commit message for parsing.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
2017-06-15 15:53:21 +03:00
Samuel Pitoiset
e8df89d2c5 gallium/radeon: fix initialization of new resource bindless fields
r600_resource objects are not calloc'd.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-15 11:56:21 +02:00
Lucas Stach
4026744fcb gbm: implement FD import with modifier
This implements a way to import FDs with modifiers on plain GBM devices,
without the need to go through EGL. This is mostly to the benefit of
gbm_gralloc, which can keep its dependencies low.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-06-15 10:43:36 +01:00
Lucas Stach
71b78b6b0c gbm: add API to to import FD with modifier
This allows to import an FD with an explicit modifier passed through
userspace protocols.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-06-15 10:43:23 +01:00
Emil Velikov
18d4a6f964 i965: gen4_blorp_exec.h to the sources list
We tend to use the sources, as opposed to EXTRA_DIST to include the
headers.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-15 10:29:47 +01:00
Michel Dänzer
176e761513 gallium/util: Break recursion in pipe_resource_reference
It calling itself recursively prevented it from being inlined, resulting
in a copy being generated in every compilation unit referencing it. This
bloated the text segment of the Gallium mega-driver *_dri.so by ~4%,
and might also have impacted performance.

Fixes: ecd6fce261 ("mesa/st: support lowering multi-planar YUV")
v2:
* Add comment above pipe_resource_next_reference [Samuel Pitoiset]
v3:
* Use loop to unreference the full chain of resources referenced via
  the next members [Timothy Arceri]
v4:
* Stop chasing ->next chain at the first sub-resource which isn't
  destroyed [Nicolai Hähnle]

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-15 11:24:59 +09:00
Samuel Pitoiset
1c00af4264 mesa: fix 'make check' by moving bindless functions at the right place
Fixes: 5f249b9f05 ("mapi: add GL_ARB_bindless_texture entry points")
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-06-15 10:44:38 +09:00
Jason Ekstrand
1d132712fe i965/miptree: Use the new simple alloc_tiled for CCS buffers
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
21d83f54b3 i965/bufmgr: Add a new, simpler, bo_alloc_tiled
ISL already has all of the complexity required to figure out the correct
surface pitch and size taking tile alignment into account.  When we get
a surface out of ISL, the pitch and size are already correct and using
brw_bo_alloc_tiled_2d doesn't actually gain us anything other than extra
asserts we have to do in order to ensure that the bufmgr code and ISL
agree.  This new helper doesn't try to be smart but just allocates the
BO you ask for and sets up the tiling.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
6ee0530c35 i965/bufmgr: Rename bo_alloc_tiled to bo_alloc_tiled_2d
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
862493f7cb i965: Use blorp for depth/stencil clears on gen6+
Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
f762962f7f i965: Set step_rate = 0 for interleaved vertex buffers
Before, we weren't setting step rate so we got whatever old value
happened to be lying around.  This can lead to some interesting
rendering errors.  In particular, if you run the OpenGL ES CTS with
dEQP-GLES3.functional.instanced.types.mat2x4 immediately followed by one
of the dEQP-GLES3.functional.transform_feedback.* tests, the transform
feedback test gets stale instancing data from the other test and fails.
The only thing that is causing this to not be a problem today is that we
use meta for clears and meta is setting up vertex buffers via the VBO or
non-interleaved path and setting step_rate to 0 for us.  When blorp
depth/stencil clears are enabled, meta is no longer sitting between the
two tests and the stale data starts causing noticeable problems.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
b3569e7445 i965: Disable the interleaved vertex optimization when instancing
Instance divisor is a property of the vertex buffer and not the vertex
element so if we ever see anything other than 0, bail.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
7175561598 intel/blorp: Work around Sandy Bridge occlusion query issue
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
39a13c08dc i965/blorp: Set no_depth_or_stencil correctly
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
b14852997a i965: Remove some unneeded fields from brw_context
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
ea225d4da4 i965: Remove some of the remnants of meta
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
96f9d4de7d intel/isl: Properly set SeparateStencilBufferEnable on gen5-6
On gen5-6, SeparateStencilBufferEnable and HierarchicalDepthBufferEnable
come hand in hand and we have to set either both or neither.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
ee0e29dd02 i965/miptree: Choose the stencil layout in miptree_create_layout
This ensures that we get the correct layout for all stencil buffers, not
just those which are created as separate stencil for a depth buffer.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
6f6aa0f462 mesa: Add a BUFFER_BITS mask for depth+stencil
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
83ab6327c1 i965/blorp: Set aux_usage to NONE for miplevels without HiZ
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Aaron Watry
e4d06e4c53 radeon/winsys: Limit max allocation size to 70% of VRAM
The CL CTS queries the max allocation size, and then attempts to
allocate buffers of that size. If not enough contiguous RAM/VRAM is
available, this causes errors in the radeon kernel module due to
inability to allocate the required memory.

It's a bit of a hack, but experimentally on my system, I can use ~3/4
of the card's VRAM for a single global/constant buffer allocation given
current GUI/compositor use.

For a 1GB Pitcairn (HD7850) this gets me from the reported clinfo values of:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           1500153446 (1.397GiB)
Max constant buffer size                        1500153446 (1.397GiB)

To:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           751619276 (716MiB)
Max constant buffer size                        751619276 (716MiB)

Fixes: OpenCL CTS test/conformance/api/min_max_mem_alloc_size,
       OpenCL CTS test/conformance/api/min_max_constant_buffer_size

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 19:38:55 -05:00
Kenneth Graunke
b6d56c747c i965: Use a line end cap width of 0.5 unless smooth lines enabled.
This updates the Gen4-5 code to use a line end cap width of 0.5
for non-smooth lines, and 1.0 for smooth lines - which is what we
do on Gen6+.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
6563d5287b i965: Use brw_get_line_width() in Gen4-5 SF_STATE code.
This unifies the Gen4-5 and Gen6+ line width calculations.

I believe it also fixes a bug - we weren't rounding the line width
to the nearest integer.  The GL 4.5 (and GL 2.1) specs "Wide Lines"
section says:

"The actual width of non-antialiased lines is determined by rounding
 the supplied width to the nearest integer, then clamping it to the
 implementation-dependent maximum non-antialiased line width."

We don't need to care about _NEW_MULTISAMPLE here because multisampling
doesn't exist on Gen4-5, so the state shouldn't change.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
af373ea4a2 genxml: Fix Gen4-5 SF_STATE "Line Width" fixed point type.
It's a U3.1.  It became a U3.7 on Sandybridge.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
3793410369 i965: Stop using BRW_RASTRULE_LOWER_RIGHT on Gen4-5.
This effectively reverts Robert Ellison's 2009 commit
cc8afbd386.

I'm not seeing any GL spec text indicating that UPPER won't work.
On Gen6+, this bit moved to 3DSTATE_WM as a single bit, controlling
UPPER_LEFT vs. UPPER_RIGHT.  There is no way to request LOWER_RIGHT,
so UPPER_RIGHT is the best you can do.

In the G45 docs, it's marked as "Reserved" as well, but we just
decided to use it anyway.

This patch unifies the behavior between Gen4-5 and Gen6+.

Note that this is separate from point sprite texcoord behavior.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
6d4e031d9a i965: When gl_PointSize is unwritten, default to 1.0 on Gen4-5.
Modern GL specifications say that the point size should be 1.0 when
gl_PointSize is unwritten and the last enabled stage is a geometry
or tessellation shader.  If it's a vertex shader, though, both the
GL specs and ES 3.0 spec say that it's undefined - so since Gen4-5
only support vertex shaders, there's no actual requirement to do this.

Since there is a cost associated (an extra dirty bit, which may cause
SF_STATE to be emitted more often), it may not be a good idea.

The real benefit is that it makes all generations behave identically.
And that seems somewhat nice...

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
3d34e27522 i965: Make Gen4-5 SF_STATE use the point size calculations from Gen6+.
Apparently, Nanhai made the Gen4-5 point size calculations round to the
nearest integer in commit 8d5231a358,
"according to spec".  When Eric first ported the driver to Sandybridge,
he did not implement this rounding.

In the GL 2.1 and 3.0 specs "Basic Point Rasterization" section, it does
say "If antialiasing and point sprites are disabled, the actual width is
determined by rounding the supplied width to the nearest integer, then
clamping it to the implementation-dependent maximum non-antialised point
width."

In contrast, GL 3.1 and later do not appear to contain this rounding.

It might be reasonable to round, given that we only implement GL 2.1.
Of course, if we were to do that, we should actually implement the AA
vs. non-AA distinction.  Brian added an XXX comment reminding us to fix
this 10 years ago, but it never happened.

I think a better plan is to follow the newer, unrounded behavior.  This
is what we do on Gen6+ and it passes all the relevant conformance tests.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Jason Ekstrand
d9261275cc i965: Do an end-of-pipe sync after flushes
According to the docs, a simple CS stall is insufficient to ensure that
the memory from the flush is visible and an end-of-pipe sync is needed.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:42 -07:00
Jason Ekstrand
314ec7b46f i965/blorp: Do an end-of-pipe sync around CCS ops
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:40 -07:00
Jason Ekstrand
96e7b7ac54 i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:39 -07:00
Topi Pohjolainen
7b607aae3f i965: Add an end-of-pipe sync helper
v2 (Jason Ekstrand):
 - Take a flags parameter to control the flushes
 - Refactoring

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:22 -07:00
Jason Ekstrand
b771d9a136 i965: Unify the two emit_pipe_control functions
These two functions contain almost identical logic except for one SNB
workaround required for render target cache flushes.  They may as well
call into the same code so we only have to handle the work-arounds in
one place.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:21 -07:00
Jason Ekstrand
a8ea68bc93 i965: Take a uint64_t immediate in emit_pipe_control_write
It's a 64-bit value.  Splitting it up just makes the function arguments
awkward.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:19 -07:00
Jason Ekstrand
86da08367b i965: Flush around state base address
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:06 -07:00
Kenneth Graunke
244c2a5d2c i965: Print "force dual color blending" in FS recompile debug output.
I forgot to add this when introducing the new key field.  It doesn't
happen often - just with the Unigine workarounds.  But we may as well
have it, so we get an accurate picture of why recompiles happen.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-06-14 14:30:11 -07:00
Eric Le Bihan
2154defcd6 Fix khrplatform.h not installed if EGL is disabled.
KHR/khrplatform.h is required by the EGL, GLES and VG headers, but is
only installed if Mesa3d is compiled with EGL support.

This patch installs this header file unconditionally.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77240
Signed-off-by: Eric Le Bihan <eric.le.bihan.dev@free.fr>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-14 16:55:13 +01:00
Ville Syrjälä
c1eedb43f3 i915: Fix wpos_tex vs. -1 comparison
wpos_tex used to be a GLuint so assigning -1 to it and
later comparing with -1 worked correctly, but commit
c349031c27 ("i915: Fix texcoord vs. varying collision in
fragment programs") changed wpos_tex to uint8_t and hence
broke the comparison. To fix this define a more explicit
invalid value for wpos_tex.

gcc warns us:
i915_fragprog.c:1255:57: warning: comparison is always true due to limited range of data type [-Wtype-limits]
    if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
                                                         ^

And clang says:
i915_fragprog.c:1255:57: warning: comparison of constant -1 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
   if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
                                            ~~~~~~~~~~~ ^  ~~

Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: c349031c27 ("i915: Fix texcoord vs. varying collision in fragment programs")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-14 18:22:52 +03:00
Samuel Pitoiset
5f8b654b47 tgsi/scan: add missing 'static' to tgsi_is_bindless_image_file()
This should fix compilation errors in some situations.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101418
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 15:30:39 +02:00
Chuck Atkins
ad69b037b1 configure.ac: Reduce zlib requirement from 1.2.8 to 1.2.3.
Testing with zlib versions 1.2.{3,4,5,6,7,8} showed no difference in
functionality, correctness, or zlib API usage and 1.2.3 is the oldest
version available in still actively deployed production Linux
distributions (RHEL/CentOS 6 and SuSE 11).

Build 17.1.1 against the system supplied zlib-devel packages for 1.2.3
in EL6 and 1.2.7 on EL7. I then swapped out the zlib version at runtime
via LD_LIBRARY_PATH with ones build from the release tarballs from
zlib.net

Testwise - I ran the piglit shader profile with --quick addded to the
tests since I figured that would exercise the shader cache, which would
in turn use zlib.

Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
[Emil Velikov: add hunk about version/piglit testing]
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-14 12:03:22 +01:00
Samuel Pitoiset
65d1e4d1eb radeonsi: enable ARB_bindless_texture
This has only been tested on RX480.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
285ec4463b radeonsi: add support for loading bindless images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
950b5ffa31 radeonsi: add support for loading bindless samplers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
0c2834c5b2 radeonsi: invalidate buffers which are made resident if needed
When a buffer becomes resident, check if it has been invalidated,
if so update the descriptor and the dirty flag.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
811756dfd0 radeonsi: upload new descriptors when resident buffers are invalidated
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.

Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
48fe8a6210 radeonsi: only decompress resident textures/images when used
When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
2c3a7d5840 radeonsi: track use of bindless samplers/images from tgsi_shader_info
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
e1813a8635 radeonsi: decompress resident textures/images before graphics/compute
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
d7e1a66bb5 radeonsi: decompress DCC for resident textures/images
Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
a45e198e2d radeonsi: only add descriptors in presence of resident handles
This won't help much except for applications that use a ton
of resident handles. Though, this will reduce the winsys
overhead a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
333c8f65cf radeonsi: add all resident buffers to the current CS
Resident buffers have to be added to every new command stream.
Though, this could be slightly improved when current shaders
don't use any bindless textures/images but usually applications
tend to use bindless for almost every draw call, and the winsys
thread might help when buffers are added early.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
9cc328eef6 radeonsi: implement ARB_bindless_texture
This implements the Gallium interface. Decompression of resident
textures/images will follow in the next patches.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
77bbdcdfcd radeonsi: add a slab allocator for bindless descriptors
For each texture/image handles, we need to allocate a new
buffer for the bindless descriptor. But when the number of
buffers added to the current CS becomes high, the overhead
in the winsys (and in the kernel) is important.

To reduce this bottleneck, the idea is to suballocate the
bindless descriptors using a slab similar to the one used
in the winsys.

Currently, a buffer can hold 1024 bindless descriptors but
this limit is arbitrary and could be changed in the future
for some reasons. Once a slab is allocated the "base" buffer
is added to a per-context list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
86d7b7f01a radeonsi: add si_set_shader_image_desc() helper
To share some common code between bound and bindless images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
410b4ec06d radeonsi: add si_set_sampler_view_desc() helper
To share some common code between bound and bindless textures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
2ce04d7c1a radeonsi: add si_init_descriptor_list() helper
This will be used in order to initialize resident descriptors
for bindless textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
08ba871549 st/mesa: enable ARB_bindless_texture
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
b1288fad3c st/mesa: disable per-context seamless cubemap when using texture handles
The ARB_bindless_texture spec say:

   "If ARB_seamless_cubemap (or OpenGL 4.0, which includes it) is
    supported, the per-context seamless cubemap enable is ignored
    and treated as disabled when using texture handles."

   "If AMD_seamless_cubemap_per_texture is supported, the seamless
    cube map texture parameter of the underlying texture does apply
    when texture handles are used."

The per-context seamless cubemap flag should only be enabled for
bound textures/samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
76b8758253 st/mesa: make bindless samplers/images bound to units resident
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
66a2589d00 st/mesa: add infrastructure for storing bound texture/image handles
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
b6b915afa4 st/mesa: add st_create_{texture,image}_handle_from_unit() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
6f96bd7318 st/mesa: add st_convert_image_from_unit() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
32b4aa3499 st/mesa: make convert_sampler_from_unit() non-static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
9c1558d222 st/mesa: make update_single_texture() non-static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
3dd062ce2a st/mesa: implement ARB_bindless_texture
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
6447abf373 tgsi/scan: record bindless samplers/images usage
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
dd1ec664f5 st/glsl_to_tgsi: teach rename_temp_registers() about bindless samplers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
77cbded995 st/glsl_to_tgsi: teach the DCE pass about bindless samplers/images
When a texture (or an image) instruction uses a bindless sampler
(respectively a bindless image), make sure the DCE pass won't
remove code when the resource is a temporary variable.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
5d59226a7f st/glsl_to_tgsi: add support for bindless pack/unpack operations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
afafcbae30 st/glsl_to_tgsi: add support for bindless images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
d2f84d541e st/glsl_to_tgsi: add support for bindless samplers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
556f70b404 tgsi/ureg: accept TGSI_FILE_{CONSTANT,INPUT} for dst registers
For example, TGSI_OPCODE_STORE for bindless images might use
a constant buffer or a shader input.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
ed4fbb84d1 tc: add ARB_bindless_texture support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
e53e374b26 trace: add ARB_bindless_texture support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
02743d63cc ddebug: add ARB_bindless_texture support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
8a68b4de08 gallium: add ARB_bindless_texture interface
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
973822bcee gallium: add PIPE_CAP_BINDLESS_TEXTURE
Whether bindless texture operations are supported by the
underlying driver.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
990c8d15ac mesa: fix setting uniform variables for bindless samplers/images
This fixes a 64-bit vs 32-bit mismatch when setting an array
of bindless samplers. Also, we need to unconditionally set
size_mul to 2 when the underlying uniform is bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
fe9f7095e8 mesa: handle bindless uniforms bound to texture/image units
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
804e6f2b76 mesa: associate uniform storage to bindless samplers/images
When a bindless sampler/image is bound to a texture/image unit,
we have to overwrite the constant value by the resident handle
directly in the constant buffer before the next draw.

One solution is to keep track of a pointer to the data.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
878a6e6eed mesa: pass gl_program to _mesa_associate_uniform_storage()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
70f2573103 mesa: update textures for bindless samplers bound to texture units
This is analogous to the existing SamplerUnits and SamplerTargets,
but it loops over bindless samplers bound to texture units.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
9eaad42c58 mesa: add update_single_program_texture_state() helper
This will also be used for looping over bindless samplers bound
to texture units.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
bf60db5a4b mesa: add update_single_shader_texture_used() helper
This will also be used for looping over bindless samplers bound
to texture units.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
064d6263c5 glsl: add ir_variable::contains_bindless()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
31154f0975 glsl: set the explicit binding value for bindless samplers/images
This handles a situation like:

layout (bindless_sampler, binding = 7) uniform sampler2D;

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
e3c6fba5d6 glsl: pass the ir_variable object to set_opaque_binding()
In order to set the explicit binding value for bindless
samplers/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
3a087dd7a4 glsl: process uniform images declared bindless
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
9e756de7d1 glsl: process uniform samplers declared bindless
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
444b703a88 mesa: add infrastructure for bindless samplers/images bound to units
Yes, ARB_bindless_texture allows to do this. In other words, in
a situation like:

layout (bindless_sampler) uniform sampler2D tex;

The 'tex' sampler uniform can be either set with glUniform1()
(old-style bound samplers) or with glUniformHandleui() (resident
handles).

When glUniform1() is used, we have to somehow make the texture
resident "under the hood". This is done by requesting a texture
handle to the driver, making the handle resident in the current
context and overwriting the value directly in the constant buffer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
4fe2a6ba7a mesa: store bindless samplers as PROGRAM_UNIFORM
Old-style samplers (ie. bound samplers) are stored as
PROGRAM_SAMPLER, while bindless ones are PROGRAM_UNIFORM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
156bcbaca6 mesa: keep track of the current variable in add_uniform_to_shader
Bindless samplers are considered PROGRAM_UNIFORM but
add_uniform_to_shader::visit_field() is based on glsl_type.

Because only ir_variable knows if the uniform variable is
bindless via ir_variable::bindless, store it instead of
adding a new parameter to visit_field().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
41257fddc8 mesa: refuse to change tex buffers when a handle is allocated
The ARB_bindless_texture spec says:

   "The error INVALID_OPERATION is generated by BufferData if it is
    called to modify a buffer object bound to a buffer texture while
    that texture object is referenced by one or more texture handles."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
028a9b54c4 mesa: refuse to change textures when a handle is allocated
The ARB_bindless_texture spec says:

   "The error INVALID_OPERATION is generated by TexImage*, CopyTexImage*,
    CompressedTexImage*, TexBuffer*, TexParameter*, as well as other
    functions defined in terms of these, if the texture object to be
    modified is referenced by one or more texture or image handles."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
eb9c708ee2 mesa: refuse to update tex parameters when a handle is allocated
The ARB_bindless_texture spec says:

   "The ARB_bindless_texture spec says: "The error INVALID_OPERATION
    is generated by TexImage*, CopyTexImage*, CompressedTexImage*,
    TexBuffer*, TexParameter*, as well as other functions defined in
    terms of these, if the texture object to be modified is referenced
    by one or more texture or image handles."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
67ab372c60 mesa: refuse to update sampler parameters when a handle is allocated
The ARB_bindless_texture spec says:

   "The error INVALID_OPERATION is generated by SamplerParameter* if
    <sampler> identifies a sampler object referenced by one or more
    texture handles."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
326a82a255 mesa: add support for glUniformHandleui64*ARB()
Bindless sampler/image handles are represented using 64-bit
unsigned integers.

The ARB_bindless_texture spec says:

   "The error INVALID_OPERATION is generated by UniformHandleui64{v}ARB
   if the sampler or image uniform being updated has the "bound_sampler"
   or "bound_image" layout qualifier"."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
afb141156f mesa: add support for unsigned 64-bit vertex attributes
This adds support in the VBO and array code to handle unsigned
64-bit vertex attributes as specified by ARB_bindless_texture.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:35 +02:00
Samuel Pitoiset
1fe7b1f972 mesa: implement ARB_bindless_texture
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:35 +02:00
Samuel Pitoiset
6649b840c3 mesa/util: add a hash table wrapper which support 64-bit keys
Needed for bindless handles which are represented using
64-bit unsigned integers. All hash table implementations should
be uniformized later on.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:35 +02:00
Samuel Pitoiset
eeb34af5be mesa: move some hash declarations to hash.h
These will be used by the bindless hash tables to initialize
the default deleted key value.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-14 10:04:35 +02:00
Samuel Pitoiset
30471eb745 mesa/util: add new util_dynarray_delete_unordered helper
This helper function will be used for managing dynamic arrays of
resident texture/image handles.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:35 +02:00
Samuel Pitoiset
5f249b9f05 mapi: add GL_ARB_bindless_texture entry points
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:35 +02:00
Aaron Watry
d364ab4a61 clover/device: Get device/host unified memory from pipe driver
clinfo no longer reports my discrete GCN card as unified memory

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-13 21:26:09 -05:00
Henri Verbeet
1307ed430a gallium/radeon: Include the family name in the renderer string if it's not equal to the marketing name.
The "family" name is often more informative than the "marketing" name. More
importantly, applications, like for example Wine, may recognise GPUs based on
the existing "family" names.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
2017-06-13 19:23:18 +02:00
Brian Paul
def8d1d23f gallium/docs: clarify TGSI_SEMANTIC_SAMPLEMASK, again
I've since discovered the fragment shader sample mask system value (which
corresponds to gl_SampleMaskIn).

v2: It's a system value, not a shader input.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-13 08:02:43 -06:00
Brian Paul
26500c3fad st/mesa: unmap the stream_uploader buffer before drawing
Some drivers require that the vertex buffers be unmapped prior to
drawing.  This change unmaps the stream_uploader buffer after we've
uploaded the zero-stride attributes (unless the driver supports
rendering with mapped buffers).

This fixes a regression in the VMware driver since 17f776c27b.
Some Mesa demos such as mandelbrot and brick would display black
quads instead of the expected rendering.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-13 07:52:54 -06:00
Brian Paul
e5eb9b4363 gallium/util: whitespace, formatting fixes in u_upload_mgr.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-13 07:52:54 -06:00
Eric Engestrom
40a8385d8b egl: improve dri2_fallback_swap_buffers_with_damage()
Let's (try to) set damages before swapping buffers.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-06-13 12:12:50 +01:00
Jose Fonseca
f2d71df0ca softpipe: Match pipe_context::render_condition prototype.
To silence compiler warnings.  Trivial.
2017-06-13 11:53:16 +01:00
Jose Fonseca
e1d4c966dc llvmpipe: Match pipe_context::render_condition prototype.
To silence compiler warnings.  Trivial.
2017-06-13 11:53:07 +01:00
Samuel Pitoiset
6d8a387f78 st_glsl_to_tgsi: init index to 0 before get_deref_offsets()
Fixes: 8ec4975cd8 ("st_glsl_to_tgsi: don't try and pass 32-bit values to get_deref_offsets")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101401
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-06-13 17:36:29 +09:00
Nicolai Hähnle
8dddb9788a glsl: simplify an assertion in lower_ubo_reference
Struct types are now equal when they're structurally equal.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:53 +02:00
Nicolai Hähnle
de32c8378c glsl: simplify validate_intrastage_arrays
Struct types are now equal when they are structurally equal.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:50 +02:00
Nicolai Hähnle
d21a35d63c glsl: simplify varying matching
Unnamed struct types are now equal if they have the same field.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:48 +02:00
Nicolai Hähnle
f97c92ae11 glsl: remove redundant record_compare check when linking globals
Unnamed struct types are now equal across stages based on the fields they
contain, so overriding the type to make sure names match has become
unnecessary.

The check was originally introduced in commit 955c93dc08 ("glsl: Match
unnamed record types across stages.")

v2: clarify the commit message

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:45 +02:00
Nicolai Hähnle
835b1435f2 glsl: stop considering unnamed and named structures equal
Previously, if an unnamed and a named struct contained the same fields,
they were considered the same type during linking of globals.

The discussion around commit e018ea81bf ("glsl: Structures must have
same name to be considered same type.") doesn't seem to have considered
this thoroughly, and I see no evidence that an unnamed struct should
ever be considered to be the same type as a named struct.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:40 +02:00
Nicolai Hähnle
77ea2ada5a glsl: give all unnamed structs the same name
As a result, unnamed structs defined in different places of the program
are considered the same types if they have the same fields in the same
order.

This will simplify matching of global variables whose type is an unnamed
struct.

It also fixes a memory leak when the same shader containing unnamed
structs is compiled over and over again: instead of creating a new type
each time, the existing type is re-used.

Finally, this does have the effect that some previously rejected programs
are now accepted, such as:

   struct {
      float a;
   } s1;

   struct {
      float a;
   } s2;

   s2 = s1;

C/C++ do not allow that, but GLSL does seem to want to treat unnamed
structs with the same fields as the same type at least during linking
(and apparently, some applications require it), so it seems odd to treat
them as different types elsewhere.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:36 +02:00
Nicolai Hähnle
597b2486b8 glsl: do not add unnamed struct types to the symbol table
We removed the need for lookups, and we will assign them all the same
name in the future.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:32 +02:00
Nicolai Hähnle
0cb1f25d86 glsl: do not lookup struct types by typename
This changes the logic during the conversion of the declaration list

   struct S {
      ...
   } v;

from AST to IR, but should not change the end result.

When assigning the type of v, instead of looking `S' up in the symbol
table, we read the type from the member variable of ast_struct_specifier.

This change is necessary for the subsequent change to how anonymous types
are handled.

v2: remove a type override when redefining a structure; should be
    the same type in that case anyway

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:29 +02:00
Nicolai Hähnle
d6ec0aa7ed glsl: fix a race condition when inserting new types
By splitting glsl_type::mutex into two, we can avoid dropping the hash
mutex while creating the new type instance (e.g. struct/record,
interface).

This fixes a time-of-check/time-of-use race where two threads would
simultaneously attempt to create the same type but end up with different
instances of glsl_type.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-13 09:35:10 +02:00
Timothy Arceri
2e28e8b199 st/mesa: skip texture validation logic when nothing has changed
Based on the same logic in the i965 driver 2f225f6145 and
16060c5adc.

perf reports st_finalize_texture() going from 0.60% -> 0.16% with
this change when running the Xonotic benchmark from PTS.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-13 11:24:32 +10:00
Dave Airlie
95c0591087 ac/gpu: drop duplicated code line.
has_hw_decode is assigned twice.

Pointed out by coverity.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-13 10:01:40 +10:00
Dave Airlie
9cce302951 radv: move assert down in radv_bind_descriptor_set
coverity complains about the deref before NULL check.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-13 10:01:36 +10:00
Dave Airlie
b9e76b0c44 radv: return correct error on invalid handle from vkAllocateMemory
Coverity pointed out this was returning uninitialised.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-13 09:30:19 +10:00
Dave Airlie
8ec4975cd8 st_glsl_to_tgsi: don't try and pass 32-bit values to get_deref_offsets
Just use a temporary 16-bit index.

This fixes coverity issue, pointed to me by Ilia.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-13 09:29:54 +10:00
Dave Airlie
ca69b5e78c u_dynarray: fix coverity warning about ignoring return value from reralloc
>>>     Ignoring storage allocated by "reralloc_size(buf->mem_ctx, buf->data, buf->size)" leaks it.

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-13 06:40:25 +10:00
Dave Airlie
53587b7105 glsl/lower_distance: only set max_array_access for 1D clip dist arrays
The max_array_access field applies to the first dimension, which means
we only want to set it for the 1D clip dist arrays.

This fixes an ir_validate assert seen with
KHR-GL44.cull_distance.functional
on nouveau and radeon with debug builds.

Fixes: a08c4ebbe (glsl: rewrite clip/cull distance lowering pass)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-12 20:37:06 +01:00
Lionel Landwerlin
1c5d4c9d74 i965: fix missing break
Pretty obvious missing break statement.

CID: 1412564
Fixes: 641405f797 "i965: Use the new tracking mechanism for HiZ"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed by: Elie Tournier <elie.tournier@collabora.com>
2017-06-12 20:30:19 +01:00
Marek Olšák
4951b0adbd radeonsi: pack si_context better
there isn't much to gain here

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
6d43d352cc radeonsi: pack si_framebuffer better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
ca815f1ead radeonsi: pack si_sampler_view better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
29bf2530d8 radeonsi: pack si_buffer_resources better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
cf5ce61148 radeonsi: pack struct si_descriptors better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
217114dd73 radeonsi: pack struct si_vertex_elements better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
e80a056ff9 radeonsi: replace si_vertex_elements::elements with separate fields
It makes si_vertex_elements a little smaller.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
c8b6f42e25 radeonsi: rename si_vertex_element -> si_vertex_elements
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
7be6186e0c radeonsi: allocate si_state_rasterizer::pm4_poly_offset only when needed
Each element has over 700 bytes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
a828f5d783 radeonsi: pack si_state_rasterizer fields
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
6b6fed3a3c radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy
The previous patch helps with this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
7b2240ac9c radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs
the next patch will benefit from this

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
1621b33d73 radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
30882ba0dd radeonsi: don't emit DB_STENCIL_CONTROL if it has no effect
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
6743dc01fd radeonsi: fix missing num_L2_invalidates increment
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
c503381864 radeonsi: get rid of more compressed_colortex_mask names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
3d8259194d gallium/noop: fix sampler views
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
7448342a1f gallium/docs: clarify gen_name/get_vendor/get_device_vendor behavior
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
0d62e8a727 st/mesa: call check_program_state only when needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
9a22c85618 r600g: set pipe_context::priv = NULL
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101254

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák
e8be83f7f8 vl,omx,va,vdpau,xvmc: don't set the priv pointer in context_create
Unused and radeonsi ignores it anyway.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Juan A. Suarez Romero
621a784529 r600/eg: distribute egd_tables.py in the dist file
Otherwise, `make distcheck` will fail.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 11:35:24 +02:00
Juan A. Suarez Romero
4152edbcde i965: include gen4_blorp_exec.h into EXTRA_DIST
Otherwise, `make distcheck` will fail.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-12 10:32:18 +02:00
Kenneth Graunke
b7153c3e9f i965: Call intel_prepare_render() from intel_update_state()
The resolve code looks at the current color draw buffers.  These are not
valid until intel_prepare_render() is called.  You can end up with one
color buffer bound, but where the renderbuffer has zero width/height and
no miptree allocated.

You can get a call chain like: _mesa_Clear -> _mesa_update_state ->
intel_update_state, where no brw driver hooks were called, so there is
no other point at which we could have called this.

Fixes crashes in KWin where Clear was causing intel_disable_rb_aux_buffer
to crash on irb != NULL but irb->mt == NULL.

According to Tapani, this also fixes crashes seen on Android.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2017-06-12 01:10:36 -07:00
Grazvydas Ignotas
fae3b13905 radv: fix trace dumping for !use_ib_bos
Fixes trace dumping crash for SI or when RADV_DEBUG=noibs is set.

Fixes: 97dfff5410 "radv: Dump command buffer on hang."
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-11 23:07:09 +03:00
Grazvydas Ignotas
f56aa25ac5 radv: don't even attempt to prefetch on SI
Before bcae327469 this was emitting CP DMA packet even on SI, but
apparently hasn't caused too many problems. After that commit the
CP DMA code now always sets the CIK+ only bit for prefetch. Just
follow radeonsi there and don't try to prefetch at all.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101334
Fixes: bcae327469 "radv: realign cp dma code with radeonsi"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-11 14:28:40 +03:00
Grazvydas Ignotas
f490200973 radv: assert on CP_DMA_USE_L2 for SI
The register header (and radeonsi comment) states V_411_SRC_ADDR_TC_L2
is for CIK+ only, so let's assert on earlier ASICs.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-11 14:28:08 +03:00
Harish Krupo
9827547313 egl/android: support for EGL_KHR_partial_update
This patch adds support for the EGL_KHR_partial_update extension for
android platform. It passes 36/37 tests in dEQP for EGL_KHR_partial_update.
1 test not supported.

v2: add fallback for eglSetDamageRegionKHR (Tapani)

v3: The native_window_set_surface_damage call is available only from
    Android version 6.0. Reintroduce the ANDROID_VERSION guard and
    advertise extension only if version is >= 6.0. (Emil Velikov)

v4: use newly introduced ANDROID_API_LEVEL guard rather than
    ANDROID_VERSION guard to advertise the extension.The extension
    is advertised only if ANDROID_API_LEVEL >= 23 (Android 6.0 or
    greater). Add fallback function for platforms other than Android.
    Fix possible math overflow. (Emil Velikov)
    Return immediately when n_rects is 0. Place function's entrypoint
    in alphabetical order. (Eric Engestrom)

v5: Replace unnecessary calloc with malloc (Eric)
    Check for BAD_ALLOC error (Emil)
    Check for error in native_window_set_damage_region. (Emil, Tapani,
    Eric).

Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-06-11 01:02:09 +01:00
Marius Gräfe
f3c0bbe18a gallium: fixed modulo zero crashes in tgsi interpreter (v2)
softpipe throws integer division by zero exceptions on windows
when using % with integers in a geometry shader.

v2: Made error results consistent with existing div/mod zero handling in
    tgsi. 64 bit signed integer division by zero returns zero like in
    micro_idiv, unsigned returns ~0u like in micro_udiv.
    Modulo operations always set all result bits to one (like in
    micro_umod).

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-06-10 16:40:13 +02:00
Grazvydas Ignotas
29b9f35704 nir: make various getters take const pointers
This will allow to constify other things.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-10 16:48:45 +03:00
Ben Widawsky
e179a3438a i965/cnl: Add a preliminary device for Cannonlake
v2 (Anuj):
Rebased on master and updated pci ids
Remove redundant initialization of max_wm_threads to 64 * 12.
For gen9+ max_wm_threads are initialized in gen_get_device_info().

v3 (Anuj):
Move the patch to end of series.
Remove unused gt1, gt2, gt3 functions.
Remove l3_banks variable. Variable is now available on master.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-09 16:03:00 -07:00
Jason Ekstrand
f2cbf738b4 anv: Don't advertise support on anything above gen9
This will prevent the driver from even trying to work on Cannon Lake
until we get actual support added.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-09 16:03:00 -07:00
Anuj Phogat
9acc93feeb i965/cnl: Enable CCS_E and RT support for few formats
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat
61f171292e i965/cnl: Reformat surface_format_info table to accomodate gen10+
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat
f9e31a26d4 i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3
v1: By Ben Widawsky <benjamin.widawsky@intel.com>
v2: v1 had an assert only for VS. Add the restriction for GS, HS and
    DS as well and make sure the allocated sizes are not multiple of 3.
v3: Move the entry_size checks in to compiler code (Ken)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-09 16:02:59 -07:00
Anuj Phogat
b76659997e i965/cnl: Don't resolve single sampled color rb in case of sRGB formats
As sRGB now supports lossless compression, we also need to stop resolving
single sampled color render buffers for sRGB formats in Gen 10.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Ben Widawsky
640f5d3957 i965/cnl: Implement depth count workaround
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat
8c43e33560 i965/cnl: Start using CNL MOCS defines
CNL MOCS defines are duplicates of SKL MOCS defines.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat
111881abac i965/cnl: Handle gen10 in switch cases across the driver
V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec()
    gen10_init_atoms() (Jason)
    Remove Vulkan changes. Do them later in a separate patch.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat
30e749c8f1 i965/cnl: Update few assertions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat
56b4d82729 i965/cnl: Add cnl bits in aubinator
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat
dd6c27ace1 i965/cnl: Add pci id for INTEL_DEVID_OVERRIDE
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat
dc83ce7a16 i965/cnl: Wire up android Mesa build files for gen10
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-06-09 16:02:58 -07:00
Anuj Phogat
e01c5a6824 i965/cnl: Wire up Mesa build files for gen10
V2: Remove isl_gen10.c and isl_gen10.h

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-09 16:02:58 -07:00
Anuj Phogat
2417d5ca19 intel/genxml: Update genx_bits for gen10+
This commit adds a gen10 case to the switch statement and
drops some unneeded code for handling gen numbers which
doesn't work on gen10 and above.

V2: Drop "z = float(z)" and the "z *= 10" lines

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat
98b95a3735 i965/cnl: Add gen10 specific function declarations
These declarations will help the code start compiling
once we wire up the makefiles for gen10. Later patches
will start using these functions for gen10.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat
2704ccc646 i965/cnl: Include gen10_pack.h
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat
a48cb9cf7f i965/cnl: Define genX(x) and GENX(x) for gen10
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Jason Ekstrand
aa416f515a i965/genxml: Add gen10.xml
V2(Anuj):
Add default value for length of 3DPRIMITIVE command
Add values for 'Attribute Active Component Format'
Rename few fields to match gen9.xml

V3 (Ander Conselvan de Oliveira)
Add gen10 alias for MOCS
Make 3DSTATE_CONSTANT_BODY on Gen10 use arrays

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-09 16:00:49 -07:00
Ben Widawsky
d968f072bc i965: Make feature macros gen8 based
All the "features" of the hardware are similar starting with GEN8, so remove as
much of the GEN9 uniqueness as possible. This makes implementing future gen
platforms a bit easier.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 15:27:14 -07:00
Dave Airlie
51553c0bea radv: set fmask state to all 0s when no fmask. (v2)
The shader reads the descriptor to decide if it should take the
fmask value, however we weren't initing it always, which meant
random crap, esp with MSAA depth textures.

Fixes random hangs with:
dEQP-VK.glsl.builtin_var.fragdepth.*

v2: check fmask_state is not NULL

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-09 20:41:55 +01:00
Matt Turner
71651b3139 i965: Temporarily disable async mappings on non-LLC
Fixes regressions from commits e0a9b261e5 and a16355d67d by
neutering async mappings on non-LLC to be synchronous, like they were
before those two commits. :(

The failing tests include

piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_index_only
piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_combined_vertex_and_index
piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_separate_vertex_and_index
piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_vertex_only
piglit-test piglit.spec.arb_pixel_buffer_object.texsubimage-unpack pbo
2017-06-09 12:14:28 -07:00
Rafael Antognolli
d42fc65bb3 mesa/main/debug: Check if we successfully reopened the ppm file.
Since we created the file, we should be able to reopen it for appending, but
some weird filesystem error could cause that to be false. So simply check
whether we could reopen it or not.

CID: 1177144
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-09 10:21:16 -07:00
Brian Paul
81e15a5dea tgsi: clarify TGSI_SEMANTIC_SAMPLEMASK documentation
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 08:51:56 -06:00
Frank Richter
0ef39e588f gallium/wgl: Allow context creation even if SetPixelFormat() wasn't called
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101326
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-09 08:51:45 -06:00
Varad Gautam
f84bb6a9d9 st/dri: support format modifier queries
ask the driver for supported modifiers for a given format.

v2: move to __DRIimageExtension v16.
v3: fail if the supplied format is not supported by driver.
v4: purge PIPE_CAP_QUERY_DMABUF_ATTRIBS.
v5:
- move to __DRIimageExtension v15, pass external_only to the driver.

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v4)
Cc: Lucas Stach <l.stach@pengutronix.de>
2017-06-09 14:12:37 +01:00
Varad Gautam
e0965a2c8e gallium: introduce format modifier querying
format modifiers tokens are driver specific, and hence, need to come
in from the driver. this allows drivers to be queried for supported
format modifiers for EGL_EXT_image_dma_buf_import_modifiers.

v2: rebase to master.
v3: drivers must return false on query failure.
v4: use pscreen->is_format_supported instead of adding a separate
    format query handle, remove PIPE_CAP_QUERY_DMABUF_ATTRIBS.
    (Lucas Stach)
v5: add external_only parameter.

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-09 14:12:37 +01:00
Varad Gautam
cf748242d1 st/dri: support format queries
ask the driver for supported dmabuf formats

v2: rebase to master.
v3: return false on failure.
v4: use pscreen->is_format_supported instead of adding a new query.
    (Lucas Stach)
v5: stylefix to conform to formatting rules (Brian Paul). add fourcc list
    here instead of using struct image_format from v4.

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v4)
Cc: Lucas Stach <l.stach@pengutronix.de>
2017-06-09 14:12:37 +01:00
Varad Gautam
82b3d1fa9a st/dri: implement DRIimage creation from dmabufs with modifiers
support importing dmabufs into DRIimage while taking format modifiers
in account, as per DRIimage extension version 15.

v2: initialize winsys modifier to DRM_FORMAT_MOD_INVALID (Daniel Stone)
v3: do not bump DRIimageExtension version. split out winsys changes.

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-09 14:12:37 +01:00
Varad Gautam
f61a8ba168 st/dri: implement createImageWithModifiers in DRIimage
adds a pscreen->resource_create_with_modifiers() to create textures
with modifier.

v2:
- stylefixes (Emil Velikov)
- don't return selected modifier from resource_create_with_modifiers. we can
  use the winsys_handle to get this.

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v1)
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-09 14:12:37 +01:00
Varad Gautam
d33fe8b84e st/dri: enable DRIimage modifier queries
return the modifier selected by the driver when creating this image.

v2: since we can use winsys_handle->modifier to serve these, remove
    DRIimage->modifier from v1.
    use DRM_API_HANDLE_TYPE_KMS instead of DRM_API_HANDLE_TYPE_FD to avoid
    ownership transfer. (Lucas)

Suggested-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-09 14:12:37 +01:00
Varad Gautam
3f8513172f gallium/winsys/drm: introduce modifier field to winsys_handle
we use this to import resources with format modifiers, and to support
per-resource modifier queries.

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-09 14:12:37 +01:00
Samuel Pitoiset
cde963ec35 mesa: make use of NewScissorTest driver flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:33:28 +02:00
Samuel Pitoiset
328191f26c mesa: make use of NewScissorRect driver flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:33:26 +02:00
Samuel Pitoiset
4c037af9cc mesa: add gl_driver_flags::NewScissor{Rect,Test}
_NEW_SCISSOR mesa flag is set when a scissor test is enabled/disabled
or when a new rectangle is defined. However, it triggers too much
changes in the state tracker.

Actually, ST_NEW_RASTERIZER should only be called when a scissor
test is enabled/disabled, while ST_NEW_SCISSOR should be called
in both situations.

In other words, this will avoid to update the rasterizer every
time a new rectangle is defined using glScissor*().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:33:22 +02:00
Tapani Pälli
8fac894f9b egl: fix _eglQuerySurface in EGL_BUFFER_AGE_EXT case
Specification states that in case of error, value should not be
written, patch changes buffer age queries to return -1 in case of
error so that we can skip changing the value.

In addition, small change to droid_query_buffer_age to return 0
in case buffer does not have a back buffer available.

Fixes:
   dEQP-EGL.functional.negative_partial_update.not_postable_surface

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2017-06-09 07:39:22 +03:00
Dave Airlie
c2464271a0 radv: introduce perf test env var and allow to enable chaining
We have some features that seem to slow things down or cause other
possible undesireable side effects, but it would be nice to test
games etc with them easily.

I forsee multisample DCC and maybe some shader opt changes using this.

For now use it for batch chaining.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-09 02:15:25 +01:00
Timothy Arceri
d0a26edc25 mesa: add KHR_no_error support to glDrawRangeElements*()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-09 09:58:07 +10:00
Timothy Arceri
87cb44d9b0 mesa: rework _ae_invalidate_state() so that it just sets a dirty flag
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:13:46 +10:00
Timothy Arceri
b57bc7473b mesa: remove redundant _ae_invalidate_state() call
The FLUSH_VERTICES(ctx, _NEW_ARRAY) above this will already cause
this to be called.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:13:46 +10:00
Timothy Arceri
bc70bad59b mesa: inline vbo_exec_invalidate_state() and call from mesa core
Rather than calling it indirectly in each driver.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:13:46 +10:00
Timothy Arceri
99987fe92e mesa: rework vbo_exec_init()
Here we make some assumptions about the AEcontext and set the
recalculate bools directly.

Some formating fixes are also made while we are here.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:13:46 +10:00
Timothy Arceri
f77740f14b mesa: stop passing state bitfield to UpdateState()
The code comment which seems to have been added in cab974cf6c
(from year 2000) says:

   "Set ctx->NewState to zero to avoid recursion if
   Driver.UpdateState() has to call FLUSH_VERTICES().  (fixed?)"

As far as I can tell nothing in any of the UpdateState() calls
should cause it to be called recursively.

V2: add a wrapper around the osmesa update function so it can still
    be used internally.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-09 09:13:46 +10:00
Timothy Arceri
f627ac6e35 st/mesa: add st_invalidate_buffers() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-09 09:13:46 +10:00
Timothy Arceri
df27aba422 r200/radeon: stop calling _ae_invalidate_state() directly
It is already called via _vbo_InvalidateState().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-09 09:13:46 +10:00
Tim Rowley
0b80b02502 swr: relax c++ requirement from c++14 to c++11
Remove c++14 generic lambda to keep compiler requirement at c++11.

No regressions on piglit or vtk test suites.

Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

CC: mesa-stable@lists.freedesktop.org
2017-06-08 18:07:52 -05:00
Juan A. Suarez Romero
a625d58ee1 radeonsi: call LLVMAddEarlyCSEMemSSAPass only for LLVM >= 4.0
LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0.

Fixes: 257b538 ("radeonsi: do EarlyCSEMemSSA LLVM pass)

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-06-08 23:32:32 +02:00
Marek Olšák
6940361796 gallium/radeon: don't allocate HTILE in a separate buffer
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák
c6451b1209 radeonsi: rename depth decompress functions
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák
d8a577d96e radeonsi: rename shader resource decompress masks to their true meaning
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák
da26de5ff7 radeonsi: rename is_compressed_colortex -> color_needs_decompression
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák
391673af7a radeonsi: disable the patch ID workaround on SI when the patch ID isn't used (v2)
The workaround causes a massive performance decrease on 1-SE parts.
(Cape Verde, Hainan, Oland)

The performance regression is already part of 17.0 and 17.1.

v2: check tess_uses_prim_id

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák
4b8d0c2b1d radeonsi: don't update dependent states if it has no effect (v2)
This and the previous clip_regs commit decrease IB sizes and the number of
si_update_shaders invocations as follows:

                 IB size   si_update_shaders calls
Borderlands 2      -10%            -27%
Deus Ex: MD         -5%            -11%
Talos Principle     -8%            -30%

v2: always dirty cb_render_state in set_framebuffer_state

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Varad Gautam
f804e0672e i965: Add format/modifier advertising
v2: Rebase and reuse tiling/modifier map. (Daniel Stone)
v3: bump DRIimageExtension to version 15, fill external_only array.
v4: Y-tiling works since gen 6

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-08 22:27:30 +01:00
Varad Gautam
c303772e5b i965: Support dmabuf import with modifiers
Add support for createImageFromDmaBufs2, adding a modifier to the
original, and allow importing CCS resources with auxiliary data from
dmabufs.

v2: avoid DRIimageExtension version bump, pass single modifier to
    createImageFromDmaBufs2.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-08 22:27:30 +01:00
Daniel Stone
f58e6358bf i965: Improve same-buffer restriction for imports
Intel hardware requires that all planes of an image come from the same
buffer, which is currently implemented by testing that all FDs are
numerically the same.

However, when going through a winsys (e.g.) or anything which transits
FDs individually, the FDs may be different even if the underlying buffer
is the same.

Instead of checking the FDs for equality, we must check if they actually
point to the same buffer (Jason).

Reviewed-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-08 22:27:30 +01:00
Ben Widawsky
37cdcaf386 i965: Allocate tile aligned height
This patch shouldn't actually do anything because the libdrm function
should already do this alignment. However, it preps us for a future
patch where we add in the CCS AUX size, and in the process it serves as
a good place to find bisectable issues if libdrm or kernel does
something incorrectly.

v2: Do proper alignment for X tiling, and make sure non-tiled case is
handled (Jason)
v3: Rebase (Daniel)

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-08 22:27:30 +01:00
Daniel Stone
78703881ff i965: Move fallback size assignment out of bufmgr
The bufmgr took a mandatory size argument, which would only be used if
the kernel size query failed, i.e. an older kernel. It didn't actually
check that the BO size was sufficient for use.

Pull the check out of the bufmgr, and actually check that the BO is
sufficiently-sized for our import one level up. This also resolves a
chicken/egg we have when importing bufers without explicit modifiers,
namely that we need the tiling mode to calculate the size, but we need
the BO imported to query the tiling mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-08 22:27:30 +01:00
Daniel Stone
6b18d4aaec i965: Invert image modifier/tiling inference
When allocating images, we record a tiling mode and then work backwards
to infer the modifier. Unfortunately this is the wrong way around, since
it is a one:many mapping (e.g. TILING_Y can be plain Y-tiling, or
Y-tiling with CCS).

Invert the mapping, so we record a modifier first and then map this to a
tiling mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-08 22:27:30 +01:00
Daniel Stone
11e549ae3f egl/dri2: Avoid sign extension when building modifier
Since the EGL attributes are signed integers, a straight OR would
also perform sign extension,

Fixes: 6f10e7c37a ("egl/dri2: Create EGLImages with dmabuf modifiers")
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-08 22:27:30 +01:00
Vinson Lee
142536a0e3 i915g: Add blitter_context argument.
Fix build error.

  CC       i915_surface.lo
i915_surface.c:108:63: error: too few arguments to function call, expected 4, have 3
   util_blitter_default_src_texture(&src_templ, src, src_level);
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                           ^
../../../../src/gallium/auxiliary/util/u_blitter.h:271:1: note: 'util_blitter_default_src_texture' declared here
void util_blitter_default_src_texture(struct blitter_context *blitter,
^

Fixes: a893c91697 ("gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101340
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-08 13:47:39 -07:00
Lucas Stach
978e6876f1 etnaviv: flush resource when binding as sampler view
As TS is also allowed on sampler resources, we need to make sure to resolve
to self when binding the resource as a texture, to avoid stale content
being sampled.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-08 18:29:36 +02:00
Lucas Stach
f25390afa4 etnaviv: don't flush resource to self without TS
A resolve to self is only necessary if the resource is fast cleared, so
there is never a need to do so if there is no TS allocated.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-08 18:29:36 +02:00
Lucas Stach
0f888ad4be etnaviv: upgrade DISCARD_RANGE to DISCARD_WHOLE_RESOURCE if possible
Stolen from VC4. As we don't do any fancy reallocation tricks yet, it's
possible to upgrade also coherent mappings and shared resources.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-08 18:29:36 +02:00
Lucas Stach
d4e6de9e38 etnaviv: simplify transfer tiling handling
There is no need to special case compressed resources, as they are already
marked as linear on allocation. With that out of the way, there is room to
cut down on the number of if clauses used.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-08 18:29:36 +02:00
Lucas Stach
6e628ee3f3 etnaviv: don't read back resource if transfer discards contents
Reduces bandwidth usage of transfers which discard the buffer contents,
as well as skipping unnecessary command stream flushes and CPU/GPU
synchronization.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-08 18:29:36 +02:00
Lucas Stach
c3b2c7a75f etnaviv: honor PIPE_TRANSFER_UNSYNCHRONIZED flag
This gets rid of quite a bit of CPU/GPU sync on frequent vertex buffer
uploads and I haven't seen any of the issues mentioned in the comment,
so this one seems stale.

Ignore the flag if there exists a temporary resource, as those ones are
never busy.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-08 18:29:36 +02:00
Lucas Stach
a276c32a08 etnaviv: slim down resource waiting
cpu_prep() already does all the required waiting, so the only thing that
needs to be done is flushing the commandstream, if a GPU write is pending.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-08 18:29:36 +02:00
Rob Herring
ada3c3aa3d glsl: Fix gl_shader_stage enum unsigned comparison
Replace -1 with MESA_SHADER_NONE enum value to fix sign related warning:

external/mesa3d/src/compiler/glsl/link_varyings.cpp:1415:25: warning: comparison of constant -1 with expression of type 'gl_shader_stage' is always true [-Wtautological-constant-out-of-range-compare]
        (consumer_stage != -1 && consumer_stage != MESA_SHADER_FRAGMENT))) {
         ~~~~~~~~~~~~~~ ^  ~~

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-06-08 07:26:04 -05:00
Rob Herring
6150ea794b Android: vulkan: fix build error due to extra )
Commit 621b3410f5 ("util/vulkan: Move Vulkan utilities to
src/vulkan/util") broke the Android build with the following error:

build/core/binary.mk:1427: error: external/mesa3d/src/vulkan/Android.mk: libmesa_vulkan_util: Unused source files: util/vk_util.h).

Fixes: 621b3410f5 ("util/vulkan: Move Vulkan utilities to src/vulkan/util")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-06-08 07:26:04 -05:00
Iago Toral Quiroga
ce53e8e61b Fix glcpp test expectations
With commit f7741985be we have changed some preprocessor
error messages and warnings. Adapt related glcpp tests
expectations accordingly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101336
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-08 09:46:36 +02:00
Vlad Golovkin
f4df2a196e util: make set's deleted_key_value declaration consistent with hash table one
This also silences following clang warnings:
no previous extern declaration for non-static variable 'deleted_key' [-Werror,-Wmissing-variable-declarations]
const void *deleted_key = &deleted_key_value;
            ^
no previous extern declaration for non-static variable 'deleted_key_value'
      [-Werror,-Wmissing-variable-declarations]
uint32_t deleted_key_value;
         ^

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 09:26:44 +02:00
Jason Ekstrand
f1ba51b940 i965: Delete intel_resolve_map
Now that we've moved over to the new array mechanism, it's no longer
needed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
641405f797 i965: Use the new tracking mechanism for HiZ
This is similar to the previous commit only for HiZ.  For HiZ, apart
from everything looking different, there is really only one functional
change:  We now track the ISL_AUX_STATE_COMPRESSED_NO_CLEAR state.
Previously, if you rendered to a resolved slice of the miptree and then
did a fast-clear with a different clear color, that slice would get
resolved even though it hadn't been fast-cleared.  Now that we can track
COMPRESSED_NO_CLEAR, we know that it doesn't have any blocks in the
"clear" state so we can skip the resolve.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
e6c69264ed i965/miptree: Make level_has_hiz take a const miptree
Acked-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
b2c6290b01 i965: Wholesale replace the color resolve tracking code
This commit reworks the resolve tracking for CCS and MCS to use the new
isl_aux_state enum.  This should provide much more accurate and easy to
reason about tracking.  In order to understand, for instance, the
intel_miptree_prepare_ccs_access function, one only has to go look at
the giant comment for the isl_aux_state enum and follow the arrows.
Unfortunately, there's no good way to split this up without making a
real mess so there are a bunch of changes in here:

 1) We now do partial resolves.  I really have no idea how this ever
    worked before.  So far as I can tell, the only time the old code
    ever did a partial resolve was when it was using CCS_D where a
    partial resolve and a full resolve are the same thing.

 2) We are now tracking 4 states instead of 3 for CCS_E.  In particular,
    we distinguish between compressed with clear and compressed without
    clear.  The end result is that you will never get two partial
    resolves in a row.

 3) The texture view rules are now more correct.  Previously, we would
    only bail if compression was not supported by the destination
    format.  However, this is not actually correct.  Not all format
    pairs are supported for texture views with CCS even if both support
    CCS individually.  Fortunately, ISL has a helper for this.

 4) We are no longer using intel_resolve_map for tracking aux state but
    are instead using a simple array of enum isl_aux_state indexed by
    level and layer.  This is because, now that we're tracking 4
    different states, it's no longer clear which should be the "default"
    and array lookups are faster than linked list searches.

 5) The new code is very assert-happy.  Incorrect transitions will now
    get caught by assertions rather than by rendering corruption.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
46fd924899 i965: Delete most of the old resolve interface
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
f296c22989 i965: Use the new get/set_aux_state functions for color clears
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
38563e95d5 i965: Move blorp to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
554f7d6d02 i965: Move depth to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
170e4b366a i965: Move images to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
8cb3b4a586 i965: Move framebuffer fetch to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
79df134d56 i965: Remove an unneeded render_cache_set_check_flush
This is only needed to fix rendering corruptions caused by not flushing
after doing a resolve operation.  The resolve now does all the needed
flushing so this is unnecessary.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
49e4d8cce2 i965: Move color rendering to the new resolve functions
This also removes an unneeded brw_render_cache_set_check_flush() call.
We were calling it in the case where the surface got resolved to satisfy
the flushing requirements around resolves.  However, blorp now does this
itself, so the extra is just redundant.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
c0f5225264 i965: Move texturing to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
421d713eec i965: Use the new resolve function for several simple cases
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
5ec344e420 i965/miptree: Add new entrypoints for resolve management
This commit adds a new unified interface for doing resolves.  The basic
format is that, prior to any surface access such as texturing or
rendering, you call intel_miptree_prepare_access.  If the surface was
written, you call intel_miptree_finish_write.  These two functions take
parameters which tell them whether or not auxiliary compression and fast
clears are supported on the surface.  Later commits will add wrappers
around these two functions for texturing, rendering, etc.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
a59c7f834c intel/isl: Add an enum for describing auxiliary compression state
This enum describes all of the states that a auxiliary compressed
surface can have.  All of the states as well as normative language for
referring to each of the compression operations is provided in the
truly colossal comment for the new isl_aux_state enum.  There is also
a diagram showing how surfaces move between the different states.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
c89b795db4 i965: Combine render target resolve code
We have two different bits of resolve code for render targets: one in
brw_draw where it's always been and one in brw_context to deal with sRGB
on gen9.  Let's pull them together.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
0607ca42da i965: Be a bit more conservative about certain resolves
There are several places where we were resolving the entire miptree
when we really only needed to resolve a single slice.  Let's avoid the
unneeded resolving.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
3b65f9499c i965/blorp: Move MCS allocation earlier for clears
This way it happens before we call get_aux_state.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
076defba7a i965/blorp: Refactor do_single_blorp_clear
Previously, we had two checks for can_fast_clear and a tiny bit of
shared code in between.  This commit pulls all of the fast clear code
together and duplicates the tiny bit that declares some surface structs
and calls blorp_surf_for_miptree.  The duplication is no real loss and
we're about to change the two in slightly different ways.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
7a9c37eb7b i965/blorp: Take an explicit fast clear op in resolve_color
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
4afe282a35 i965/miptree: Move color resolve on map to intel_miptree_map
None of the other methods such as blit work with CCS either so we need
to do the resolve for all maps.  This change also makes us only resolve
the one slice we're mapping and not the entire image.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
ad7fa063ae i965: Inline renderbuffer_att_set_needs_depth_resolve
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
c15b2f53f4 i965: Get rid of intel_renderbuffer_resolve_*
There is exactly one caller so it's a bit pointless to have all of this
plumbing.  Just inline it at the one place it's used.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
25d00e72e4 i965/miptree: Refactor intel_miptree_resolve_color
The new version now takes a range of levels as well as a range of
layers.  It should also be a tiny bit faster because it only walks the
resolve_map list once instead of once per layer.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
64b829244b i965/miptree: Clean up the depth resolve helpers a little
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
97f6f411db i965/surface_state: Images can't handle CCS at all
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand
5097fcbfdc i965: Mark depth surfaces as needing a HiZ resolve after blitting
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Dave Airlie
cb2a13e895 st_glsl_to_tgsi: cleanup variable storage search.
I forgot to put the cleanup in earlier.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-08 13:29:29 +10:00
Rob Herring
f4b5510872 mesa/main: fix gl_buffer_index enum comparison
For clang, enums are unsigned by default and gives the following warning:

external/mesa3d/src/mesa/main/buffers.c:764:21: warning: comparison of constant -1 with expression of type 'gl_buffer_index' is always false [-Wtautological-constant-out-of-range-compare]
      if (srcBuffer == -1) {
          ~~~~~~~~~ ^  ~~

Replace -1 with an enum value to fix this.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-06-07 20:44:26 -05:00
Rob Herring
18348a383d glsl: fix bounds check in blob_overwrite_bytes
clang gives a warning in blob_overwrite_bytes because offset type is
size_t which is unsigned:

src/compiler/glsl/blob.c:110:15: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
   if (offset < 0 || blob->size - offset < to_write)
       ~~~~~~ ^ ~

Remove the less than 0 check to fix this.

Additionally, if offset is greater than blob->size, the 2nd check would
be false due to unsigned math. Rewrite the check to avoid subtraction.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-06-07 20:44:26 -05:00
Dave Airlie
4453fbb024 st_glsl_to_tgsi: replace variables tracking list with a hash table
This removes the linear search which is fail when number of variables
goes up to 30000 or so.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-08 07:57:50 +10:00
Dave Airlie
3008161d28 st_glsl_to_tgsi: rewrite rename registers to use array fully.
Instead of having to search the whole array, just use the whole
thing and store a valid bit in there with the rename.

Removes this from the profile on some of the fp64 tests

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-08 07:56:33 +10:00
Dave Airlie
3bc7169793 st_glsl_to_tgsi: bump index back up to 32-bit
with some of the fp64 emulation, we are seeing shaders coming in with
> 32K temps, they go out with 40 or so used, but while doing register
renumber we need to store a lot of them.

So bump this fields back up to 32-bit.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-08 07:21:06 +10:00
Marek Olšák
e93a141f64 util/u_queue: fix a use-before-initialization race for queue->threads
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-07 23:19:30 +02:00
Grazvydas Ignotas
19f6cc3cba ac/nir: remove another unused variable
Declared by each loop already.
Trivial.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
2017-06-08 00:02:42 +03:00
Grazvydas Ignotas
5bbbe91799 radv/meta: remove an unused variable
Trivial.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-08 00:02:36 +03:00
Grazvydas Ignotas
7dfa54399c ac/nir: convert several ifs to a switch
Also solve "outinfo may be used uninitialized" warning by putting in an
unreachable().

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-08 00:02:26 +03:00
Grazvydas Ignotas
ae3262c1f2 ac/nir: mark some arguments const
Most functions are only inspecting nir, so nir related arguments can be
marked const. Some more can be done if/when some nir changes are
accepted.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-08 00:02:02 +03:00
Samuel Li
c705caaff9 radeonsi: Use libdrm to get chipset name
v2: Add a func pointer to radeon_winsys to support radeon later.

Change-Id: I614ea71424f9e5c97e4ae68654315d28c89eaa5f
Signed-off-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-06-07 21:53:36 +02:00
Thomas Helland
4ba4f0e976 util: Add extern c to u_dynarray.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Thomas Helland
cfb696dc82 nir: Delete nir_array.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Thomas Helland
e558a7a988 nir: Port to u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Thomas Helland
bc3a2be6c9 nir: Remove unused include
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Thomas Helland
9cb42ae997 util: Port nir_array functionality to u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Thomas Helland
07653f159f util: Remove unused includes and convert to lower-case memory ops
Also, prepare for the next commit by correcting some coding style
changes. This should be all non-functional changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Thomas Helland
f0372814a9 util: Move u_dynarray to src/util
This will be used as the basis for unification

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Thomas Helland
a66befc3c8 gallium: Add missing includes
These will need to be in place to avoid regressions when
removing these includes from the u_dynarray

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-07 21:07:24 +02:00
Marek Olšák
bacaceb78a radeonsi: update clip_regs on shader state changes only when it's needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:20 +02:00
Marek Olšák
2b7fd9df9a radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:18 +02:00
Marek Olšák
140b3c5019 radeonsi: add a new helper si_get_vs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:16 +02:00
Samuel Pitoiset
878bd981bf radeonsi: isolate real framebuffer changes from the decompression passes (v3)
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.

v2: Marek - set the flags outside the loop
          - also clear and set framebuffer.do_update_surf_dirtiness there
          - do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:14 +02:00
Marek Olšák
257b538fd2 radeonsi: do EarlyCSEMemSSA LLVM pass
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.

This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)

The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.

EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.

SGPRS: 1935420 -> 1938076 (0.14 %)
VGPRS: 1645504 -> 1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size: 61981592 -> 61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:09 +02:00
Marek Olšák
e9409c86e7 radeonsi: remove 8 bytes from si_shader_key
We can use a union in si_shader_key::mono.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:06 +02:00
Marek Olšák
2b8b9a56ef radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC
Heaven LDS usage for LS+HS is below. The masks are "outputs_written"
for LS and HS. Note that 32K is the maximum size.

Before:
  heaven_x64: ls=1f1 tcs=1f1, lds=32K
  heaven_x64: ls=31 tcs=31, lds=24K
  heaven_x64: ls=71 tcs=71, lds=28K

After:
  heaven_x64: ls=3f tcs=3f, lds=24K
  heaven_x64: ls=7 tcs=7, lds=13K
  heaven_x64: ls=f tcs=f, lds=17K

All other apps have a similar decrease in LDS usage, because
the "outputs_written" masks are similar. Also, most apps don't write
POSITION in these shader stages, so there is room for improvement.
(tight per-component input/output packing might help even more)

It's unknown whether this improves performance.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:14:15 +02:00
Thomas Hellstrom
2c4ec3f93f svga: Always set the alpha value to 1 when sampling using an XRGB view
If the XRGB view is sampling from an ARGB svga format, change
PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels.
Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which
could be both insufficient and incorrect.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
df4d6003dc svga: Fix imported surface view creation
When deciding to create a view with or without an alpha channel we need to
look at the SVGA3D format and not the PIPE format.

This fixes the glx-tfp piglit test for dri3/xa.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
c2138a066c svga: Set alpha to 1 for non-alpha views
Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those
and similar cases we need to set the alpha value to 1 when sampling.

Fixes piglit glx::glx-tfp

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
1887faf73b svga: Allow format differences in 16-bit RGBA surface sharing
For the purpose of surface sharing, treat SVGA3D_R5G6B5 and
SVGA3D_B5G6R5_UNORM as identical formats.
This fixes the following piglit tests with dri3/xa:

glx@glx-visuals-depth -pixmap
glx@glx-visuals-stencil -pixmap

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Singh Rawat <drawat@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
b8b0a3dc5c dri/vmwgfx: Disable a couple of glx extensions also for Ubuntu unity / compiz
It appears like the GLX_EXT_buffer_age extension also prevents Compiz /
Ubuntu Unity from performing partial buffer swaps when it otherwise
feels like doing so. So try to get them back again. We also disable
GLX_OML_sync_control since it appears it had a favourable impact on
gnome-shell.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
37e8341db4 dri: Turn of a couple of glx extensions for gnome-shell on vmwgfx.
Increases performance on vmwgfx since we're avoiding full buffer damage and
since we can't sync to vertical retrace anyway.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
48f4baf63f st/dri: Allow gallium drivers to turn off two GLX extensions
Allow gallium drivers to turn off GLX_EXT_buffer_age and
GLX_OML_sync_control if needed, using driconf.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
9d3f177e4b dri: Optionally turn off a couple of GLX extensions based on driconf options
With GLX_EXT_buffer_age turned on, gnome-shell will use full-screen damage
with GLX, which severely hurts performance with architectures that emulate
page-flips with copies. Like vmware. We would like to be able to turn off that
extension. Similarly, typically the GLX_OML_sync_control doesn't make much
sense on a virtual architecture since we don't really sync to the host's
vertical retrace. We'd like to be able to turn it off as well.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-07 19:43:54 +02:00
Thomas Hellstrom
ff2978b449 st/dri: Allow dri users to query also driver options
There will be situations where we want to control, for example, the
GLX behaviour based on applications and drivers. So allow DRI users access
to the driver options.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-07 19:43:54 +02:00
Marek Olšák
7d67cbefe0 radeonsi: clean up decompress blend state names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:45 +02:00
Marek Olšák
882c18bf1c gallium/radeon: clean up a misleading statement from the old days
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:43 +02:00
Marek Olšák
66176e6f14 radeonsi: don't use 1D tiling for Z/S on VI to get TC-compatible HTILE
It's always good to have fewer decompress blits.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:42 +02:00
Marek Olšák
d2ee423b69 radeonsi: enable TC-compatible stencil compression on VI
Most things are in place. Ideally we won't see decompress blits for stencil
anymore.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:39 +02:00
Marek Olšák
e003e3c4c0 st/mesa: don't keep framebuffer state in st_context
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:46:21 +02:00
Marek Olšák
f34abf77e9 st/mesa: cache pipe_surface for GL_FRAMEBUFFER_SRGB changes
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:46:21 +02:00
Marek Olšák
f7523f1ef6 st/mesa: use gl_driver_flags::NewFramebufferSRGB
also call st_init_driver_flags when st_context is initialized.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:46:21 +02:00
Marek Olšák
ac0aff7222 mesa: add gl_driver_flags::NewFramebufferSRGB
_NEW_BUFFERS updates too much stuff.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:46:21 +02:00
Marek Olšák
3effce4fb0 radeonsi/gfx9: prevent a race when the previous shader's main part is missing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
b5bc826ead radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
ffbaba6072 radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9
LS is merged into TCS. If there is no TCS, LS is merged into fixed-func
TCS. The problem is the fixed-func TCS was ignored by scratch update
functions, so LS didn't have the scratch buffer set up.

Note that Mesa 17.1 doesn't have merged shaders.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
6e2c07749b radeonsi: move streamout state update out of si_update_shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
294be5279d radeonsi: remove dead code in declare_input_fs
Colors are interpolated in the PS prolog. This was never used.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
8147c4a4a5 radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
86cc809726 radeonsi: use a compiler queue with a low priority for optimized shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
89b6c93ae3 util/u_queue: add an option to set the minimum thread priority
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
6f2947fa79 radeonsi: decrease the number of compiler threads to num CPUs - 1
Reserve one core for other things (like draw calls).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
38bd468a78 radeonsi: drop unfinished shader compilations when destroying shaders
If we enqueue too many jobs and destroy the GL context, it may take
several seconds before the jobs finish. Just drop them instead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák
33e507ec23 util/u_queue: add a way to remove a job when we just want to destroy it
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Rob Clark
812fd1aaa8 freedreno/a5xx: set SP_BLEND_CONTROL properly
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-07 12:32:00 -04:00
Rob Clark
5b60004525 freedreno/a5xx: LRZ support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-07 12:32:00 -04:00
Rob Clark
313f6360aa freedreno: drop timestamp field
unused.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-07 12:32:00 -04:00
Rob Clark
5589ba983d freedreno/a5xx: refactor out helper for LRZ flush
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-07 12:32:00 -04:00
Rob Clark
e26a7c1cf2 freedreno: reshuffle FD_MESA_DEBUG bitmask
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-07 12:32:00 -04:00
Rob Clark
613410c8fc freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-07 12:32:00 -04:00
Marek Olšák
a893c91697 gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible
so that we can use TXF.

The cubemap blit pixel shader code size: 148 -> 92 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:10:50 +02:00
Marek Olšák
4a88c7774c gallium/u_blitter: use TXF if possible
This fixes piglit:
    arb_texture_view-rendering-r32ui

TEX (image_sample) flushes denorms to 0 with FP32 textures on GCN, but such
a texture can contain integer data written using an integer render view.
If we do a transfer blit with TEX, denorms are flushed to 0. Luckily,
TXF (image_load) doesn't do that.

TXF also doesn't need to load the sampler state, so blit shaders don't have
to do s_load_dwordx4.

TXF doesn't do CLAMP_TO_EDGE, so it can only be used if the src box is
in bounds, or if we clamp manually (this commit doesn't).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:10:50 +02:00
Marek Olšák
0604568527 gallium/u_blitter: use TEX_LZ if it's supported
The sampler views always have first_level == last_level.
Now radeonsi doesn't have to use the WQM. (a few SALU removed)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:10:50 +02:00
Marek Olšák
eedca3323e gallium/util: add _LZ and TXF options to simple shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:10:50 +02:00
Marek Olšák
20c2785f7c gallium/ureg: add TEX/TXF_LZ opcodes to ureg
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:10:50 +02:00
Jason Ekstrand
dd294fd2d9 i965: Use BLORP for all HiZ ops
BLORP has been capable of doing gen8-style HiZ ops for a while now.  We
might as well start using it.  The one downside is that this may cause a
bit more state emission since we still re-emit most things for BLORP.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
bacae7221b blorp: Use FullSurfaceDepthandStencilClear for blorp_hiz_op
The blorp_hiz_op entrypoint always acts on a full subresource of a HiZ
buffer so we can just set the flag unconditionally.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
a2152775fd i965: Move the post-HiZ-clear flush/stall to intel_hiz_exec
This also changes it to be predicated so we only do the flush/stall on
clears and HiZ resolves.  The docs only say it's needed for clears but
empirical evidence says it's also needed for HiZ resolves.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
9cb6ac62fb intel/blorp: Plumb through access to the workaround BO
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101283
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Nanley Chery
ed5801864e anv/blorp: Move the depth cache flush outside of BLORP
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
fbd8a33f61 intel/blorp: Refactor the HiZ op interface
This commit does a few things:

 1) Now that BLORP can do HiZ ops on gen8+, drop the gen6 prefix.
 2) Switch parameters to uint32_t to match the rest of blorp.
 3) Take a range of layers and loop internally.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
42b10bbfe0 i965/blorp: Inline gen6_blorp_exec
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
acbd02450b i965: Perform HiZ flush/stall prior to HiZ resolves
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
acb9a2ef8f i965: Move the pre-depth-clear flush/stalls to intel_hiz_exec
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
252b004a51 i965/blorp: Take a layer range in intel_hiz_exec
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand
f9fd976e8a i965/miptree: Store fast clear colors in an isl_color_value
This commit, out of necessity, makes a number of changes at once:

 1) Changes intel_mipmap_tree to store the clear color for both color
    and depth as an isl_color_value.

 2) Changes the depth/stencil emit code to do the format conversion of
    the depth clear value on Haswell and earlier instead of pulling a
    uint32_t directly from the miptree.

 3) Changes ISL's depth/stencil emit code to perform the format
    conversion of the depth clear value on Haswell and earlier instead
    of assuming that the depth value in the float is pre-converted.

 4) Changes blorp to pass the depth value through as a float.

 5) Changes the Vulkan driver to pass the depth value to blorp as a
    float rather than a uint.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 08:54:54 -07:00
Thomas Hellstrom
1253d58983 dri3/GLX: Fix drawable invalidation v2
A number of internal VMware apitrace traces image comparisons fail with
dri3 because the viewport transformation becomes incorrect after an X
drawable resize. The incorrect viewport transformation sometimes persist
until the second draw-call after a swapBuffer.

Comparing with the dri2 glx code there are a couple of places where dri2
invalidates the drawable in the absence of server-triggered invalidation,
where dri3 doesn't do that. When these invalidation points are added to
dri3, the image comparisons become correct.

v2:
Addressed review comment by Michel Dänzer.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-and-tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-06-07 11:23:56 +02:00
Kenneth Graunke
09c3a00f10 i965: Fix alpha to one with dual color blending.
The BLEND_STATE documentation says that alpha to one must be disabled
when dual color blending is enabled.  However, it appears that it simply
fails to override src1 alpha to one.

We can work around this by leaving alpha to one enabled, but overriding
SRC1_ALPHA to ONE and ONE_MINUS_SRC1_ALPHA to ZERO.  This appears to be
what the other driver does, and it looks like it works despite the
documentation saying not to do it.

Fixes spec/ext_framebuffer_multisample/alpha-to-one-dual-src-blend *
Piglit tests.

v2: Add UNUSED to shut up warning on generations which don't use this.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-07 02:13:49 -07:00
Samuel Pitoiset
98d5667f4b mesa: add KHR_no_error support for glTexSubImage*D()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:40 +02:00
Samuel Pitoiset
7b104d9c50 mesa: add texsubimage() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:36 +02:00
Samuel Pitoiset
3e34fc0363 mesa: make _mesa_texture_sub_image() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:35 +02:00
Samuel Pitoiset
c2b6a63130 mesa: rename texsubimage() to texsubimage_err()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:34 +02:00
Samuel Pitoiset
287a7a0ca6 mesa: add KHR_no_error support for glCopyImageSubData()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:33 +02:00
Samuel Pitoiset
41df4b1d7e mesa: add copy_image_subdata() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:28 +02:00
Samuel Pitoiset
4485c28e1f mesa: add prepare_target() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:27 +02:00
Samuel Pitoiset
185a79a549 mesa: rename prepare_target() to prepare_target_err()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:25 +02:00
Samuel Pitoiset
6fedb31785 mesa: add KHR_no_error support for glBlitNamedFramebuffer()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:21 +02:00
Samuel Pitoiset
25304a44da mesa: add blit_named_framebuffer() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:18 +02:00
Samuel Pitoiset
a9600318ee mesa: add KHR_no_error support for glBlitFramebuffer()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:16 +02:00
Samuel Pitoiset
d496b879ed mesa: add validate_depth_buffer() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:14 +02:00
Samuel Pitoiset
f88c367ba9 mesa: add validate_stencil_buffer() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:11 +02:00
Samuel Pitoiset
bf0bf23f94 mesa: add validate_color_buffer() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:08 +02:00
Samuel Pitoiset
4f805edd3f mesa: wrap blit_framebuffer() into blit_framebuffer_err()
Also add ALWAYS_INLINE to blit_framebuffer().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:04:01 +02:00
Samuel Pitoiset
cb1d5f4639 mesa: add 'no_error' parameter to blit_framebuffer()
The whole GLES3 block has been moved before the buffer validation
checks.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:57 +02:00
Samuel Pitoiset
63a60584d1 mesa: make _mesa_blit_framebuffer() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:56 +02:00
Samuel Pitoiset
c231590f8d mesa: add KHR_no_error support for glBindBuffer()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:54 +02:00
Samuel Pitoiset
b019c4e6e8 mesa: add KHR_no_error support for glInvalidateBufferData()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:53 +02:00
Samuel Pitoiset
9ab285e588 mesa: add KHR_no_error support for glInvalidateBufferSubData()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:52 +02:00
Samuel Pitoiset
ec0c2eb845 mesa: add invalidate_buffer_subdata() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:47 +02:00
Samuel Pitoiset
2933ed56ce mesa: add KHR_no_error support for glBindVertexBuffers()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:47 +02:00
Samuel Pitoiset
5da83140df mesa: add KHR_no_error support for glVertexArrayVertexBuffers()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:45 +02:00
Samuel Pitoiset
a11c7e3fb5 mesa: add vertex_array_vertex_buffers_err() helper
This also adds a 'no_error' parameter to vertex_array_vertex_buffer()
to be used in a following patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 11:03:40 +02:00
Samuel Pitoiset
f075c2bc0b mesa: add KHR_no_error support for glScissor*()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 09:09:19 +02:00
Samuel Pitoiset
e2524a21cb mesa: add scissor() and scissor_array() helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 09:09:17 +02:00
Samuel Pitoiset
80ae5c128d mesa: rename ScissorIndexed() to scissor_indexed_err()
And move GET_CURRENT_CONTEXT() into the APIENTRY calls
for consistency.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 09:09:13 +02:00
Samuel Pitoiset
e8de0e124f mesa: use _mesa_set_scissor() in ScissorIndexed()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 09:09:11 +02:00
Samuel Pitoiset
ee38dfe9a5 mesa: make _mesa_scissor_bounding_box() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 09:09:04 +02:00
Samuel Pitoiset
8614f31be2 mesa: inline update_image_transfer_state() into _mesa_update_pixel()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 09:06:05 +02:00
Samuel Pitoiset
51854def8a mesa: remove useless check in _mesa_update_pixel()
The only caller is _mesa_update_state_locked() which already
checks if _NEW_PIXEL is set before calling _mesa_update_pixel().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-07 09:05:25 +02:00
Iago Toral Quiroga
f7741985be glcpp: fix #undef to match latest spec update and GLSLang implementation
GLSL ES spec includes the following:

   "It is an error to undefine or to redefine a built-in
    (pre-defined) macro name."

But desktop GLSL doesn't. This has sparked some discussion
in Khronos, and the final conclusion was to update the
GLSL 4.50 spec to include the following:

   "By convention, all macro names containing two consecutive
    underscores ( __ ) are reserved for use by underlying
    software layers.  Defining or undefining such a name in a
    shader does not itself result in an error, but may result
    in unintended behaviors that stem from having multiple
    definitions of the same name.  All macro names prefixed
    with “GL_” (“GL” followed by a single underscore) are also
    reserved, and defining or undefining such a name results in
    a compile-time error."

In other words, undefining GL_* names should be an error, but
undefining other names with a double underscore in them is
not strictly prohibited in desktop GLSL.

This patch fixes the preprocessor to apply these rules,
following exactly the implementation already present
in GLSLang. This fixes some tests in CTS.

Khronos bug:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16003

Fixes:
KHR-GL45.shaders.preprocessor.definitions.undefine_core_profile_vertex
KHR-GL45.shaders.preprocessor.definitions.undefine_core_profile_fragment

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-07 07:50:46 +02:00
Dave Airlie
1ec4f008a2 ac/nir: move gpr counting inside argument handling.
This just moves this code in here to it's cleaner.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 06:00:30 +01:00
Dave Airlie
7b46e2a74b ac/nir: assign argument param pointers in one place.
Instead of having the fragile code to do a second pass, just
give the pointers you want params in to the initial code,
then call a later pass to assign them.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 06:00:23 +01:00
Dave Airlie
b19cafd441 ac/nir: consolidate setting userdata location
Just pass a pointer and increment inside the function,
makes the code less error prone.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 05:59:57 +01:00
Timothy Arceri
d0cec1fce1 glthread: remove extra _mesa_glthread_finish() from generated code
The other user of print_sync_dispatch() was ending up with code that
looked like:

      _mesa_glthread_finish(ctx);
      _mesa_glthread_restore_dispatch(ctx);
      _mesa_glthread_finish(ctx);

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-07 14:53:38 +10:00
Anuj Phogat
8d02916e0c intel: Fix broxton 2x6 way size computation
This patch is undoing the changes to way size computation
in broxton 2x6, made by below commit:

Commit: 0d576fbfbe
Author:     Anuj Phogat <anuj.phogat@gmail.com>
i965: Simplify l3 way size computations

By making use of l3_banks field in gen_device_info struct
l3_way_size for gen7+ = 2 * l3_banks.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101306
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 21:30:51 -07:00
Dave Airlie
86eff151b1 radv: move chip_class extraction down further.
This seems to matter here in a profile, without this we spend a lot
more time exiting this function with no flush bits.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 10:25:20 +10:00
Dave Airlie
00fe30f376 radv: move lots of index related things into the bind.
This just moves lots of stuff to the bind stage rather than
dealing with it in the draw stage.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 10:24:37 +10:00
Dave Airlie
734ea16bdb radv: move calculating the vertex sgpr to the pipeline.
There is no need to calculate this at draw time.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 10:24:36 +10:00
Dave Airlie
3f48021b86 radv: rename and make global some functions.
I want to use these in the pipeline setup stage.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 10:24:36 +10:00
Eric Engestrom
63a8a88ac4 tree-wide: remove trailing backslash
Simple search for a backslash followed by two newlines.
If one of the newlines were to be removed, this would cause issues, so
let's just remove these trailing backslashes.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-07 01:18:09 +01:00
Dave Airlie
f0b82bc545 radv/gfx9: use correct register setting for uconfig regs
Thanks to Marek for pointing this out.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-07 08:09:03 +10:00
Bas Nieuwenhuizen
59c2e2a061 radv: Remove SI num RB override for occlusion queries.
radeonsi doesn't have it anymore either.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-06 23:23:43 +02:00
Bas Nieuwenhuizen
d607b83b79 radv: Split out updating the vertex descriptors.
Simple refactor.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-06 23:23:43 +02:00
Bas Nieuwenhuizen
58c8aae241 radv: Move pipeline stuff from flush_state to emit_graphics_pipeline.
No functional changes.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-06 23:23:43 +02:00
Bas Nieuwenhuizen
e08f741678 radv: Add early exit for cache flushes.
No sense checking each bit separately in the common case of none
being set.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-06 23:23:43 +02:00
Bas Nieuwenhuizen
4ec89727b2 radv: Remove vertex_descriptors_dirty.
Redundant.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-06 23:23:43 +02:00
Bas Nieuwenhuizen
fe0b8d1e8b radv: Don't use a divide by index_size.
Divides are pretty slow, and this is in the hot path of a draw.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-06 23:23:43 +02:00
Chris Wilson
7063696b71 i965: Explicitly disallow tiled memcpy path on Gen4 with swizzling.
The manual detiling paths are not prepared to handle Gen4-G45 with
swizzling enabled, so explicitly disable them.  (They're already
disabled because these platforms don't have LLC but a future patch could
enable this path).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:56 -07:00
Matt Turner
bc17155fd0 i965: Remove brw_bo_map_unsynchronized()
Call brw_bo_map() directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:47 -07:00
Matt Turner
e0a9b261e5 i965: Use unsynchronized mappings for BufferSubData on non-LLC
Now that unsynchronized maps actually work, we can use them, like we do
on LLC platforms.

On Broxton, the performance of Unigine Valley 1.1-rc1 is improved by
37.6656% +/- 0.401389% (n=20) at 1280x720/QUALITY_LOW, and by
20.862% +/- 2.20901% (n=3) at 1920x1080/QUALITY_LOW.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:47 -07:00
Matt Turner
a16355d67d i965: Make unsynchronized maps unsynchronized on non-LLC
On Broxton, the performance of Unigine Valley 1.0 is improved by
13.3067% +/- 0.144322% (n=40) at 1280x720/QUALITY_LOW, and by
1.68478% +/- 0.484226% (n=3) at 1920x1080/QUALITY_LOW.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:47 -07:00
Matt Turner
ce17d4c5f5 i965: Implement brw_bo_map_unsynchronized() with MAP_ASYNC
This way we can let brw_bo_map() choose the best mapping type.

Part of the patch inlines map_gtt() into brw_bo_map_gtt() (and removes
map_gtt()). brw_bo_map_gtt() just wrapped map_gtt() with locking and a
call to set_domain(). map_gtt() is called by brw_bo_map_unsynchronized()
to avoid the call to set_domain(). With the MAP_ASYNC flag, we now have
the same behavior previously provided by brw_bo_map_unsynchronized().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
68bfc377fb i965: Elide call to set_domain() if MAP_ASYNC
No functional change (no callers currently pass MAP_ASYNC)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
2120cfe1af i965: Add and use brw_bo_map()
We can encapsulate the logic for choosing the mapping type. This will
also help when we add WC mappings.

A few functional changes are made in this patch. On non-LLC, what were
previously WB mappings are now GTT mappings (in the prefilling debug
code in brw_performance_query.c; the shader_time code in brw_program.c;
and in the case of an RW mapping in intel_buffer_objects.c).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
dcb03bf18d i965: Drop MAP_READ from some write-only mappings
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
275401d32b i965: Pass flags to brw_bo_map_*
brw_bo_map_cpu() took a write_enable arg, but it wasn't always clear
whether we were also planning to read from the buffer. I kept everything
semantically identical by passing only MAP_READ or MAP_READ | MAP_WRITE
depending on the write_enable argument.

The other flags are not used yet, but MAP_ASYNC for instance, will be
used in a later patch to remove the need for a separate
brw_bo_map_unsynchronized() function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
925a4222f2 i965: Rename brw_bo_map() -> brw_bo_map_cpu()
I'm going to make a new function named brw_bo_map() in a later patch
that is responsible for choosing the mapping type, so this patch clears
the way.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
3d1530d3e8 i965: Rename *_virtual -> map_*
I think these are better names, and it reduces the delta between
upstream and Chris Wilson's brw-batch branch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Chris Wilson
6aa2e8777b i965: Pass the map-mode along to intel_mipmap_tree_map_raw()
Since we can distinguish when mapping between READ and WRITE, we can
pass along the map mode to avoid stalls and flushes where possible.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-06 11:47:46 -07:00
Matt Turner
47bb498534 i965: Add a cache_coherent field to brw_bo
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
51d714dca6 i965: Remove unused 'use_resource_streamer' field
Missing in the resource streamer removal of commit 951f56cd43.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
5dc35e1664 i965: Remove brw_bo's virtual member
Just return the map from brw_map_bo_*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Matt Turner
d7024a6b3c i965: Remove unused brw_bo_map__* functions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 11:47:46 -07:00
Alex Smith
922b038864 anv: Set better descriptor set limits
Based on discussions with Jason, Ivy Bridge and Bay Trail only actually
support 16 samplers, while newer hardware can support more than the
current limit of 64. Therefore set the lower limit where needed, and
bump up to 128 for everything else. There is also a limit on the total
number of other resources of around 250.

This allows Dawn of War III to render correctly on ANV.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-06 08:20:09 -07:00
Alex Smith
59c1797d56 anv: Set driver version to Mesa version
As already done by RADV.

v2: Move version calculation function to src/vulkan/util to share with
    RADV.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-06 08:20:00 -07:00
Alex Smith
dc6182fa3f radv/vulkan: Move radv_get_driver_version to src/vulkan/util
This means it can be reused for other Vulkan drivers. Also fix up a
typo, need to search for '.' in the version string rather than ','.

v2: Remove unneeded temporary version variable (Emil, Eric)

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-06 08:19:55 -07:00
Alex Smith
621b3410f5 util/vulkan: Move Vulkan utilities to src/vulkan/util
We have Vulkan utilities in both src/util and src/vulkan/util. The
latter seems a more appropriate place for Vulkan-specific things, so
move them there.

v2: Android build system changes (from Tapani Pälli)

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-06 08:17:13 -07:00
Lionel Landwerlin
2ef73473c8 intel: gen-decoder: rework how we handle groups
The current way of handling groups doesn't seem to be able to handle
MI_LOAD_REGISTER_* with more than one register. This change reworks
the way we handle groups by building a traversal list on loading the
GENXML files.

Let's say you have

Instruction {
  Field0
  Field1
  Field2
  Group0 (count=2) {
    Field0-0
    Field0-1
  }
  Group1 (count=4) {
    Field1-0
    Field1-1
  }
}

We build of linked on load that goes :

Instruction -> Group0 -> Group1

All of those are gen_group structures, making the traversal trivial.
We just need to iterate groups for the right number of timers (count
field in genxml).

The more fancy case is when you have only a single group of unknown
size (count=0). In that case we keep on reading that group for as long
as we're within the DWordLength of that instruction.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-06 14:04:37 +01:00
Marek Olšák
6c655cfeb4 radeonsi: fix a GPU hang with tessellation on 2-CU configs
Only harvested Stoney has 2 CUs. Tested on 2-CU Stoney and Fiji forced
to 2 CUs.

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-06-06 13:01:52 +02:00
Samuel Pitoiset
b9f9bad4eb mesa: make use of NewWindowRectangles driver flags
Now, st_update_window_rectangles() won't be called when the
scissor is going to be updated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-06 11:47:31 +02:00
Samuel Pitoiset
035b0176e2 mesa: add new gl_driver_flags::NewWindowRectangles
This new driver flag will replace _NEW_SCISSOR which is
emitted when setting new window rectangles but it actually
triggers useless changes in the state tracker (like scissor
and rasterizer).

EXT_window_rectangles is currently only supported by Nouveau.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-06 11:47:27 +02:00
Samuel Pitoiset
d19d8f5e6b mesa: remove call to Driver.Scissor() in _mesa_WindowRectanglesEXT()
This is actually useless because this driver call is only used
by the classic DRI drivers which don't support that extension
and probably won't never support it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-06 11:47:24 +02:00
Samuel Pitoiset
11c6aab239 mesa: only emit _NEW_MULTISAMPLE when min sample shading changes
We usually check that given parameters are different before
updating the state.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-06 11:47:22 +02:00
Samuel Pitoiset
af9e537be3 mesa: only emit _NEW_MULTISAMPLE when sample mask changes
We usually check that given parameters are different before
updating the state.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-06 11:47:19 +02:00
Samuel Pitoiset
706e31fe5a mesa: only emit _NEW_MULTISAMPLE when coverage parameters change
We usually check that given parameters are different before
updating the state.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-06 11:47:16 +02:00
Kenneth Graunke
9cd69022d5 i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency.
We moved to INTEL_SCALAR_* when we added more than a single stage, but
never went back and converted the VS to work that way.  Be consistent.

Also update the documentation to actually mention these debug variables.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-05 23:32:40 -07:00
Dave Airlie
2890a71158 radv: expose integrated device type for APUs.
This just sets the vulkan device type depending on whether
this is an APU or GPU.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
2017-06-06 12:48:57 +10:00
Bas Nieuwenhuizen
ecdace80f4 ac/surface: Fix HTILE for radv.
We always compute HTILE size using addrlib, even when not TC compatible.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlied <airlied@redhat.com>
2017-06-06 03:17:02 +02:00
Dave Airlie
0e72dea46f radv: fix write event eop on vega.
Typo here, fixes command submission hangs on vega
2017-06-06 10:43:19 +10:00
Dave Airlie
65477bae9c radv: enable GFX9 on radv
I'm open to reverting this closer to release if bad things
happen, but it might be easier to debugging to leave it for now.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:44:26 +10:00
Dave Airlie
c07eb1823f radv: turn off geom/tess for gfx9.
We don't support these yet, and it'll take a bit of work to do so.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:44:18 +10:00
Dave Airlie
348f63623b radv: misc GFX9 changes.
These are just some register changes ported from radeonsi for gfx9.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:44:10 +10:00
Dave Airlie
289de9f945 radv: add some GFX9 specific events.
These are ported from radeonsi, don't know all the rules for
when they should be inserted.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:44:00 +10:00
Dave Airlie
5c8f8cae3e radv: add IA_MULTI_VGT_PARAM support for GFX9.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:55 +10:00
Dave Airlie
67655cb24f radv: add rb+ support for GFX9
This adds some rb+ support, as on GFX9 we have to disable
it as per radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:45 +10:00
Dave Airlie
c2fbeb7ca0 radv: add GFX9 cache flushing support.
GFX9 needs to write event EOP to a fence buffer, allocate some
space for this, and just write an ever increasing number to it,
this isn't exactly what radeonsi does, but it seems to work.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:40 +10:00
Dave Airlie
b11c4a5546 radv: add texture descriptor/fmask/cmask support for GFX9
This adds gfx9 support for the texture descriptor along
with the fmask/cmask allocation routines.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:37 +10:00
Dave Airlie
87b3799493 radv: add GFX9 to initialisation cmd buffer.
This just adds support for initialising some GFX9 registers,
and handles the different init for the VGT reuse reg.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:35 +10:00
Dave Airlie
98f27b9cce radv: don't setup raster_config on gfx9.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:32 +10:00
Dave Airlie
77b8aa4d95 radv: add gfx9 cp dma support.
This adds support to the CP dma code for GFX9, ported from
radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:29 +10:00
Dave Airlie
41eba750ba radv: add gfx9 depth/stencil surface support.
This is ported from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:27 +10:00
Dave Airlie
ac3e18916f radv: add GFX9 support for color surfaces.
This is ported from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:24 +10:00
Dave Airlie
0063da8393 radv: add some misc gfx9 pieces.
This just adds the strings and includes the gfx9 register defs
in some files that we need them in.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:21 +10:00
Dave Airlie
a83f28d536 radv: set offchip hs param like radeonsi.
radeonsi never uses 512 here anymore.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 09:43:18 +10:00
Dave Airlie
04924c09be radv: fix typo in comment. 2017-06-06 08:59:30 +10:00
Dave Airlie
114d29e7fe radv: add a comment from radeonsi before cp dma function.
This is just copied over.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 08:44:01 +10:00
Dave Airlie
da3330662f radv: remove doubled up prototype.
Must have snuck in during a rebase.
2017-06-06 08:27:35 +10:00
Dave Airlie
d1a4d229ec radv: split metadata struct into legacy/gfx9 parts.
This is just ported from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 08:22:45 +10:00
Dave Airlie
d987f90354 radv: refactor some texture descriptor state.
This just splits out some non-gfx9 bits in advance to avoid
regressions.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 08:22:42 +10:00
Dave Airlie
a5d181f60b radv: refactor color surface init before gfx9.
This just moves the code around in preparation for gfx9 support.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 08:22:38 +10:00
Dave Airlie
d3ab239099 radv: refactor depth/stencil state setup
In advance of GFX9 to reduce chances for regression, refactor
this code out so adding the GFX9 changes will be more obvious.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 08:22:33 +10:00
Dave Airlie
b50ab49723 radv: use radv_foreach_stage in a couple of places.
This just collapses a few per-stage things into a loop,
shouldn't affect anything.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-06 08:20:22 +10:00
Emil Velikov
065dea70e5 radeon: remove out of date LLVM_REVISION.txt
The file was introduced to track which LLVM revision was required, yet
that has quickly gone out of shape.

It has seen no updates since 2013.

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
2017-06-05 17:12:36 -05:00
Juan A. Suarez Romero
d1df9a595e docs: update calendar, add news item and link release notes for 17.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-05 21:22:15 +00:00
Juan A. Suarez Romero
3255b9d348 docs: add sha256 checksums for 17.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 4908b1e909)
2017-06-05 21:22:15 +00:00
Juan A. Suarez Romero
373c309c24 docs: add release notes for 17.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 97f6404e50)
2017-06-05 21:22:15 +00:00
Brian Paul
af4017665b gallium/u_threaded: fixes for MSVC
Replace some static assertions with runtime assertions.  The static
asserts don't work/fail on MSVC, despite the offsets being multiples
of 16 (checked with softpipe).

Use correct parameter types for a few gallium context functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-05 15:06:15 -06:00
Dave Airlie
d8212f847a r600: refactor out some compressed resource state code.
This just takes this out to a separate function as it will
get more complex with images.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2017-06-06 06:09:44 +10:00
Dave Airlie
7a26a0bf09 r600: document some of the missing shader constants.
These are used for fragment shader thread calculations.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2017-06-06 06:09:41 +10:00
Dave Airlie
95c1e57a18 r600: add register info for atomic counters.
The atomic counters on evergreen are implemented via append/consume
UAV counters. This just adds the register info for them. The EOS
packets are used to get the atomic totals extracted post shader
execution for storing into a buffer.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2017-06-06 06:09:37 +10:00
Dave Airlie
a6b71f7588 r600: add missing RAT registers and operations.
This just documents in the headers the RAT operation list,
and the RAT encoding for exports.

The immediate registers are used to point to buffers for the
RAT return values (_RTN instructions).

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2017-06-06 06:09:10 +10:00
Dave Airlie
e119c445da r600/sb: fix typo in field definitions
Pointed out by glennk.
2017-06-06 05:46:14 +10:00
Marek Olšák
4b1e6ed49a tgsi/scan: fix scanning fragment shaders with PrimID and Position/Face
Not relevant to radeonsi, because Position/Face are system values
with radeonsi, while this codepath is for drivers where Position and
Face are ordinary inputs.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-05 18:29:42 +02:00
Jason Ekstrand
708664159e i965: Finalize miptrees before prepare_texture
In order to do resolves for texture views with different formats, we
need intel_texture_object::_Format to be valid.  Calling
intel_finalize_mipmap_tree can safely be done multiple times in a row
and should be a fairly cheap operation.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-05 09:26:22 -07:00
Marek Olšák
9275b2233f gallium/u_threaded: remove 16 bytes from tc_batch
All other sentinels occupy what is otherwise unused space.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-05 18:25:57 +02:00
Marek Olšák
3b1ce49bc1 gallium/u_threaded: align batches and call slots to 16 bytes
not sure if this helps

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-05 18:25:57 +02:00
Marek Olšák
2ec50f98a9 st/mesa: don't load cached TGSI shaders on demand
This fixes a performance issue with the shader cache that delayed Gallium
shader create calls until draw calls.

I'd like this in stable, but it's not a showstopper.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-05 18:25:57 +02:00
Chih-Wei Huang
bb0452442a Android: use bionic pthread_barrier_* if possible
The pthread_barrier_* functions were introduced to bionic
since Nougat.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-05 14:06:35 +01:00
Dave Airlie
06f4251925 r600: fix incorrect and missing bit field in register headers.
The compression field was incorrect, and we were missing the
depth before shader field.
2017-06-05 13:19:18 +10:00
Nicolai Hähnle
df30123794 radv: use ac_compute_surface
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:30 +10:00
Dave Airlie
607e61c40e radv: prepare fmask surface creation
The old code copied over all the surface info from the image
surface, we only want some bits of it, and to modify the flags.

This prevents a regression in dEQP-VK.api.copy_and_blit.resolve_image.*
and others in the subsequent switch to ac_compute_surface.

v2:
- also disable opt4Space in radv_amdgpu_surface, so that we can
  apply this patch separately *before* switching to ac_compute_surface
  and hopefully avoid intermittent regressions (Nicolai)

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-05 10:44:24 +10:00
Nicolai Hähnle
8354f287db radv: use amdgpu_addr_create
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:22 +10:00
Nicolai Hähnle
40e94847a5 radv: stop using radv_amdgpu_winsys::family
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:18 +10:00
Nicolai Hähnle
bd4493b169 radv: use ac_gpu_info
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:15 +10:00
Nicolai Hähnle
eeb075d662 radv: remove radeon_info::name
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:13 +10:00
Nicolai Hähnle
dfc06d2fac radv: use ac_surface data structures
This is mostly mechanical changes of renaming types and introducing
"legacy" everywhere.

It doesn't use the ac_surface computation functions yet.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:09 +10:00
Nicolai Hähnle
543de22f4b radv: rename radeon_surf::bo_{size,alignment} to surf_{size,alignment}
To match radeonsi / ac_surface.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:05 +10:00
Nicolai Hähnle
8417c21d0a radv: remove unused RADEON_SURF_HAS_SBUFFER_MIPTREE
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:44:02 +10:00
Nicolai Hähnle
e156eaedb4 radv: remove radeon_surf_level::nblk_z
We're not using thick tiling modes, so we can just derive the value
ourselves.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:43:59 +10:00
Nicolai Hähnle
34b7fb47b6 radv: remove radeon_surf_level::dcc_enabled
Like radeonsi; replace with radeon_surf::num_dcc_levels.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:43:56 +10:00
Nicolai Hähnle
59f72e158a radv: remove radeon_surf_level::pitch_bytes
Like radeonsi. This saves memory, and the information can easily be
recomputed on the fly where necessary.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:43:53 +10:00
Nicolai Hähnle
a12d288bff radv: add surface helper variable in radv_GetImageSubresourceLayout
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:43:50 +10:00
Nicolai Hähnle
388d36dfd1 radv: fewer than 8 RBs are possible
This fixes the subsequent assertion on Bonaire.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:43:47 +10:00
Nicolai Hähnle
e07d5c7296 ac/surface/gfx6: explicitly support S8 surfaces
This is needed by radv for dEQP-VK.renderpass.simple.stencil

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:43:29 +10:00
Dave Airlie
72f0830ecd ac/nir: set workgroup size attribute to correct value.
This ports: 55445ff189 from radeonsi

    radeonsi: tell LLVM not to remove s_barrier instructions

    LLVM 5.0 removes s_barrier instructions if the max-work-group-size
    attribute is not set. What a surprise.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-05 01:37:44 +01:00
Dave Airlie
68c812f699 ac: add new helper function to add a integer target dependent function attr.
This is needed to add the max workgroup size attribute.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-05 01:37:29 +01:00
Dave Airlie
4ba2e6cbfa radv: add external memory support.
This adds support for exporting 2D images, to an
opaque fd.

This implements the:
VK_KHX_external_memory_capabilities
VK_KHX_external_memory
VK_KHX_external_memory_fd

extensions.

These are used by SteamVR, we should work with anv
to decide if we should ship these under an env
var or something.

v2 (Bas): - Don't expose the semaphore ext without implementing it.
          - Only export the capabilities ext as instance ext.
          - Implement radv_GetPhysicalDeviceExternalBufferPropertiesKHX.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2017-06-05 02:26:43 +02:00
Bas Nieuwenhuizen
d515b420dd radv: Add VkPhysicalDeviceIDProperties support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 02:26:43 +02:00
Bas Nieuwenhuizen
d513473cc1 radv: Add support for external queue family.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 02:26:43 +02:00
Dave Airlie
a935cd926b radv/formats: reverse how the image format properties KHR2 is handled
This just aligns with how anv does it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-05 01:03:30 +01:00
Bas Nieuwenhuizen
4415a46be2 radv: Dirty all descriptors sets when changing the pipeline.
Sets could have been ignored during previous descriptor set flush
due to the shader not using them and therefore no SGPR being assigned.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: ae61ddabe8 "radv: move userdata sgpr ownership to compiler side."
2017-06-03 22:24:37 +02:00
Bas Nieuwenhuizen
5fb8bb3065 radv: Set both compute and graphics SGPRS on descriptor set flush.
We clear the descriptors_dirty array afterwards, so the SGPRs for
the other pipeline don't get updated on the flush for that other
draw/dispatch, so we have to make sure we do it immediately.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: ae61ddabe8 "radv: move userdata sgpr ownership to compiler side."
2017-06-03 22:24:37 +02:00
Chris Wilson
8d07cb125c i965: Order write of query availablity with earlier writes
Currently we signal the availabilty of the query result using an
unordered pipe-control write. As it is unordered, it may be executed
before the write of the query result itself - and so an observer may
read the query result too early. Fix this by requesting that the write
of the availablity flag is ordered after earlier pipe control writes.

Testcase: piglit/arb_query_buffer_object-qbo/*async*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-06-03 13:38:45 +01:00
Lyude
98fc0243ef nvc0: Add support for ARB_post_depth_coverage
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-02 23:19:42 -04:00
Lyude
4dafc4c99a st/mesa: Add support for ARB_post_depth_coverage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-02 23:19:39 -04:00
Lyude
467af445a3 gallium: Add a cap to check if the driver supports ARB_post_depth_coverage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-02 23:19:22 -04:00
Lyude
af788a82d5 gallium: Add TGSI shader token for ARB_post_depth_coverage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-02 23:19:22 -04:00
Lyude
245912b684 nvc0: disable BGRA8 images on Fermi
BGRA8 image stores on Fermi don't work, which results in breaking
PBO downloads, such that they always return 0x0. Discovered this
through a glamor bug, and confirmed it does indeed break a good number
of piglit tests such as spec/arb_pixel_buffer_object/pbo-read-argb8888

Fixes: 8e7893eb53 ("nvc0: add support for BGRA8 images")
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-06-02 23:10:36 -04:00
Anuj Phogat
0d576fbfbe i965: Simplify l3 way size computations
By making use of l3_banks field in gen_device_info struct
l3_way_size for gen7+ = 2 * l3_banks.

V2: Keep the get_l3_way_size() function.

Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-02 16:21:56 -07:00
Anuj Phogat
eb23be1d97 i965: Add and initialize l3_banks field for gen7+
This new field helps simplify l3 way size computations
in next patch.

V2: Initialize the l3_banks to 0 in macros.

Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-02 16:21:56 -07:00
Chad Versace
e9f5004d5e i965: Replace 0 with ISL_FORMAT_UNSUPPORTED in format table (v2)
When given an *unsupported* mesa_format,
brw_isl_format_for_mesa_format() returned 0, a *valid* isl_format,
ISL_FORMAT_R32G32B32A32_FLOAT.  The problem is that
brw_isl_format_for_mesa_format's inner table used 0 instead of
ISL_FORMAT_UNSUPPORTED to indicate unsupported mesa formats.

Some callers of brw_isl_format_for_mesa_format() were aware of this
weirdness, and worked around it. This patch removes those workarounds.

v2: Ensure that all array elements are initialized to
  ISL_FORMAT_UNSUPPORTED, even when new formats are added to enum
  mesa_format, by using an designated range initializer.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-02 12:41:30 -07:00
Gurchetan Singh
1fec049850 st/dri: Use fence extension in drisw.c
This is desirable for synchronization in virtual machines.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-02 12:33:42 -07:00
Gurchetan Singh
59dc23bba9 st/dri: move fence implemention into separate file
Since the fence implementation is not dri2.c specific, put
it in a separate file. This way SW implementations can use this
extension too.

v2: Don't depend on dri2.c for extensions (Emil)
v3: Make this patch only move extension into a separate file (Chad).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-02 12:33:21 -07:00
Brian Paul
3ba5b8a560 mesa: document range of SampleCoverageValue, MinSampleShadingValue
Trivial.
2017-06-02 08:23:13 -06:00
Brian Paul
c6ba85a8c0 xlib: fix glXGetCurrentDisplay() failure
glXGetCurrentDisplay() has been broken for years and nobody noticed until
recently.  This change adds a new XMesaGetCurrentDisplay() that the GLX
emulation API can call, just as we did for glXGetCurrentContext().

Tested by hacking glxgears to call glXGetCurrentContext() before and
after glXMakeCurrent() to verify the return value is NULL beforehand and
the same as the opened display afterward.

Also tested by Tom Hudson with his tests programs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100988
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Tom Hudson <tom.hudson.phd@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
2017-06-02 08:22:55 -06:00
Dave Airlie
bcae327469 radv: realign cp dma code with radeonsi
This reworks this code to be like radeonsi, which will make it
easier to add GFX9 support to it in the future.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-02 12:49:11 +10:00
Dave Airlie
745aa17093 radv: bump some base addresses to 64-bits.
For GFX9 these will be needed to be 64-bit, so bump them early,
to avoid it causing any wierdness later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-02 12:49:01 +10:00
Dave Airlie
ad61eac250 radv: factor out eop event writing code. (v2)
In prep for GFX9 refactor some of the eop event writing code
out.

This changes behaviour, but aligns with what radeonsi does,
it does double emits on CIK/VI, whereas previously it only
did this on CIK.

v2: bump the size checks.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-02 12:48:56 +10:00
Dave Airlie
7205431e73 radv: factor out si_emit_wait_fence code.
This code was in a few places, consolidate into one.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-02 12:48:20 +10:00
Jason Ekstrand
1a22c4c960 intel/blorp: Handle gen6 stencil/HiZ offsets in the back-end
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:34:01 -07:00
Jason Ekstrand
d065a9540c intel/isl: Add a helper for getting the byte/tile offset of a subimage
Frequently, get_image_offset_sa is combined with get_intratile_offset_sa
so it makes sense to have a single helper to do both.  If the caller
doesn't want the intratile offsets, it can simply pass NULL and ISL will
assert that they are 0.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:58 -07:00
Jason Ekstrand
b178762d05 intel/isl: Make get_intratile_offset_el take the element size in bits
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:56 -07:00
Jason Ekstrand
757f7087a5 intel/isl: Add a new layout for HiZ and stencil on Sandy Bridge
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:47 -07:00
Jason Ekstrand
cb8cdab8e8 intel/isl: Generate phys_total_el from isl_calc_phys_extent
The only surface layout for which slice0 makes any sense is GEN4_2D.
Move all of the slice0 stuff into isl_calc_phys_total_extent_el_gen4_2d
and make the others trivially return the total size in surface elements.
As a side-effect, array_pitch_el_rows is now returned from these helpers
as well.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:45 -07:00
Jason Ekstrand
918f41bb29 intel/isl: Don't check array pitch for gen4 3D textures
Array pitch doesn't matter in this layout.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:43 -07:00
Jason Ekstrand
044bfb292f intel/isl: Refactor to use a phys_total_el extent.
We've already implicitly been using a physical total size in surface
elements.  This just centralizes things a bit.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:41 -07:00
Jason Ekstrand
1547d133ac intel/isl: Add an isl_assert_div helper
This is a fairly common operation and it's nice to be able to just call
the one little function.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:39 -07:00
Jason Ekstrand
58051ad220 intel/isl: Refactor isl_calc_array_pitch_el_rows
Over 90% of the function only applies to ISL_DIM_LAYOUT_GEN4_2D anyway
so we can just handle the other two as special cases at the top.  The
two "generic" cases below the switch only apply on gen9 and above and
only to 3D or CCS surfaces.  This implies that they only apply to
surfaces with ISL_DIM_LAYOUT_GEN4_2D.  Making them look generic is a
lie.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:37 -07:00
Jason Ekstrand
fe13c59c1b intel/isl: Move isl_calc_array_pitch_el_rows higher up
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:34 -07:00
Jason Ekstrand
c1a70165be intel/isl: Remove the device parameter from isl_tiling_get_info
We were only using it for validating that we don't use Ys/Yf on gen8 and
earlier.  Removing it from isl_tiling_get_info lets us remove it from a
bunch of other things that had no business needing a hardware
generation.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:31 -07:00
Jason Ekstrand
10903d2289 i965: Rework Sandy Bridge HiZ and stencil layouts
Sandy Bridge does not technically support mipmapped depth/stencil.  In
order to work around this, we allocate what are effectively completely
separate images for each miplevel, ensure that they are page-aligned,
and manually offset to them.  Prior to layered rendering, this was a
simple matter of setting a large enough halign/valign.

With the advent of layered rendering, however, things got more
complicated.  Now, things weren't as simple as just handing a surface
off to the hardware.  Any miplevel of a normally mipmapped surface can
be considered as just an array surface given the right qpitch.  However,
the hardware gives us no capability to specify qpitch so this won't
work.  Instead, the chosen solution was to use a new "all slices at each
LOD" layout which laid things out as a mipmap of arrays rather than an
array of mipmaps.  This way you can easily offset to any of the
miplevels and each is a valid array.

Unfortunately, the "all slices at each lod" concept missed one
fundamental thing about SNB HiZ and stencil hardware:  It doesn't just
always act as if you're always working with a non-mipmapped surface, it
acts as if you're always working on a non-mipmapped surface of the same
size as LOD0.  In other words, even though it may only write the
upper-left corner of each array slice, the qpitch for the array is for a
surface the size of LOD0 of the depth surface.  This mistake causes us
to under-allocate HiZ and stencil in some cases and also to accidentally
allow different miplevels to overlap.  Sadly, piglit test coverage
didn't quite catch this until I started making changes to the resolve
code that caused additional HiZ resolves in certain tests.

This commit switches Sandy Bridge HiZ and stencil over to a new scheme
that lays out the non-zero miplevels horizontally below LOD0.  This way
they can all have the same qpitch without interfering with each other.
Technically, the miplevels still overlap, but things are spaced out
enough that each page is only in the "written area" of one LOD.

Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:26 -07:00
Kenneth Graunke
fe14a9a501 i965: Drop duplicate shadow variable.
We already initialized this at the top of the function.

Trivial.
2017-06-01 14:28:12 -07:00
Jose Fonseca
ce5e83b8a0 automake: Link all libGL.so variants with -Bsymbolic.
We were linking src/glx with -Bsymbolic, but not the classic/gallium X11
libGL.so.

But it's always a good idea to build all libGL.so and all DRI drivers
with -Bsymbolic, otherwise they might resolve symbols from the 3rd party
application executable or shared libraries, which is _never_ what we
want.

In particular, this can happen when intercepting OpenGL calls with
apitrace, before
63194b2573

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-01 21:24:38 +01:00
Chad Versace
9d996e94fb i965/dri: Fix bad GL error in intel_create_winsys_renderbuffer()
This function never occurs in the callchain of a GL function. It occurs
only in the callchain of eglCreate*Surface and the analogous paths for
GLX.  Therefore, even if a  thread does have a bound GL context,
emitting a GL error here is wrong. A misplaced GL error, when no GL
call is made, can confuse clients.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-01 12:41:32 -07:00
Chad Versace
a23cabd8ca i965: Cleanup in intel_create_winsys_renderbuffer()
Combine variable declarations and assignments.
Trivial cleanup.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-01 12:41:30 -07:00
Chad Versace
6551655ffd i965: Remove bad assert on isl_format
translate_tex_format() asserted that isl_format != 0. But 0 is a valid
format, ISL_FORMAT_R32G32B32A32_FLOAT.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-01 12:41:26 -07:00
Chad Versace
de69002faa i965: Fix return type of translate_tex_format()
It returns an isl_format, not GLuint BRW_FORMAT.  I updated every
translate_tex_format() found by git-grep.

No change in behavior.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-01 12:41:24 -07:00
Chad Versace
77e3c836f8 i965: Fix return type of brw_isl_format_for_mesa_format() [v2]
It returns an isl_format, not uint32_t BRW_FORMAT.
I updated every brw_isl_format_for_mesa_format() found by git-grep.

No change in behavior.

v2: Rebased atop Anuj's patch, which has some of the same fixes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
2017-06-01 12:39:35 -07:00
Anuj Phogat
84ede214fc i965: Remove an extra semicolon
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-06-01 12:14:58 -07:00
Anuj Phogat
adb449694a i965: Rename brw_format variable names to isl_format
This patch makes non functional changes. Renaming is just to
make the code more readable.

V2: update the types to "enum isl_format"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-01 12:14:13 -07:00
Chad Versace
7a4964ec5c i965: Reject unsupported formats in glEGLImageTargetTexture2D()
If the EGLImage's format is not a supported texture format according to
brw_surface_formats.c, then refuse to create the miptree. This follows
the precedent in glEGLImageRenderbufferStorage (implemented by
intel_image_target_renderbuffer_storage), which rejects the EGLImage's
format if is not renderable.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-01 12:03:33 -07:00
Kenneth Graunke
fe9699dcb4 genxml: Make 3DSTATE_CONSTANT_BODY on Gen7+ use arrays.
This will let us initialize the constant buffers with loops.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:46 -07:00
Kenneth Graunke
12303bd390 genxml: Fix decoder to print the array element on field members.
Previously we'd print things like:

   0xfffbb568:  0x00010000 : Dword 1
       ReadLength: 0
       ReadLength: 1
   0xfffbb568:  0x00000001 : Dword 1
       ReadLength: 1
       ReadLength: 0

instead of the more obvious:

   0xfffbb568:  0x00010000 : Dword 1
       ReadLength[0]: 0
       ReadLength[1]: 1
   0xfffbb568:  0x00000001 : Dword 1
       ReadLength[2]: 1
       ReadLength[3]: 0

(Yes, the ralloc context here is bogus - the decoder leaks just about
everything.  We need to use proper ralloc contexts someday...)

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:46 -07:00
Kenneth Graunke
73c21e69d0 genxml: Fix decoding of array groups.
If you had a group as the first element of a struct, i.e.

  <struct name="3DSTATE_CONSTANT_BODY" length="10">
    <group count="4" start="0" size="16">
      <field name="ReadLength" start="0" end="15" type="uint"/>
    </group>
    ...
  </struct>

we would get a group_offset of 0, causing create_field() to think the
field wasn't in a group, and fail to offset forward for successive array
elements.  So we'd mark all the array elements as offset 0.

Using ctx->group->elem_size is a better check for "are we in a group?".

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:45 -07:00
Kenneth Graunke
d1b949282f genxml: Fix decoder for groups with multiple fields.
If you have something like:

    <group count="0" start="96" size="32">
      <field name="Entry_0" start="0" end="15" type="GATHER_CONSTANT_ENTRY"/>
      <field name="Entry_1" start="16" end="31" type="GATHER_CONSTANT_ENTRY"/>
    </group>

We would reset ctx->group_count to 0 after processing the first field,
so the second would not have a group count.

This is largely untested, as the only groups with multiple fields are
packets we don't emit in Mesa.  Found by inspection.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:45 -07:00
Kenneth Graunke
df2d55ba57 genxml: Fix parsing of address fields in groups.
For example,

    <group count="4" start="64" size="64">
      <field name="Pointer" start="5" end="63" type="address"/>
    </group>

used to generate:

   const uint64_t v2_address =
      __gen_combine_address(data, &dw[2], values->Pointer, 0);
   ...
   const uint64_t v4_address =
      __gen_combine_address(data, &dw[4], values->Pointer, 0);
   ...

but now generates code with proper subscripts:

   const uint64_t v2_address =
      __gen_combine_address(data, &dw[2], values->Pointer[0], 0);
   ...
   const uint64_t v4_address =
      __gen_combine_address(data, &dw[4], values->Pointer[1], 0);
   ...

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:45 -07:00
Eric Engestrom
845d07978f configure.ac: simplify --enable-libunwind=auto check
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-01 16:56:57 +01:00
Nicolas Dechesne
adadadc151 util/rand_xor: add missing include statements
Fixes for:

src/util/rand_xor.c:60:13: error: implicit declaration of function 'open' [-Werror=implicit-function-declaration]
    int fd = open("/dev/urandom", O_RDONLY);
             ^~~~
src/util/rand_xor.c:60:34: error: 'O_RDONLY' undeclared (first use in this function)
    int fd = open("/dev/urandom", O_RDONLY);
                                  ^~~~~~~~

Signed-off-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-06-01 14:26:12 +01:00
Lucas Stach
cab5996c26 etnaviv: always do cpu_fini in transfer_unmap
The cpu_fini() call pushes the buffer back into the GPU domain, which needs
to be done for all buffers, not just the ones with CPU written content. The
etnaviv kernel driver currently doesn't validate this, but may start to do
so at a later point in time. If there is a temporary resource the fini needs
to happen before the RS uses this one as the source for the upload.

Also remove an invalid comment about flushing CPU caches, cpu_fini takes
care of everything involved in this.

Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-06-01 15:20:38 +02:00
Emil Velikov
72011f7a7b docs: update calendar, add news item and link release notes for 17.0.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-01 11:46:39 +01:00
Emil Velikov
0fd1715be1 docs: add sha256 checksums for 17.0.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit bdfd5658e7)
2017-06-01 11:42:46 +01:00
Emil Velikov
29c6a1200b docs: add release notes for 17.0.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 46cc7a1746)
2017-06-01 11:42:45 +01:00
Samuel Pitoiset
1da51ec0f7 glsl: fix a crash in ir_print_visitor() for bindless samplers/images
Bindless samplers/images are represented with 64-bit unsigned
integers and they can be assigned with explicit constructors.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-01 11:54:06 +02:00
Samuel Pitoiset
e4e5562d8a glsl: teach opt_array_splitting about bindless images
Memory/format layout qualifiers shouldn't be lost when arrays
of images are splitted by this pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-01 11:54:06 +02:00
Samuel Pitoiset
678e05cc34 glsl: teach opt_structure_splitting about images in structures
GL_ARB_bindless_texture allows images to be declared inside
structures, but when memory/format qualifiers are used, they
should be propagated when structures are splitted.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-01 11:54:06 +02:00
Samuel Pitoiset
71efec290c glsl: fix broken indentation in do_structure_splitting()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-01 11:54:06 +02:00
Samuel Pitoiset
ad717102d9 glsl: handle format layout qualifiers for struct with array of images
This handles a situation like:

struct {
   layout (r32f) image2D imgs[6];
} s;

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-01 11:54:06 +02:00
Samuel Pitoiset
d9460ad600 glsl: handle memory qualifiers for struct with array of images
This handles a situation like:

struct {
   image2D imgs[6];
} s;

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-06-01 11:54:06 +02:00
Rhys Kidd
e305400443 nvc0: Clean up unnecessary includes from gallium/auxiliary/vl/
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-01 10:16:14 +02:00
Kenneth Graunke
6d60121fa0 i965: Simplify SO_DECL handling.
We can initialize structs directly, avoid some temporaries, and cut out
about half of the skip component handling.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-01 00:08:29 -07:00
Kenneth Graunke
9a690ada94 i965: Make a local for linked_xfb->Outputs[i], to shorten things.
This seems a bit more readable.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-01 00:08:29 -07:00
Kenneth Graunke
65f5f3c85c i965: Move SOL PSIZ hacks from draw time to link time.
We can just update the gl_transform_feedback_info fields at link time
to make the VUE header fields have the right location and component.
Then we don't need to handle them specially at draw time, which is
expensive.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-01 00:08:29 -07:00
Iago Toral Quiroga
3d37cf99c8 mesa/main: replace remaining uses of IROUND() in GetUniform*() by round()
These were correct since they were used only in conversions to signed integers,
however this makes the implementation a bit more is more consistent and reduces
chances of propagating use of these macros to unsigned cases in the future, which
would not be correct.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-01 08:44:34 +02:00
Iago Toral Quiroga
1356b42284 mesa/main: conversion from float in GetUniformi64v requires rounding to nearest
As we do for all other cases of float/double conversions to integers.

v2: use round() instead of IROUND() macros (Iago)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-01 08:44:34 +02:00
Iago Toral Quiroga
c333082483 mesa/main: Add conversion from double to uint64/int64 in GetUniform*i64v()
v2:
  - need unsigned rounding for double->uint64 conversion (Nicolai)
  - use round() instead of IROUND() macros (Iago)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-01 08:44:34 +02:00
Iago Toral Quiroga
cc972c2845 mesa/main: Clamp GetUniformui64v values to be >= 0
Like we do for the 32-bit case.

v2:
  - need unsigned rounding for float->uint64 conversion (Nicolai)
  - use roundf() instead of IROUND() macros (Iago)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-01 08:44:34 +02:00
Kenneth Graunke
83e74d7dc1 mesa/main: Clamp GetUniformuiv values to be >= 0
Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 October 24th 2016 specification says:

"If a command returning unsigned integer data is called, such as
 GetSamplerParameterIuiv, negative values are clamped to zero."

v2: uint to int conversion should clamp to INT_MAX (Nicolai)

v3 (Iago)
  - Add conversions conversions from 64-bit integer paths
  - Rebase on master

v4:
  - need unsigned rounding for float/double->uint conversions (Nicolai)
  - use round{f}() instead of IROUND() macros (Iago)

Fixes:
KHR-GL45.gpu_shader_fp64.state_query

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v2)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-01 08:44:34 +02:00
Iago Toral Quiroga
1020448700 mesa/main: fix indentation in _mesa_get_uniform()
v2: also change the style of the large conditional in that function
    to follow the style from most other parts of Mesa (Nicolai)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-06-01 08:44:34 +02:00
Ian Romanick
779b35bbc6 r100: Silence numerous unused this or that warnings
radeon_fbo.c: In function ‘radeon_map_renderbuffer_s8z24’:
radeon_fbo.c:147:50: warning: unused parameter ‘ctx’ [-Wunused-parameter]
 radeon_map_renderbuffer_s8z24(struct gl_context *ctx,
                                                  ^~~
radeon_fbo.c: In function ‘radeon_map_renderbuffer_z16’:
radeon_fbo.c:186:48: warning: unused parameter ‘ctx’ [-Wunused-parameter]
 radeon_map_renderbuffer_z16(struct gl_context *ctx,
                                                ^~~
radeon_fbo.c: In function ‘radeon_unmap_renderbuffer_s8z24’:
radeon_fbo.c:344:52: warning: unused parameter ‘ctx’ [-Wunused-parameter]
 radeon_unmap_renderbuffer_s8z24(struct gl_context *ctx,
                                                    ^~~
radeon_fbo.c: In function ‘radeon_unmap_renderbuffer_z16’:
radeon_fbo.c:377:50: warning: unused parameter ‘ctx’ [-Wunused-parameter]
 radeon_unmap_renderbuffer_z16(struct gl_context *ctx,
                                                  ^~~
radeon_fbo.c: In function ‘radeon_nop_alloc_storage’:
radeon_fbo.c:624:75: warning: unused parameter ‘rb’ [-Wunused-parameter]
 radeon_nop_alloc_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
                                                                           ^~
radeon_fbo.c:625:12: warning: unused parameter ‘internalFormat’ [-Wunused-parameter]
     GLenum internalFormat, GLuint width, GLuint height)
            ^~~~~~~~~~~~~~
radeon_fbo.c:625:35: warning: unused parameter ‘width’ [-Wunused-parameter]
     GLenum internalFormat, GLuint width, GLuint height)
                                   ^~~~~
radeon_fbo.c:625:49: warning: unused parameter ‘height’ [-Wunused-parameter]
     GLenum internalFormat, GLuint width, GLuint height)
                                                 ^~~~~~
radeon_fbo.c: In function ‘radeon_bind_framebuffer’:
radeon_fbo.c:696:74: warning: unused parameter ‘fbread’ [-Wunused-parameter]
                        struct gl_framebuffer *fb, struct gl_framebuffer *fbread)
                                                                          ^~~~~~
radeon_fbo.c: In function ‘radeon_validate_framebuffer’:
radeon_fbo.c:832:19: warning: unused variable ‘radeon’ [-Wunused-variable]
  radeonContextPtr radeon = RADEON_CONTEXT(ctx);
                   ^~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-31 21:14:44 -07:00
Ian Romanick
303b47f253 r100: Use _mesa_get_format_base_format in radeon_update_wrapper
The wrapper is for a renderbuffer around a texture.  Textures can have
formats (e.g., 3) that aren't valide for API generated renderbuffers.
_mesa_base_fbo_format will return 0, but _mesa_get_format_base_format
will return the base format of RGB.

Fixes a crashes in piglit tests fbo-alphatest-formats (all subtests
pass) and fbo-colormask-formats (some subtests pass, some fail).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-31 21:14:44 -07:00
Ian Romanick
c24881d39c r100,r200: Don't assume glVisual is non-NULL during context creation
Thanks to EGL_MESA_configless_context, the visual pointer can be NULL.

Fixes a segfault (or assertion failure) in piglit's
egl-configless-context test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-31 21:14:44 -07:00
Ian Romanick
2dcec62075 r100: Don't assume that the base mipmap of a texture exists
Fixes crashes in piglit's gl-1.2-texture-base-level.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-31 21:14:44 -07:00
Dave Airlie
f42fb0012a r600/eg: add support for tracing IBs after a hang.
This is a poor man's version of radeonsi ddebug stuff, this
should get hooked into that infrastructure, and grow more stuff,
but for now, just create R600_TRACE var that points to a file
that you want to dump the last IB to.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-01 11:20:11 +10:00
Dave Airlie
55d1550d35 glsl/lower_int64: only set progress when something is lowered.
Otherwise we'd get progress continually set if we had non 64-bit
versions of these ops.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-01 08:46:35 +10:00
Bas Nieuwenhuizen
af2844116f radv: Revert HTILE reset word to 0xFFFFFFFF.
0x30f regressed mad max.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Fixes: df91abfe5a "radv: Use correct clear words for HTILE."
2017-05-31 23:55:13 +02:00
Rob Herring
e8f82bfd52 Android: major/minor/makedev live in <sys/sysmacros.h>
sysmacros.h was getting implicitly included in types.h until recently in
AOSP master. Define MAJOR_IN_SYSMACROS to explicitly include sysmacros.h.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-05-31 16:35:25 -05:00
Chad Versace
22d6b08d2d egl/android: Drop unused 'format' param in get_back_bo()
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-31 10:45:57 -07:00
Chad Versace
0bcdcebc85 egl/android: Align channel masks in HAL_PIXEL_FORMAT table
Improves readability. No change in behavior.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-31 10:45:57 -07:00
Eric Engestrom
11da77e546 egl/drm: remove temporary fd variable
In all codepaths, this var ends up assigned to the struct, except one:
a cleanup codepath, where the `close()` was removed, leading to fd leaks.
Remove the temp fd and assign to the struct field directly instead.

CovID: 1213930
Fixes: 7ec07beedf ("egl/drm: make use of the
                              dri2_display_destroy() helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-31 18:09:27 +01:00
Samuel Pitoiset
c222fa9ada mesa: throw an INVALID_OPERATION error in get_texobj_by_name()
Because get_texobj_by_name() can already throw a INVALID_ENUM
error, it makes more sense to add a check directly there.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-31 12:01:19 +02:00
Samuel Pitoiset
b9c3ce529f mesa: add new 'name' parameter to get_texobj_by_name()
To display better function names when INVALID_OPERATION is
returned. Requested by Timothy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-31 12:01:08 +02:00
Samuel Pitoiset
30a4e375f5 radeonsi: remove unused si_pm4_state::compute_pkt
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-31 09:20:57 +02:00
Samuel Pitoiset
e4b05a50df radeonsi: remove chip_class define from si_pm4.h
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-31 09:20:55 +02:00
Samuel Pitoiset
d90a6c2f23 radeonsi: merge si_pm4_free_state_simple() into si_pm4_free_state()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-31 09:20:53 +02:00
Samuel Pitoiset
d8debc6aad mesa/util: fix arithmetic use of 'void *' in u_vector_foreach
u_vector_foreach is currently only used by the Intel Vulkan
driver but when this macro is used in mesa core, GCC reports
a compile-time error. Probably because some compiler options
are different.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-31 09:19:54 +02:00
Timothy Arceri
4e93da30f0 mesa: remove _mesa from static function names
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-31 11:18:17 +10:00
Timothy Arceri
42fea3622f mesa/st: indentation tidy-up
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-31 11:18:17 +10:00
Rob Clark
45e97c994b freedreno/a5xx: drop WFIs in emit_marker5()
Results in always having at least one WFI between draws, which was
slowing stk down by ~5% and ~10% in xonotic.

(also drop bogus assert while we're at it.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-30 20:40:58 -04:00
Rob Clark
76214b9919 freedreno/a5xx: timestamp / time-elapsed queries
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-30 20:40:58 -04:00
Rob Clark
5ed9e8fd5d freedreno/a5xx: rename query result struct
Going to want the same thing for timestamp queries.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-30 20:40:58 -04:00
Rob Clark
8c65f17c3b freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-30 20:40:58 -04:00
Kenneth Graunke
236ffbc442 i965: Delete dead old-school packing structs.
Trivial.
2017-05-30 16:22:33 -07:00
Tim Rowley
c606edb578 swr/rast: code cleanup (no functional change)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:22:18 -05:00
Tim Rowley
b10c9507ce swr/rast: whitespace changes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:22:12 -05:00
Tim Rowley
ac9d7c3d33 swr/rast: code cleanup (no functional change)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:22:08 -05:00
Tim Rowley
e9e999ae32 swr/rast: allow early-z if shader uses depth value
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:22:02 -05:00
Tim Rowley
628fefc15c swr/rast: move wireframe/point triangle binning after culling
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:57 -05:00
Tim Rowley
3b76dea5d1 swr/rast: remove unused functions
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:52 -05:00
Tim Rowley
d91402fefa swr/rast: code cleanup (no functional change)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:47 -05:00
Tim Rowley
7e271a763e swr/rast: move binner utility functions to binner.h
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:41 -05:00
Tim Rowley
5ea9a30f50 swr/rast: SIMD16 FE - fix/use SIMD16 calcDeterminantIntVertical()
Stop double pumping the SIMD8 version.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:36 -05:00
Tim Rowley
fb9f7bd717 swr/rast: add renderTargetArrayIndex to SWR_PS_CONTEXT
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:30 -05:00
Tim Rowley
2438932b7e swr/rast: make simd16 logicops avx512f safe
Express the simd16 logicops in terms of avx512f instructions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:22 -05:00
Tim Rowley
7be26a2d35 swr/rast: SIMD16 FE - add SIMD16 types to jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:18 -05:00
Tim Rowley
e3c93d8ddf swr/rast: SIMD16 FE - fix PA_STATE_OP::Reset()
Fixes instanced GS.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:12 -05:00
Tim Rowley
fd14c40734 swr/rast: SIMD16 FE - simplify/refactor StreamOut
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:07 -05:00
Tim Rowley
a230af8b44 swr/rast: SIMD16 FE - fix conservative rasterization
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:21:02 -05:00
Tim Rowley
f64aea0959 swr/rast: SIMD16 FE - interleaved simdvertex output in GS
Eliminates conversion copies on GS output from simdvertex to simd16vertex.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:56 -05:00
Tim Rowley
cbd33e71f7 swr/rast: fix _simd16_movemask_(ps,pd) native AVX512 intrinsics
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:51 -05:00
Tim Rowley
9fd68be133 swr/rast: SIMD16 FE - primitive assembly simplification
Reduce/simplify vertex storage usage in PA_STATE_OPT, fix PA
GetNextVSOutput wrap-around behaviour and eliminate unnecessary
SIMDVERTEX copies/storage for tri fan in PA_STATE_OPT

Fixes the OpenGL tri fan test failure under SIMD16 -
triangle-rasterization-overdraw.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:44 -05:00
Tim Rowley
4c23523365 swr/rast: silence write of cfg graph
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:39 -05:00
Tim Rowley
7e35777624 swr/rast: add CreateDirectoryPath to recursively create directories
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:33 -05:00
Tim Rowley
f094d582ec swr/rast: add support for DX1_RGB{_SRGB} formats
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:27 -05:00
Tim Rowley
42b4e7cb25 swr/rast: clean up whitespace
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:21 -05:00
Tim Rowley
5d542b3204 swr/rast: adjust BinPostSetupPoints* function signature
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:20:15 -05:00
Tim Rowley
b714208415 swr/rast: remove extra pixel center adjustment in BinPostSetupPoints
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-30 17:19:51 -05:00
Kenneth Graunke
56535959fd anv: Port over CACHE_MODE_1 optimization fix enables from brw.
Ben and I haven't observed these to help anything, but they enable
hardware optimizations for particular cases.  It's probably best to
enable them ahead of time, before we run into such a case.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-30 14:59:31 -07:00
Kenneth Graunke
53368b008e genxml: Add Gen9 CACHE_MODE_1 definitons.
These were already in gen8.xml but not gen9.xml.  There are a few new
fields and a couple that have changed.  These are all documented in the
Skylake PRM, Volume 2c Command Reference: Registers, Part 1.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-30 14:59:31 -07:00
Kenneth Graunke
a8fde221a8 i965: Set the "Float Blend Optimization Enable" bit on Gen9+.
This is woefully undocumented.  It's some kind of optimization that
avoids unnecessary render target reads when blending with a floating
point render target, using independent alpha blending modes.

The internal documentation indicates that this bit exists on Cherryview
as well, but the other driver doesn't appear to set it on that platform.
There's also some confusing wording that indicates that it may exist on
Broadwell, but the documentation says it's reserved, so who knows.

I was not able to find any workload that benefited from setting this
bit, but it seems like a good idea to set it nonetheless.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-30 14:59:31 -07:00
Chad Versace
9601b41a33 i965: Fix type of brw_context::render_target_format[]
It's an array of isl_format, not uint32_t. This patch updates every
reference to render_target_format[] git-grep.

Trivial cleanup. No change in behavior.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:38 -07:00
Chad Versace
6e325f1203 i965: Move func to right comment block in brw_context.h
brw_init_surface_formats() is defined in brw_surface_formats.c, not
brw_wm_surface_state.c.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:38 -07:00
Chad Versace
f5702230e0 i965: Document type of GLuint __DRIimage::format
It's either a mesa_format or mesa_array_format.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:37 -07:00
Chad Versace
da042d951c i965: Add whitespace in intel_update_image_buffers()
Improve readability.  Add an empty line between two large 'if' blocks.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:37 -07:00
Chad Versace
b86e079ab7 i965: Move an 'i' declaration into its 'for' loop
In intel_update_dri2_buffers().
Trivial cleanup.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:37 -07:00
Chad Versace
a90a15d638 i965: Fix type of intel_update_image_buffers::format
It's a mesa_format, not an unsigned int.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:37 -07:00
Chad Versace
77a1eefa3c i965: Rename intel_create_renderbuffer
The name is misleading because the function is unrelated to GL
renderbuffers. Rename it to intel_create_winsys_renderbuffer.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:37 -07:00
Chad Versace
e8a0a5d7f9 i965/dri: Combine declaration and assignment in intelCreateBuffer
Trivial cleanup.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:37 -07:00
Chad Versace
85dd3e4de1 i965/dri: Rewrite comment for intelCreateBuffer
The old comment pinned this function to X11 windows. In reality, this
function serves more than X11 and more than just windows.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-30 12:01:37 -07:00
Bartosz Tomczyk
fd6c2a3f3e mesa: Avoid leaking surface in st_renderbuffer_delete
v2: add comment in code

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100741
Fixes: a5e733c6b5 mesa: drop current draw/read buffer when ctx is released
Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-30 14:48:32 +01:00
Varad Gautam
4c412293d0 egl: advertise EGL_EXT_image_dma_buf_import_modifiers
v2: check for DRIimageExtension version 15 (Jason Ekstrand)

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-30 13:56:20 +01:00
Varad Gautam
de3c459bbd egl: implement eglQueryDmaBufModifiersEXT
query and return supported dmabuf format modifiers for
EGL_EXT_image_dma_buf_import_modifiers.

v2: move format check to the driver instead of making format queries
   here and then checking.
v3: Check DRIimageExtension version before query (Daniel Stone)
v4:
- move to DRIimageExtension version 15, check queryDmaBufModifiers before
  calling (Jason Ekstrand)
- pass external_only to the driver instead of setting as EGL_TRUE here
  (Emil Velikov, Daniel Stone)

Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-30 13:56:20 +01:00
Varad Gautam
6719e058d6 egl: implement eglQueryDmaBufFormatsEXT
allow egl clients to query the dmabuf formats supported on this platform.

v2: return EGLBoolean.
v3: Check DRIimageExtension version before querying (Daniel Stone).
v4: move to DRIimageExtension version 15, error checking (Jason Ekstrand).

Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-30 13:56:20 +01:00
Varad Gautam
6f10e7c37a egl/dri2: Create EGLImages with dmabuf modifiers
Allow creating EGLImages with dmabuf format modifiers when target is
EGL_LINUX_DMA_BUF_EXT for EGL_EXT_image_dma_buf_import_modifiers.

v2:
- clear modifier assembling and error label name (Eric Engestrom)
v3:
- remove goto jumps within switch-case (Emil Velikov)
- treat zero as valid modifier (Daniel Stone)
- ensure same modifier across all dmabuf planes (Emil Velikov)
v4:
- allow modifiers to add extra planes (Louis-Francis Ratté-Boulianne)
v5:
- fix error checking, some cleanups (Jason Ekstrand)
- pass single copy of the modifier to createImageFromDmaBufs2

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-30 13:56:20 +01:00
Varad Gautam
c5929634a0 dri: introduce dmabuf format modifier related handles
these allow dmabuf import with modifiers, and supported format and
modifier queries, which are used to implement
EGL_EXT_image_dma_buf_import_modifiers.

v2:
- squash dmabuf queries into DRIimage version 15 (Jason Ekstrand).
- add external_only param to queryDmaBufModifiers (Emil, Daniel Stone)
- pass a single modifier form createImageFromDmaBufs2 since all planes have
the same modifier (Jason Ekstrand)

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-30 13:56:20 +01:00
Pekka Paalanen
fb2a1c2327 egl/main: add support for fourth plane tokens
The EGL_EXT_dma_buf_import_modifiers extension adds support for a
fourth plane, just like DRM KMS API does.

Bump maximum dma_buf plane count to four.

v2: prevent attribute tokens from being parsed if
    EXT_image_dma_buf_import_modifiers is not suported. (Emil Velikov)

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-05-30 13:56:20 +01:00
Pekka Paalanen
9434f057c8 egl: introduce DMA_BUF_MAX_PLANES
Rather than hardcoding 3, use a #define. Makes it easier to bump this
later to 4.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-05-30 13:56:20 +01:00
Alexandre Courbot
76aa1bbb89 nvc0: support for GP10B
GP10B uses the same 3D class as GP100.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-30 08:27:00 -04:00
Tomeu Vizoso
106b2786b6 etnaviv: Don't try to use the index buffer if size is zero
If info->index_size is zero, info->index will point to uninitialized
memory.

Fatal signal 11 (SIGSEGV), code 2, fault addr 0xab5d07a3 in tid 20456 (surfaceflinger)

lst: Remove useless indexbuf conditional in the index_size != 0 case.

Fixes: 330d0607ed ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-05-30 11:45:10 +02:00
Kenneth Graunke
d529d5ff16 i965: Always scissor on Gen4-5 instead of disabling guardband.
See commit ece0e535a4.  This makes
Gen4-5 follow the behavior we use on Gen6+.  It seems to have
worked out there.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:48 -07:00
Kenneth Graunke
70be2a96a5 i965: Unify Gen4-5 and Gen6 SF_VIEWPORT/CLIP_VIEWPORT code.
This brings the improved guardbanding we implemented on Gen6+
back to the older Gen4-5 code.  It also deletes piles of code.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:46 -07:00
Kenneth Graunke
01cb6cd473 i965: Make a set_scissor_bits helper function.
Gen4-5 include a single SCISSOR_RECT in SF_VIEWPORT.

Making a helper function will allow us to reuse this code for Gen4-5.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:43 -07:00
Kenneth Graunke
55862ed477 i965: Use GENX(packet_length) rather than hardcoded dword counts.
This is clearer and less likely to break in the future.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:42 -07:00
Kenneth Graunke
c6b623f601 i965: Move the scissoring code up near the viewport code.
These are fairly related.  Gen4-5 combine the scissor rectangle and
SF_VIEWPORT.  Co-locating them will allow me to avoid forward
declarations of helper functions in a few patches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:40 -07:00
Kenneth Graunke
9afe5846d2 genxml: Make a SCISSOR_RECT structure on Gen4-5.
Gen6+ support multiple scissor rectangles, and define a SCISSOR_RECT
structure containing their dimensions.  On Gen4-5, those same fields
exist in SF_VIEWPORT.

This patch extracts the SF_VIEWPORT fields into a SCISSOR_RECT
structure.  Although not a named concept on Gen4-5, it works just
as well, and gives us a consistent SCISSOR_RECT structure across
all generations, making it easier to reuse code.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:37 -07:00
Kenneth Graunke
44309dcea3 i965: Replace brw->gen and devinfo->gen with GEN_GEN.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:36 -07:00
Kenneth Graunke
4ce103e01a i965: Rework Sandybridge 3DSTATE_VIEWPORT_STATE_POINTERS.
On Gen7+ we emit 3DSTATE_VIEWPORT_STATE_POINTERS_{SF_CL,CC} when
emitting a new viewport.

This patch makes us take the same approach on Sandybridge - but because
we have a combined command, we just set the appropriate "change" bits.
This eliminates an atom, some dirty flagging, and some brw->*.vp_offset
writes.  It does mean we'll emit two 3DSTATE_VIEWPORT_STATE_POINTERS
instead of one if both change, but that's probably fine.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:33 -07:00
Kenneth Graunke
7f4645e89c i965: Port CC_VIEWPORT to genxml.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:30 -07:00
Kenneth Graunke
1e3880544e i965: Ignore INTEL_SCALAR_* debug variables on Gen10+.
Scalar mode has been default since Broadwell, and vector mode is getting
increasingly unmaintained.  There are a few things that don't even fully
work in vector mode on Skylake, but we've never cared because nobody
uses it.  There's no point in porting it forward to new platforms.

So, just ignore the debug options to force it on.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-29 21:40:44 -07:00
Timothy Arceri
2c2ea573e5 mesa: add KHR_no_error support for glBindBufferRange()
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
b8174a837f mesa: create bind_buffer_range() helper
This will help us add KHR_no_error support.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
3eb6d34dfc mesa: convert mesa_bind_buffer_range_transform_feedback() to a validate function
This allows some tidy up and also makes it so we can add KHR_no_error
support.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
863b19ae21 mesa: create _mesa_bind_buffer_range_xfb() helper
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
21d9376e71 mesa: split bind_atomic_buffer() in two
This will help us add KHR_no_error support.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
135e5659bd mesa: split bind_buffer_range_shader_storage_buffer() in two
This will help us implement KHR_no_error support.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
cea384fa75 mesa: split bind_buffer_range_uniform_buffer() in two
This will help us implement KHR_no_error support.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
85e891283c mesa: add KHR_no_error support for glVertexArrayVertexBuffer()
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
9d331739ae mesa: add KHR_no_error support for glBindVertexBuffer()
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Timothy Arceri
9db595e0de mesa: split vertex_array_vertex_buffer() in two
This will allow us to skip the error checkes when adding
KHR_no_error support.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-30 08:03:32 +10:00
Bas Nieuwenhuizen
18efb404cf radv: Reserve space for descriptor and push constant user SGPR setting.
flush_compute_state doesn't reserve a large chunk, so these need their own reservation.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
2017-05-29 22:30:39 +02:00
Leo Liu
ea79c0440c amd/common: set vcn dec as hw decode as well
Recommit after issue resolved by the previous patch.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-29 14:32:29 -04:00
Leo Liu
0abc24723c amd/common: add vcn dec ip info query for amdgpu version 3.17
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-29 14:32:29 -04:00
Gregory Hainaut
79f0fe655d glthread/gallium: require safe_glthread to start glthread
Print an error message for the user if the requirement isn't met, or
we're not thread safe.

v2: based on Nicolai feedbacks
Check the DRI extension version

v3: based on Emil feedbacks
improve commit and error messages.
use backgroundCallable variable to improve readability

v5: based on Emil feedbacks
Properly check the function pointer

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 17:07:04 +01:00
Gregory Hainaut
3fde8db53a egl: implement __DRIbackgroundCallableExtension.isThreadSafe
v2:
bump version

v3:
Add code comment
s/IsGlThread/IsThread/ (and variation)
Include X11/Xlibint.h protected by ifdef

v5: based on Daniel feedback
Move non X11 code outside of X11 define
Always return true for Wayland

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 17:06:57 +01:00
Gregory Hainaut
63b78c939b glx: implement __DRIbackgroundCallableExtension.isThreadSafe
v2:
bump version

v3:
Add code comment
s/IsGlThread/IsThread/ (and variation)

v4:
DRI3 doesn't hit X through GL call so it is always safe

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 17:06:49 +01:00
Gregory Hainaut
fa84f6225b dri: Extend __DRIbackgroundCallableExtensionRec to include a callback that checks for thread safety
DRI-drivers could call Xlib functions, for example to allocate a new back
buffer.

When glthread is enabled, the driver runs mostly on a separate thread.
Therefore we need to guarantee the thread safety between libX11 calls
from the applications (not aware of the extra thread) and the ones from
the driver.

See discussion thread:
   https://lists.freedesktop.org/archives/mesa-dev/2017-April/152547.html

Fortunately, Xlib allows to lock display to ensure thread safety but
XInitThreads must be called first by the application to initialize the lock
function pointer. This patch will allow to check XInitThreads was called
to allow glthread on GLX or EGL platform.

Note: a tentative was done to port libX11 code to XCB but it didn't solve fully
thread safety.
See discussion thread:
   https://lists.freedesktop.org/archives/mesa-dev/2017-April/153137.html

Note: Nvidia forces the driver to call XInitThreads. Quoting their manpage:
"The NVIDIA OpenGL driver will automatically attempt to enable Xlib
thread-safe mode if needed. However, it might not be possible in some
situations, such as when the NVIDIA OpenGL driver library is dynamically
loaded after Xlib has been loaded and initialized. If that is the case,
threaded optimizations will stay disabled unless the application is
modified to call XInitThreads() before initializing Xlib or to link
directly against the NVIDIA OpenGL driver library. Alternatively, using
the LD_PRELOAD environment variable to include the NVIDIA OpenGL driver
library should also achieve the desired result."

v2: based on Nicolai and Matt feedback
Use C style comment

v3: based on Emil feedback
split the patch in 3
s/isGlThreadSafe/isThreadSafe/

v5: based on Marek comment
Add a comment that isThreadSafe is supported by extension v2

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 17:06:37 +01:00
Emil Velikov
5cb16e07ab egl/wayland: use the image_driver alongside the image_loader
Analogous to earlier commits - image_driver and image_loader are meant
to be used hand in hand.

v2: Rebase

Cc: Derek Foreman <derekf@osg.samsung.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:47 +01:00
Emil Velikov
429d56693d egl/wayland: set the resize_callback if the flush extension is available
Strictly speaking __DRI_DRI2 implies __DRI2_FLUSH. Although since we're
using the latter in the callback, we want to use the correct guard.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:46 +01:00
Emil Velikov
6ef0fc400c egl/wayland: select the format based on the interface used
Rather than misleadingly depending on DRI2 for the WL_DRM vs WL_SHM
formats, use the wl_drm and wl_shm interface respectively.

Fixes: a1727aa75e ("egl/wayland: Don't use DRM format codes for SHM")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:45 +01:00
Emil Velikov
d6ecd1647f egl/surfaceless: use the image_driver for image_loader
Analogous to previous commit.

Cc: Chad Versace <chadversary@chromium.org>
Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:41 +01:00
Emil Velikov
acf125ed3a egl/android: use the image_driver alongside the image_loader
They are meant to be used together. Otherwise we'll need workarounds
like egl/wayland. Namely register an image_loader_extension even thought
we should be using only DRI2.

v2: Add missing the bracket to fix the build (Tapani).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-29 16:59:39 +01:00
Emil Velikov
6b46854269 egl/x11: flatten codeflow
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:38 +01:00
Emil Velikov
14e51d526f egl/x11: check for dri2_dpy->flush before using the flush extension
Analogous to earlier commit.

Note that the dri2_x11_post_sub_buffer and dri2_x11_swap_buffers_region
paths already implicitly require __DRI2_FLUSH. The corresponding
extensions (NV_post_sub_buffer and NOK_swap_region) are enabled only
with DRI2.

v2: Split cosmetic changes into separate patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:20 +01:00
Emil Velikov
1398ece02c egl/drm: flatten codeflow
Rework the code to return early and drop an indentation level.
It should be easier to read.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:19 +01:00
Emil Velikov
4db5e83227 egl/drm: check for dri2_dpy->flush before using the flush extension
The current __DRI_DRI2 imples __DRI2_FLUSH. At the same time, one can
use __DRI_IMAGE_DRIVER alongside the latter, so the current check is
confusing at best.

Check for what we use.

v2: Split out from whitespace changes

Reviewed-by: Chad Versace <chadversary@chromium.org> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-29 16:59:16 +01:00
Emil Velikov
79d1fb95ee egl: annotate dri2_egl_display_vtbl as const data
With the final place that modifies the vtbl removed as of last commit we
can annotate the symbols accordingly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-29 16:59:15 +01:00
Emil Velikov
83a792cf25 egl/wayland: don't modify the vtbl if an extension is not available
With previous commit we'll error out should one be using the extension
when it's not available. Thus we no longer need to modify the vtbl.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-29 16:59:14 +01:00
Emil Velikov
701311425e egl: error out on eglCreateWaylandBufferFromImageWL
Currently f one does the silly thing by probing the entry point w/o
checking the extension they will attempt to use the extension even
though it cannot work.
That is due our of of an assert which gets removed in release builds.

Simply error out if the extension is not enabled. Thus we can
apply some cleanups with next commits.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-29 16:59:12 +01:00
Emil Velikov
46cc022d5d gbm: manage only the required set of DRI extensions
Currently GBM attempts to know all the extensions that might be required
by EGL/DRM [at some later stage].

That is a bit unclear and we often forget to update GBM as EGL gets
attention.

To avoid that, simply let EGL manage it's own required extensions based
on the base primitive (screen) we provide it.

v2: Rework the approach - GBM should not dive into EGL/DRM.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:50:12 +01:00
Emil Velikov
90d0ad14ca egl/drm: use dri2_setup_extensions() over the extensions provided by GBM
Allows us to keep things in sync easier and lets us simplify the
interface between the two even further.

v2: Don't set GBM's extensions.

Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:50:09 +01:00
Emil Velikov
2c341f2bda egl: refactor dri2_create_screen() into three separate functions
Split the create_screen into:
 - create screen
 - setup/bind extensions
 - setup screen

This will allow us to reuse the latter two on egl/drm. Said platform
does create its own screen and attempts to reinvent the later two
functions itself.

Since the GBM ones tend to get out of sync quite often, and there is no
distinct reason why it does so we'll drop them with latter commits.

v2: disp -> dpy for the Android platform.
v3: use correct goto label (Rob)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:50:06 +01:00
Emil Velikov
ee3b32696f egl/x11: make use of the dri2_display_destroy() helper
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:50:04 +01:00
Emil Velikov
a0163f9284 egl/wayland: make use of the dri2_display_destroy() helper
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:50:02 +01:00
Emil Velikov
c8d366bab2 egl/surfaceless: make use of the dri2_display_destroy() helper
Cc: Chad Versace <chadversary@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:50:00 +01:00
Emil Velikov
7ec07beedf egl/drm: make use of the dri2_display_destroy() helper
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:58 +01:00
Emil Velikov
898d7858f8 egl/android: make use of dri2_display_destroy() helper
v2: disp -> dpy (Tapani)

Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:55 +01:00
Emil Velikov
3e73c0245b egl: split out a dri2_display_destroy() helper
Within dri2_display_release() we already tear down all the display
specifics. Within the platform specific dri initialize however we badly
and partially duplicate that.

Let's stop that by fleshing out the required functionality into a helper
and using it throughout the codebase.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:52 +01:00
Tapani Pälli
12196d1b76 egl: check for driver_configs in dri2_display_release
With later commits we'll split and reuse the destroy side of the
function for the initialize_foo error path.

In such cases, driver_configs may be NULL leading to a crash.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
[Emil Velikov: reword commit message]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:49 +01:00
Emil Velikov
628af2bc96 gbm: remove unneeded gbm_drm_device abstraction
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:47 +01:00
Emil Velikov
e183c55275 gbm: move gbm_drm_device::driver_name to gbm_dri_device
The former already keeps track of the DRI module opened, based on the
driver_name provided. So let's keep them together.

As a nice bonus this Will allows us to remove the gbm_drm_device all
together with next patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:44 +01:00
Emil Velikov
2204ea6464 gbm: remove "struct gbm_drm_bo" abstraction
The struct is a simple wraper around gbm_bo and brings no actual
benefit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:42 +01:00
Emil Velikov
b5ab59ce37 gbm: remove unused gbm_dri_device::loader
Introduced back in 2012 with fd6acb97fb ("gbm: Create hooks for
dri2_loader_extension in dri backend") and hasn't been used since.

Seemingly a copy/paste thinko from development stage.

Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
2017-05-29 16:49:39 +01:00
Emil Velikov
2b6ad89d86 radv: automake: list shared libraries after the static ones
Analogous to previous commit - the compiler can discard xcb + wayland
libs, since there is no user (the static libraries) before it on the
command line.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-05-29 16:42:44 +01:00
Emil Velikov
3e8790bff0 anv: automake: list shared libraries after the static ones
The compiler can discard the shared ones from the link chain, since
there is no user (the static libraries) before it on the command line.

Cc: mesa-stable@lists.freedesktop.org
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-05-29 16:42:41 +01:00
Samuel Pitoiset
55083705cf mesa: add KHR_no_error support for glBindImageTextures()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-29 10:11:44 +02:00
Samuel Pitoiset
def908af6c mesa: add KHR_no_error support for glBindImageTexture()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-29 10:11:43 +02:00
Samuel Pitoiset
3ca5da2704 mesa: add bind_image_texture() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-29 10:11:41 +02:00
Samuel Pitoiset
1f75915e1a mesa: add set_image_binding() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-29 10:11:39 +02:00
Samuel Pitoiset
b12dfb1558 mesa: remove unused layered parameter from validate_bind_image_texture()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-29 10:11:36 +02:00
Samuel Pitoiset
5521dc2477 mesa: add KHR_no_error support for glActiveTexture()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-05-29 10:03:11 +02:00
Marek Olšák
48b91103ce radeonsi: use ac_build_buffer_load for shader buffer loads
and document why we can't use SMEM yet.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Marek Olšák
e019ea8f4b radeonsi: move building llvm.SI.load.const into ac_build_buffer_load
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Marek Olšák
e1942c970f radeonsi: rename readonly_memory -> can_speculate
This is more accurate.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Marek Olšák
24306c0b27 radeonsi: fix a crash in si_destroy_context if we fail early
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Marek Olšák
c70b0604f0 util: slab_destroy_child should check whether it's been initialized
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Bas Nieuwenhuizen
5cd8ab49fd radv: Also signal fence if vkAcquireNextImageKHR returns VK_SUBOPTIMAL_KHR.
It is a successful return.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-29 00:09:45 +02:00
Rob Clark
8fc9702a1b freedreno: fix fence creation fail if no rendering
Android tries to create a FENCE_FD fence without any rendering.  And
then falls over when that fails.  So just always create an initial
batch.

Fixes: e4ad8695 ("freedreno: fix crash when flush() but no rendering")
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-28 14:49:27 -04:00
Samuel Pitoiset
ab8fb5a082 radeonsi: drop useless memcmp() check in si_set_blend_color()
cso_set_blend_color() already checks if the old state is different.
Only Nine uses pipe::set_blend_color() directly but I guess it
should use the cache too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-27 18:00:45 +02:00
Roland Scheidegger
d2724fe5bd llvmpipe: add LP_NEW_GS flag for updating vertex info
The vertex information we compute here is really dependent on the last
stage before FS. It just happened to work most of the time because new
GS tend to come with new VS and/or FS...
(The LP_NEW_GS flag was previously set but never used.)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-05-27 15:49:21 +02:00
Brian Paul
31ff7bff5a svga: document some incorrect VGPU10 shader translation issues
We have a few mistakes in our shader translation code, but the virtual
GPU is forgiving.

Reviewed-by: Michal Krol <michal@vmware.com>
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-05-26 20:05:30 -06:00
Jason Ekstrand
21ddab4a17 i965/copy_image: Use the blitter on gen5
This was just an accidental typo in the refactoring.  The intention was
to try the blitter on gen4-5, not just gen4.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-26 14:44:29 -07:00
Alexandre Demers
a958a30827 osmesa: link with libunwind if enabled (v2)
Fixes linking error in libOSmesa when using libunwind.

CXXLD    libOSMesa.la
src/gallium/auxiliary/.libs/libgallium.a(u_debug_stack.o): In function `symbol_name_cached':
./src/gallium/auxiliary/util/u_debug_stack.c:87: undefined reference to `_ULx86_64_get_proc_name'
src/gallium/auxiliary/.libs/libgallium.a(u_debug_stack.o): In function `debug_backtrace_capture':
./src/gallium/auxiliary/util/u_debug_stack.c:114: undefined reference to `_Ux86_64_getcontext'
./src/gallium/auxiliary/util/u_debug_stack.c:115: undefined reference to `_ULx86_64_init_local'
./src/gallium/auxiliary/util/u_debug_stack.c:117: undefined reference to `_ULx86_64_step'
./src/gallium/auxiliary/util/u_debug_stack.c:123: undefined reference to `_ULx86_64_get_reg'
./src/gallium/auxiliary/util/u_debug_stack.c:124: undefined reference to `_ULx86_64_get_proc_info'
./src/gallium/auxiliary/util/u_debug_stack.c:120: undefined reference to `_ULx86_64_step'
collect2: error: ld returned 1 exit status

v2 : Fixes title and adds the original error it is fixing.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-26 09:29:44 -06:00
Jason Ekstrand
726b68ad82 i965/blorp: Support copyteximage on gen4-5
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
b06c63c782 i965: Use blorp for CopyImageSubData on gen4-5
We keep the blit path because it's probably faster when it works.
However, now that we can use blorp, we can delete that nasty CPU
fall-back path.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
0901d0bc4c i965: Round copy size to the nearest block in intel_miptree_copy
The width and height of the copy don't have to be aligned to the block
size if they specify the right or bottom edges of the image.  (See also
the comment and asserts right above).  We need to round them up when we
do the division in order to get it 100% right.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
79f2a5541f i965: Use BLORP for color clears on gen4-5
We don't support replicated data clears yet.  Those take a bit more work
and enabling replicated data clears in its own commit is probably better
for bisectibility anyway.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
6d11362d8b i965: Use blorp for color blits on gen4-5
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
fa13ef285d intel/blorp: Assert that no one tries to blit combined depth stencil
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
752d7af77a i965: Add blorp support for gen4-5
Due to complications with things such as URB setup on gen4-5, it's
easier to keep gen4 support in blorp completely internal to i965.  This
makes things a bit awkward because that means there's a file in i965
that includes blorp_priv.h but it's either that or have a file in blorp
that includes brw_context.h.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
23125b7102 intel/blorp: Set additional brw_wm_prog_key fields on gen4-5
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
79b486f736 i965/gen4: Expose the guts of URB recalculation as a helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
0ed6f196fc intel/blorp: Add support for gen4-5 SF programs
As part of enabling support for SF programs, we plumb the SF URB size
through to emit_urb_config.  For now, it's always zero but, on gen4, it
may be something larger.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
8bce7bda45 intel/blorp: Make convert_to_single_slice available outside blorp_blit
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
110061afa2 intel/blorp: Use designated initializers to set up VERTEX_ELEMENTS
We also add a slot variable and use it as an iterator.  This will make
it much easier to conditionally put something between the header and the
vertex position.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
ac79806766 intel/blorp: Rename emit_viewport_state to emit_cc_viewport
The real point of this packet is that it sets up CC_VIEWPORT so that
name is a bit better.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
1f2f90be1f intel/blorp: Make the common genX_blorp_exec code gen4-safe
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
a7f5d6df8a intel/blorp: Re-arrange blorp_genX_exec.h
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
302c0488cf intel/blorp: Don't use ffma directly
It isn't supported prior to gen6 and, on gen6+, NIR will fuse the fmul
and fadd into an ffma automatically for us anyway.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
675ec434f3 intel/blorp: Delete isl_to_gen_ds_surfype
It's no longer used.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
e80f0840bf intel/blorp: Pull the pipeline bits of blorp_exec into a helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
3d35e5a51e intel/blorp/blit: Add support for normalized coordinates
Gen5 and earlier can't do non-normalized coordinates so we need to
compensate in the shader.  Fortunately, it's pretty easy plumb through.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
18e18a1863 i965: Move clip program compilation to the compiler
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
9fb8a8775b i965: Move SF compilation to the compiler
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
c30587643e i965/clip: Make brw_clip_prog_key::interp_mode an array
Having it be a pointer means that we end up caching clip programs based
on a pointer to wm_prog_data rather than the actual interpolation modes.
We've been caching one clip program per FS ever since 91d61fbf7c
where Timothy rewrote brw_setup_vue_interpolation().

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
58a57ea7d6 i965/sf: make brw_sf_prog_key::interp_mode an array
Having it be a pointer means that we end up caching clip programs based
on a pointer to wm_prog_data rather than the actual interpolation modes.
We've been caching one clip program per FS ever since 91d61fbf7c
where Timothy rewrote brw_setup_vue_interpolation().

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
21ba2b4bef intel/compiler: Make brw_disasm take const assembly
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
c336c224a6 intel/decoder: Handle the BLT ring in gen_group_get_length
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
9d1001c8e5 intel/decoder: Handle gen4 VF_STATISTICS and PIPELINE_SELECT
These need special handling because they have no "DWord Length"
parameter and they have an unusual bias of 1.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
87588e546e intel/genxml: Rename 3DSTATE_AA_LINE_PARAMS on gen5
All of the other gens use "PARAMETERS".

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
04f6d975e1 intel/genxml: Use the right subtype for VF_STATISTICS on gen4
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
1fcc5e2399 intel/genxml: Iron Lake doesn't support non-normalized sampler coordinates
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
648b618dc5 intel/genxml: Add SAMPLER_STATE to gen 4.5
Somehow this got missed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
3f8ee8c703 intel/genxml: Rename the CC_VIEWPORT pointer on gen4-5
It isn't a pointer to "color calc state", that's the packet it's in.
It's a pointer to the CC viewport state.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
0ee1ef0cbb intel/genxml: Sampler state is a pointer on gen4-5
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
64243d3b8e intel/genxml: Suffix KSP0 fields on Iron Lake
Iron Lake introduced the multiple KSP thing and so you have KSP0-3.
However, the genxml didn't have an index on the first "Kernel Start
Pointer" or "GRF Register Count".  Add one to match gen6+.  While we're
here, we drop the brackets from the other "GRF Register Count" fields.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
7769e448aa intel/genxml: Make a bunch of things offsets on gen4-5
Most things on gen4-5 are addresses because we don't have dynamic state
base address and we don't have instruction state base on gen4.  However,
whoever converted things to addresses got a little over-excited and
converted too much.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
8257fe7b18 intel/isl: Add gen4_filter_tiling
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
332a5d7a3f intel/isl: Add support for setting component write disables
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
8958355549 intel/isl: Add support for gen4 cube maps to get_image_offset_sa
Gen4 cube maps are a 2-D surface with ISL_DIM_LAYOUT_GEN4_3D which is a
bit weird but accurate none the less.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
b9b7792d9a intel/isl: Don't request space for stencil/hiz packets unless needed
On Iron Lake, the packets exist but we never emit them so there's no
need for us to ask the driver to make batch space for them.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
b50b821eb3 i965/blorp: Properly handle mt->first_level
The guts of blorp and ISL don't understand i965's partial miptrees.
Instead, we need to subtract off first_level before we hand anything off
to blorp.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
c16e840f9a i965/miptree: Take first_level into account when converting to ISL
ISL doesn't have a concept of a partial miptree.  Instead, we need to
subtract off first_level.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
554a1731a5 intel/blorp: Move the gen7 stencil format workaround to blorp_blit
It's not needed for blorp_copy because it already overrides formats.
It's also not needed for blorp_clear because it clears stencil as
stencil.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
c19150af5c i965: Use blorp_copy for doing r8 stencil updates on HSW
The blorp_copy entrypoint is designed for doing memcpy like operations
which is what we need to do here while blorp_blit is for handling format
conversion and scaling.  Using blorp_copy is much simpler and prevents
us from getting formats wrong.  While we're here, we get rid of the
layers_per_blit thing since stencil always uses interleaved MSAA.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand
441cd7a81d i965/blorp: Do and end-of-pipe sync on both sides of fast-clear ops
We've discovered in the Vulkan driver that simply doing the end-of-pipe
sync afterwards is insufficient.  The specific requirement stated in the
PRM is that you have to do one every time you transition between the
tree modes of "clear", "render", and "resolve".  This is GL, so we could
track it but any attempt to do so would most likely get it wrong.  For
now, it's easier to just assume that every fast-clear op is an island
and do the sync both before and after.

This also removes the unneeded flush and stall after slow-clear
operations.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
2017-05-26 07:58:01 -07:00
Eric Engestrom
44b29dd7b6 amd/common: add missing libdrm include path
Fixes: de9dd4f9f1 ("ac/radeonsi: move struct radeon_info to ac_gpu_info.h")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-26 15:19:55 +01:00
Andres Gomez
cd8a9d7dfa docs: small release calendar fixes
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-26 15:08:14 +02:00
Dave Airlie
e1409f7302 Revert "amd/common: add vcn dec ip info query"
This reverts commit 524d4fff9e.

This commit breaks amdgpu on kernels with no DEC IP support.

Caught by the airlied CI system.
2017-05-26 16:36:57 +10:00
Dave Airlie
ae1f32915b Revert "amd/common: set vcn dec as hw decode as well"
This reverts commit 50d322be2f.

A previous patch breaks amdgpu on non-vcn decode systems,
but have to revert this first.
2017-05-26 16:36:38 +10:00
Rob Herring
1dc1860602 util: remove unneeded Android ifdef from ralloc.c
SIZE_MAX has been defined in stdint.h on Android since 2013, so this ifdef
is no longer needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-05-25 15:02:12 -05:00
Rob Herring
151bd66080 nouveau: drop Android 4.4 and earlier support
Support for Android 4.4 and earlier has already been removed from mesa.
Remove this remaining piece from nouveau, too.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-05-25 15:02:12 -05:00
Rob Herring
0dabb9d9fa i965: use mmap64 for Android
Simplify the handling of mmap for Android by using mmap64 instead. mmap64
may have not existed for Android when this was written, but it's been
around since 2013.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-05-25 15:01:28 -05:00
Rob Herring
51f9851753 gallium/os: use mmap64 for Android
Simplify the handling of mmap for Android by using mmap64 instead. mmap64
may have not existed for Android when this was written, but it's been
around since 2013.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-05-25 15:00:34 -05:00
Rob Herring
d5a9365d46 Android: generate an error if building on Android 4.4 or earlier
Since commit 7a5b5f5226 ("Android: drop Android 4.4 (KitKat) support"),
Android 4.4 or earlier is no longer supported, so exit with an error if we
try building on it.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-05-25 14:58:49 -05:00
Brian Paul
48faedbc7e st/wgl: whitespace, formatting fixes in stw_device.c
Trivial.
2017-05-25 11:13:40 -06:00
Brian Paul
12dc843367 glsl: Fix g++ initializer order warning
Fixes this warning:
In file included from ../../../src/compiler/glsl/ir.cpp:25:0:
../../../src/compiler/glsl/ir.h: In constructor 'ir_swizzle::ir_swizzle(ir_rvalue*, ir_swizzle_mask)':
../../../src/compiler/glsl/ir.h:1955:20: warning: 'ir_swizzle::mask' will be initialized after [-Wreorder]
    ir_swizzle_mask mask;
                    ^
../../../src/compiler/glsl/ir.h:1954:15: warning:   'ir_rvalue* ir_swizzle::val' [-Wreorder]
    ir_rvalue *val;
               ^
../../../src/compiler/glsl/ir.cpp:1592:1: warning:   when initialized here [-Wreorder]
 ir_swizzle::ir_swizzle(ir_rvalue *val, ir_swizzle_mask mask)
 ^

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-25 10:35:11 -06:00
Leo Liu
f94cfdc5f2 radeonsi: enable vcn decode
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
7ecc244b14 winsys/amdgpu: add vcn dec cs support
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
50d322be2f amd/common: set vcn dec as hw decode as well
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
524d4fff9e amd/common: add vcn dec ip info query
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
c23ffafc50 radeon: rename has_uvd info to has_hw_decode
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
34f7cf49c8 radeon/vcn: add decode message for mpeg4 codec
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
155ca0ca50 radeon/vcn: add decode message for mpeg2 codec
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
9c93c7c0b4 radeon/vcn: add decode message for vc1 codec
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
a55d2659d9 radeon/vcn: add decode message for hevc codec
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
9c21f6abda radeon/vcn: add decode message decode for avc codec
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
949dd66c9e radeon/vcn: add decode message feedback
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
0152a0cf16 radeon/vcn: add decode message destroy
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
a106866962 radeon/vcn: add decode message create
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
ae4faecf66 radeon/vcn: add common decode part
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
f9b8736776 radeon/winsys: add vcn dec ring type
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu
2094b75c68 radeon/winsys: add uvd enc ring type
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:19 -04:00
Leo Liu
71075a8126 radeon/vcn: add vcn decode interface
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:19 -04:00
Leo Liu
e1f7936d05 configure.ac: update libdrm amdgpu version requirement to 2.4.81
VCN decode has a new interface, and that depends on the latest libdrm

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-25 11:40:19 -04:00
Emil Velikov
6a3ffda83a docs: update calendar, add news item and link release notes for 17.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-25 08:52:14 +01:00
Emil Velikov
1e735800a9 docs: add sha256 checksums for 17.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 092c485b8e)
2017-05-25 08:48:20 +01:00
Emil Velikov
e3ba46f6aa docs: add release notes for 17.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ca0a148a4d)
2017-05-25 08:48:19 +01:00
Timothy Arceri
c8a3bac820 mesa: remove unrequired double calc
type_size() will already handle this correctly for us.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-25 12:20:57 +10:00
Timothy Arceri
fd461b22e9 mesa: remove redundant modulus operation
The if check above means we can only get here if size is less than 4.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-25 12:20:49 +10:00
Brian Paul
4a6fdeab05 svga: init svga_screen::swc_mutex with mtx_recursive
If the SVGA3D_BindGBSurface() call in svga_buffer_hw_storage_unmap()
fails, we'll flush and that might involve unmapping other buffers.
That leads to a recursive lock on svga_screen::swc_mutex and causes
a deadlock.  Fix this by initializing the mutex with mtx_recursive.

Note that this only happened on Linux, not Windows.  On Windows, the
mutex functions are implemented with Win32 critical sections which
support recursive locking.

Also add a comment about this.

Fixes VMware bug 1831549 (Unigine Tropics demo freeze on Linux).

Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende<bhenden@vmware.com>
2017-05-24 11:33:47 -06:00
Brian Paul
0c84c395f8 svga: move logging initialization code into new function
Plus a few other minor clean-ups.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2017-05-24 11:33:47 -06:00
Brian Paul
84233ac661 svga: init local vars to silence uninitialized use warnings
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2017-05-24 11:33:47 -06:00
Brian Paul
cf1adb7b1c svga: log the process command line to the vmware.log file
This is useful for Piglit when thousands of tests are run and we want
to determine which test triggered a device error.

v2: only log command line info if the new SVGA_EXTRA_LOGGING env var is set

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-05-24 11:33:47 -06:00
Sinclair Yeh
14d1687229 svga: Limit svga message capability to newer compilers
The assembly code used by the SVGA message feature doesn't
build properly with older compilers, so limit it to only
gcc 5.3.0 and newer.

Also modified the stubs to avoid "unused variable" warnings.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-24 11:33:46 -06:00
Brian Paul
c85a35d465 svga: Fix MSVC build.
This let us compile the code with MSVC, but it no-ops the log function.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-05-24 11:33:46 -06:00
Sinclair Yeh
1ce3a2723f svga: Add the ability to log messages to vmware.log on the host.
For now this capability only exists in the SVGA driver but
can be exported later if other modules, e.g. winsys, wants
to use it for logging.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-24 11:33:46 -06:00
Brian Paul
3ad5325da0 Revert "gallium: remove unused PIPE_CC_GCC_VERSION"
This reverts commit e60928f4c4.

PIPE_CC_GCC_VERSION is used by some of our in-house code which hasn't
been upstreamed yet.
2017-05-24 11:33:46 -06:00
Lionel Landwerlin
359fa0e9a0 aubinator: report error on unknown device id
Since we're going to stop aubinator without a valid device id, better
report an error. This also silences a Coverity warning.

CID: 1405004
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-24 10:50:18 +01:00
Lionel Landwerlin
8f1f1d294d aubinator: be consistent on exit code
We're using both exit(1) & exit(EXIT_FAILURE), settle for one, same
for success.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-24 10:50:18 +01:00
Lionel Landwerlin
6200d835a0 aubinator: fix double free
1;4601;0c
Free previously allocated filename outside the for loop.

CID: 1405014
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-24 10:50:18 +01:00
Christian König
5318870f54 winsys/amdgpu: align VA allocations to fragment size v2
BOs larger than the minimum fragment size should have their VA
alignet to at least the fragment size for optimal performance.

v2: drop unused leftover from initial implementation

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-24 10:32:19 +02:00
Samuel Pitoiset
51dc5e3df3 tgsi: remove unused tgsi_is_passthrough_shader()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2017-05-24 09:52:17 +02:00
Eric Engestrom
338f47b6d8 configure.ac: rephrase 'GLX w/o X11' error message
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
2017-05-24 08:35:44 +01:00
Jason Ekstrand
39adea9330 anv: Require vertex buffers to come from a 32-bit heap
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 17:37:43 -07:00
Jason Ekstrand
50d0eb5096 anv: Advertise both 32-bit and 48-bit heaps when we have enough memory
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 17:37:42 -07:00
Jason Ekstrand
34581fdd4f anv: Refactor memory type setup
This makes us walk over the heaps one at a time and add the types for
LLC and !LLC to each heap.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:42 -07:00
Jason Ekstrand
b83b1af6f6 anv: Make supports_48bit_addresses a heap property
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:40 -07:00
Jason Ekstrand
00df1cd9d6 anv: Stop setting BO flags in bo_init_new
The idea behind doing this was to make it easier to set various flags.
However, we have enough custom flag settings floating around the driver
that this is more of a nuisance than a help.  This commit has the
following functional changes:

 1) The workaround_bo created in anv_CreateDevice loses both flags.
    This shouldn't matter because it's very small and entirely internal
    to the driver.

 2) The bo created in anv_CreateDmaBufImageINTEL loses the
    EXEC_OBJECT_ASYNC flag.  In retrospect, it never should have gotten
    EXEC_OBJECT_ASYNC in the first place.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:38 -07:00
Jason Ekstrand
10fad58b31 anv: Set image memory types based on the type count
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:36 -07:00
Jason Ekstrand
f7736ccf53 anv: Add valid_bufer_usage to the memory type metadata
Instead of returning valid types as just a number, we now walk the list
and check the buffer's usage against the usage flags we store in the new
anv_memory_type structure.  Currently, valid_buffer_usage == ~0.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:34 -07:00
Jason Ekstrand
92325a7efc anv: Determine the type of mapping based on type metadata
Before, we were just comparing the type index to 0.  Now we actually
look the type up in the table and check its properties to determine what
kind of mapping we want to do.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:32 -07:00
Jason Ekstrand
c1f4343807 anv: Set up memory types and heaps during physical device init
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:30 -07:00
Jason Ekstrand
eceaf7e234 anv: Predicate 48bit support on gen >= 8
This doesn't matter right now since it only affects whether or not we
set the kernel bit but, if we ever do anything else based on it, we'll
want it to be correct per-gen.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:27 -07:00
Jason Ekstrand
4eecd534f0 anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hack
Up until now, we've been memsetting the auxiliary surface to 0 at
BindImageMemory time to ensure that it is properly initialized.
However, this isn't correct because apps are allowed to freely alias
memory between different images and buffers so long as they properly
track whether or not a particular image is valid and, if it isn't,
transition from UNINITIALIZED to something else before using it.  We
now implement those transitions so we can drop the hack.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:22 -07:00
Jason Ekstrand
cc45c4bb80 anv: Handle transitioning depth from UNDEFINED to other layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:20 -07:00
Jason Ekstrand
75edecf502 anv: Handle color layout transitions from the UNINITIALIZED layout
This causes dEQP-VK.api.copy_and_blit.resolve_image.partial.* to start
failing due to test bugs.  See CL 1031 for a test fix.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:03 -07:00
Axel Davy
7e04ae74d4 st/nine: Fix a regression and syntax cleanup
A few cleanups and in particular initializing properly
the new pipe_draw_info fields.
This should fix the regression caused by
330d0607ed

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101088

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-24 00:40:43 +02:00
Ian Romanick
7009955281 mesa: Remove GL_APPLE_vertex_array_object stubs
Mark the functions 'exec="skip"' in the XML instead.  libGL will still
have the functions, but the driver won't try to use them.  I verified
that this commit works with piglit's 'object-namespace-pollution glClear
vertex-array' on x64 with a driver built from mesa-12.0.3 tag.

In fairness, this test also works with a libGL built from 7927d03.  I
believe it continues to work because on non-Windows platforms we
generate some extra, dummy dispatch functions that can be used when a
driver requests a function unknown to libGL.  This was done to provide
some "forward" compatibility with drivers that need more functions.
This doesn't work on Windows because the Windows calling convention is
for the callee to clean up the stack.  That's the theory anyway.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-23 15:02:29 -07:00
Marek Olšák
0781b58b3a gallium/radeon: pipe AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS into gallium HUD
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-23 23:29:16 +02:00
Rob Clark
1db28fbbea freedreno/ir3: switch to NIR by default
Now that we lower vars to regs, we no longer regress for anything that
does complex dereferences.  (With tgsi, derefers are already lowered
before tgsi_to_nir, but not with glsl_to_nir.)  In fact it actually
fixes a few things to bypass tgsi.

So make NIR the default (finally!)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
caa64b24ce freedreno/ir3: lower arrays to regs
Instead of using load/store_var intrinsics, which can have complex
derefs in the case of multi-dimensional arrays, lower these to regs
and handle the direct/indirect loads in get_src() and stores in
put_dst().

This should let us switch to using nir by default.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
232fc99544 freedreno/ir3: add put_dst()
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
2bbd425adb freedreno/ir3: code-motion
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
90dade300f freedreno/ir3: fix cmdline compiler
standalone_compiler_cleanup() frees the glsl types, among other things,
so it needs to come after nir->ir3.  But since we exit after dumping the
disassembly, it is easier to just not call it at all.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
1059dc9165 freedreno/ir3: add missing nir_opt_copy_prop_vars() pass
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
c712a637b9 freedreno/ir3: need different compiler options for a5xx
vertex_id_zero_based differs..

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
4531e67c47 freedreno/a5xx: remove copapasta from a4xx
Won't ever hit this w/ a420 gpu, so this is dead code.  Need to get astc
working to know whether to rip this out entirely or not.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
0c2e0f15b8 freedreno: only support SSBOs with nir
tgsi_to_nir does not support them.  Note that compute shaders already
force nir.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
444b4b40f9 freedreno/a5xx: add some missing texture formats
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
6ccbbd8d05 freedreno/a5xx: provoking vertex
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
d7f296de26 freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:35 -04:00
Rob Clark
6f65a1a211 nir/lower-atomics-to-ssbo: remove atomic_uint arrays too
Maybe there is a better way to do this.  But by the time we get to
assigning uniform locs, we want the atomic_uint's to all be gone,
otherwise we assert in st_glsl_attrib_type_size().

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:34 -04:00
Rob Clark
5f6c034f82 nir/lower-atomics-to-ssbo: fix num_components
Fixes some piglits like arb_shader_atomic_counters-active-counters

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-23 12:26:34 -04:00
Timothy Arceri
a363fa0c99 radeon: pass flags that can change shaders to disk_cache_create()
I wasn't sure if I should filter the flags so that we only use
flags that actually change the shader output. To avoid manual
updates we just pass in everything for now.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-23 09:09:43 +10:00
Timothy Arceri
0bbcfbfc0b util/disk_cache: add new driver_flags param to cache keys
This will be used for things such as adding driver specific environment
variables to the key. Allowing us to set environment vars that change
the shader and not have the driver ignore them if it finds existing
shaders in the cache.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-05-23 09:09:43 +10:00
Jose Fonseca
d970f773f4 u_format_test: Ignore S3TC errors.
This prevents spurious failures when libtxc-dxtn-s2tc is installed.

Note: lp_test_format doesn't need any change since we were already
ignoring S3TC failures there.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
2017-05-22 21:00:06 +01:00
Nanley Chery
d132bb36ce docs: Document ASTC extension support for SKL and BXT
v2: Remove the '+' after bxt

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-05-22 11:13:53 -07:00
Nanley Chery
d6150bd764 i965: Enable ASTC HDR for Broxton
This platform passes the following GLES3 tests:
ES3-CTS.functional.texture.compressed.astc.endpoint_value_hdr_cem_*

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-05-22 11:13:53 -07:00
Nanley Chery
52a6fd9871 intel/isl: Add ASTC HDR to format lists and helpers
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-05-22 11:13:53 -07:00
Bas Nieuwenhuizen
b2c5e69942 radv: Add compute HTILE fast clear.
Not really what the fast depth clear does, no matter whether you use
EXPCLEAR or not. Seems the fast clear using the DB HW always touches
the main buffer.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-22 20:07:21 +02:00
Bas Nieuwenhuizen
df91abfe5a radv: Use correct clear words for HTILE.
Did some RE'ing what several HTILE words give when read from a descriptor
with HTILE compression enabled.

Seems to align with -pro usage for D16 too.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-22 20:07:21 +02:00
Bas Nieuwenhuizen
0b26f0ee4f radv: Add queue masks for htile usage determination.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-22 20:07:21 +02:00
Bas Nieuwenhuizen
0628580eff radv: Specify semantics of HTILE layout helpers.
And correct implementation to specify only what we support.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-22 20:07:21 +02:00
Bas Nieuwenhuizen
62e182acd0 radv: Don't use a separate can_expclear.
We never use EXPCLEAR clears.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-22 20:07:21 +02:00
Ian Romanick
7174e3f22b mesa: GL_ARB_shader_subroutine is not optional in core profile
text	   data	    bss	    dec	    hex	filename
7038459	 235248	  37280	7310987	 6f8e8b	32-bit i965_dri.so before
7038227	 235248	  37280	7310755	 6f8da3	32-bit i965_dri.so after
6681438	 303400	  50608	7035446	 6b5a36	64-bit i965_dri.so before
6681254	 303400	  50608	7035262	 6b597e	64-bit i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-22 10:51:26 -07:00
Benedikt Schemmer
b026f45bdd drirc: Add allow_glsl_builtin_variable_redeclaration for Dead Island Riptide Definitive Edition
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-22 19:32:07 +02:00
Marek Olšák
8c069a6a06 gallium/radeon: add a query for monitoring Gallium thread load
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-22 19:23:39 +02:00
Marek Olšák
2beb31bd7c radeonsi/gfx9: compile shaders with +xnack
so that LLVM doesn't allocate SGPRs where XNACK is.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-22 19:23:39 +02:00
Rhys Kidd
499f45163a vc4: Remove dead code in vc4_dump_surface_msaa()
Coverity caught the use of dead code copy-paste for
found_colors[] and num_found_colors.

CID: 1341850
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-22 09:50:22 -07:00
Lionel Landwerlin
30dc56bb5b egl/wayland: verify event queue was allocated
We're already verified that 'window' wasn't NULL, I'm guessing this
allocation error is about the newly created queue.

CID: 1409754
Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-05-22 15:44:38 +01:00
Timothy Arceri
4eb0411ed7 mesa: add APPLE_vertex_array_object stubs
APPLE_vertex_array_object support was removed in 7927d0378f.
However it turns out we can't remove the functions because this
can cause issues when libglapi is used together with DRI
drivers built prior to said commit

Fixes: 7927d0378f ("mesa: drop APPLE_vertex_array_object support")

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-22 14:56:51 +10:00
Timothy Arceri
3ceae88642 glsl: set mask via initialisation list rather than in constructor body
Potentially more efficient as it may avoid the struct being initialised
twice.

Also add var to the initialisation list while we are here.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-22 14:21:55 +10:00
Vladislav Egorov
cf164d9e97 ralloc: Use strnlen() inside of strncat()
If the str is long or isn't null-terminated, strlen() could take a lot
of time or even crash. I don't know why was it used in the first place,
maybe for platforms without strnlen(), but strnlen() is already used
inside of ralloc_strndup(), so this change should not additionally
break anything.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-22 12:34:28 +10:00
Vladislav Egorov
4a47247523 glcpp: Skip unnecessary line continuations removal
Overwhelming majority of shaders don't use line continuations. In my
shader-db only shaders from the Talos Principle and Serious Sam used
them, less than 1% out of all shaders. Optimize for this case, don't
do any copying if no line continuation was found.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-22 12:34:28 +10:00
Vladislav Egorov
b8e792ee25 glcpp: Avoid unnecessary strcmp()
strcmp() is slow. Initiate comparison with "__LINE__" or "__FILE__"
only if the identifier starts with '_', which is rare.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-22 12:34:28 +10:00
Thomas Helland
1575a8146a main: Move hashLockMutex/hashUnlockMutex to header and inline
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-22 09:19:24 +10:00
Thomas Helland
f203a9f7d1 main: Use _mesa_HashLock/UnlockMutex consistently
This is shorter and easier on the eyes. At the same time this
also ensures that we are always asserting that the table pointer
is not NULL. Currently that was not done for all situations.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-22 09:17:37 +10:00
Thomas Helland
90dfcc6b32 util: Change the pointer hashing function
Use our knowledge that pointers are at least 4 byte aligned to remove
the useless digits. Then shift by 6, 10, and 14 bits and add this to
the original pointer, effectively folding in the entropy of the higher
bits of the pointer into a 4-bit section. Stopping at 14 means we can
add the entropy from 18 bits, or at least a 600Kbyte section of memory.
Assuming that ralloc allocates from a linearly allocated heap less than
this we can make a very efficient pointer hashing function for our usecase.
Even if we are not on an architecture that is 4 byte aligned, there is
still a high big chance that the thing we are allocating is at least
8 bytes in size, so even then we will have entropy into the third bit.

The 4 bit increment on the shifts is chosen rather arbitrarily; if we
had chosen a 3 bit increment we would need to add another xor to
cover a decently sized memorypool. Increasing it to 5 bits would
spread our entropy more, possibly hurting us with more collisions on
hash tables of size less than 32. With a hash table of size 16 there
are a max of 11 entries, and we can assume that with such a small table
collisions are not that painfull.

This allows us to hash the whole 32 or 64 bit pointer at once,
instead of running FNV1a, looping through each byte and doing
increments, decrements, muls, and xors on every byte. This cuts
_mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer
show up with a 0.09% share. Collisions on insertion actually seems to be
ever so slightly lower with this hash function, as found by printing
a loop counter and sorting the data.

perf stat shows a 1.5% reduction in instruction count,
and a 5% reduction in stalled cycles. Shader-db runtime goes
from 225 to 220 seconds.

No instruction-count changes in shader-db, but there are some minor
changes in cycle-count that is likely caused by nir walking a set
in some of its passes, and this causing a different ordering.
That might eventually lead to a difference in register allocation.
However, the effect is a net positive;

total cycles in shared programs: 24739550 -> 24738482 (-0.00%)
cycles in affected programs: 374468 -> 373400 (-0.29%)
helped: 178
HURT: 49

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-22 09:17:37 +10:00
Philipp Zabel
1586768e74 vulkan/wsi/wayland: Fix proxy wrappers for swapchain recreation
Before the swapchain event queue is destroyed, all proxy objects that reference
it must be dropped. Otherwise we risk a use-after-free if a frame callback event
or buffer release events are received afterwards.
This happens when an application destroys and recreates a swapchain in FIFO
mode between two frames without using the VkSwapchainCreateInfoKHR::oldSwapchain
mechanism to keep the old swapchain until after the next redraw.

Fixes: 5034c61558 ("vulkan/wsi/wayland: Use proxy wrappers for swapchain")
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2017-05-20 17:00:08 +01:00
John Brooks
2b878cb8fd drirc: Add allow_glsl_builtin_variable_redeclaration for Dying Light and Dead Island Definitive Edition
This fixes the long-standing problem with Dying Light where the game would
produce a black screen when running under Mesa. This happened because the
game's vertex shaders redeclare gl_VertexID, which is a GLSL builtin.
Mesa's GLSL compiler is a little more strict than others, and would not
compile them:

    error: `gl_VertexID' redeclared

The allow_glsl_builtin_variable_redeclaration directive allows the shaders
to compile and the game to render. The game also requires OpenGL 4.4+ (GLSL
440), but does not request it explicitly. It must be forced with an
override, such as MESA_GL_VERSION_OVERRIDE=4.5 and
MESA_GLSL_VERSION_OVERRIDE=450. A compatibility context is *not* required
and forcing one with 4.5COMPAT or allow_higher_compat_version results in
graphical artifacts.

Dead Island Definitive Edition is another Techland port on the same engine
with the same problems, so we set the
allow_glsl_builtin_variable_redeclaration option for that game as well.

v2 (Samuel Pitoiset):
    - Rename allow_glsl_builtin_redeclaration ->
      allow_glsl_builtin_variable_redeclaration

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96449
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-20 17:30:07 +02:00
John Brooks
6e8f34a2de glsl: Conditionally allow redeclaration of built-in variables
Conditional on allow_glsl_builtin_variable_redeclaration driconf option.

v2 (Samuel Pitoiset):
    - Rename allow_glsl_builtin_redeclaration ->
      allow_glsl_builtin_variable_redeclaration
    - style: put spaces after 'if'

Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-20 17:30:05 +02:00
John Brooks
bf4d7671f4 driconf: Add allow_glsl_builtin_variable_redeclaration option
This option will allow GLSL builtins to be redeclared verbatim (e.g.
redeclaring "in int gl_VertexID" in a vertex shader). This is not strictly
valid and would normally fail to compile, but some applications (such as
newer Techland ports) do it and need more leniency.

v2 (Samuel Pitoiset):
    - Rename allow_glsl_builtin_redeclaration ->
      allow_glsl_builtin_variable_redeclaration

Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-20 17:29:55 +02:00
Ilia Mirkin
61d8f3387d nv50,nvc0: clear index buffer bufctx bin unconditionally
The previous condition was to clear it out if it had previously been
set, not what's in the current draw. That information is gone now, so
just clear it unconditionally.

Fixes: 330d0607e ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-20 04:20:11 -04:00
Ilia Mirkin
85d2186326 nv50: fix vtxbuf cleanup
Use a user-buffer-aware cleanup function.

Fixes: c24c3b94ed ("gallium: decrease the size of pipe_vertex_buffer - 24 -> 16 bytes")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-20 04:20:11 -04:00
Kenneth Graunke
e781b9e640 i965: Use the upload BO for push constants on Gen7.5-Gen8.
We can easily use the upload BO for push constants on Gen7.5/Gen8 too,
at the cost of a relocation when emitting 3DSTATE_CONSTANT_XS.  We can
simply switch to using constant buffer pointer 2 instead of pointer 0,
like we do on Gen9+.

Ivybridge and Baytrail can't do this trick because they require the
constant buffers to be enabled in order, starting with 0.  We'd have
to set the INSTPM bit to make the constant buffer pointer not relative
to dynamic state base address, which would need kernel command parser
support.

Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Broadwell GT2: 0.305608% +/- 0.19877% (n = 68)
- Braswell: No difference proven (n = 742)
- Haswell GT3e: 0.180755% +/- 0.0237505% (n = 30)

Reviewed-by: Chris Forbes <chrisforbes@google.com>
2017-05-20 00:23:10 -07:00
Kenneth Graunke
494593e6b2 i965: Use the upload BO for push constants on Gen9+.
Shaders can use quite a bit of uniform data.  Better to put it in the
upload buffers, like we do for client vertex data, rather than the
batch buffer state area, which is primarly used for indirect state.

This should free up batch space, allowing us to emit more commands in a
batch before flushing.  Because BRW_NEW_BATCH also causes a lot of state
to be re-emitted, it may also reduce CPU overhead a little bit.

We took this approach on Gen4-5, but switched to using the batch area
on Gen6+ because buffer 0 is relative to Dynamic State Base Address by
default, which is set to the start of the batch.

On Gen9+, we already use a relocation due to a workaround, so this is
trivial to change and has basically no downside.

Unfortunately we can't change compute shader push constants because
MEDIA_CURBE_LOAD always uses an offset from dynamic state base address.

Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Skylake GT4e: 0.52821% +/- 0.113402% (n = 190)
- Apollolake: 0.510225% +/- 0.273064% (n = 70)

Reviewed-by: Chris Forbes <chrisforbes@google.com>
2017-05-20 00:23:10 -07:00
Kenneth Graunke
731b577cc6 i965: Drop BRW_NEW_PUSH_CONSTANT_ALLOCATION from CS packets.
I don't think CS push constant uploading uses the section of L3
controlled by 3DSTATE_PUSH_CONSTANT_ALLOC_XS.  So I don't think
it needs to be re-emitted when that space is reallocated.

The programming note in gen7_allocate_push_constants doesn't
indicate this is necessary, at least.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
2017-05-20 00:23:10 -07:00
Ilia Mirkin
82e77d4e44 nvc0/ir: SHLADD's middle source must be an immediate
The instruction encodings only allow for immediates. Don't try to
replace a zero (which is dumb to have in that op in any case) with RZ.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-05-20 03:12:40 -04:00
Tapani Pälli
f0051fcf2b android: add -Wl,--build-id=sha1 to LDFLAGS for libvulkan_intel
Just like is done on desktop and what is expected by the build-id code.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-20 08:59:57 +03:00
Emil Velikov
48cd1919ff configure.ac: s/xcb-fixes/xcb-xfixes/
Former is not a thing, even if I have a hacked xcb-fixes.pc on my system.
Thanks for spotting it Mark!

Fixes: 9a90d6a9d4 ("configure.ac: add xcb-fixes to the XCB DRI3 list")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-20 01:15:02 +01:00
Emil Velikov
9a90d6a9d4 configure.ac: add xcb-fixes to the XCB DRI3 list
The XCB module is used by the VL targets. Thus omitting it can lead to
link-time errors due to unresolved symbols.

Other DRI3 users such as the Vulkan WSI and the dri3 loader helper do
not use an update region in their xcb_present_pixmap() call. We will
look into that at a later stage.

Fixes: acf3d2afab ("configure: check once for DRI3 dependencies")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101110
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-20 00:12:56 +01:00
Emil Velikov
5233eaf9ee automake: add SWR LLVM gen_builder.hpp workaround
As gen_builder.hpp file is generated, it contains information that is
specific to the LLVM version it originates from.

As suggested by Tim, the file seems to be forwards compatible. So in
order to produce ship a file which will work everywhere we should be
using earlies supported LLVM - 3.9.

With this we're back on track and can build all of mesa without
python/mako/flex and friends.

In the long term we might want to see if the python generators can be
updated to produce LLVM version agnostic files. At least within the
range supported by SWR.

Cc: <mesa-stable@lists.freedesktop.org>
Cc: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-05-20 00:12:56 +01:00
Timothy Arceri
80e643345e st/mesa: don't mark the program as in cache_fallback when there is cache miss
When we fallback currently the gl_program objects are re-allocated.

This is likely to change when the i965 cache lands, but for now
this fixes a crash when using MESA_GLSL=cache_fb. This env var
simulates the fallback path taken when a tgsi cache item doesn't
exist due to being evicted previously or some kind of error.

Unlike i965 we are always falling back at link time so it's safe to
just re-allocate everything. We will be unnecessarily freeing and
re-allocate a bunch of things here but it's probably not a huge deal,
and can be changed when the i965 code lands.

Fixes: 0e9991f957 ("glsl: don't reference shader prog data during cache fallback")

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-20 08:35:51 +10:00
Timothy Arceri
a74300c7ff mesa: add an env var to force cache fallback
For the gallium state tracker a tgsi binary may have been evicted
from the cache to make space. In this case we would take the
fallback path and recompile/link the shader.

On i965 there are a number of reasons we can get to the program
upload stage and have neither IR nor a valid cached binary.
For example the binary may have been evicted from the cache or
we need a variant that wasn't previously cached.

This environment variable enables us to force the fallback path that
would be taken in these cases and makes it easier to debug these
otherwise hard to reproduce scenarios.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-20 08:35:51 +10:00
Timothy Arceri
8cad301a3e st/mesa: improve shader cache debug info
This will explicitly state that we are following the fallback
path when we find invalid/corrupt cache items. It will also
output the fallback message when the fallback path is forced
via an environment variable, the following patches will allow
this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-20 08:35:51 +10:00
Emil Velikov
552cd5cce5 travis: remove workarounds for the Vulkan target
Previously we required --enable-egl for the platform selection to work.
Additionally due to the broken DRI3 dependency tracking we needed
--enable-glx.

Since both of these are now sorted now we no longer need the
workarounds.

While we're here, explicitly enable dri3.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-19 19:46:55 +01:00
Emil Velikov
5ab6ded0a9 configure: trivial whitespace cleanup
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-19 19:46:55 +01:00
Emil Velikov
b496fc2932 configure: error out if building XVMC w/o supported platform
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-19 19:46:55 +01:00
Emil Velikov
037e9d37b4 configure: error out if building VDPAU w/o supported platform
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-19 19:46:54 +01:00
Emil Velikov
1914c814a6 configure: error out if building OMX w/o supported platform
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-19 19:46:54 +01:00
Emil Velikov
63e11ac2b5 configure: error out if building VA w/o supported platform
A bit pedantic patch to fool proof should someone start thinkering
without knowing what they do.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-19 19:46:54 +01:00
Emil Velikov
912f24fd32 st/xvmc: add DRI3 support
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-19 19:46:54 +01:00
Emil Velikov
fdc90e1286 st/omx: add DRI3 support
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
2017-05-19 19:46:54 +01:00
Emil Velikov
fcbedce310 gallium/targets: link against XCB only as needed
OMX and VA can optionally use the X11 DRI2/DRI3, thus we should link
only as required.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-19 19:46:54 +01:00
Emil Velikov
115cb729d8 st/omx: fix building against X11-less setups
The vl_*_screen_create API properly falls back to a NOP when we're
building without specific platforms. So the only thing we need is to
handle the lack of X11/Xlib.h and provide a dummy Display define.

Cc: <mesa-stable@lists.freedesktop.org>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-19 19:46:49 +01:00
Emil Velikov
d71ce62e84 st/omx: remove unneeded X11 include
En route to a X11-less builds

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-19 19:46:48 +01:00
Emil Velikov
8b9868ad4c st/omx: remove unused drm_driver.h includes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-19 19:46:47 +01:00
Emil Velikov
28703d605d st/va: check if vl_*_screen_create has failed only once
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-19 19:46:46 +01:00
Emil Velikov
aaea53c2c0 st/va: fix misplaced closing bracket
It's been like this since the code was introduced.

Fixes: 86eb4131a9 (st/va: add headless support, i.e. VA_DISPLAY_DRM)
Cc: <mesa-stable@lists.freedesktop.org>
Cc: Julien Isorce <julien.isorce@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-19 19:46:46 +01:00
Emil Velikov
c34a008891 st/va: move variable declaration to where its used
... and make it const, since we shouldn't tinker with it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-19 19:46:46 +01:00
Emil Velikov
369e5dd939 auxiliary/vl: use vl_*_screen_create stubs when building w/o platform
Provide a dummy stub when the user has opted w/o said platform, thus
we can build the binaries without unnecessarily requiring X11/other
headers.

In order to avoid build and link-time issues, we remove the HAVE_DRI3
guards in the VA and VDPAU state-trackers.

With this change st/va will return VA_STATUS_ERROR_ALLOCATION_FAILED
instead of VA_STATUS_ERROR_UNIMPLEMENTED. That is fine since upstream
users of libva such as vlc and mpv do little error checking, let
alone distinguish between the two.

Cc: Leo Liu <leo.liu@amd.com>
Cc: Guttula, Suresh <Suresh.Guttula@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-19 19:46:41 +01:00
Emil Velikov
05043e0e8e configure: error out when building X11 Vulkan without DRI3
Vulkan supports only DRI3 enabled X11 platforms. Make it obvious,
should one consider building without it.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:25 +01:00
Emil Velikov
d80d6d662e loader: build libloader_dri3_helper.la only with HAVE_PLATFORM_X11
Pretty much every other place does the same.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:22 +01:00
Emil Velikov
a24dc36dde vulkan: automake: remove unused VULKAN_LIB_DEPS variable
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:17 +01:00
Emil Velikov
acf3d2afab configure: check once for DRI3 dependencies
Currently we are having the XCB_DRI3 dependencies duplicated,
partially.

Just do a once-off check and add all of the respective CFLAGS/LIBS
where needed.

As a nice side effect this helps us solve a couple of FIXMEs.

DRI3 is not a thing w/o X11 so disable it in such cases.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:15 +01:00
Emil Velikov
8212fc95b5 configure: error out when building GLX w/o the X11 platform
Building EGL/Vulkan/other without X11, while GLX is enabled is confusing
and misleading. In practise anyone aiming at the former will also
disable GLX.

The inverse (some examples below) should still work:
 ./configure --disable-glx --with-platforms=x11 --with-vulkan-drivers=intel
 ./configure --disable-glx --with-platforms=x11 --enable-egl

Keep in mind that the X11 platform is enabled, by default.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:12 +01:00
Emil Velikov
f353f844a0 configure: set HAVE_foo_PLATFORM as applicable
Rather than having multiple places that define the macros, do it just
once in configure. Makes existing code a bit shorter and easier to
manage as we fix the VL targets with follow-up commits.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:09 +01:00
Emil Velikov
2d35773221 configure: enable the surfaceless platform by default
A simple platform that you want to use in a many usecases. See the
spec file details.

It has no special requirements plus it takes less than a second to
build.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:06 +01:00
Emil Velikov
edb5a65f93 configure: loosen --with-platforms heuristics
Remove the enable-egl pre-requirement. Platform selection does not
depend on EGL.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:04 +01:00
Emil Velikov
73682f82bc configure: update remaining --with-egl-platforms references
Rename the remaining references to omit the egl part.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:02 +01:00
Emil Velikov
27737e7e84 configure: rename remaining HAVE_EGL_PLATFORM_* guards
Analogous to others earlier, these will be used to control the platform
for more than the EGL driver.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:00 +01:00
Emil Velikov
3208fd2e46 configure: move platform handling further up
We'll need it for the Vulkan drivers and the VL targets.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:43:51 +01:00
Rob Herring
de6f3cce8c Android: r600: fix build when LLVM is disabled
There's still an error after my recent clean-up if LLVM is not patched to
enable AMDGPU target:

external/mesa3d/src/amd/common/ac_llvm_util.c:38:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetInfo' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTargetInfo();
        ^
external/mesa3d/src/amd/common/ac_llvm_util.c:39:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTarget' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTarget();
        ^
external/mesa3d/src/amd/common/ac_llvm_util.c:40:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetMC' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTargetMC();
        ^
external/mesa3d/src/amd/common/ac_llvm_util.c:41:2: error: implicit declaration of function 'LLVMInitializeAMDGPUAsmPrinter' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUAsmPrinter();
        ^

We need to drop libmesa_amd_common when LLVM is disabled, however there's
still a dependency on include paths for ac_binary.h. So explicitly add the
include path when LLVM is disabled.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-19 19:03:08 +01:00
Rob Herring
5771ecc90e virgl: fix virgl_bo_transfer_{put, get} box struct copy
Commit 3dfe61ed6e ("gallium: decrease the size of pipe_box - 24 -> 16
bytes") changed the size of pipe_box, but the virgl code was relying on
pipe_box and drm_virtgpu_3d_box structs having the same size/layout doing
a struct copy. Copy the fields one by one instead.

Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Fixes: 3dfe61ed6e ("gallium: decrease the size of pipe_box - 24 -> 16 bytes")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-19 19:02:32 +01:00
Emil Velikov
e19ea928b9 egl: add g_egldispatchstubs.h to the release tarball
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-19 19:02:12 +01:00
Tapani Pälli
f347bac30f egl/android: fix segfault within swap_buffers
Function droid_swap_buffers may get called without dri2_surf->buffer set,
in these cases we don't have a back buffer set either. Patch fixes segfault
seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI.

backtrace:
   #00 pc 00013f88  /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104)
   #01 pc 000117b2  /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50)
   #02 pc 000058b2  /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386)
   #03 pc 00011329  /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553)
   #04 pc 000118e7  /system/lib/libEGL.so (eglSwapBuffers+55)
   #05 pc 000754dc  /system/lib/libandroid_runtime.so

Note, this is v1 as v2 caused dEQP regressions.

Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-19 13:57:52 +03:00
Daniel Stone
1f2d0093bf egl/wayland: Ensure we get a back buffer
Commit 9ca6711faa changed the Wayland winsys to only block for the
frame callback inside SwapBuffers, rather than get_back_bo. get_back_bo
would perform a single non-blocking Wayland event dispatch, to try to
find any release events which we had pulled off the wire but not
actually processed. The blocking dispatch was moved to SwapBuffers.

This removed a guarantee that we would've processed all events inside
get_back_bo(), and introduced a failure whereby the server could've sent
a buffer release event, but we wouldn't have read it. In clients
unconstrained by SwapInterval (rendering ~as fast as possible), which
were being displayed directly without composition (buffer release delayed),
this could lead to get_back_bo() failing because there were no free
buffers available to it.

The drawing rightly failed, but this was papered over because of the
path in eglSwapBuffers() which attempts to guarantee a BO, in order to
support calling SwapBuffers twice in a row with no rendering actually
having been performed.

Since eglSwapBuffers will perform a blocking dispatch of Wayland
events, a buffer release would have arrived by that point, and we
could then choose a buffer to post to the server. The effect was that
frames were displayed out-of-order, since we grabbed a frame with random
past content to display to the compositor.

Ideally get_back_bo() failing should store a failure flag inside the
surface and cause the next SwapBuffers to fail, but for the meantime,
restore the correct behaviour such that get_back_bo() no longer fails.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98833
Fixes: 9ca6711faa ("Revert "wayland: Block for the frame callback in get_back_bo not dri2_swap_buffers"")
2017-05-19 09:36:19 +01:00
Daniel Stone
03dd9a88b0 egl/wayland: Use per-surface event queues
During display initialisation, we need a separate event queue to handle
the registry events, which is correctly handled. But we also need
separate per-surface event queues to handle swapchain-related events,
such as surface frame events and buffer release events. This avoids two
surfaces from the same EGLDisplay, both current on separate threads,
dispatching each other's events.

Create separate per-surface event queues, create wl_surface and wl_drm
proxy wrapper objects per surface, so we eliminate the race around
sending events to the wrong queue. swrast buffers do not need a
dedicated proxy wrapper, as the wl_shm_pool used to create the
wl_buffers, being transient, can itself be assigned to a queue.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 36b9976e1f ("egl/wayland: Avoid race conditions when on non-main thread")
Cc: mesa-stable@lists.freedesktop.org
2017-05-19 09:36:15 +01:00
Daniel Stone
8118bc269f egl/wayland: Don't open-code roundtrip
wl_display_roundtrip_queue() exists and can replace roundtrip(). The
API was introduced with wayland 1.6, while we currently require 1.11.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-05-19 09:36:11 +01:00
Daniel Stone
5034c61558 vulkan/wsi/wayland: Use proxy wrappers for swapchain
Though most swapchain operations used a queue, they were racy in that
the object was created with the queue only set later, meaning that its
event could potentially be dispatched from the default queue in between
these two steps.

Use proxy wrappers to avoid this race, also assigning wl_buffers created
for the swapchain to the event queue.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-05-19 09:36:06 +01:00
Daniel Stone
c902a1957d vulkan/wsi/wayland: Use per-display event queue
Calling random callbacks on the display's event queue is hostile, as
we may call into client code when it least expects it. Create our own
event queue, one per wsi_wl_display, and use that for the registry.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-05-19 09:36:03 +01:00
Daniel Stone
afe8c8a299 vulkan/wsi/wayland: Remove roundtrip when creating image
There's no need to call wl_display_roundtrip() after trying to create a
buffer through wl_drm; if it succeeds then everything is fine, and if it
fails, then we get a fatal protocol error so can't recover anyway.

Additionally, doing a roundtrip on the default / main application queue,
is destructive anyway, so would need to be its own queue.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-05-19 09:36:01 +01:00
Daniel Stone
d9a8bba7f4 vulkan: Fix Wayland uninitialised registry
Untangle the exit cleanup paths so we don't try to use the registry
variable before it's been initialised.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-05-19 09:34:52 +01:00
Nanley Chery
688ddb85c8 i965/formats: Update the three-channel DXT1 mappings
The procedure for decompressing an opaque DXT1 OpenGL format is
dependant on the comparison of two colors stored in the first 32 bits of
the compressed block. Here's the specified OpenGL behavior for
reference:

   The RGB color for a texel at location (x,y) in the block is given by:

      RGB0,              if color0 > color1 and code(x,y) == 0
      RGB1,              if color0 > color1 and code(x,y) == 1
      (2*RGB0+RGB1)/3,   if color0 > color1 and code(x,y) == 2
      (RGB0+2*RGB1)/3,   if color0 > color1 and code(x,y) == 3

      RGB0,              if color0 <= color1 and code(x,y) == 0
      RGB1,              if color0 <= color1 and code(x,y) == 1
      (RGB0+RGB1)/2,     if color0 <= color1 and code(x,y) == 2
      BLACK,             if color0 <= color1 and code(x,y) == 3

The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1.
This means that the behavior is incompatible with OpenGL. This is stated
in the SKL PRM, Vol 5: Memory Views:

   Opaque Textures (DXT1_RGB)
      Texture format DXT1_RGB is identical to DXT1, with the exception that the
      One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
      the resulting texel color is derived strictly from the Opaque Color Encoding.
      The alpha channel defaults to 1.0.

      Programming Note
      Context: Opaque Textures (DXT1_RGB)
      The behavior of this format is not compliant with the OGL spec.

The opaque and non-opaque DXT1 OpenGL formats are specified to be
decoded in exactly the same way except the BLACK value must have a
transparent alpha channel in the latter. Use the four-channel BC1 Intel
formats with the alpha set to 1 to provide the behavior required by the
spec. Note that the alpha is already set to 1 for RGB formats in
brw_get_texture_swizzle().

v2: Provide a more detailed commit message (Kenneth Graunke).
v3: Ensure the alpha channel is set to 1 for DXT1 formats.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-05-18 16:46:15 -07:00
Nanley Chery
56458cb168 anv/formats: Update the three-channel BC1 mappings
The procedure for decompressing an opaque BC1 Vulkan format is dependant on the
comparison of two colors stored in the first 32 bits of the compressed block.
Here's the specified OpenGL (and Vulkan) behavior for reference:

   The RGB color for a texel at location (x,y) in the block is given by:

      RGB0,              if color0 > color1 and code(x,y) == 0
      RGB1,              if color0 > color1 and code(x,y) == 1
      (2*RGB0+RGB1)/3,   if color0 > color1 and code(x,y) == 2
      (RGB0+2*RGB1)/3,   if color0 > color1 and code(x,y) == 3

      RGB0,              if color0 <= color1 and code(x,y) == 0
      RGB1,              if color0 <= color1 and code(x,y) == 1
      (RGB0+RGB1)/2,     if color0 <= color1 and code(x,y) == 2
      BLACK,             if color0 <= color1 and code(x,y) == 3

The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1. This
means that the behavior is incompatible with OpenGL and Vulkan. This is stated
in the SKL PRM, Vol 5: Memory Views:

   Opaque Textures (DXT1_RGB)
      Texture format DXT1_RGB is identical to DXT1, with the exception that the
      One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
      the resulting texel color is derived strictly from the Opaque Color Encoding.
      The alpha channel defaults to 1.0.

      Programming Note
      Context: Opaque Textures (DXT1_RGB)
      The behavior of this format is not compliant with the OGL spec.

The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in
exactly the same way except the BLACK value must have a transparent alpha
channel in the latter. Use the four-channel BC1 Intel formats with the alpha
set to 1 to provide the behavior required by the spec.

v2 (Kenneth Graunke):
- Provide a more detailed commit message.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-05-18 16:46:15 -07:00
Jason Ekstrand
c499faebd7 anv: Add an option to abort on device loss
This is mostly for running in our CI system to prevent dEQP from
continuing on to the next test if we get a GPU hang.  As it currently
stands, dEQP uses the same VkDevice for almost all tests and if one of
the tests hangs, we set the anv_device::device_lost flag and report
VK_ERROR_DEVICE_LOST for all queue operations from that point forward
without sending anything to the GPU.  dEQP will happily continue trying
to run tests and reporting failures until it eventually gets crash that
forces the test runner to start over.  This circumvents the problem by
just aborting the process if we ever get a GPU hang.  Since this is not
the recommended behavior most of the time, we hide it behind an
environment variable.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-18 16:32:11 -07:00
Jason Ekstrand
53f997de77 anv: Wrap the device lost error in vk_error in QueueSubmit
We weren't wrapping this before because anv_cmd_buffer_execbuf may throw
a more meaningful error message.  However, we do change the error code
into VK_ERROR_DEVICE_LOST, so we should print a new message.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-18 16:32:11 -07:00
Marek Olšák
807e1d2577 radeonsi/gfx9: use CE RAM optimally
On GFX9 with only 4K CE RAM, define the range of slots that will be
allocated in CE RAM. All other slots will be uploaded directly. This will
switch dynamically according to which slots are used by current shaders.

GFX9 CE usage should now be similar to VI instead of being often disabled.

Tested on VI by taking the GFX9 CE allocation codepath and setting
num_ce_slots = 2 everywhere to get frequent switches between both modes.
CE is still disabled on GFX9.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
1cde473ec0 radeonsi: remove CE offset alignment restriction
This was only needed by LOAD_CONST_RAM, which is now only used to load
whole CE.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
a7f098fb76 radeonsi: only upload (dump to L2) those descriptors that are used by shaders
This decreases the size of CE RAM dumps to L2, or the size of descriptor
uploads without CE.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
53c2ef36da radeonsi: record which descriptor slots are used by shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
38828094e9 radeonsi: update si_ce_needed_cs_space
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
edb59ef2dc radeonsi: do only 1 big CE dump at end of IBs and one reload in the preamble
A later commit will only upload descriptors used by shaders, so we won't do
full dumps anymore, so the only way to have a complete mirror of CE RAM
in memory is to do a separate dump after the last draw call.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
06690e63f7 radeonsi: remove early return in si_upload_descriptors
All updates of descriptors_dirty also set dirty_mask, so the return is
unnecessary. The next commit will want this function to be executed
even if dirty_mask == 0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
b8f8d9e46c radeonsi: clamp indirect index to the number of declared shader resources
We'll do partial uploads of descriptor arrays, so we need to clamp
against what shaders declare.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
f07c15ef80 radeonsi: merge sampler and image descriptor lists into one
Sampler slots: slot[8], .. slot[39] (ascending)
Image slots: slot[7], .. slot[0] (descending)

Each image occupies 1/2 of each slot, so there are 16 images in total,
therefore the layout is: slot[15], .. slot[0]. (in 1/2 slot increments)

Updating image slot 2n+i (i <= 1) also dirties and re-uploads slot 2n+!i.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
5df24c3fa6 radeonsi: merge constant and shader buffers descriptor lists into one
Constant buffers: slot[16], .. slot[31] (ascending)
Shader buffers: slot[15], .. slot[0] (descending)

The idea is that if we have 4 constant buffers and 2 shader buffers, we only
have to upload 6 slots. That optimization is left for a later commit.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
d88ca12350 gallium/u_threaded: add a fast path for unbinding shader buffers
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
d4c8f429d1 gallium/u_threaded: add a fast path for unbinding shader images
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
ae7f7e8162 st/mesa: silence a valgrind warning in u_threaded_context due to st_draw_vbo
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák
767868ff6d glsl_to_tgsi: declare all SSBOs and atomics when indirect indexing is used
Only the first array element was declared, so tgsi_shader_info::
shader_buffers_declared didn't match what the shader was using.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Samuel Pitoiset
1468e29e02 radeonsi: get the sampler view type from inst->Texture for TG4
This will also magically fix this special lowering for
bindless samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 21:48:16 +02:00
Samuel Pitoiset
5cb2eee557 tgsi: store the sampler view type directly in the instruction
RadeonSI needs to do a special lowering for Gather4 with integer
formats, but with bindless samplers we just can't access the index.

Instead, store the return type in the instruction like the target.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 21:48:16 +02:00
Samuel Pitoiset
ac3f6bf608 tgsi: remove some unused OPCODE macros
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 21:48:16 +02:00
Tom Stellard
14e525a4d7 gallivm: Make sure module has the correct data layout when pass manager runs
The datalayout for modules was purposely not being set in order to work around
the fact that the ExecutionEngine requires that the module's datalayout
matches the datalayout of the TargetMachine that the ExecutionEngine is
using.

When the pass manager runs on a module with no datalayout, it uses
the default datalayout which is little-endian.  This causes problems
on big-endian targets, because some optimizations that are legal on
little-endian or illegal on big-endian.

To resolve this, we set the datalayout prior to running the pass
manager, and then clear it before creating the ExectionEngine.

This patch fixes a lot of piglit tests on big-endian ppc64.

Cc: mesa-stable@lists.freedesktop.org
2017-05-18 17:52:47 +00:00
Chad Versace
8f62d21bd7 egl: Partially revert 23c86c74, fix eglMakeCurrent
Fixes regressions in Android CtsVerifier.apk on Intel Chrome OS devices
due to incorrect error handling in eglMakeCurrent. See below on how to
confirm the regression is fixed.

This partially reverts

    commit 23c86c74cc
    Author:  Chad Versace <chadversary@chromium.org>
    Subject: egl: Emit error when EGLSurface is lost

The problem with commit 23c86c74 is that, once an EGLSurface became
lost, the app could never unbind the bad surface. Each attempt to unbind
the bad surface with eglMakeCurrent failed with EGL_BAD_CURRENT_SURFACE.

Specificaly, the bad commit added the error handling below. #2 and #3
were right, but #1 was wrong.

    1. eglMakeCurrent emits EGL_BAD_CURRENT_SURFACE if the calling
       thread has unflushed commands and either previous surface is no
       longer valid.

    2. eglMakeCurrent emits EGL_BAD_NATIVE_WINDOW if either new surface
       is no longer valid.

    3. eglSwapBuffers emits EGL_BAD_NATIVE_WINDOW if the swapped surface
       is no longer valid.

Whe I wrote the bad commit, I misunderstood the EGL spec language
for #1. The correct behavior is, if I understand correctly now, is
below. This patch doesn't implement the correct behavior, though, it
just reverts the broken behavior.

    - Assume a bound EGLSurface is no longer valid.
    - Assume the bound EGLContext has unflushed commands.
    - The app calls eglMakeCurrent. The spec requires eglMakeCurrent to
      implicitly flush. After flushing, eglMakeCurrent emits
      EGL_BAD_CURRENT_SURFACE and does *not* alter the thread's
      current bindings.
    - If the app calls eglMakeCurrent again, and the app inserts no
      commands into the GL command stream between the two eglMakeCurrent
      calls, then this second eglMakeCurrent succeeds without emitting an
      error.

How to confirm this fixes the regression:

    Download android-cts-verifier-7.1_r5-linux_x86-x86.zip from
    source.android.com, unpack, and `adb install CtsVerifier.apk`.
    Run test "Projection Cube". Click the Pass button (a
    green checkmark). Then run test "Projection Widget". Confirm that
    widgets are visible and that logcat does not complain about
    eglMakeCurrent failure.

    Then confirm there are no regressions in the cts-traded module that
    commit 263243b1 fixed:

        cts-tf > run cts --skip-preconditions --skip-device-info \
                 -m CtsCameraTestCases \
                 -t android.hardware.camera2.cts.RobustnessTest

    Tested with Chrome OS board "reef".

Fixes: 23c86c74 (egl: Emit error when EGLSurface is lost)
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Emil Velikov <emil.velikov@collabora.com>
2017-05-18 10:25:52 -07:00
Iago Toral Quiroga
2322ddf548 anv: fix multiview for clear commands
According to the VK_KHX_multiview spec:

"Multiview causes all drawing and clear commands in the subpass to
behave as if they were broadcast to each view, where each view is
represented by one layer of the framebuffer attachments."

This adds support for multiview clears, which were missing in the
initial implementation.

v2 (Jason):
  - split multiview from regular case
  - Use for_each_bit() macro

Fixes new CTS multiview tests:
dEQP-VK.multiview.clear_attachments.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-18 11:53:25 +02:00
Nicolai Hähnle
70215a23c6 ac: add missing extern "C" guards
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:53 +02:00
Nicolai Hähnle
6c01c4b907 ac: add radeon_info::num_{sdma,compute}_rings
Vulkan needs them.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:53 +02:00
Nicolai Hähnle
c488bf24ed ac: add radeon_surf::htile_slice_size
Vulkan needs it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:52 +02:00
Nicolai Hähnle
98a2492290 ac_surface: use radeon_info from ac_gpu_info
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:52 +02:00
Nicolai Hähnle
988c866212 ac/radeonsi: move radeon_info initialization to amd/common
v2: update Android.common.mk (Emil)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:52 +02:00
Nicolai Hähnle
de9dd4f9f1 ac/radeonsi: move struct radeon_info to ac_gpu_info.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:52 +02:00
Nicolai Hähnle
4d6e75776d ac/radeonsi: move some aspects of sanity checking to ac_surface
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:52 +02:00
Nicolai Hähnle
00f466bad9 ac/radeonsi: add ac_compute_surface to automatically switch gfx6 vs. gfx9
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:52 +02:00
Nicolai Hähnle
8aabed64c3 ac/radeonsi: move the bulk of gfx9_surface_init to ac_surface
We can now merge the two *_surface_init functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:51 +02:00
Nicolai Hähnle
db77cd879b ac/radeonsi: move the bulk of gfx6_surface_init to ac_surface
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:51 +02:00
Nicolai Hähnle
f187a49322 ac/radeonsi: move amdgpu_addr_create to ac_surface
v2:
- update Android.common.mk (Emil)
- rebase on top of Raven support

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2017-05-18 11:48:51 +02:00
Nicolai Hähnle
15a844986a ac/radeonsi: move surface definitions to new header ac_surface.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:48:51 +02:00
Nicolai Hähnle
377877ff5f st/mesa: remove an incorrect assertion
There is really no reason why the current DrawBuffer needs to be complete
at this point. In particular, the assertion gets hit on the X server side
in libglx when running .../piglit/bin/glx-get-current-display-ext -auto
(which uses indirect GLX rendering).

Fixes: 19b61799e3 ("st/mesa: don't cast the incomplete framebufer to st_framebuffer")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-18 11:47:27 +02:00
Samuel Iglesias Gonsálvez
e69e5c7006 i965/vec4: load dvec3/4 uniforms first in the push constant buffer
Reorder the uniforms to load first the dvec4-aligned variables in the
push constant buffer and then push the vec4-aligned ones. It takes
into account that the relocated uniforms should be aligned to their
channel size.

This fixes a bug were the dvec3/4 might be loaded one part on a GRF and
the rest in next GRF, so the region parameters to read that could break
the HW rules.

v2:
- Fix broken logic.
- Add a comment to explain what should be needed to optimise the usage
  of the push constant buffer slots, as this patch does not pack the
  uniforms.

v3:
- Implemented the push constant buffer usage optimization.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez
8aa6ada838 i965/vec4: fix swizzle and writemask when loading an uniform with constant offset
It was setting XYWZ swizzle and writemask to all uniforms, no matter if they
were a vector or scalar, so this can lead to problems when loading them
to the push constant buffer.

Moreover, 'shift' calculation was designed to calculate the offset in
DWORDS, but it doesn't take into account DFs, so the calculated swizzle
for the later ones was wrong.

The indirect case is not changed because MOV INDIRECT will write
to all components. Added an assert to verify that these uniforms
are aligned.

v2:
- Fix 'shift' calculation (Curro)
- Set both swizzle and writemask.
- Add assert(shift == 0) for the indirect case.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez
354f7f2cb9 i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution
We are going to add a packing feature to reduce the usage of the push
constant buffer. One of the consequences is that 'nr_params' would be
modified by vec4_visitor's run call, so we need to restore it if one of
them failed before executing the fallback ones. Same thing happens to the
uniforms values that would be reordered afterwards.

Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when
the dvec4 alignment and packing patch is applied.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-05-18 06:49:28 +02:00
Eric Anholt
e8ea42d245 vc4: Don't allocate new BOs to avoid synchronization when they're shared.
If X11 did a software fallback to the entire screen, we would throw out
the BO the screen is scanning out from and allocate a new one.

Cc: mesa-stable@lists.freedesktop.org
2017-05-17 14:18:29 -07:00
Eric Anholt
50e78cd04f vc4: Drop pointless indirections around BO import/export.
I've since found them to be more confusing by adding indirections than
clarifying by screening off resources from the handle/fd import/export
process.
2017-05-17 14:18:26 -07:00
Eric Anholt
76e4ab5715 vc4: Drop the u_resource_vtbl no-op layer.
We only ever attached one vtbl, so it was a waste of space and
indirections.
2017-05-17 14:18:26 -07:00
Marek Olšák
bd4b224fa6 gallium/radeon: use a top-of-pipe timestamp for the start of TIME_ELAPSED
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 20:28:44 +02:00
Marek Olšák
4f50c91c32 mesa: don't check mapped buffers in every draw call if drivers allow it
Before: DrawElements (16 VBOs) w/ no state change: 4.34 million/s
After:  DrawElements (16 VBOs) w/ no state change: 8.80 million/s

This inefficiency was uncovered by Timothy Arceri's no_error work.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 20:28:44 +02:00
Marek Olšák
d02d8ea8b6 mesa: add gl_constants::AllowMappedBuffersDuringExecution
for skipping mapped-buffer checking in every GL draw call

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 20:28:44 +02:00
Marek Olšák
50189379fa gallium: add PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION
for skipping mapped-buffer checking in every GL draw call

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 20:28:44 +02:00
Hans de Goede
84f764a759 glxglvnddispatch: Add missing dispatch for GetDriverConfig
Together with some fixes to xdriinfo this fixes xdriinfo not working
with glvnd.

Since apps (xdriinfo) expect GetDriverConfig to work without going to
need through the dance to setup a glxcontext (which is a reasonable
expectation IMHO), the dispatch for this ends up significantly different
then any other dispatch function.

This patch gets the job done, but I'm not really happy with how this
patch turned out, suggestions for a better fix are welcome.

Cc: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2017-05-17 20:02:18 +02:00
Tim Rowley
dce41f7728 swr: don't use AttributeSet with llvm >= 5
This change fixes the build break with llvm-svn.

r301981 of llvm-svn made add/remove of function attributes
use AttrBuilder instead of AttributeList.

Tested with llvm-3.9, llvm-4.0, llvm-svn.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-17 11:24:46 -05:00
Chih-Wei Huang
bfc0c23843 Android: correct libz dependency
Commit 6facb0c0 ("android: fix libz dynamic library dependencies")
unconditionally adds libz as a dependency to all shared libraries.
That is unnecessary.

Commit 85a9b1b5 introduced libz as a dependency to libmesa_util.
So only the shared libraries that use libmesa_util need libz.

Fix Android Lollipop build by adding the include path of zlib to
libmesa_util explicitly instead of getting the path implicitly
from zlib since it doesn't export the include path in Lollipop.

Fixes: 6facb0c0 "android: fix libz dynamic library dependencies"

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Herring <robh@kernel.org>
2017-05-17 14:04:18 +01:00
Timothy Arceri
f96edf72b4 mesa: add KHR_no_error support for glDispatchCompute*()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
d1894c42ef mesa: add DispatchCompute* helpers
These will be used to add KHR_no_error support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
07e14d561c mesa: move FLUSH_CURRENT() calls out of DispatchCompute*() validation
This is required to add KHR_no_error support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
f98411eaad mesa: compute.c C99 tidy up
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
64757e73de mesa: move DispatchCompute() validation to compute.c
This is the only place it is used so there is no reason for it to be
in api_validate.c

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
25591adc28 mesa: add KHR_no_error support for glBlendEquationSeparateiARB()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
9aff3c605b mesa: add blend_equation_separatei() helper
Will be used to add KHR_no_error support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
5c8252ba6f mesa: add KHR_no_error support for glBlendFunc*iARB()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:04 +10:00
Timothy Arceri
b5c67f469a mesa: add blend_func_separatei() helper
This will be used to add KHR_no_error support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
b3888b7a68 mesa: add KHR_no_error support for glBufferSubData()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
7ec12293be mesa: add KHR_no_error support for glNamedBufferSubData()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
cab148c282 mesa: add buffer_sub_data() helper
This will allow us to share code between the dsa, non-dsa and
no_error variants.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
5d29e6aab8 mesa: create validate_buffer_sub_data() helper
This change assumes meta will always pass valid arguments to
_mesa_buffer_sub_data().

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
624dc2833e mesa: add KHR_no_error support for glBufferStorage()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
cdbfb19420 mesa: add KHR_no_error support for glNamedBufferStorage()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
09687c2282 mesa: add inlined_buffer_storage() helper
This will allow us to share code between the dsa, non-dsa and
no_error variants.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
70d4d1164e mesa: add validate_buffer_storage() helper
This will allow use to add KHR_no_error support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
6c8964bf63 mesa: add KHR_no_error support for glCompressedTex*SubImage3D()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
d1033cd1eb mesa: add 3D support to compressed_tex_sub_image() helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
7cc190aae8 mesa: add KHR_no_error support for glCompressedTex*SubImage2D()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
1b36aa02b0 mesa: add 2D support to compressed_tex_sub_image() helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
d5e382e316 mesa: add KHR_no_error support for CompressedTex*SubImage1D()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
cb5627cbac mesa: add compressed_tex_sub_image() helper
This reduces duplication between the dsa and non-dsa function
and will also be used in the following commit to add
KHR_no_error support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
f1e692b452 mesa: make _mesa_compressed_texture_sub_image() static
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
3336d248e8 mesa: add KHR_no_error support for NamedFramebufferTexture
V3: use frame_buffer_texture() helper

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
4e8aa4b9a2 mesa: add KHR_no_error support for FramebufferTexture
V3: use the frame_buffer_texture() helper

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
f6229284e2 mesa: add *FramebufferTexture() support to frame_buffer_texture helper
V2: call check_layered_texture_target() even for no_error

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
4e125c4da6 mesa: add KHR_no_error support for NamedFramebufferTextureLayer
v3: use frame_buffer_texture_layer() helper

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
e75e8d6c94 mesa: add KHR_no_error support for FramebufferTextureLayer
V3: use frame_buffer_texture_layer() helper

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
f6198e9146 mesa: add no error support to frame_buffer_texture_layer() helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
01081c6ee4 mesa: add frame_buffer_texture_layer() helper
To be used to add KHR_no_error support while sharing code between
the DSA and non-DSA OpenGL function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
70aa66f181 mesa: add KHR_no_error support for glUseProgram
V3: use always_inline attribute (Suggested by Nicolai)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Timothy Arceri
35a9b9a70c mesa: move use_program() inside _mesa_use_program()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 10:12:03 +10:00
Jason Ekstrand
e0d6f9afba intel/isl/gen6: Fix combined depth stencil alignment
All combined depth stencil buffers (even those with just stencil)
require a 4x4 alignment on Sandy Bridge.  The only depth/stencil buffer
type that requires 4x2 is separate stencil.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Jason Ekstrand
74d626f383 intel/isl: Refactor gen8_choose_image_alignment_el
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Jason Ekstrand
2486c7dd54 intel/isl: Refactor gen6_choose_image_alignment_el
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Jason Ekstrand
715f47cb34 intel/isl: Refactor gen7_choose_image_alignment_el
The Ivy Bridge PRM provides a nice table that handles most of the
alignment cases in one place.  For standard color buffers we have a
little freedom of choice but for most depth, stencil and compressed it's
hard-coded.  Chad's original functions split halign and valign apart and
implemented them almost entirely based on restrictions and not the
table.  This makes things way more confusing than they need to be.  This
commit gets rid of the split and makes us implement the exact table
up-front.  If our surface isn't one of the ones in the table then we
have to make real choices.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Pohjolainen, Topi
236f17a9f7 intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4
The reasoning Chad gave in the comment for choosing a valign of 4 is
entirely bunk.  The fact that you have to multiply pitch by 2 is
completely unrelated to the halign/valign parameters used for texture
layout.  (Not completely unrelated.  W-tiling is just Y-tiling with a
bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled
chunks so it makes everything easier if miplevels are always aligned to
8x8.)  The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet
doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you
can't do stencil texturing anyway.

v2 (Jason Ekstrand):
 - Delete most of Chad's comment and add a more descriptive commit
   message.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Rob Clark
dafc2f1887 freedreno/gmem: fix hw binning hangs with large render targets
On all 3 gens, we have 4 bits for width and height in the VSC pipe
config.  And overflow results in setting width and/or height to zero
which causes hangs.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-16 16:34:21 -04:00
Rob Clark
da9a1cb8a6 freedreno/ir3: fix crash with atomics
Atomics can have a result value.  And sometimes it is even used.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-16 16:34:21 -04:00
Rob Clark
8b4588b090 mesa/st: fix yuv EGLImage's
Don't reject YUV formats that the driver doesn't handle natively, since
mesa/st already knows how to lower this in shader.

Reported-by: Nicolas Dechesne <ndec@linaro.org>
Fixes: 83e9de2 ("st/mesa: EGLImageTarget* error handling")
Cc: 17.1 <mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@gmail.com>
Tested-by: Nicolas Dechesne <ndec@linaro.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-16 16:34:21 -04:00
Rob Clark
12aa1d15d5 ttn: fix dest size for some texture instructions
Some, like lod, don't return 4 components.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-16 16:34:21 -04:00
Rob Clark
2216a95946 ttn: fix txd src sizes
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-16 16:34:21 -04:00
Rob Clark
b00fbb7daf ttn: fix txs dest size
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-16 16:34:21 -04:00
Rob Clark
1303afdd4f freedreno/a5xx: remove unneeded assert
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-16 16:34:21 -04:00
Rob Clark
9235ab6550 freedreno/a5xx: fallback to slow-clear for z32
We probably *could* do this with blit path, but I think it would involve
clobbering settings from batch->gmem (see emit_zs()).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-16 16:34:21 -04:00
Philipp Zabel
cb16d91034 etnaviv: increment the resource seqno in resource_changed
Just increment the resource seqno instead of setting the texture
seqno to be lower by one than the resource seqno.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-16 21:07:56 +02:00
Lucas Stach
ba0b7de7e3 etnaviv: clean up sampler view reference counting
Use the proper pipe_resource_reference function instead of
rolling our own.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-16 21:07:51 +02:00
Lucas Stach
f8a3991458 etnaviv: apply feature overrides in one central location
This way we can just test the feature bits and don't need to spread
the debug overrides to all locations touching a feature.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-16 21:07:46 +02:00
Lucas Stach
20ce6f1361 etnaviv: allow R/B swapped surfaces to be cleared
Fixes: 7f62ffb68a ("etnaviv: add support for rb swap")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-16 21:07:42 +02:00
Lucas Stach
8173d7d9e8 etnaviv: stop oversizing buffer resources
PIPE_BUFFER is a target enum, not a binding. This caused the driver to
up-align the height of buffer resources, leading to largely oversizing
those resources. This is especially bad, as the buffer resources used
by the upload manager are already 1MB in size. Height alignment meant
that those would result in 4 to 8MB big BOs.

Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-16 21:07:37 +02:00
Matt Turner
169e1e26ee i965: Fix test_eu_validate.cpp
Broken by commit a7217e909c ("i965: Pass pointer and end of assembly
to brw_validate_instructions").

Reported-by: Aaron Watry <awatry@gmail.com>
2017-05-16 11:45:07 -07:00
Jason Ekstrand
b5437fc05c anv: Implement VK_KHR_get_surface_capabilities2
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-16 08:38:46 -07:00
Jason Ekstrand
59f75dc2a4 vulkan/wsi/wayland: Add support for VK_KHR_get_surface_capabilities2
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-16 08:38:45 -07:00
Jason Ekstrand
56901c9ea4 vulkan/wsi/x11: Add support for VK_KHR_get_surface_capabilities2
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-16 08:38:43 -07:00
Jason Ekstrand
a28163db05 vulkan/wsi: Add get_capabilities2 and get_formats2d interface pointers
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-16 08:38:39 -07:00
Jason Ekstrand
52e6271ffd vulkan/wsi: Use vk_outarray for surface_get_formats
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-16 08:38:38 -07:00
Jason Ekstrand
c58f8bb56b vulkan: Update registry and headers to 1.0.49
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-16 08:38:34 -07:00
Nicolai Hähnle
c485b47383 radeonsi: extract TGSI memory/texture opcode handling into its own file
It's about time to get the growth of si_shader.c somewhat under control.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:55 +02:00
Nicolai Hähnle
cd9504667b radeonsi: make const_array externally accessible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:54 +02:00
Nicolai Hähnle
f0066eb57e radeonsi: make get_bounded_indirect_index externally accessible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:54 +02:00
Nicolai Hähnle
9252638afa radeonsi: make emit_waitcnt externally accessible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:54 +02:00
Nicolai Hähnle
3811730a37 radeonsi: silence a Coverity warning
Coverity doesn't understand that we'll never pass non-NULL for vertex
shaders.

This is a bit lame, actually. A straightforward cross-procedural analysis
limited to this source file should be enough to prove that there's no
NULL-pointer dereference. Oh well.

CID: 1405999
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:54 +02:00
Nicolai Hähnle
4ea67c1751 radeonsi: rename tcs_tes_uses_prim_id for clarity
What we care about is whether PrimID is used while tessellation is
enabled; whether it's used in TCS/TES or further down the pipeline is
irrelevant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:54 +02:00
Nicolai Hähnle
f4dbe2efb7 radeonsi: fix gl_PrimitiveIDIn in geometry shader when using tessellation
This builds on commit 0549ea15ec ("radeonsi: fix primitive ID in
fragment shader when using tessellation").

Fixes piglit
arb_tessellation_shader/execution/gs-primitiveid-instanced.shader_test

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:53 +02:00
Nicolai Hähnle
3accda4b82 ac/debug: handle index field in SET_*_REG correctly
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:53 +02:00
Samuel Pitoiset
0ca5bdb330 glsl: simplify link_assign_uniform_storage() a bit
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-16 09:33:06 +02:00
Samuel Pitoiset
8080082ad7 mesa: unify _mesa_uniform() for image uniforms
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-16 09:33:04 +02:00
Samuel Pitoiset
3c95a4fd4c mesa: fix indentation in _mesa_uniform()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-16 09:33:02 +02:00
Samuel Pitoiset
989def73ec mesa: fix indentation in _mesa_associate_uniform_storage()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-16 09:32:59 +02:00
Timothy Arceri
25bb02d7a0 mesa: replace _mesa_problem() with unreachable() in pack.c
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-16 12:25:50 +10:00
Timothy Arceri
51486d3369 mesa: replace _mesa_problem() with unreachable() in mipmap.c
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-16 12:25:50 +10:00
Timothy Arceri
1bd692b946 mesa: replace _mesa_problem() with unreachable() in _mesa_convert_colors()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-16 12:25:49 +10:00
Timothy Arceri
4c1664ff08 mesa: replace _mesa_problem() with unreachable() in _mesa_light()
All drivers but the old nouveau dri driver return after this anyway.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-16 12:25:49 +10:00
Timothy Arceri
59b9544fa7 mesa: replace _mesa_problem() with assert() in hash table
There should be no way the OpenGL test suites don't hit the assert()
should we do something to cause this code path to be taken.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-16 12:25:49 +10:00
Timothy Arceri
f25b2f76b0 mesa: don't crash in KHR_no_error uniform variants when location == -1
From Seciton 7.6 (UNIFORM VARIABLES) of the OpenGL 4.5 spec:

  "If the value of location is -1, the Uniform* commands will
  silently ignore the data passed in, and the current uniform values
  will not be changed.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-16 11:53:16 +10:00
Matt Turner
b1af896853 intel/aubinator_error_decode: Disassemble shader programs
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 12:04:04 -07:00
Matt Turner
23685f07d1 intel/aubinator_error_decode: Stop decoding after MI_BATCH_BUFFER_END
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 11:43:20 -07:00
Matt Turner
8e7221fa5a intel/tools: Refactor gen_disasm_disassemble() to use annotations
Which will allow us to print validation errors found in shader assembly
in GPU hang error states.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 11:43:14 -07:00
Matt Turner
aaa0329b5f intel/decoder: Fix indentation
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-15 11:43:13 -07:00
Matt Turner
3443bd45a3 genxml: Remove brackets from kernel start pointer names
Newer Gens' names don't have the brackets. Having common names will make
some later patches simpler.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-15 11:43:11 -07:00
Matt Turner
aae2626be8 i965: Add a weak no-op nir_print_instr() symbol
intel_asm_annotation.c is part of libintel_compiler.la, which contains
code for disassembling and validating shaders that we want to call in
aubinator_error_decode.

dump_assembly() calls nir_print_instr() to print annotations, and
although dump_assembly() is not called by aubinator_error_decode (nor is
any function in intel_asm_annotation.c) it causes undefined references
to nir_print_instr().

To work around, provide a no-op weak symbol to resolve against.
2017-05-15 11:43:01 -07:00
Matt Turner
d98e82c772 i965: Allow brw_eu_validate to handle compact instructions
This will allow the validator to run on shader programs we find in the
GPU hang error state.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 11:42:56 -07:00
Matt Turner
a7217e909c i965: Pass pointer and end of assembly to brw_validate_instructions
This will allow us to more easily run brw_validate_instructions() on
shader programs we find in GPU hang error states.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-15 11:42:47 -07:00
Matt Turner
8ca8ebbf78 i965: Mark shader programs for capture in the error state.
When the GPU hangs, the kernel saves some state for us. Until now it has
not included the shader programs, which are very often the reason the
GPU hang occurred. With the programs saved in the error state, we should
be more capable of debugging hangs.

Thanks to Chris Wilson and Ben Widawsky who provided the kernel support
for this feature ("drm/i915: Copy user requested buffers into the error
state"), which will be in kernel v4.13.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 11:42:41 -07:00
Tapani Pälli
0aa578714e egl: fix android logger compilation
1ce5853 broken compilation since LOG_ERROR is not defined and also
macro expansion won't work as planned (expands to 'ANDROID_egl2alog[level]')

v2: append 'ANDROID' to egl2alog table and use LOG_PRI
    (suggested by Chih-Wei Huang)

Fixes: 1ce5853 ("egl: simplify the Android logger")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-15 16:03:51 +01:00
Lionel Landwerlin
6d2e912cdb i965: perf: fix pointer to integer cast
v2: Just use cast to uintptr_t (Chris)

Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-15 14:06:15 +01:00
Lionel Landwerlin
55be6653e0 intel: gen-decoder: fix xml parser leak
In the unlikely case the parsing of genxml files fails, we were
leaking an xml parser object.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-15 14:06:11 +01:00
Marek Olšák
1c8f7d3be6 radeonsi: enable threaded_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
e24d094d70 gallium/u_threaded: drop and ignore all non-async debug callbacks
This is necessary to comply with OpenGL.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-15 13:01:33 +02:00
Marek Olšák
4c98afb241 gallium/radeon: add threaded context counter monitoring for HUD
"tc" will be initialized by the next commit.

v2: rename stuff according to v2 changes in u_threaded_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
7166773f90 radeonsi: implement replace_buffer_storage for the threaded context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
04299f7e5d gallium/radeon: subclass and handle threaded_query
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
b40d8026fa gallium/radeon: subclass threaded_transfer
v2: use assert on rtransfer->b.staging

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
b4fc399c08 gallium/radeon: subclass threaded_resource
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
93d549b2af gallium/radeon: handle other map buffer flags from the threaded context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
e11f7e1d59 gallium/radeon: handle TC_TRANSFER_MAP_THREADED_UNSYNC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
8b5485957e gallium/radeon: unwrap a context if we get a wrapped one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
42fe45b451 gallium/radeon: require both WRITE and FLUSH_EXPLICIT in buffer_flush_region
spotted randomly.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
b8e552424e gallium/util: add threaded_context as a pipe_context wrapper
v2: - rename num_calls -> num_call_slots (for tc_call)
    - rename num_calls -> num_total_call_slots (for tc_batch)
    - rename num_offloaded/direct_calls -> num_offloaded/direct_slots
    - declare slot[0] instead of slot[1]
    - remove no-op leftover code from tc_draw_vbo
    - use tc_set_resource_reference to fill threaded_transfer
    - fix map flags for sparse buffers
    - cosmetic changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
dca19b1d42 gallium/u_upload: add u_upload_clone
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
8559fa505d gallium: add flag PIPE_CONTEXT_PREFER_THREADED
State trackers can set this to tell the driver when u_threaded_context is
desirable.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15 13:01:33 +02:00
Marek Olšák
7622181cad radeonsi/gfx9: add support for Raven
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-15 13:00:26 +02:00
Marek Olšák
efdb378c36 amd/addrlib: import Raven support
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-15 13:00:26 +02:00
Eric Anholt
c98f03c6eb renderonly: Initialize fields of struct winsys_handle.
vc4 was rejecting renderonly's import, because the offset field was
nonzero.

Fixes: 848b49b288 ("gallium: add renderonly library")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-15 06:38:45 +02:00
Rob Clark
12f9fa564a Revert "freedreno: use bypass if only clears"
Causing issues with stk on a4xx.. still probably a good idea, but seems
some debugging is needed first.

This reverts commit 3ab072d3c8.
2017-05-14 15:10:08 -04:00
Rob Clark
e4ad86952a freedreno: fix crash when flush() but no rendering
If we haven't created a batch, just bail in pipe->flush(), since there
is nothing to do.

Fixes crash in warsow, which creates a whole bunch of contexts used for
nothing but texture uploads.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-14 15:10:08 -04:00
Rob Clark
06a51fb4e5 freedreno: fix indexbuffer upload
My fault for not having time to test Marek's patches while they were on
list.

Fixes: 330d0607 ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-14 15:10:08 -04:00
Bas Nieuwenhuizen
d4e4c36c7c radv: Save descriptor set even if vertex buffers are not saved.
Totally independent.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 0e6d532d32 "radv/meta: add support for save/restore meta without vertex data."
2017-05-13 23:05:25 +02:00
Rob Clark
8efaae3e19 freedreno/a5xx: hw binning support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-13 13:25:26 -04:00
Rob Clark
c61417e8be freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-13 13:12:36 -04:00
Rob Clark
3ab072d3c8 freedreno: use bypass if only clears
Some things trigger batches that only contain a clear (like glmark2
startup).  No point to use GMEM for this.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-13 13:12:36 -04:00
Pierre Moreau
840f6beb81 nv50/ir: Report wrong prog types using proper var
Coverity caught the use of the uninitialised variable `type`.
However, it was `info->type`, which is initialised, which was meant to
be used.

CID: 1406000
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: b490ca9a38 ("nv50/ir: Fail if encountering unknown shader type")
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-13 16:26:09 +02:00
Timothy Arceri
812ff333bf mesa: fix KHR_no_error SSO support
Fixes: 00c5119a5e ("mesa: add KHR_no_error support for glUseProgramStages()")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-13 20:56:34 +10:00
Andres Gomez
752a6384af docs: update calendar, add news item and link release notes for 17.0.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-05-13 02:20:32 +03:00
Andres Gomez
05e833dfda docs: add sha256 checksums for 17.0.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 6a680243fc)
2017-05-13 02:16:11 +03:00
Andres Gomez
a2ca97eaca docs: add release notes for 17.0.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 08abf3a2a2)
2017-05-13 02:16:08 +03:00
Andres Gomez
b7af0ddfef bin/get-fixes-pick-list.sh: bring back the warning
We warn again if there are more than one line with the "fixes:" tag.

The warning is silenced when the commit has already landed or each
fixes tag reference a commit that is in branch.

v2:
 - Warn if any of the fixes tags has not landed (Emil)

v3:
 - Remove unnecessary head command
 - Clarify commit message (Emil)
 - Skip already picked commits sooner (Emil)

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-13 00:52:34 +03:00
Andres Gomez
0dead448dd docs: extend until the end of August
Completed the 17.1 cycle and added the beginning of the 17.2 one.

Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-13 00:00:41 +03:00
Andres Gomez
19db5072aa docs: update "Release manager" column
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-13 00:00:34 +03:00
Nicolai Hähnle
5c92b1bf07 glsl: include image qualifiers when printing IR
v2:
- fix copy&paste errors noted by Samuel
- rebase

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-12 10:46:07 +02:00
Nicolai Hähnle
a16ae77185 radeonsi: get rid of secondary input/output word
By keeping track of fewer generics, everything can fit into 64 bits.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-12 10:46:06 +02:00
Nicolai Hähnle
0dd8aa44b3 radeonsi: reduce the number of generics for shader IO unique indices
This is a high as possible while still allowing to merge the bitfields
with the next commit.

For OpenGL, 32 would be sufficient. Nine apparently uses (much!) higher
indices than. Indices that are out of bound don't hurt for VS-PS
pipelines, except that the VS output kill optimization is not applied.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-12 10:46:06 +02:00
Nicolai Hähnle
90339fabd7 radeonsi: at most 8 sets of texture coordinates are supported
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-12 10:46:05 +02:00
Nicolai Hähnle
cfe6e30f1b radeonsi: skip generic out/in indices without a shader IO index
OpenGL uses at most 32 generic outputs/inputs in any stage, and they always
have a shader IO index and therefore fit into the outputs_written/
inputs_read/kill_outputs fields.

However, Nine uses semantic indices more liberally. We support that
in VS-PS pipelines, except that the optimization of killing outputs
must be skipped.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-12 10:46:05 +02:00
Nicolai Hähnle
7091fe887b radeonsi: use SI_MAX_IO_GENERIC instead of magic values
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-12 10:46:04 +02:00
Samuel Pitoiset
4aa4e17f4e glsl: order indices for images inside a struct array
ARB_bindless_texture allows images to be declared inside
structures. This is similar to samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-12 10:30:37 +02:00
Samuel Pitoiset
f87416f62d glsl: add parcel_out_uniform_storage::set_opaque_indices() helper
In order to sort indices for images inside a struct array we
need to do something similar to samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-12 10:30:30 +02:00
Rafael Antognolli
70251e3631 i965: Port 3DSTATE_VF_TOPOLOGY on gen8+ to genxml.
With this last state ported, we can get rid of gen8_draw_upload.c.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-11 21:27:38 -07:00
Rafael Antognolli
5bbcbabd86 i965: Port 3DSTATE_INDEX_BUFFER to genxml.
Also make the brw_get_index_type() function not shift its return, since that
is genxml's job now.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-11 21:27:38 -07:00
Rafael Antognolli
71bfb44005 i965: Port brw_cs_state tracked state to genxml.
Emit the respective commands using genxml code.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-11 21:27:38 -07:00
Rafael Antognolli
d9b4a81672 genxml: Add alias for MOCS.
Use an alias for this field on 3DSTATE_INDEX_BUFFER on gen6+, so we can set
the same value as the defines.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-11 21:27:38 -07:00
Rafael Antognolli
c93b17be19 i965/genxml: Mostly style fixes for emit_vertices code.
Several issues were caught on review after the original patch landed.
This commit fixes them.

v2:
   - Fix padding (Topi)
   - Remove .DestinationElementOffset change from this patch (Topi)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-11 21:27:38 -07:00
Glenn Kennard
fa105214d3 r600g: Add defines for per-shader engine settings
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
2017-05-12 12:20:04 +10:00
Glenn Kennard
123ae18f29 r600g: Add instruction encoding defines for MEM_RD
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
2017-05-12 12:19:55 +10:00
Glenn Kennard
8260c4648a r600g: Add scratch ring register defines
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
2017-05-12 12:19:46 +10:00
Kenneth Graunke
b361af6bbe i965: Drop brw_context::viewport_transform_enable.
This was used by the meta fast clear code.  Now that we've switched
back to BLORP, it's always true.

We might want it back when we add a RECTLIST extension to GL, but
that's someday in the future...

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2017-05-11 16:53:28 -07:00
Kenneth Graunke
f790d6e0b4 i965: Port Gen4-5 VS_STATE to genxml.
It's actually not that much code.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-11 16:52:59 -07:00
Kenneth Graunke
4933c3d16e i965: Change GEN_GEN < 7 to GEN_GEN == 6 in 3DSTATE_VS code.
This whole code is surrounded in #if GEN_GEN >= 6, and this code only
applies on Sandybridge.  So, use GEN_GEN == 6 to reduce the delta in
the next patch, when we add Gen4-5 support.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-11 16:34:04 -07:00
Kenneth Graunke
d65e19f5c6 genxml: Fix KSPs on Ironlake to be offsets, not pointers.
We use Instruction State Base Address on Ironlake, so we want KSP to be
an offset not an actual pointer.  Gen4/G45 use pointers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-11 16:33:48 -07:00
Samuel Pitoiset
4ad5fa617c glsl: simplify set_opaque_binding()
While we are at it, update the GLSL spec comment.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-11 21:08:32 +02:00
Samuel Pitoiset
a29810e27c glsl: add missing check for samplers in set_opaque_binding()
Like images, this prevents out-of-bound access when the explicit
binding layout qualifier is used with an array which contains
too much samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-11 21:08:28 +02:00
Samuel Pitoiset
be7a9066d9 mesa: remove useless get_uniform_parameter() declaration
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-11 21:08:24 +02:00
Samuel Pitoiset
7a37f5ade6 mesa: remove unused gl_program_parameter::Initialized
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-11 21:08:21 +02:00
Marek Olšák
479e76bc1f gallium/tests: fix build after index buffer changes
for some reason, only scons can build these.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-11 17:11:22 +02:00
Emil Velikov
709468a808 configure: remove unneeded bits around libunwind handling
If libunwind is not found we'll fail at PKG_CHECK_MODULES, so the
follow-up check will be false. Additionally the AM_CONDITIONAL is not
used, so we can drop it.

Fixes: 3bcef6aa24 ("configure.ac: honour --disable-libunwind if the .pc file is present")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 14:02:07 +01:00
Emil Velikov
a68d306a1d docs/releasing: don't forget to update the calendar
Suggested-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 14:01:37 +01:00
Emil Velikov
832180554a docs: remove released versions from the calendar
v2: Remove Mesa 17.1.0 as well

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v1)
2017-05-11 14:00:48 +01:00
Emil Velikov
4588912425 virgl: remove unused draw include
Driver does not use the gallium draw module.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-05-11 13:58:20 +01:00
Emil Velikov
88b8aaea3b radeon: automake: remove unneeded elf Cflags/Libs
No longer required as of commit d90bf4ef3e ("radeon: remove unused
radeon_elf_util.{c,h}")

v2: Add the required libelf link in src/amd/Makefile.common.am

Fixes: d90bf4ef3e ("radeon: remove unused  radeon_elf_util.{c,h}")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2017-05-11 13:58:20 +01:00
Emil Velikov
4c22b99953 anv: document that anv_gem_mmap returns MAP_FAILED on error
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-11 13:58:20 +01:00
Emil Velikov
1ce58536ec egl: simplify the Android logger
Drop the unsupported pre-JellyBean macros and use a simple egl2android
mapping. With this we loose the explicit abort() provided by LOG_FATAL,
although Mesa already already calls exit(1) in case of a fatal errors.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rob Herring <robh@kernel.org>
2017-05-11 13:58:20 +01:00
Rob Herring
bec1c13be2 Android: Drop linking libgcc
Including libgcc breaks on Android O (master). This doesn't appear to be
needed any more as both Android M and N have also been built w/o libgcc.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
d31a2b4d49 Android: Add LLVM support for Android O
Android O moves to LLVM 3.9 and also has some differences in header
dependencies as LLVM has moved to blueprint files. It seems libLLVMCore
was only needed for header dependencies, so we can drop that for O.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
26aee6f4d5 Android: rework LLVM build support
Currently, building with "mmma external/mesa3d" which builds all targets
and dependencies is broken for targets that require LLVM. This is due to
the build settings depending on MESA_ENABLE_LLVM. Instead of using a
conditional in the global Android.common.mk, make all the components that
need LLVM explicitly include the necessary build settings.

GALLIVM_CPP_SOURCES doesn't exist anymore, so remove that as well.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
e2ff12e919 Android: rework libelf dependencies
Add libelf as a library dependency rather than explicitly listing its
include paths. This should work for Android M and later which have the
necessary exported directories in libelf.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
06260da16e Android: drop LLVM support on Lollipop
Mesa no longer supports LLVM 3.5 for any targets we support.
Android-x86 adds support for llvmpipe which could work, but android-x86
for L is using mesa 11.0 anyway.

Dropping this support enables clean-up of libelf dependencies.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
4c0c3719dc Android: Add driver "all" option to enable all drivers
Add a driver string "all" so that if BOARD_GPU_DRIVERS is set to "all",
all the drivers are enabled in the build. This makes build testing all
drivers easier to maintain.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
3f097396a1 Android: push driver build details to driver makefiles
src/gallium/targets/dri/Android.mk contains lots of conditional for
individual drivers. Let's move these details into the individual driver
makefiles.

In the process, align the make driver conditionals with automake
(i.e. HAVE_GALLIUM_*).

Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: add the radeon winsys for radeonsi]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
2a2dabe1c3 Android: remove needless conditional including of child makefiles
It is not necessary to filter driver and winsys directories based on the
list of enabled drivers. Selecting the included driver libraries or not is
sufficient to control what is built.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
88014bc023 Android: Fix swrast only build
A build of only swrast is broken as the Android EGL now depends on
libdrm as does GBM. While we could make EGL conditionally depend on
libdrm, we probably want to enable kms_dri winsys as well and that will
need libdrm enabled. So just always enable libdrm and simplify the
Android makefiles a bit.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
[Emil Velikov: drop related inline comment]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
1ef913aacf Android: amd/common: fix dependency on libmesa_nir
Building libmesa_amd_common fails with:

external/mesa/src/amd/common/ac_shader_info.c:23:10: fatal error: 'nir/nir.h' file not found
         ^

external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found
         ^

libmesa_amd_common now depends on libmesa_nir, so add it as a dependency
and export the necessary directories.

Fixes: 224cf29 "radv/ac: add initial pre-pass for shader info gathering"
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
1082501979 Android: amd: use exported include dirs instead of explicit includes
Add exported include paths rather than explicitly adding the includes
in each user of the common AMD libs.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:21 +01:00
Rob Herring
4eec1cfa8e Android: remove remaining explicit libcxx includes
Explicitly including libcxx includes is not necessary at least on
Android M and later. It appears that libc++ was made the default in
commit "Make libc++ the default STL." in Android build system post L.
However, if L support is still needed, using "LOCAL_CXX_STL=libc++" is
the preferred way.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:20 +01:00
Mauro Rossi
f21454eaa5 Android: define required __STDC* macros as cflags
Necessary to fix the following radeonsi building errors:

In file included from external/mesa/src/gallium/drivers/radeonsi/si_blit.c:24:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:29:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:71:
In file included from external/llvm/include/llvm-c/Core.h:18:
In file included from external/llvm/include/llvm-c/ErrorHandling.h:17:
In file included from external/llvm/include/llvm-c/Types.h:17:
external/llvm/include/llvm/Support/DataTypes.h:49:3: error: "Must #define __STDC_LIMIT_MACROS before #including Support/DataTypes.h"
  ^
external/llvm/include/llvm/Support/DataTypes.h:53:3: error: "Must #define __STDC_CONSTANT_MACROS before "         "#including Support/DataTypes.h"
  ^
2 errors generated.

[Emil Velikov: add inline comment about the defines]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:20 +01:00
Mauro Rossi
7e907d8f7f Android: drop static linking of R600 LLVM libraries
Inspired by Chih-Wei Huang and Zhen Wu similar patches

Linking against llvm with both static and shared may be avoided,
provided that libLLVM shared library for device supports
whole static R600/AMDGPU libraries, necessary for radeonsi/amdgpu.

Complementary changes, limited to android external/llvm project
are necessary to correclty build libLLVM

Tested with marshmallow-x86 and nougat-x86 builds

Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-11 13:52:20 +01:00
Philipp Zabel
0b31c3adc1 configure.ac: Fix help string for --disable-pwr8 configure option
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-11 09:43:38 +01:00
Timothy Arceri
6d7660cf4b mesa: remove _CurrentFragmentProgram from gl_pipeline_object
This was added in b527dd65c8 as a work around because fixed function
fragment shaders were tracked in ctx->FragmentProgram._Current as
a gl_program rather than gl_shader_program.

However after my refactoring of the program and shader structs
at the end of 2016 which culminated in c505d6d852, we no longer
need gl_shader_program to track the current program making
_CurrentFragmentProgram obsolete.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-11 14:46:39 +10:00
Timothy Arceri
276166c45b mesa: add KHR_no_error support for FramebufferTexture*D functions
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
20cabc2ac0 mesa: add no error version of framebuffer_texture_with_dims()
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
304058a1fb mesa: add error version of get_texture_for_framebuffer()
This is a step towards KHR_no_error support.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
69ca1ef683 mesa: pass rb attachment to _mesa_framebuffer_texture()
This change will help us add KHR_no_error support to the caller.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
d90ced445c mesa: add _mesa_get_and_validate_attachment() helper
Will be used to add KHR_no_error support. We make this available
external so it can be called from meta.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
786b9ad95b mesa: remove _mesa_problem() from a few locations
_mesa_problem() is still useful in some places such as is if a backend
compile fails, but for the majority of cases we should be able to
remove it.

OpenGL test suites are becoming very mature, we should place more
trust in debug builds picking up missed cases.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
8b00630c4d mesa: make _mesa_get_framebuffer_attachment_parameter() static
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
a754e4ca38 mesa: fix indentation
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Timothy Arceri
e618761233 mesa: remove _mesa from static framebuffer object function
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 13:53:39 +10:00
Michel Dänzer
0c67aa8456 gallivm: Fix build against LLVM SVN >= r302589
deregisterEHFrames doesn't take any parameters anymore.

Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-11 11:00:42 +09:00
Timothy Arceri
bdaff25c20 mesa: small _mesa_UseProgram() tidy up
Makes the code easier to follow.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 10:56:09 +10:00
Timothy Arceri
244cef1694 mesa: add KHR_no_error support for glBindProgramPipeline()
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 10:56:08 +10:00
Timothy Arceri
0bca4784c2 mesa: add KHR_no_error support for glActiveShaderProgram()
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 10:56:08 +10:00
Timothy Arceri
00c5119a5e mesa: add KHR_no_error support for glUseProgramStages()
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 10:56:08 +10:00
Timothy Arceri
ea4c606441 mesa: create use_program_stages() helper
This will be used to create a KHR_no_error version of
glUseProgramStages().

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-11 10:56:08 +10:00
Dave Airlie
fe6c407a33 radv: handle fragment shader srgb resolve pass better
Bas pointed out the fs key doesn't take srgb into account,
since there is just one srgb variant, just create a separate
pipeline for it. This also uses dest format to be more consistent
on when srgb matters.

Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-11 10:36:06 +10:00
Kenneth Graunke
32f0dc3a29 i965: Make INTEL_DEBUG=bat decode VS/CLIP/GS/SF/WM/CC_STATE on Gen4-5.
This is something the original decoder did, but I didn't bother with
until now.  I recently had to debug an Ironlake issue, and wanted to
inspect VS_STATE.  So, now it's back.

The other packets in the switch statement are all Gen6/7+, where we
use offsets from dynamic state base address, so we don't need the
gtt_offset subtraction introduced here.  We might want to make a
helper for this hack at some point - perhaps when we introduce the
next occurance.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-10 11:58:20 -07:00
Kenneth Graunke
0f34b674ed i965: Switch BRW_NEW_CURBE_OFFSETS to BRW_NEW_PUSH_CONSTANT_ALLOCATION.
The BRW_NEW_CURBE_OFFSETS dirty bit is signalled when changing the
partitioning of the Constant Buffer URB section between the various
shader stages, on Gen4-5.

BRW_NEW_PUSH_CONSTANT_ALLOCATION is basically the same thing on Gen7+.

So, save a bit, and use the new name.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-10 11:41:58 -07:00
Kenneth Graunke
608a65ebca i965: Drop BRW_NEW_PUSH_CONSTANT_ALLOCATION from Gen6 code.
Gen6 doesn't have a configurable push constant region.  This is only
used on Gen7+.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-10 11:41:50 -07:00
Kenneth Graunke
3d70e00c62 i965: Only #if...#endif a single function or related section at a time.
Previously we guarded large swathes of code with #if GEN ... #endif
blocks.  This made it difficult to see which generations include what.

This patch splits up the #if..#endif sections so they surround a small
section of code - usually a single function/atom, or sometimes a group
of related functions.  It should make the code easier to work on.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-10 11:41:46 -07:00
Kenneth Graunke
774db15aaf i965: Turn brw_get_line_width_float() into brw_get_line_width().
Drop the old brw_get_line_width() helper which return the unsigned
fixed-point encoding of the line width - it's been dead since the
conversion to GENXML (which does the encoding for us).

Then rename brw_get_line_width_float() to the shorter name.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-10 11:41:42 -07:00
Kenneth Graunke
620f12a53f i965: Drop INTEL_DEBUG=stats.
For whatever reason, we had an INTEL_DEBUG=stats option that enabled
various statistics counters on Gen4-5 systems.  It's been around
forever, though I can't think of a single time that it's been useful.

On Gen6+, we enable statistics all the time because they're necessary
to support various query object targets.  Turning them off would break
those queries.

Gen4-5 don't support those queries, so the statistics counters generally
aren't useful; we disabled them by default.  This patch disables them
altogether.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-10 11:37:19 -07:00
Kenneth Graunke
31abfd2d35 i965: Disable ARB_pipeline_statistics_query on Gen4-5.
We apparently enabled this on all platforms in Mesa 10.6.  However, it
was only ever implemented for Gen6+.  The Gen4-5 query code goes up in
flames with an "Unrecognized query target" unreachable() error if you
even attempt to use any of the new functionality.

This wasn't caught because the Piglit tests require OpenGL 3.0, which
Gen4-5 cannot support.  The extension spec does say 3.0 is required,
though I'm not sure why - it seems like 2.1 would work fine.

We could implement it anyway, but it's a little bit of a pain due to the
lack of hardware contexts (so we have to snapshot around batches).

Given that it's been 100% broken for two years and I haven't seen a bug
report about it, I'm not terribly inclined to care.  So, let it go.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-10 11:37:19 -07:00
Alex Deucher
2f0450c627 radeonsi: add new vega10 pci ids
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-10 13:41:38 -04:00
Marek Olšák
49c326420e st/mesa: move the logic of all_varyings_in_vbos into st_update_array
The function was pretty slow. This brings a substantial decrease in draw
call overhead when min/max index bounds are not needed:

Before:  DrawElements (1 VBO) w/ no state change:          5.75 million
After:   DrawElements (1 VBO) w/ no state change:          7.03 million

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
94506e5642 st/mesa: unify common code in st_draw_vbo functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
f60f14bdb3 st/mesa: make st_draw_vbo static
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
740ef228f7 radeonsi: remove upload code for zero-stride vertex attribs
st/mesa takes care of it now.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
17f776c27b st/mesa: upload zero-stride vertex attributes here
This is the best place to do it. Now drivers without u_vbuf don't have to
do it.

v2: use correct upload size and optimal alignment

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
70dcb7377d gallium: add PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX
The next patch will use it. This is really for svga and GL2-level drivers.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
9db1f9bcd1 st/mesa: simplify the signature of get_client_array
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
e8b2274592 st/mesa: remove vpv->num_inputs dereferences in st_update_array
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
71fde49059 st/mesa: fold error handling into setup_(non_)interleaved_attribs
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
f4d272f6f6 st/mesa: fold cso calls into setup_(non_)interleaved_attribs
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 19:29:08 +02:00
Marek Olšák
c334c7dd75 st/mesa: don't call util_draw_init_info in st_draw_vbo 2017-05-10 19:00:16 +02:00
Marek Olšák
330d0607ed gallium: remove pipe_index_buffer and set_index_buffer
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.

Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.

pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.

v2: fixes for nine, svga
2017-05-10 19:00:16 +02:00
Marek Olšák
22f6624ed3 gallium: separate indirect stuff from pipe_draw_info - 80 -> 56 bytes
For faster initialization of non-indirect draws.
2017-05-10 19:00:16 +02:00
Marek Olšák
c24c3b94ed gallium: decrease the size of pipe_vertex_buffer - 24 -> 16 bytes 2017-05-10 19:00:16 +02:00
Emil Velikov
fe437882ea docs: add news item and link release notes for 17.1.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-10 15:24:03 +01:00
Emil Velikov
419be3a61f docs: add sha256 checksums for 17.1.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 806f802e7b)
2017-05-10 15:21:41 +01:00
Emil Velikov
c816c848e5 docs: Update 17.1.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 15a38605fc)
2017-05-10 15:21:39 +01:00
Samuel Pitoiset
de97e38290 st/glsl_to_tgsi: make sure resource file for samplers is PROGRAM_SAMPLER
Similar to how image resources are handled. That way we are sure
that inst->resource.file is PROGRAM_SAMPLER for "bound" samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 14:02:21 +02:00
Samuel Pitoiset
169888b55e radeonsi: silent a compiler warning
This fixes:

si_shader.c: In function ‘si_shader_dump_stats’:
si_shader.c:6704:31: warning: passing argument 1 of ‘si_get_max_workgroup_size’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
     si_get_max_workgroup_size(shader);
                               ^~~~~~
si_shader.c:5832:17: note: expected ‘struct si_shader *’ but argument is of type ‘const struct si_shader *’
 static unsigned si_get_max_workgroup_size(struct si_shader *shader)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 14:02:17 +02:00
Samuel Pitoiset
820966f9bc mesa: use u_bit_scan() in update_program_texture_state()
The check in update_single_program_texture() can also be
removed.

v2: - remove unused 's' variable

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10 12:14:17 +02:00
Samuel Pitoiset
6a1f324e4a mesa: remove never used gl_shader_compiler_options::EmitNoFunctions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2017-05-10 12:10:50 +02:00
Nicolai Hähnle
362f8f6798 radeonsi: dump compute descriptor lists
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:53 +02:00
Nicolai Hähnle
30267256df radeonsi: dump both enabled and required descriptor slots
This allows a meaningful dump with info == NULL (for compute shaders).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:50 +02:00
Nicolai Hähnle
571597bf47 radeonsi: dump compute shader as part of debug dump
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:48 +02:00
Nicolai Hähnle
fbb2886634 radeonsi: move struct si_compute into a header
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:46 +02:00
Nicolai Hähnle
1a3bedd4b7 radeonsi: split descriptor list dumping
Prepare for dumping CS descriptor list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:44 +02:00
Nicolai Hähnle
83f56e531d radeonsi: split shader dumping
Prepare for dumping compute shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:41 +02:00
Nicolai Hähnle
0282214c72 radeonsi: more const qualifiers in shader dump functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:39 +02:00
Nicolai Hähnle
db3559da12 ddebug: implement dd_dump_launch_grid
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:37 +02:00
Nicolai Hähnle
bf4ecfec4b ddebug: extract dd_dump_shader
Will be re-used for compute shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:34 +02:00
Nicolai Hähnle
fa1519d0c9 gallium/util: dump tokens in util_dump_shader_state only if type is TGSI
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:32 +02:00
Nicolai Hähnle
bcc37711cd gallium/util: add util_dump_grid_info
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-10 08:58:23 +02:00
Grazvydas Ignotas
45ccb661d8 radv: always free nir shaders from modules on stack
valgrind reports them as leaked, and I could not find anything making a
copy of the nir pointer. Also, radv_device_init_meta_blit_color() is
already freeing them unconditionally like this.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-10 01:13:44 +03:00
Grazvydas Ignotas
0ef302638f anv: don't leak DRM devices
After successful drmGetDevices2() call, drmFreeDevices() needs to be
called.

Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # radv version
2017-05-10 01:13:44 +03:00
Grazvydas Ignotas
e0aee8b667 anv: fix possible stack corruption
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.

Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-10 01:13:44 +03:00
Jason Ekstrand
037ce253b1 i965/vec4: Delete the system value infastructure
The only thing still using it is INVOCATION_ID for geometry shaders.
That's easily enough inlined into the nir_intrinsic_load_invocation_id
handling code.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:07 -07:00
Jason Ekstrand
2e9916ea04 i965/vec4: Use NIR to do GS input remapping
We're already doing this in the FS back-end.  This just does the same
thing in the vec4 back-end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:07 -07:00
Jason Ekstrand
e31042ab40 i965/fs: Move remapping of gl_PointSize to the NIR level
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand
5b00c3cc05 i965/nir: Inline remap_inputs_with_vue_map
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand
0d5f89cdc3 i965/vec4: Use NIR remapping for VS attributes
The NIR pass already handles remapping system values to attributes for
us so we delete the system value code as part of the conversion.

We also change nir_lower_vs_inputs to take an explicit inputs_read
bitmask and pass in the inputs_read from prog_data instead from pulling
it out of NIR.  This is because the version in prog_data may get
EDGEFLAG added to it on some old platforms.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand
80aa6e9d32 intel/compiler/vs: Move inputs_read handling to generic code
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:03 -07:00
Jason Ekstrand
d2fe804d18 i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE map
We also add a nice little comment to make it more clear exactly what
happens with the edge flag copy.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
ca4d192802 i965/fs: Lower gl_VertexID and friends to inputs at the NIR level
NIR calls these system values but they come in from the VF unit as
vertex data.  It's terribly convenient to just be able to treat them as
such in the back-end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
24e6fba500 i965/vs: Set uses_vertexid and friends from brw_compile_vs
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
5e832302dc i965: Move multiply by 4 for VS ATTR setup into the scalar backend.
The vec4 backend will want to count in units of vec4s, not scalar
components.  The simplest solution is to move the multiplication by 4
into the scalar backend.  This also improves consistency with how we
count varyings.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
36764b6923 i965/nir: Inline remap_vs_attrs
Now that we have nice block iterators, there's no good reason for this
to be off on it's own.  While we're here, we convert to using the NIR
const index getters/setters instead of whacking const_index values
directly.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
b86dba8a0e nir: Embed the shader_info in the nir_shader again
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer.  The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct.  This, however, has
caused a few problems:

 1) There are many things which generate NIR without GLSL.  This means
    we have to support both NIR shaders which come from GLSL and ones
    that don't and need to have an info elsewhere.

 2) The solution to (1) raises all sorts of ownership issues which have
    to be resolved with ralloc_parent checks.

 3) Ever since 00620782c9, we've been
    using nir_gather_info to fill out the final shader_info.  Thanks to
    cloning and the above ownership issues, the nir_shader::info may not
    point back to the gl_shader anymore and so we have to do a copy of
    the shader_info from NIR back to GLSL anyway.

All of these issues go away if we just embed the shader_info in the
nir_shader.  There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Kenneth Graunke
d4fa0a0fa6 mesa: Make _mesa_primitive_restart_index a static inline in the header.
It's now basically a single expression, so it probably makes sense to
have it inlined into the callers.

Suggested by Marek.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-09 12:07:34 -07:00
Rob Herring
db6f38cb6a freedreno: fix clang error in fd_get_compute_param
With commit 10c17f23b7 ("freedreno: core compute state support"),
Android builds fail with the following error:

external/mesa3d/src/gallium/drivers/freedreno/freedreno_screen.c:610:17: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
                        sprintf(ret, ir);
                                     ^~

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-09 14:20:54 -04:00
Rob Clark
c42952ea90 mesa/vbo: fix invalid min/max indexes
Fixes: c3f37e9b ("st/mesa: use min_index and max_index directly from vbo")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-09 14:20:24 -04:00
Lionel Landwerlin
32f14332f5 intel: compiler: prevent integer overflow
CID: 1399477, 1399478 (Integer handling issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-09 13:56:17 +01:00
Lionel Landwerlin
85182e490c intel: compiler: remove duplicated code
CID: 1399470: (Control flow issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-09 13:56:17 +01:00
Lionel Landwerlin
4201b7d1bf intel: gen decoder: don't check for size_t negative values
We should get either 0 or 1 here.

CID: 1373562 (Control flow issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2017-05-09 13:54:08 +01:00
Andres Gomez
bac80635af bin/*py: honor editorconfig formatting
Replace the two stray tabs with respective space.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-09 14:06:52 +03:00
Andres Gomez
d823440fed bin: use tabs for coding style on *.sh files
v2: Instead of changing *.sh, adapt the editorconfig file (Emil).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-09 14:05:00 +03:00
Mauro Rossi
7993823d38 android: i965: add per-gen libmesa_i965_gen{4,45,5} static
Needed to fix android building errors:

external/mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:148: error: undefined reference to 'gen5_init_atoms'
external/mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:150: error: undefined reference to 'gen45_init_atoms'
external/mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:152: error: undefined reference to 'gen4_init_atoms'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: 5a19d0b ("i965: Get real per-gen atom lists")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-09 08:04:09 +03:00
George Kyriazis
909f72e0a2 swr: fix polygonmode for front==back
Rasterizer core only supports polygonmode front==back.  Add logic for
populating fillMode for the rasterizer only for that case correctly.
Provide enum conversion between mesa enums and core enums.

The core renders lines/points as tris. Previously, code would enable
stipple for polygonmode != FILL.  Modify stipple enable logic so that
this works correctly.

No regressions in vtk tests.
Fixes the following piglit tests:
	pointsprite
	gl-1.0-edgeflag-const

v2: remove cc stable, and remove "not implemented" assert
v3: modified commit message

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-08 21:28:53 -05:00
George Kyriazis
26a9ed6f0f swr/rast: support polygonmode point
Add support for polygonmode point in the binner.  This is done by
splitting BinPostSetupPoints from BinPoints, so the earlier call can be
called from BinTriangles.  Setup has already been done at the time
BinPostSetupPoints needs to be called.

This checkin just adds support in the rasterizer.  A separate checkin
will add the appropriate driver support.

v2: remove cc stable
v3: modified commit message and subject line

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-05-08 21:28:53 -05:00
Timothy Arceri
34c5e58a68 util: move ALWAYS_INLINE macro to util/macro.h
Also added clang check.

macro.h is include by p_compiler.h so no other change is needed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-09 11:21:03 +10:00
Bruce Cherniak
f52e63069a swr: move msaa resolve to generalized StoreTile
v3: list piglit tests fixed by this patch. Fixed typo Tim pointed out.
v2: Reword commit message to more closely adhere to community
guidelines.

This patch moves msaa resolve down into core/StoreTiles where the
surface format conversion routines are available.  The previous
"experimental" resolve was limited to 8-bit unsigned render targets.

This fixes a number of piglit msaa tests by adding resolve support for
all the render target formats we support.

Specifically:
layered-rendering/gl-layer-render: fail->pass
layered-rendering/gl-layer-render-storage: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg: crash->pass
multisample-formats *[2,4,8,16] gl_ext_texture_snorm: crash->pass
multisample-formats *[2,4,8,16] gl_arb_texture_float: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg-float: fail->pass

MSAA is still disabled by default, but can be enabled with
"export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
The default is 0, which is disabled.

This patch improves the number of multisample-formats supported by swr,
and fixes several crashes currently in the 17.1 branch.  Therefore, it
should be considered for inclusion in the 17.1 stable release.  Being
disabled by default, it poses no risk to most users of swr.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
cc: mesa-stable@lists.freedesktop.org
2017-05-08 14:45:40 -05:00
Eric Anholt
0ffa06a19b glsl: Don't allow redefining builtin functions on GLSL 1.00.
The spec text cited above says you can't, but only the GLSL 3.00 (redefine
or overload) case was implemented.

Fixes dEQP scoping.invalid.redefine_builtin_fragment/vertex.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
2017-05-08 12:15:49 -07:00
Eric Anholt
79da0ed2fc glsl: Restrict func redeclarations (not just redefinitions) on GLSL 1.00.
Fixes DEQP's scoping.invalid.redeclare_function_fragment/vertex.

v2: Fix accidental rejection of prototype+decl.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
Tested-by: Matt Turner <mattst88@gmail.com>
2017-05-08 12:15:49 -07:00
Eric Anholt
e5ade7db73 glsl: Ban #undefining __LINE__ and friends on GLES2.
Fixes deqp_gles2 undefine_invalid_object_* failures.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
2017-05-08 12:15:49 -07:00
Eric Anholt
efa9750e96 glsl: Restrict functions to not return arrays or SOAs in GLSL 1.00.
From the spec,

    Arrays are allowed as arguments, but not as the return type. [...] The
    return type can also be a structure if the structure does not contain
    an array.

Fixes DEQP shaders.functions.invalid.return_array_in_struct_fragment.

v2: Spec cite wording change

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
2017-05-08 12:15:49 -07:00
Rob Clark
ae7aa8dbaf nir: fix (hopefully) windows build
Fixes: 53aa109b ("nir: add pass to lower atomic counters to SSBO")
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-08 13:41:16 -04:00
Marek Olšák
25d246f454 radeonsi: rename si_eliminate_const_vs_outputs -> si_optimize_vs_outputs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 19:18:29 +02:00
Marek Olšák
34bc470fa6 ac: fix broken elimination of duplicated VS exports
The renumbering code didn't take into account that multiple VS exports
can have the same PARAM index. This also significantly simplifies
the renumbering. Thankfully, we have piglits for this:

    spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatcentroid-packing
    spec@glsl-1.50@execution@interface-blocks-complex-vs-fs

Reported by Michel Dänzer.

Fixes: b08715499e ("ac: eliminate duplicated VS exports")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 19:18:29 +02:00
Chad Versace
0160fb1d50 egl: Fix -Wint-to-pointer-cast
main/egldisplay.c: In function '_eglParseX11DisplayAttribList':
main/egldisplay.c:491:38: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
          display->Options.Platform = (void *)value;

The fix: cast to uinptr_t before void*.
                                      ^
Fixes: ddb99127 egl/x11: Honor the EGL_PLATFORM_X11_SCREEN_EXT attribute
Cc: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-08 13:10:16 -04:00
Marek Olšák
b84979d6c7 st/mesa: remove unused st parameter in init_velement_lowered
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:33:46 +02:00
Marek Olšák
d801247cec st/mesa: use PIPE_MAX_ATTRIBS as the max number of vertex buffers
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
57fc8ae61a st/mesa: simplify code due to unification to st_common_program
v2: use the st_common_program() helper

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
a159d6ed20 st/mesa: simplify update_constants functions
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
bb6e851a1e st/mesa: unify TCS, TES, GS st_*_program structures
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
7ca8b86cb9 st/mesa: decrease the size of remaining st_translate_program array params
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
88d46ac184 st/mesa: remove unused outputSlotToAttr
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
7c44810cc0 st/mesa: remove st_context::vertex_result_to_slot
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
d947e3e2c8 st/mesa: decrease the size of st_vertex_program
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-08 18:32:00 +02:00
Marek Olšák
d1ee2b37ff st/mesa: remove struct st_tracked_state
It contains only one member: the update function. Let's use the update
function directly.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 18:32:00 +02:00
Nicolai Hähnle
cb2ac69628 radeonsi: split per-patch from per-vertex indices
Make it a bit clearer that the index spaces are logically seperate by
having them defined in different functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-08 17:42:17 +02:00
Nicolai Hähnle
707df19451 radeonsi: clarify documentation of existing SI workaround
Limiting LS-HS to a single wave is required on all SI chips due to an
issue with a power management feature.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-08 17:42:17 +02:00
Nicolai Hähnle
f16b755863 radeonsi: fix gl_PrimitiveID in tessellation with instanced draws on SI
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-08 17:42:17 +02:00
Nicolai Hähnle
b84b631c63 radeonsi: load patch_id for TES-as-ES when exporting for PS
For some reason, this change is only necessary on SI.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-08 17:42:17 +02:00
Nicolai Hähnle
0549ea15ec radeonsi: fix primitive ID in fragment shader when using tessellation
In a VS->TCS->TES->PS pipeline, the primitive ID is read from TES exports,
so it is as if TES were using the primitive ID.

Specifically, this fixes a bug where the primitive ID is not reset at
the start of a new instance.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-08 17:42:17 +02:00
Nicolai Hähnle
854ed47f3e radeonsi: mark fast-cleared textures as compressed when dirtying
There are a bunch of piglit fast clear tests that regressed on SI, for
example ./bin/ext_framebuffer_multisample-fast-clear single-sample.

The problem is that a texture is bound as a framebuffer, cleared, and
then rendered from in a loop that loops through different clear colors.
The texture is never rebound during all this, so the change to
tex->dirty_level_mask during fast clear was not taken into account
when checking for compressed textures.

I have considered simply reverting the problematic commit. However,
I think this solution is better. It does require looping through all
bound textures after a fast clear, but the alternative would require
visiting more textures needless on every draw. Draws are much more
common than clears.

Note that the rendering feedback loop rules do not apply here, because
the framebuffer binding is changed between the glClear and the draw
that samples from the texture that was cleared.

Fixes: bdd6449769 ("radeonsi: don't mark non-dirty textures with CMASK as compressed")
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-08 17:42:16 +02:00
Emil Velikov
f12fcb1c9d egl: use designated initializers
All the compilers used to build Mesa support them.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-08 15:34:21 +01:00
Emil Velikov
54f619fb9b egl: drop unneeded sentinel from level_strings[]
The array is local so we already know its size.

v2: Correct loop condition (Bartosz)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-08 15:34:09 +01:00
Emil Velikov
239e7ee91b egl: remove suprous header eglcompiler.h
The header is used only to provide STATIC_ASSERT. The latter is already
available in utils/macros.h so use that instead and kill of the header.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-08 15:33:59 +01:00
Emil Velikov
8d6f92313d egl: remove unneeded else statement in _eglInitLogger
The variable level is already initialized to -1 which is already
interpreted as FALLBACK_LOG_LEVEL.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-08 15:33:56 +01:00
Emil Velikov
1dd038e988 egl: remove no longer needed logger infra
As of last commit nobody requires anything else but the
_eglDefaultLogger(). As such use it directly and simplify the
implementation.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-08 15:33:54 +01:00
Emil Velikov
0372097eec egl: fold Android logger into main/
Will allow us to greatly simplify a lot of the code in egllog.c

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-08 15:33:51 +01:00
Emil Velikov
716e5db610 egl: remove unused _eglSetLogLevel()
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-08 15:33:12 +01:00
Samuel Pitoiset
a3996590b8 glsl: apply the image format for members of structures
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 16:04:05 +02:00
Samuel Pitoiset
8a6ecde9c1 glsl: store the image format in glsl_struct_field
ARB_bindless_texture allows to declare image types inside
structures, which means we need to keep track of the format.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-08 16:04:05 +02:00
Samuel Pitoiset
14187e1e9e st/glsl_to_tgsi: don't use rzalloc_array() when it's unnecessary
When the arrays are initialized later on with -1, that's useless
to use rzalloc_array().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-08 16:04:05 +02:00
Lionel Landwerlin
e3a5ab2d66 anv: check return value of anv_execbuf_add_bo
CID: 1405919 (Error handling issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-08 14:38:27 +01:00
Lionel Landwerlin
6247b8b413 anv: avoid null pointer dereference
The application might not give an output structure.

CID: 1405765 (Null pointer dereferences)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-08 14:38:27 +01:00
Eric Engestrom
dc795f85a5 egl: avoid dereferencing a null display
Fixes: ddb99127a6 ("egl/x11: Honor the EGL_PLATFORM_X11_SCREEN_EXT attribute")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-08 11:43:36 +01:00
Andres Gomez
9c70537a52 docs/releasing: added relevant people for build/check with MacOSX
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Jeremy Sequoia <jeremyhu@apple.com>
2017-05-08 11:39:46 +03:00
Andres Gomez
2be0a99052 docs/releasing: added relevant people for build/check with Android
v2: Tapani as main contact and Mauro just for help with
    debugging/building (Mauro).

v3: Mauro my provide feedback for android-x86 only (Mauro).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-08 11:39:22 +03:00
Andres Gomez
029d7bebed docs/releasing: added relevant people for build/check with Windows
v2: Brian Paul as main contact point and Jose Fonseca as
    fallback (Vinson, Jose)

Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Cc: Brian Paul <brianp@vmware.com>
Cc: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-05-08 11:39:20 +03:00
Andres Gomez
fcdc96d1fc docs/releasing: if possible, do some every day use on the RC
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-08 11:39:11 +03:00
Andres Gomez
8a3e33ae5d docs/releasing: further explain the build/check testing process
The build/check test should be done with an appropriate combination of
flags, depending on the changes introduced by the patch set.

Also, mention to cross compile with mingw-w64 for Windows.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-08 11:39:01 +03:00
Andres Gomez
8058707395 docs/releasing: check in master for forgotten nomination candidates
The maintanier should not just rely on the mesa-stable@ mailing list
but actually check the master branch in search for suitable nomination
candidates.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-08 11:38:53 +03:00
Andres Gomez
e0f7d25cf0 docs/releasing: format/style homogenization
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-08 11:38:39 +03:00
Andres Gomez
77306e2afc bin/get-fixes-pick-list.sh: don't warn if more than one, go over them
If an identified commit was having more than one fix, we would warn
about that and only treat the first.

Now, we don't warn but treat all of them.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-08 11:28:17 +03:00
Rafael Antognolli
df3b221016 i965: Update gen6_depth_stencil_state to use genX macro.
While moving depth stencil state to use genxml, this one was left
behind.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-07 21:06:05 -07:00
Rafael Antognolli
592d4387a3 i965: Move MOCS macros to brw_state.h.
brw_state.h is a better place to keep them, instead of brw_context.h.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-07 21:02:44 -07:00
Kenneth Graunke
bc074a4518 i965: Don't try to unmap NULL program cache BO.
When running shader-db with intel_stub and recent Mesa, context creation
fails when making a logical hardware context.  In this case, we call
intelDestroyContext(), which gets here and tries to unmap the cache BO.

But there isn't one - we haven't made it yet.  So we try to unmap a
NULL pointer, which used to be safe (it did nothing), but crashes
after commit 7c3b8ed878.

The result is that we crash rather than failing context creation with
a nice message.  Either way nothing works, but this is more polite.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-07 20:58:44 -07:00
Kenneth Graunke
1456da91c8 Revert "mesa: Require mipmap completeness for glCopyImageSubData(), sometimes."
This reverts commit c5bf7cb529.

This broke rendering in "Total War: WARHAMMER", which uses a single
level RGBA_UINT32 texture and the default filter modes of GL_LINEAR
and GL_NEAREST_MIPMAP_LINEAR.  However, the texture max level is 0,
so it is actually mipmap complete - it's the integer + linear rule
that causes the error.

I'm working with Khronos to find a real solution.  However it turns
out, this patch is not correct and breaks real programs, so let's
revert it for now.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100690
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16224
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-07 20:55:31 -07:00
Grazvydas Ignotas
1f743a0edf glsl: destroy function and subroutine hash tables
Just like other type hash tables are destroyed in
_mesa_glsl_release_types(), also destroy the ones for function and
subroutine types.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-08 13:18:23 +10:00
Dave Airlie
d71ca40a18 radv: fix regression in blit2d push constant change.
These were being fed to the shader as floats via the vertex
path, so also push them as floats here.

This fixes missing overlay in Sascha Willems demos.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 00:54:49 +01:00
Dave Airlie
bcf705b62e radv/meta: cleanup some unused code path
After moving everything to using push constants,
these paths are no longer needed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 08:56:55 +10:00
Dave Airlie
387fdf84c5 radv/meta: port blit to using push constants
Remove use of vertex buffer.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 08:56:52 +10:00
Dave Airlie
7c8bfb95c6 radv/meta: move blit2d to using push constants
This allows us to drop the vertex buffer.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 08:56:49 +10:00
Dave Airlie
b29ea49e8e radv/meta: move clear color to using push constants
The color clear value is uniform and needs only to be emitted from
the frag shader, so just push it down via a push constant,
and remove the vertex buffer completely.

The depth clear value needs to be emitted from the vertex
shader, but is only a single value.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 08:56:45 +10:00
Dave Airlie
3b85b630ee radv/meta: use novertex save path for resolve pass.
This was missing in the original change.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 08:56:42 +10:00
Dave Airlie
eb2a833679 radv: set base/ranges for push constant loads.
This isn't necessary yet but I'd like to use the range in
some future patches.

[airlied: add new resolve pass]
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 08:56:36 +10:00
Dave Airlie
823e9ea8a1 radv: drop resolve hack workarounds
This drops the resolve workarounds that change an image
tiling mode behinds it's back, this is horrible and breaks
the image_view->image relationship. Remove all this.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 23:41:39 +01:00
Dave Airlie
2a04f5481d radv/meta: select resolve paths
There are 3 resolve paths, the fastest being the hw resolver
but it has restriction on tile modes and can't do subresolves,
the compute resolver is next speed wise, but can't handle DCC
destinations, the fragment resolver handles that case.

This will end up with a slow down as currently we hack the
hw resolver paths when they shouldn't work, but we shouldn't
keep doing that.

The next patch removes the hacks.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 23:41:39 +01:00
Dave Airlie
69136f4e63 radv/meta: add resolve pass using fragment/vertex shaders
In order to resolve into DCC enabled dests we need to use
the fragment shader. This reuses the code from the compute
path and implements a resolve path in vertex/fragment shader.

This code isn't used until later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 23:41:39 +01:00
Dave Airlie
19be95f71e radv: add subpass resolve compute path
This adds a path to allow compute resolves to be used
for subpass resolves.

This isn't used yet, but will be later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 23:41:39 +01:00
Dave Airlie
c573076d4a radv/resolve: split resolve emission out for compute
This will allow to add a subpass compute resolve path.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 23:41:38 +01:00
Dave Airlie
ff47866107 radv/meta: split out core part of resolve shader
I want to reuse the same code for the fragment shader
version of the resolve shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 23:41:38 +01:00
Dave Airlie
588185eb6b radv/meta: add srgb conversion to end of resolve shader.
If we are resolving into an srgb dest, we need to convert
to linear so the store does the conversion back.

This should fix some wierdness seen when we subresolves
hit the compute path.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 23:41:38 +01:00
Jose Fonseca
dab6a2dfd9 nir: Fix missing snprintf symbol on Windows.
Copy nir_print.c's snprintf definition for now, to unbreak Windows
builds.

We can and should cleanup all snprintf definitions in a follow up
change, but I rather not leave Windows build broken any further.

Trivial.
2017-05-07 19:23:07 +01:00
Pierre Moreau
27ad060c6e nv50/ir: Replace NV50_PROGRAM_IR_* by PIPE_SHADER_IR_*
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-07 10:26:37 -04:00
Pierre Moreau
8fe5949b08 nv50/ir: Remove unused translation methods
This code was merged commented out, and has stayed that way ever since.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-07 10:26:36 -04:00
Pierre Moreau
dd7ab4dcb4 nv50/ir: Free target if we failed to create a program
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-07 10:26:36 -04:00
Pierre Moreau
b490ca9a38 nv50/ir: Fail if encountering unknown shader type
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-07 10:26:36 -04:00
Dave Airlie
c297e68828 radv: set PERF_MOD in sample state like radeonsi.
This just aligns the code with radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 11:19:01 +01:00
Dave Airlie
2add79a732 radv: apply the tess+GS hang workaround to Polaris12 as well
As I pointed out for radeonsi, and AMD confirmed, so fix this
in radv as well.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-07 11:17:48 +01:00
Timothy Arceri
ccf9669cc1 mesa: small texture targetIndex tidy up
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:34 +10:00
Timothy Arceri
68cd0e2000 mesa: fix broken indentation
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:34 +10:00
Timothy Arceri
084fec0e77 mesa: some C99 tidy ups
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
f9e6820652 mesa: add KHR_no_error support to copy buffer subdata functions
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
5e86bfaee3 mesa: remove _mesa from static function
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
2a305fee1b st/mesa: stop calling _mesa_init_buffer_object_functions()
After calling this we were then overriding all the functions with
st versions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
123b113f95 mesa: make _mesa_buffer_storage() static
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
681647eca8 mesa: make _mesa_copy_buffer_sub_data() static
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
f9c28b9f87 mesa: make _mesa_clear_buffer_sub_data() static
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
426e4765d2 mesa: add KHR_no_error support for flush mapped buffer functions
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
30d8dea602 mesa: make _mesa_flush_mapped_buffer_range() static
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
bbae62c714 mesa: add KHR_no_error support for unmap buffer functions
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
0b2e4da80a mesa: split unmap_buffer() in two
This will allow us to implement KHR_no_error support for unmap
functions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
6c3768692e mesa: make _mesa_unmap_buffer() static
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
9d010f57db mesa: add KHR_no_error support for some map buffer functions
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
e83b0a4103 mesa: split out validation from map_buffer_range()
This will allow us to add KHR_no_error support for *BufferRange
functions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Timothy Arceri
8a1c36015b mesa: make map_buffer_range() static
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-07 15:29:33 +10:00
Kenneth Graunke
1151349c2a i965: Drop BRW_NEW_BLORP from 3DSTATE_VF atom.
BLORP doesn't program 3DSTATE_VF, since it doesn't use index buffers,
making the setting irrelevant.  So there's no need to re-emit it after
a BLORP operation - the old setting will still be in place.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-06 15:43:43 -07:00
Kenneth Graunke
6e2c39f562 i965: Port 3DSTATE_VF to genxml and simplify the implementation.
The whole "it might be used for non-indexed draws" thing is no longer
true - it turns out this was a mistake, and removed in OpenGL 4.5.
(See Marek's commit 96cbc1ca29e0b1f4f4d6c868b8449999aecb9080.)  So
we can simplify this and just program 0 for non-indexed draws.

We can also use #if blocks to remove the atom on Ivybridge/Baytrail,
now that they have a separate atom list from Haswell.  No more runtime
checks.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-06 15:43:43 -07:00
Kenneth Graunke
8c5a938171 mesa: Simplify _mesa_primitive_restart_index().
We can use a simple shift equation rather than a switch statement.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-06 15:43:43 -07:00
Marek Olšák
314657dc11 Revert "radeonsi: constify a bunch of the perfcounter structs."
This reverts commit 7088b655e8.

It breaks performance counters. If you use them with this commit, they hang
the machine hard. Sysrq and ssh don't work.
2017-05-06 21:17:52 +02:00
Marek Olšák
b0d01bd303 Revert "radeonsi: fix build with GCC 4.8"
This reverts commit 485ece83ac.

It's needed to revert 7088b655e8.
2017-05-06 21:17:52 +02:00
Rob Clark
6050d5bf3d freedreno/a3xx: fix hang w/ large render targets and small gmem
Possibly other gen's have a similar limit.  Fixes glmark2 -b shadow
with larger resolutions on devices with small gmem (for example,
fullscreen 1080p on 8x16/db410c).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-06 14:16:33 -04:00
Rob Clark
4fadfbf176 freedreno/ir3: add macro to declare variable length arrays
We have enough of these, that we should stop open coding this.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-06 14:15:42 -04:00
Nicolai Hähnle
b738fae4b9 glsl: skip tree grafting for sampler and image types
v2: - use is_sampler()/is_image() instead (Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
d04b0f31d3 glsl: teach lower_ubo_reference about samplers inside structures
In a situation like:

(tex vec4 (record_ref (var_ref f)  tex)  (constant vec2 (0.000000 0.000000))  0 1 () )

The sampler needs to be lowered, otherwise this ends up with
"ir_dereference_variable @ 0x229a100 specifies undeclared variable
`ubo_load_temp' @ 0x2290440"

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
d550024a7e glsl: link bindless layout qualifiers
From section 4.4.6 of the ARB_bindless_texture spec:

   "If both bindless_sampler and bound_sampler, or bindless_image
    and bound_image, are declared at global scope in any
    compilation unit, a link- time error will be generated."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
d6810ea286 glsl: do not count bindless samplers/images when linking uniforms
From section 2.14.8 of the ARB_bindless_texture spec:

    "(modify second paragraph, p. 126) ... against the
     MAX_COMBINED_TEXTURE_IMAGE_UNITS limit.  Samplers accessed
     using texture handles (section 3.9.X) are not counted against
     this limit."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
8b4c48673a glsl: lower bindless sampler/image packed varyings
v3: - rebase (and remove (sampler) ? 1 : vector_elements)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
3cdcc5f02f glsl: implement ARB_bindless_texture conversions
From section 5.4.1 of the ARB_bindless_texture spec:

   "In the following four constructors, the low 32 bits of the
    sampler type correspond to the .x component of the uvec2 and
    the high 32 bits correspond to the .y component."

    uvec2(any sampler type)     // Converts a sampler type to a
                                //   pair of 32-bit unsigned integers
    any sampler type(uvec2)     // Converts a pair of 32-bit unsigned integers to
                                //   a sampler type
    uvec2(any image type)       // Converts an image type to a
                                //   pair of 32-bit unsigned integers
    any image type(uvec2)       // Converts a pair of 32-bit unsigned integers to
                                //   an image type

v4: - fix up comment style
v3: - rebase (and remove (sampler) ? 1 : vector_elements)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
95c83aba71 glsl: allow bindless samplers/images to be used with constructors
For the explicit conversions.

From section 4.1.7 of the ARB_bindless_texture spec:

   "Samplers are represented using 64-bit integer handles, and
    may be converted to and from 64-bit integers using constructors."

From section 4.1.X of the ARB_bindless_texture spec:

   "Images are represented using 64-bit integer handles, and
    may be converted to and from 64-bit integers using constructors."

v3: - add spec comment
    - update the glsl error message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v2)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
b98542588c glsl: add is_valid_constructor() helper function
This will help for the explicit conversions for sampler and
image types as specified by ARB_bindless_texture.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
1eff26f02d glsl: add ARB_bindless_texture operations
For the explicit pack/unpack conversions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
35c8e727a5 glsl: allow bindless samplers/images to be initialized
From section 4.1.7 of the ARB_bindless_texture spec:

   "Samplers may be declared as shader inputs and outputs, as uniform
    variables, as temporary variables, and as function parameters."

From section 4.1.X of the ARB_bindless_texture spec:

   "Images may be declared as shader inputs and outputs, as uniform
    variables, as temporary variables, and as function parameters."

v3: - add spec comment
    - update the glsl error message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
efb668fb29 glsl: allow bindless samplers/images to be l-values
From section 4.1.7 of the ARB_bindless_texture spec:

   "Samplers can be used as l-values, so can be assigned into and
   used as "out" and "inout" function parameters."

From section 4.1.X of the ARB_bindless_texture spec:

   "Images can be used as l-values, so can be assigned into and
    used as "out" and "inout" function parameters."

v4: - invert the logic
v3: - update spec comment formatting
    - keep the read_only check

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
fa4ebf6b8d glsl: add _mesa_glsl_parse_state object to is_lvalue()
Yes, this is a bit hacky but we don't really have the choice.
Plain GLSL doesn't accept bindless samplers/images as l-values
while it's allowed when ARB_bindless_texture is enabled.

The default NULL parameter is because we can't access the
_mesa_glsl_parse_state object in few places in the compiler.
One is_lvalue(NULL) call is for IR validation but other checks
happen elsewhere, should be safe.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
42a2fe25f4 glsl: relax bindless sampler arrays indexing
From section 4.1.7 of the ARB_bindless_texture spec:

   "Samplers aggregated into arrays within a shader (using square
    brackets []) can be indexed with arbitrary integer expressions."

v3: - update spec comment formatting

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
ece1c04e8e glsl: reject bindless samplers/images frag inputs without 'flat'
From section 4.3.4 of the ARB_bindless_texture spec

   "(modify last paragraph, p. 35, allowing samplers and images as
    fragment shader inputs) ... Fragment inputs can only be signed
    and unsigned integers and integer vectors, floating point scalars,
    floating-point vectors, matrices, sampler and image types, or
    arrays or structures of these.  Fragment shader inputs that are
    signed or unsigned integers, integer vectors, or any
    double-precision floating- point type, or any sampler or image
    type must be qualified with the interpolation qualifier "flat"."

v3: - update spec comment formatting

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
8834c74fef glsl: allow bindless samplers/images as vertex shader inputs
From section 4.3.4 of the ARB_bindless_texture spec:

   "(modify third paragraph of the section to allow sampler and
    image types) ...  Vertex shader inputs can only be float,
    single-precision floating-point scalars, single-precision
    floating-point vectors, matrices, signed and unsigned integers
    and integer vectors, sampler and image types."

v3: - update spec comment formatting

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
015c0b4a34 glsl: allow bindless samplers/images as varying variables
From section 4.3.4 of the ARB_bindless_texture spec:

   "(modify third paragraph of the section to allow sampler and image
    types) ...  Vertex shader inputs can only be float,
    single-precision floating-point scalars, single-precision
    floating-point vectors, matrices, signed and unsigned integers
    and integer vectors, sampler and image types."

From section 4.3.6 of the ARB_bindless_texture spec:

   "Output variables can only be floating-point scalars,
    floating-point vectors, matrices, signed or unsigned integers or
    integer vectors, sampler or image types, or arrays or structures
    of any these."

v3: - add spec comment

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
89e37f9703 glsl: allow input memory qualifiers for images
ARB_bindless_texture spec allows images to be declared as
shader inputs.

v2: - put the */ on the following line (Timothy Arceri)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
242964ca5c glsl: allow image qualifiers inside structures
ARB_bindless_texture allows to declare images inside structures
which means that qualifiers like writeonly should be allowed.

I have a got a confirmation from Jeff Bolz (one author of the spec),
because the spec doesn't clearly explain this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
48b7882200 glsl: allow bindless images to be declared inside structures
The spec doesn't clearly state this, but I have got clarification
from the spec authors.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
e1eb30975a glsl: allow bindless samplers/images inside interface blocks
From section 4.3.7 of the ARB_bindless_texture spec:

   "(remove the following bullet from the last list on p. 39, thereby
    permitting sampler types in interface blocks; image types are also
    permitted in blocks by this extension)"

    * sampler types are not allowed

v3: - update the spec comment
    - update the glsl error message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
75cc83747e glsl: allow bindless samplers/images as function return
The ARB_bindless_texture spec doesn't clearly state this, but as
it says "Replace Section 4.1.7 (Samplers), p. 25" and,
"Replace Section 4.1.X, (Images)", this should be allowed.

v3: - add spec comment
    - update the glsl error message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
cb405f170b glsl: allow bindless samplers/images as out and inout parameters
From section 4.1.7 of the ARB_bindless_texture spec:

   "Samplers can be used as l-values, so can be assigned into and used
    as "out" and "inout" function parameters."

From section 4.1.X of the ARB_bindless_texture spec:

   "Images can be used as l-values, so can be assigned into and used as
    "out" and "inout" function parameters."

v3: - add spec comment
    - update the glsl error message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
4c084f18fd glsl: allow to declare bindless samplers/images as non-uniform
From section 4.1.7 of the ARB_bindless_texture spec:

   "Samplers may be declared as shader inputs and outputs, as uniform
    variables, as temporary variables, and as function parameters."

From section 4.1.X of the ARB_bindless_texture spec:

   "Images may be declared as shader inputs and outputs, as uniform
    variables, as temporary variables, and as function parameters."

v3: - add validate_storage_for_sampler_image_types()
    - update spec comment
    - update the glsl error message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
115d938cea glsl: process bindless/bound layout qualifiers
This adds bindless_sampler and bound_sampler (and respectively
bindless_image and bound_image) to the parser.

v3: - add an extra space in apply_bindless_qualifier_to_variable()
    - fix indentation in merge_qualifier()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
cf52b8cd21 glsl: do not make sampler/image types readonly variables
In plain GLSL, sampler and image types can only be declared
uniform-qualified global variables or 'in' function parameters.

Setting the read_only flag seems quite useless because other
checks will prevent sampler/image variables to be assigned and
also because the flag is not set for atomic_uint types which are
opaque types.

This will also help for ARB_bindless_texture because samplers
and images can be assigned when they are considered bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
c618f31065 glsl: make sampler/image scalar types
As a side effect, this will magically fix std140/std430 interfaces
for bindless samplers/images and will help for implementing the
explicit conversions with constructors.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
33931e4062 glsl: make count_attribute_slots() returns 1 for samplers/images
For packed varyings.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
1f40343e9a glsl: make component_slots() returns 2 for samplers/images
Bindless samplers/images are 64-bit unsigned integers, which
means they consume two components as specified by
ARB_bindless_texture.

It looks like we are not wasting uniform storage by changing
this because default-block uniforms are not packed. So, if
we use N uint uniforms, they occupy N * 16 bytes in the
constant buffer. This is something that could be improved.

Though, count_uniform_size needs to be adjusted to not count
a sampler (or image) twice.

As a side effect, this will probably break the cache if you
have one because it will consider sampler/image types as
two components.

v3: - update the comments

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
becc87b84a glsl: make sampler/image types as 64-bit
The ARB_bindless_texture spec says:

   "Samplers are represented using 64-bit integer handles."

and,

   "Images are represented using 64-bit integer handles."

It seems simpler to always consider sampler and image types
as 64-bit unsigned integer.

This introduces a temporary workaround in _mesa_get_uniform()
because at this point no flag are used to distinguish between
bound and bindless samplers. This is going to be removed in a
separate series. This avoids breaking arb_shader_image_load_store-state.

v3: - update the comment slightly

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
042eee2067 glsl: add ARB_bindless_texture enable
This also adds the extension to the standalone GLSL compiler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-06 16:40:19 +02:00
Samuel Pitoiset
b08a9bf791 mesa: add ARB_bindless_texture to the extensions list
This is required for the following GLSL bits.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-05-06 16:40:19 +02:00
Fredrik Höglund
5ff4858111 radv/meta: fix restoring a push descriptor set
radv_bind_descriptor_set cannot be used to bind a push descriptor set
since a push descriptor set does not have a buffer list. However,
there is no need to add the buffers again when restoring a set, so
this fix is also an optimization.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-06 01:46:18 +02:00
Nicolas Boichat
f6ac3d0db6 configure.ac: Also match -androideabi tuple
On ARM Android platforms, the host_os tuple should be linux-androideabi,
so let's match both -android and -androideabi (or any other
-android* tuple) to determine if we should do an Android build.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-05 15:39:38 -07:00
Jason Ekstrand
e05e3e07ab anv/allocator: Only write to _vg_ptr if we have valgrind
This fixes the build when not building against valgrind headers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100945
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-05 12:49:51 -07:00
Daniel Stone
d4342b1398 i915: Fix build break with empty unreachable()
Actually put something in unreachable(), so as not to break the build on
a Friday evening.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Mark Janes <mark.a.janes@intel.com>
2017-05-05 18:24:44 +01:00
Marek Olšák
ee5908396e radeonsi: apply the tess+GS hang workaround to Polaris12 as well
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 18:55:03 +02:00
Daniel Stone
8b8af19065 i965: Set modifier for imported and duplicated images
When a buffer is being created from FD or GEM flink import, the current
API makes no provision for passing modifier information along with this.
Set the modifier for such images to DRM_FORMAT_MOD_INVALID.

Also preserve the modifier when duplicating an image, as will be done by
GBM when importing from a wl_buffer.

This doubly tripped up Wayland, as the images would first have been
created (as wl_buffers) with a 0 modifier, and then lost what modifier
they would've had when being duplicated into gbm_bos.

Fixes: d78a36ea62 ("i965/dri: Handle the linear fb modifier")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-05 17:34:10 +01:00
Daniel Stone
467332a0ab i965: Use helper function for modifier -> tiling
Use a helper function and struct to convert between a modifier and
tiling mode, so we can use it later for a tiling -> modifier lookup.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-05 17:34:10 +01:00
Samuel Pitoiset
485ece83ac radeonsi: fix build with GCC 4.8
Fixes: 7088b655e8 ("radeonsi: constify a bunch of the perfcounter structs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100937
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-05 18:29:30 +02:00
Samuel Pitoiset
92ab06e782 st/glsl_to_tgsi: fix renumber_registers() in presence of dead code
The TGSI DCE pass doesn't eliminate dead assignments like
MOV TEMP[0], TEMP[1] in presence of loops because it assumes
that the visitor doesn't emit dead code. This assumption is
actually wrong and this situation happens.

However, it appears that the merge_registers() pass accidentally
takes care of this for some weird reasons. But since this pass has
been disabled for RadeonSI and Nouveau, the renumber_registers()
pass which is called *after*, can't do its job correctly.

This is because it assumes that no dead code is present. But if
there is still a dead assignment, it might re-use the TEMP
register id incorrectly and emits wrong code.

This patches fixes the issue by recording writes instead of reads,
and this has the advantage to be faster.

This should fix Unigine Heaven on RadeonSI and Nouveau.

shader-db results with RadeonSI:

47109 shaders in 29632 tests
Totals:
SGPRS: 1923308 -> 1923316 (0.00 %)
VGPRS: 1133843 -> 1133847 (0.00 %)
Spilled SGPRs: 2516 -> 2518 (0.08 %)
Spilled VGPRs: 65 -> 65 (0.00 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
Code Size: 60095968 -> 60096256 (0.00 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 431889 -> 431889 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

It's still interesting to disable the merge_registers() pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 09:48:01 +02:00
Iago Toral Quiroga
7761cf6d01 anv/query: handle more cases of 'out of host memory'
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-05 08:53:33 +02:00
Nicolas Boichat
63b12b0c77 egl/android: Set EGLSurface.Lost to EGL_TRUE/EGL_FALSE
Lost is an EGLBoolean, so we should assign it to EGL_TRUE/EGL_FALSE,
not true/false.

Fixes: e5eace5868 ("egl/android: Mark surface as lost when dequeueBuffer fails")
Fixes: 0212db3504 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-04 20:09:10 -07:00
Jason Ekstrand
98cd512089 anv/allocator: Improve block pool growing asserts
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
24827fdf50 anv: Drop the instruction pool block size
Now that we can allocate states larger than the block size, we no longer
need a block size of 1MB which can be rather wasteful.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
955127db93 anv/allocator: Add support for large stream allocations
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
f82d3d38b6 anv/allocator: Allow state pools to allocate large states
Previously, the maximum size of a state that could be allocated from a
state pool was a block.  However, this has caused us various issues
particularly with shaders which are potentially very large.  We've also
hit issues with render passes with a large number of attachments when we
go to allocate the block of surface state.  This effectively removes the
restriction on the maximum size of a single state.  (There's still a
limit of 1MB imposed by a fixed-length bucket array.)

For states larger than the block size, we just grab a large block off of
the block pool rather than sub-allocating.  When we go to allocate some
chunk of state and the current bucket does not have state, we try to
pull a chunk from some larger bucket and split it up.  This should
improve memory usage if a client occasionally allocates a large block of
state.

This commit is inspired by some similar work done by Juan A. Suarez
Romero <jasuarez@igalia.com>.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
8c079b566e anv/allocator: Support pushing multiple blocks onto a free list at once
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
8769fb48fb anv/allocator: Add helpers for dealing with bucket sizes
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
12043ca696 anv/allocator: Add the capability to allocate blocks of different sizes
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
01170df262 anv/allocator: Rework a comment
This commit just fixes up the English a bit and re-flows the comment.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
bcc5d0defb anv/allocator: Tweak the block pool growing algorithm
The old algorithm worked fine assuming a constant block size.  We're
about to break that assumption so we need an algorithm that's a bit more
robust against suddenly growing by a huge amount compared to the
currently allocated quantity of memory.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
d3ed72e2c2 anv/allocator: Embed the block_pool in the state_pool
Now that the state stream is allocating off of the state pool, there's
no reason why we need the block pool to be separate.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
bb2a3f0df8 anv/allocator: Get rid of the ability to free blocks
Now that everything is going through the state pools, the block pool no
longer needs to be able to handle re-use.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
08413a81b9 anv: Allocate binding table blocks through the state pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
55f49e6b7e anv/allocator: Add support for "back" allocations to state_pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
49ecaf88d1 anv/allocator: Drop the block_size field from block_pool
Since the state_stream is now pulling from a state_pool, the only thing
pulling directly off the block pool is the state pool so we can just
move the block_size there.  The one exception is when we allocate
binding tables but we can just reference the state pool there as well.

The only functional change here is that we no longer grow the block pool
immediately upon creation so no BO gets allocated until our first state
allocation.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
30d63ffe26 anv/allocator: Pull the userptr part of block_pool_grow into a helper
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
c73ce41a48 anv/allocator: Roll fixed_size_state_pool into state_pool
The helper functions aren't really gaining us as much as they claim and
are actually about to be in the way.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
6d02ef011e anv/allocator: Remove the state_size field from fixed_size_state_pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
367031a5c8 anv: Get rid of a bunch of uses of size_t
We should only use size_t when referring to sizes of bits of CPU memory.
Anything on the GPU or just a regular array length should be a type that
has the same size on both 32 and 64-bit architectures.  For state
objects, we use a uint32_t because we'll never allocate a piece of
driver-internal GPU state larger than 2GB (more like 16KB).

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
e86aeecb6a anv/allocator: Convert the state stream to pull from a state pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
e049dea5b2 anv/allocator: Return a null state for zero-size allocations
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand
45e1829274 anv/allocator: Add no-valgrind versions of state_pool_alloc/free
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Dave Airlie
a096d8d3f7 radv: enable POLARIS12 support.
This just adds the chip in the right places.

We don't set the partial_vs_wave workaround, as radeonsi
doesn't, but have to confirm it's not required.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-05 11:07:40 +10:00
Chad Versace
e5eace5868 egl/android: Mark surface as lost when dequeueBuffer fails
This ensures that future calls to eglSwapBuffers and eglMakeCurrent emit
an error.

This patch is part of a series for fixing
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface
on Chrome OS x86 devices.

Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 17:46:34 -07:00
Chad Versace
0212db3504 egl/android: Cancel any outstanding ANativeBuffer in surface destructor
That is, call ANativeWindow::cancelBuffer in droid_destroy_surface().

This should prevent application deadlock when the app destroys the
EGLSurface after EGL has acquired a buffer from SurfaceFlinger
(ANativeWindow::dequeueBuffer) but before EGL has released it
(ANativeWindow::enqueueBuffer).

This patch is part of a series for fixing
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface
on Chrome OS x86 devices.

Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 17:46:33 -07:00
Chad Versace
23c86c74cc egl: Emit error when EGLSurface is lost
Add a new bool, _EGLSurface::Lost, and check it in eglMakeCurrent and
eglSwapBuffers. The EGL 1.5 spec says that those functions emit errors
when the native surface is no longer valid.

This patch just updates core EGL. No driver sets _EGLSurface::Lost yet.

I discovered that Mesa failed to detect lost surfaces while debugging an
Android CTS camera test,
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface.
This patch doesn't fix the test though, though, because the test expects
EGL_BAD_SURFACE when the surface becomes lost, and this patch actually
complies with the EGL spec. If I interpreted the EGL spec correctly,
EGL_BAD_NATIVE_WINDOW or EGL_BAD_CURRENT_SURFACE is the correct error.

Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 17:46:33 -07:00
Marek Olšák
69e6eab653 winsys/amdgpu: fix Polaris12 (RX 550) breakage
reported by Greg White.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100892
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
2017-05-05 01:21:32 +02:00
Kenneth Graunke
9377801fbd anv: Simplify Cherryview line handling.
We can just use the new CHVLineWidth field rather than an entirely
different generation's packing function.

v2: Inline the function (requested by Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-04 16:17:34 -07:00
Kenneth Graunke
31f094e691 i965: Fix line width on Cherryview.
We just add another field to gen8.xml for the Cherryview line width,
rather than trying to replicate the gymnastics done in the Vulkan
driver to use gen9 SF pack functions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-04 16:17:34 -07:00
Marek Olšák
194d9b27cc radeonsi/gfx9: allow the scratch buffer in HS and GS
It works now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
8ac4923a67 radeonsi: prevent race conditions when doing scratch patching
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
9dfc030b48 radeonsi: separate scratch state patching code into its own function
Picked from a different branch. When we stop using the scratch patching,
this function will not be called.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
1b01014cbf radeonsi/gfx9: also apply scratch relocations to the 1st shader of merged shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
e107c5a426 radeonsi/gfx9: set correct LLVM calling conventions for merged shaders
for scratch support

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
a47289f8fc radeonsi: remove unused parameters from si_shader_apply_scratch_relocs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
2d662c0cba radeonsi: inline si_llvm_shader_type into si_llvm_create_func
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
4c0e68dfe5 radeonsi: don't use util_memcpy_cpu_to_le32 for shader uploads
at least I think this is correct.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
7660c9ee4e radeonsi: make si_compile_llvm static
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
f8f8242e8b radeonsi: fold surrounding code into si_llvm_finalize_module
and rename to si_llvm_optimize_module.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
5dad0c3477 radeonsi: don't call eliminate_const_vs_outputs in shaders without VS exports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
12beef0374 radeonsi: drop support for LLVM 3.8
LLVM 3.8:
- had broken indirect resource indexing
- didn't have scratch coalescing
- was the last user of problematic v16i8
- only supported OpenGL 4.1

This leaves us with LLVM 3.9 and LLVM 4.0 support for Mesa 17.2.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
4d32b4ac99 radeonsi: stop using v16i8
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Marek Olšák
283a1d1e27 radeonsi/gfx9: make some PA & DB registers match the closed Vulkan driver
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-05 00:23:44 +02:00
Dave Airlie
efa19f5a54 radv: don't advertise transfer props unless we can do anything else
There is no reason to advertise transfer ability for formats we can't
use for anything else. This stops some CTS tests hitting internal
error for 64-bit types when they see the transfer flags.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-05 05:46:02 +10:00
Rob Clark
7b55a05159 freedreno/a5xx: compute shader support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Rob Clark
10c17f23b7 freedreno: core compute state support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Rob Clark
2ce449fa7d freedreno/ir3: compute shader support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Rob Clark
39c5a46a7a freedreno/a5xx: SSBO support
To simplify things for now, since all the gfx shader stages share a
single SSBO state block, only advertise SSBO support for fragment shader
(and compute when we have that).  We could possibly use a fixed-
partitioning of the SSBO index space to support SSBOs on other stages
without having to resort to shader variants.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Rob Clark
edde00f5f1 freedreno/ir3: SSBO/atomic support
TODO cwabbott pointed out a write-after-read hazzard, which effects both
this and arrays.  A write needs to depend on *all* reads since the last
write, not just the last read.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Rob Clark
4d841fbaae freedreno: core SSBO support
The generation-independent support for binding shader buffer objects.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Rob Clark
fd6ed7b562 freedreno/ir3: resync instr-a3xx.h/disasm-a3xx.c
Sync to the same files from freedreno.git to correct decoding of ldgb/
stgb instructions.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Rob Clark
5f7e55582e mesa/st: compute support for glsl_to_nir
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-04 13:48:06 -04:00
Rob Clark
53aa109ba2 nir: add pass to lower atomic counters to SSBO
This is equivalent to what mesa/st does in glsl_to_tgsi.  For most hw
there isn't a particularly good reason to treat these differently.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-04 13:48:06 -04:00
Rob Clark
fd500cc10b nir: add a C wrapper for glsl_type::get_interface_instance()
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04 13:48:06 -04:00
Emil Velikov
d230ef842c mapi_abi.py: remove no longer used --mode option
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-04 18:17:06 +01:00
Emil Velikov
3698fe295a mapy_abi.py: remove dead output_for_app generator
Used by the OpenVG codebase.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 18:17:06 +01:00
Emil Velikov
4562d88c1d mapi: replace mapi_table abstraction
Replace all instances of mapi_table with the actual struct _glapi_table.
The former may have been needed when the OpenVG was around. But since
that one is long gone, there' no point in having the current confusing
mix of the two.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-04 18:17:03 +01:00
Emil Velikov
424cb9d3ea mesa/tests: remove no longer needed HAVE_SHARED_GLAPI define
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:12:11 +01:00
Emil Velikov
edb7165b25 gl_table.py: always regenerate the complete struct _glapi_table
Currently we would generate a partial one as we do non-shared glapi.
At the same time since it's local, we don't care that much if we have a
few extra bytes of space in the table.

Drop the guard, which allows us to simplify both build system and code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:12:07 +01:00
Emil Velikov
6d6913ba5a glx/apple: remove empty variable SHARED_GLAPI_CFLAGS
Cc: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:12:05 +01:00
Emil Velikov
94d48864ea glx/windows: remove empty variable SHARED_GLAPI_CFLAGS
Cc: Jon Turney <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:12:02 +01:00
Emil Velikov
27a4fd5047 glx: automake: scons: remove unneeded GLX_SHARED_GLAPI define
There's no users in-tree that use it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:59 +01:00
Emil Velikov
4752ae876a targets/libgl-xlib: remove unneeded GLX_SHARED_GLAPI define
There's no users in-tree that use it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:56 +01:00
Emil Velikov
f885f1ae14 drivers/x11: remove unneeded GLX_SHARED_GLAPI define
There's no users in-tree that use it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:53 +01:00
Emil Velikov
6177d60a37 glx: glX_proto_send.py: use correct compile guard GLX_INDIRECT_RENDERING
The code itself has nothing to do with shared glapi, thus having it
behind GLX_SHARED_GLAPI is misleading. Use GLX_INDIRECT_RENDERING
instead.

The latter macro is set at global scope by the Autotools and Scons build
systems.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:50 +01:00
Emil Velikov
123c1f69c0 mapi/es*api: remove unneeded HAVE_SHARED_GLAPI guard
Always true, since GLES* requires shared glapi.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:47 +01:00
Emil Velikov
3186d33ae9 mesa/dri: remove unneeded HAVE_SHARED_GLAPI guard
Always true, since the dri modules required shared glapi.

With earlier commit (da410e6afa "configure: explicitly require shared
glapi for enable-dri") we even made that explicit during the configure
stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:44 +01:00
Emil Velikov
4ea5f4b74c gallium/dri: remove unneeded HAVE_SHARED_GLAPI guard
Always true, since the dri modules required shared glapi.

With earlier commit (da410e6afa "configure: explicitly require shared
glapi for enable-dri") we even made that explicit during the configure
stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:40 +01:00
Emil Velikov
51accecce7 mesa/dri: always link against shared glapi
Analogous to previous commit. Check with the extensive commit
description and bug report referenced.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:37 +01:00
Emil Velikov
79a26b663a gallium/dri: always link against shared glapi
In the early days of Xorg and Mesa we had multiple providers of the
GLAPI. All of those were the ones responsible for dlopening the DRI
module. Hence it was perfectly fine, and actually expected, for the DRI
modules to have unresolved symbols.

Since then we've moved the API to a separate shared library and no other
libraries provide the symbols.

Here comes the picky part:
It's possible that one uses old Xorg (where libglx.so provides the
GLAPI) and new Mesa (with DRI modules linking against libglapi.so).

That should still work, since the the libglx.so symbols will take
precedence over the libglapi.so ones.

I've verified this while running 1.14 series Xorg alongside this (and
next) patch.

It may seem a bit fragile, but that's of reasonably OK since all of the
affected Xorg versions have been EOL for years.

The final one being the 1.14 series, which saw its final bug fix release
1.14.7 in June 2014.

To ensure that the binaries do not have unresolved symbols add
-no-undefined and $(LD_NO_UNDEFINED), just like we do everywhere else
throughout mesa.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98428
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 18:11:29 +01:00
Emil Velikov
9d2aa6e506 anv: fix anv_gem_mmap comment to not mention NULL
The function cannot return NULL, update the comment accordingly.

Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-04 18:06:18 +01:00
Emil Velikov
b6643095ba eg: explicitly size dri2_to_egl_attribute_map[]
This way we'll get an implicit zero initialization of the remaining
members, as required by dri2_add_config().

Fixes: e5efaeb85c ("egl: polish dri2_to_egl_attribute_map[]")
Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 18:05:47 +01:00
Emil Velikov
2c60bf093e dri_interface.h: define __DRI_ATTRIB_MAX
Thus we can use the value to explicitly size arrays, instead of
__DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE + 1.

The latter seems magical and is error prone, as we add more dri
attributes.

v2: Fix off by one error (Tomasz)

Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 18:05:25 +01:00
Ben Boeckel
58f51f0754 scons: update for LLVM 4.0
LLVMDemangle, LLVMGlobalISel, and LLVMDebugInfoMSF are new.

Also update the comment to add irreader to the list of components.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-05-04 18:05:04 +01:00
Emil Velikov
c9c8e1c84d c11/threads: rework Windows thrd_current() comment
Drop the misleading "will not match the one returned by thread_create"
hunk and provide more clarity as to what/why GetCurrentThread() isn't
the solution we're looking for.

v2: Places brackets after function names (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-05-04 18:00:23 +01:00
Adam Jackson
f258815c7d egl/platform/drm: Don't take display ownership until gbm is initialized
If the gbm_create_device() call here actually did fail, any subsequent
eglTerminate on the display would segfault.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2017-05-04 12:52:18 -04:00
Adam Jackson
ddb99127a6 egl/x11: Honor the EGL_PLATFORM_X11_SCREEN_EXT attribute
Introduce _egl_display::Options::Platforms for private storage.
For X11 platforms we can use it for the screen number as set by
EGL_PLATFORM_X11_SCREEN_EXT.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2017-05-04 12:52:18 -04:00
Samuel Iglesias Gonsálvez
939b015736 anv: vkBindImageMemory() should return VK_ERROR_OUT_OF_{HOST,DEVICE}_MEMORY on failure
According to the spec we get VK_ERROR_OUT_OF_HOST_MEMORY or
VK_ERROR_OUT_OF_DEVICE_MEMORY on vkBindImageMemory failure.

Fixes returned value changed by b546c9d.

Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 15:13:08 +02:00
Samuel Pitoiset
9db9b2e8cd glsl: reject memory qualifiers with uniform blocks
The spec allows memory qualifiers to be used with image variables,
buffers variables and shader storage blocks. This patch also fixes
validate_memory_qualifier_for_type().

Fixes the following ARB_uniform_buffer_object test:

uniform-block-memory-qualifier.frag

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 14:01:59 +02:00
Samuel Pitoiset
f8003d2516 glsl: reject format qualifiers with non-image types everywhere
Including structures, interfaces and uniform blocks.

Fixes the following ARB_shader_image_load_store test:

format-layout-with-non-image-type.frag

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 14:01:56 +02:00
Samuel Pitoiset
9efea874b9 glsl: rework validate_image_qualifier_for_type()
It makes more sense to have two separate validate functions,
mainly because memory qualifiers are allowed with members of
shader storage blocks.

validate_memory_qualifier_for_type() will be fixed in a
separate patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 14:01:47 +02:00
Samuel Pitoiset
a5f82db380 glsl: rename image_* qualifiers to memory_*
It doesn't make sense to prefix them with 'image' because
they are called "Memory Qualifiers" and they can be applied
to members of storage buffer blocks.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-05-04 09:51:25 +02:00
Samuel Iglesias Gonsálvez
b546c9d318 anv: anv_gem_mmap() returns MAP_FAILED as mapping error
Take it into account when checking if the mapping failed.

v2:
- Remove map == NULL and its related comment (Emil)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

Fixes: 6f3e3c715a ("vk/allocator: Add a BO pool")
Fixes: 9919a2d34d ("anv/image: Memset hiz surfaces to 0 when binding memory")
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
2017-05-04 08:56:36 +02:00
Johnson Lin
a6fb943f3e nir/lower_tex: Fix minor error in YUV color conversion matrix
The matrix used for YCbCr to RGB is listed in:

    https://en.wikipedia.org/wiki/YCbCr

There was an error in converting the offsets from integers to unorm
values: 0.0625=16/256 should be 16.0/255,and 0.5=128.0/256 should be
128.0/255.  With this fix, the CSC result is bit aligned with wikipedia's
conversion result and FFMPeg's result.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100854
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2017-05-03 23:44:59 -07:00
Rafael Antognolli
da665d22f5 i965: Port gen4+ state emitting code to genxml.
On this patch, we port:
   - brw_polygon_stipple
   - brw_polygon_stipple_offset
   - brw_line_stipple
   - brw_drawing_rect

v2:
   - Also emit states for gen4-5 with this code.
v3:
   - Style fixes and remove excessive checks (Ken).

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 20:40:20 -07:00
Rafael Antognolli
c85b217ab0 i965: Port gen6+ 3DSTATE_CC_STATE_POINTERS state to genxml.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 20:40:09 -07:00
Rafael Antognolli
b47b845574 i965: Port gen6+ multisample state emitting code to genxml.
Emit 3DSTATE_MULTISAMPLE using brw_batch_emit.

v3:
   - Remove dead code (Ken)
   - Simplify #if/#endif (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 20:40:03 -07:00
Rafael Antognolli
158dcd8659 i965: Port gen4+ emit vertices code to genxml.
Some code that was placed in brw_draw_upload.c and exported to be used
by gen8+ was also moved to genX_state_upload, and the respective symbols
are not exported anymore.

v2:
   - Remove code from brw_draw_upload too
   - Emit vertices for gen4-5 too.
   - Use helper to setup brw_address (Kristian)
   - Use macros for MOCS values.
   - Do not use #ifndef NDEBUG on code that is actually used (Ken)
v3:
   - Style and code clenup (Ken)
   - Keep some of the common code inside brw_draw_upload.c (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 20:39:48 -07:00
Rafael Antognolli
46d8f9454f i965: Port push constant code to genxml.
The following states are ported on this patch:
   - gen6_gs_push_constants
   - gen6_vs_push_constants
   - gen6_wm_push_constants
   - gen7_tes_push_constants

v2:
   - Use helper to setup brw_address (Kristian)
v3:
   - Do not use macro for upload_constant_state (Ken)
   - Do not re-declare MOCS macro (Ken)
v4: (by Ken)
   - Drop more dead code, change brw->gen checks to GEN_GEN, style nits

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 20:38:20 -07:00
Rafael Antognolli
d729936c5e i965: Port gen6+ 3DSTATE_SCISSOR_STATE_POINTERS to use genxml.
Emit 3DSTATE_SCISSOR_STATE_POINTERS using brw_batch_emit, and pack the
scissor states using GENX(SCISSOR_RECT_pack), generated from genxml.

v3:
   - Remove old code (Ken)
   - Style fixes (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:52 -07:00
Rafael Antognolli
c5ddd4782c i965: Port gen7+ 3DSTATE_TE to genxml.
Emit 3DSTATE_TE on Gen7+ using brw_batch_emit helper.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:52 -07:00
Rafael Antognolli
98cce55317 i965: Port gen6+ blend state code to genxml.
Upload blend states using GENX(BLEND_STATE_ENTRY_pack), generated from
genxml.

v3:
   - style fixes (Ken)
   - cleanup to remove excessive #ifdef's (Ken)
   - remove memset (Ken)
   - disable blend.AlphaToCoverageDitherEnable on gen6 (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:52 -07:00
Rafael Antognolli
bc1ff4509d i965: Port gen6+ state emitting code to genxml.
Ported in this patch:
   - 3DSTATE_DS
   - 3DSTATE_GS
   - 3DSTATE_HS
   - 3DSTATE_VIEWPORT_STATE_POINTERS_SF_CL

v3:
   - Remove NEW_TRANSFORM blocks (Ken)
   - Bring back some comments and workaround for Ivybridge (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:52 -07:00
Rafael Antognolli
689a46f30e i965: Port gen6+ 3DSTATE_VS to genxml.
Emit 3DSTATE_VS on Gen6+ using brw_batch_emit helper, that uses pack
structs from genxml.

v2:
   - Use render_bo helper to setup brw_address (Kristian)
v3:
   - Bring back some comments for gen6 and remove _NEW_TRANSFORM blocks
   from gen7+.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
11ee4ac5e5 i965: Port gen8+ 3DSTATE_PS_EXTRA to genxml.
Emit 3DSTATE_PS_EXTRA on Gen8+ using brw_batch_emit helper, that uses
pack structs from genxml.

v3:
   - Style fixes and moving code around to be cleaner (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
46934d9594 i965: Port gen6+ 3DSTATE_WM to genxml.
Emit 3DSTATE_WM on Gen6+ using brw_batch_emit helper, that uses pack
structs from genxml.

v2:
   - Use render_bo helper to setup brw_address (Kristian)
   - Remove TODO and use BRW_PSCDEPTH_OFF.
v3:
   - A couple of style fixes (Ken)
   - Enable RASTRULE_UPPER_RIGHT on gen6+ instead of gen8+ (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
23f69dfc0f i965: Port gen7+ 3DSTATE_PS to genxml.
Emit 3DSTATE_PS on Gen7+ using brw_batch_emit helper, that uses pack
structs from genxml.

v2:
   - Use render_bo helper to setup brw_address (Kristian)
v3:
   - Style fixes and code cleanup (Ken)
v4:
   - More style fixes and code cleanup missed in v3

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
ddc6f4d069 i965: Port gen7+ 3DSTATE_SOL to genxml.
Emit 3DSTATE_SOL on Gen7+ using brw_batch_emit helper, that uses pack
structs from genxml.

v2:
   - Add helpers to assign struct brw_address (Kristian)
v3:
   - Rename MOCS -> SOBufferMOCS
   - Do not re-declare MOCS macros (Ken).
   - Style and code reorganization (Ken).

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
c5d6ee6ccb i965: Remove calculate_attr_overrides.
This function now lives inside genX_state_upload.c.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
072bcb8edc i965: Port Gen7+ 3DSTATE_SBE state to genxml.
Emit 3DSTATE_SBE on Gen7+ using brw_batch_emit helper, that uses pack
structs from genxml.

v2: - Use ACTIVE_COMPONENT_XYZW from gen9.xml.
v3: - Style fixes (Ken)
v4: #undef unconditionally (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
9f12d9166b i965: Port gen6+ 3DSTATE_SF to genxml.
Emit sf state on Gen6+ using brw_batch_emit helper, using pack structs
from genxml.

v3:
   - Reorganize code and reduce #if/#endif's (Ken)
   - Style fixes (Ken)
   - Always set AALINEDISTANCE_TRUE (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
e00d159f4d i965: Add brw_get_line_width_float.
That helper function returns the line width as a float, and is then used
by brw_get_line_width to return the fixed point width.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
13ac46557a i965: Port Gen8+ 3DSTATE_RASTER state to genxml.
Emits 3DSTATE_RASTER from genX_state_upload.c using pack structs from
genxml.

v3:
   - Style fixes (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
36c02ce448 i965: Port Gen6+ 3DSTATE_CLIP state to genxml.
Emit clip state on Gen6+ using brw_batch_emit helper, using pack structs
from genxml.

v3:
   - Lots style fixes (Ken)
   - Do not set CullTestEnableBitMask on Gen8+ (Ken)
v4:
   - Do not include brw_defines_common.h.
v5 (Ken): s/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Kenneth Graunke
dae5cc79c6 i965: Port Gen6+ DEPTH_STENCIL state to genxml.
This emits 3DSTATE_WM_DEPTH_STENCIL on Gen8+ or DEPTH_STENCIL_STATE
(and the relevant pointer packets) on Gen6-7.5 from a single function.

v3:
   - Watch for BRW_NEW_BATCH too on gen < 8 (Ken)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Kenneth Graunke
5a19d0bcec i965: Get real per-gen atom lists
Make atoms initalization compile conditionally based on the target
platform.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-03 18:57:51 -07:00
Kenneth Graunke
9afb98c429 i965: Add genxml related plumbing in a new genX_state_upload.c file.
v3 (Rafael): Drop aub parameter
v4 (Ken): Squash in gen4/g45 automake fixes

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-03 18:57:51 -07:00
Kenneth Graunke
3ae99de2e8 i965: Drop "Destination Element Offset" from Ironlake SGVs.
The Ironlake documentation is terrible, so it's unclear whether or not
this field exists there.  It definitely doesn't exist on Sandybridge
and later.  It definitely does exist on G45.

We haven't been setting it for our normal vertex attributes - just
the SGVs (VertexID, InstanceID, BaseVertex, BaseInstance, DrawID).
We should be consistent.  My guess is that it isn't necessary and
doesn't exist - this patch drops it from the SGVs elements, making
them follow the behavior of most attributes.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-03 18:57:51 -07:00
Rafael Antognolli
f321f695d3 genxml: Fix 3DSTATE_DEPTH_BUFFER length on gen5.
The hardware docs are wrong, but the length used in the xml is also
wrong.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Dave Airlie
7088b655e8 radeonsi: constify a bunch of the perfcounter structs.
This moves the structs from the data segment to the rodata segment,
which seems like the more correct place for them.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-04 11:52:47 +10:00
Timothy Arceri
ad282c0b9e st/glsl_to_tgsi: remove unrequired tgsi_get_opcode_info() call
This is already set for the instruction at initialisation.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-04 11:42:34 +10:00
Timothy Arceri
e2f3007665 mesa: make _mesa_accum() static
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-04 11:35:37 +10:00
Timothy Arceri
b549f054a6 mesa: tidy up accum.h
These were unused.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-05-04 11:35:37 +10:00
Timothy Arceri
e473fdcdab mesa/varray: make use of dispatch KHR_no_error support
Make use of dispatch KHR_no_error support for varray functions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 11:35:37 +10:00
Timothy Arceri
2f541f63ea glapi: add KHR_no_error support to dispatch table generation
This will allows us to create no error versions of functions
noted by a _no_error suffix. We also need to set a no_error
attribute equal to "true" in the xml.

V3: stop the no_error attribute being overwritten when functions
    alias another.
V2: tidy up suggested by Nicolai.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-04 11:35:36 +10:00
Bas Nieuwenhuizen
33ad6226a0 radv: Don't use FLAT_SHADE for constants.
Setting both offset to 0x20 and flat shade results in passthrough
mode instead of the constant.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: f205e19e4f "radv/ac: eliminate unused vertex shader outputs. (v2)"
2017-05-04 10:38:14 +10:00
Rafael Antognolli
91ab1ccbfe i965: Move MOCS macros to brw_context.h.
These macros are defined in brw_defines.h, which contains a lot of
macros that conflict with autogenerated code from genxml. But we need to
use them (the MOCS macros) in some of that same genxml code.

Moving them to brw_context.h solves that problem and we don't have to
include brw_defines.h.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:59:22 -07:00
Rafael Antognolli
2e5d65ccb6 anv: Use BRW_BARYCENTRIC_NONPERSPECTIVE_BITS from common header.
In a previous patch some enums were split out from brw_eu_defines.h, so
they could be used by genxml based code. anv can also benefit from this.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:58:55 -07:00
Rafael Antognolli
8fa8abef4b i965: Move enums to brw_compiler.h.
These enums live inside struct brw_wm_prog_data, so it makes sense to
keep them in the same header. It also allows to use them without
including brw_eu_defines.h.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:55:58 -07:00
Rafael Antognolli
a66743ce8d genxml: Update 3DSTATE_LINE_STIPPLE xml on gen6.
From the PRM, Line Stipple Inverse Repeat Count is on dw2, bits 31:16,
format U1.13.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:14 -07:00
Rafael Antognolli
7d5cc5b954 genxml: Normalize xml for 3DSTATE_CC_STATE_POINTERS.
- "COLOR_CALC_STATE Change" -> "Color Calc State Pointer Valid"
   - "Pointer to COLOR_CALC_STATE" -> "Color Calc State Pointer"
   - "BackFace" -> "Backface"

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli
b89805a7bc genxml: Normalize xml for 3DSTATE_MULTISAMPLE.
Name the options to "Pixel Location":
   - PIXLOC_CENTER -> CENTER
   - PIXLOC_UL_CORNER -> UL_CORNER

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli
c032cae9ff genxml: Rename "Function Enable" to "Enable".
Rename that field name on genxml for:
   - 3DSTATE_GS - gen6+
   - 3DSTATE_DS - gen7+
   - 3DSTATE_HS - gen7+

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli
5b4223dc8e genxml: Clip guardbands are float, not int.
This makes genxml create the right struct types, and generate the right
batch commands.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli
4266c372d9 genxml: 3DSTATE_VS rename Function Enable to Enable.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Kenneth Graunke
da299b7df3 genxml: Make "Reorder Mode" fields consistent.
Both GS and SOL have these fields.  Some were ReorderEnable = true,
some were ReorderMode = REORDER_TRAILING, and some were just TRAILING.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-03 16:41:07 -07:00
Rafael Antognolli
872ffb2221 genxml: Add alias for MOCS.
Use an alias, so we can set the same value as the #define's.

v3:
   - Call it "SO Buffer MOCS" to follow the most common naming scheme.
   - Add alias for gen7 and gen75 too (Ken).

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Rafael Antognolli
b5e652fc83 genxml: Add missing field values to 3DSTATE_SBE.
Fill out "Attribute Active Component Format" possible values.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Rafael Antognolli
273a10b3f1 genxml: Update xml for 3DSTATE_SF.
- Normalize "Anti-Aliasing Enable"
 - Add "Multisample Rasterization Mode" constants
 - Rename "Use Point Width on Vertex" to "Vertex"
 - Rename "Use Point Width from State" to "State"

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Rafael Antognolli
3f155ab290 genxml: Rename clip enable property.
There are two variants:
   - Clip Enable
   - CLIP Enable (on gen6)

Rename everything to Clip Enable.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Louis-Francis Ratté-Boulianne
e0aa2bd9cb genxml: Fill out Gen4, Gen45 and Gen5 XML
Add some more details to Gen4 and Gen45 and add what is needed
in Gen5 XML. This commit overwrite the previous work done on Gen4
and Gen45 as it contains more instructions and fixes some mistakes.
However, comments (dword boundaries) are lost in the process.

v3:
   - Set the type of some fields, instead of prefix. Also fix the
     SAMPLER_BORDER_COLOR_STATE fields of gen5.xml.

Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:40:52 -07:00
Jason Ekstrand
4201cc2dd3 anv: Implement VK_KHX_external_semaphore_fd
This implementation allocates a 4k BO for each semaphore that can be
exported using OPAQUE_FD and uses the kernel's already-existing
synchronization mechanism on BOs.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand
ef2e427d78 anv: Pull the guts of cmd_buffer_execbuf into a helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand
975c0f339f anv: Implement VK_KHX_external_semaphore
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand
298e054d0c anv: Implement VK_KHX_external_semaphore_capabilities
This just stubs things out.  Real external semaphore support will come
with VK_KHX_external_semaphore_fd.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand
65aa89e75f anv: Add a real semaphore struct
It's just a dummy for now, but we'll flesh it out as needed for external
semaphores.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Marek Olšák
f466683cb0 radeonsi/gfx9: fix gl_ViewportIndex
v2: remove unnecessary LLVMBuildAnd calls

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-03 22:58:27 +02:00
Marek Olšák
ec34632859 radeonsi/gfx9: set VGT_REUSE_OFF = 0
same as Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-03 22:58:27 +02:00
Christian Gmeiner
a8007ed687 etnaviv: add L8A8_UNORM texture format
No piglit regressions.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2017-05-03 22:43:10 +02:00
Andres Gomez
e4ae4d2789 glsl: Corrected some typos and error messages
v2: left code style/formatting corrections out.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-05-03 23:18:00 +03:00
Grazvydas Ignotas
8aab792e92 radv: don't leak DRM devices
After successful drmGetDevices2() call, drmFreeDevices() needs to be called.

Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-03 22:04:52 +03:00
Grazvydas Ignotas
898cbb491b radv: fix possible stack corruption
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.

Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-03 22:02:45 +03:00
Marek Olšák
b08715499e ac: eliminate duplicated VS exports
Only very few shaders have them (from 48486 shaders):

shaders/private/left_4_dead_2/765.shader_test - ac: 1 matches 2
shaders/private/left_4_dead_2/877.shader_test - ac: 1 matches 6
shaders/private/left_4_dead_2/2141.shader_test - ac: 1 matches 6
shaders/private/ue4_effects_cave/11.shader_test - ac: 4 matches 5
shaders/private/ue4_effects_cave/14.shader_test - ac: 5 matches 6
shaders/private/ue4_effects_cave/46.shader_test - ac: 5 matches 6
shaders/private/ue4_effects_cave/42.shader_test - ac: 4 matches 5
shaders/private/ue4_effects_cave/104.shader_test - ac: 4 matches 5
shaders/private/f1-2015/336.shader_test - ac: 3 matches 4
shaders/private/f1-2015/948.shader_test - ac: 6 matches 7
shaders/private/f1-2015/602.shader_test - ac: 0 matches 3
shaders/private/f1-2015/600.shader_test - ac: 0 matches 3
shaders/private/f1-2015/1214.shader_test - ac: 0 matches 1
shaders/private/f1-2015/988.shader_test - ac: 4 matches 5
shaders/private/ue4_elemental/149.shader_test - ac: 3 matches 4
shaders/private/ue4_elemental/346.shader_test - ac: 4 matches 5
shaders/private/ue4_elemental/178.shader_test - ac: 3 matches 4
shaders/private/ue4_elemental/136.shader_test - ac: 4 matches 5
shaders/private/ue4_elemental/168.shader_test - ac: 4 matches 5
shaders/private/ue4_elemental/690.shader_test - ac: 3 matches 4
shaders/private/ue4_elemental/19.shader_test - ac: 5 matches 6
shaders/private/dota2/1901.shader_test - ac: 0 matches 5
shaders/private/dota2/1357.shader_test - ac: 0 matches 5
shaders/private/dota2/1375.shader_test - ac: 0 matches 5
shaders/private/dota2/1369.shader_test - ac: 0 matches 5
shaders/private/dota2/1583.shader_test - ac: 0 matches 5
shaders/private/dota2/1811.shader_test - ac: 0 matches 5
shaders/private/dota2/1893.shader_test - ac: 0 matches 5
shaders/private/dota2/1533.shader_test - ac: 0 matches 5
shaders/private/dota2/1951.shader_test - ac: 0 matches 5
shaders/private/dota2/1361.shader_test - ac: 0 matches 5
shaders/private/mad_max/2792.shader_test - ac: 0 matches 1
shaders/private/mad_max/2794.shader_test - ac: 0 matches 1
shaders/private/mad_max/2780.shader_test - ac: 0 matches 1
shaders/private/mad_max/2902.shader_test - ac: 0 matches 1
shaders/private/bioshock-infinite/3050.shader_test - ac: 3 matches 7
shaders/private/bioshock-infinite/2544.shader_test - ac: 3 matches 6
shaders/private/bioshock-infinite/3062.shader_test - ac: 3 matches 8
shaders/private/bioshock-infinite/2012.shader_test - ac: 3 matches 7
shaders/private/bioshock-infinite/3058.shader_test - ac: 3 matches 7
shaders/private/bioshock-infinite/3270.shader_test - ac: 3 matches 7
shaders/private/bioshock-infinite/732.shader_test - ac: 3 matches 7
shaders/private/bioshock-infinite/3026.shader_test - ac: 3 matches 7
shaders/private/bioshock-infinite/3258.shader_test - ac: 3 matches 6
shaders/private/bioshock-infinite/3198.shader_test - ac: 3 matches 6
shaders/private/bioshock-infinite/3046.shader_test - ac: 3 matches 7
shaders/private/bioshock-infinite/3168.shader_test - ac: 3 matches 6
shaders/private/bioshock-infinite/2550.shader_test - ac: 3 matches 6
shaders/private/bioshock-infinite/3210.shader_test - ac: 3 matches 6
shaders/private/bioshock-infinite/3032.shader_test - ac: 3 matches 6
shaders/private/bioshock-infinite/668.shader_test - ac: 3 matches 7

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-03 20:55:00 +02:00
Marek Olšák
7647e90b15 ac: rename ac_eliminate_const_vs_outputs -> ac_optimize_vs_outputs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-03 20:55:00 +02:00
Marek Olšák
faa37475e9 ac: first parse VS exports before eliminating constant ones
A later commit will make use of this.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-03 20:55:00 +02:00
Jason Ekstrand
f8d7c23e1f anv: Trivially implement multiDrawIndirect
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
272b7e7d25 anv: Enable VK_KHX_multiview and SPV_KHR_multiview
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
3dbd7737d4 anv/cmd_buffer: Emit instanced draws for multiple views
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
32abb0e13c anv/cmd_buffer: Pull indirect draw parameter loading into a helper
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
0db7070330 anv/pipeline: Add shader lowering for multiview
v2 (Jason Ekstrand):
 - Take a view_mask rather than a whole subpass
 - Build the view mask into the VS shader key

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
ca5bdfdfc6 anv/pipeline: Add a subpass field to anv_pipeline
This simplifies the code a variety of places.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
c4549e05aa anv/pipeline: Call nir_gather_info later
We want to insert more lowering code that may insert system values and
we need to gather info after that lowering.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
dcb6a68bb4 anv: Move shader hashing to anv_pipeline
Shader hashing is very closely related to shader compilation.  Putting
them right next to each other in anv_pipeline makes it easier to verify
that we're actually hashing everything we need to be hashing.  The only
real change (other than the order of hashing) is that we now hash in the
shader stage.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
d6b8106eea anv/pass: Store the per-subpass view mask
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
e997f548de anv: Add the KHX_multiview boilerplate
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
0bed97006f anv/nir: Delete the apply_dynamic_offsets prototype
That pass hasn't existed since dd4db84640
but the prototype stuck around for no reason.

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
f903f78b72 spirv: Add support for SPV_KHR_multiview
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
99d0709553 spirv: Bump the SPIR-V header to the latest public version
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
bb41d9a1d3 compiler: Add a system value and varying for ViewIndex
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-03 11:25:46 -07:00
Bartosz Tomczyk
fcf941068e mesa/vbo: reduce prim array size
We always use only single element.

v2: Change single element arrays to variables

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-03 18:22:58 +02:00
Brian Paul
a30313abf6 mesa: add const qualifier on _mesa_valid_to_render()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-03 08:48:46 -06:00
Samuel Iglesias Gonsálvez
f57e234fdd i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions
The regioning parameters are now properly set by convert_to_hw_regs()
and we don't need to fix them in the generator. That latter fix
previously done in the generator was strictly speaking wrong for any
non-identity regions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez
aaeb1c99be i965/vec4: fix register width for DF VGRF and UNIFORM
On gen7, the swizzles used in DF align16 instructions works for element
size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that
in the rest of the code and prepare the instructions for this (scalarize_df()),
we need to set it to two again.

However, for DF align1 instructions, a width of 2 is wrong as we are not
reading the data we want. For example, an uniform would have a region of
<0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access
to the first 4.

This patch sets the default one to 4 and then modifies the width of
align16 instruction's DF sources when we translate the logical swizzle
to the physical one.

v2:
- Remove conditional (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez
7f728bce81 i965/vec4: fix vertical stride to avoid breaking region parameter rule
From IVB PRM, vol4, part3, "General Restrictions on Regioning
Parameters":

  "If ExecSize = Width and HorzStride ≠ 0, VertStride must
   be set to Width * HorzStride."

In next patch, we are going to modify the region parameter for
uniforms and vgrf. For uniforms that are the source of
DF align1 instructions, they will have <0, 4, 1> regioning and
the execsize for those instructions will be 4, so they will break
the regioning rule. This will be the same for VGRF sources where
we use the vstride == 0 exploit.

As we know we are not going to cross the GRF boundary with that
execsize and parameters (not even with the exploit), we just fix
the vstride here.

v2:
- Move is_align1_df() (Curro)
- Refactor exec_size == width calculation (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00
Dave Airlie
3bf3f9866c radv/ac: canonicalize the output for 32-bit float min/max.
This fixes:
dEQP-VK.glsl.builtin.precision.min.*
dEQP-VK.glsl.builtin.precision.max.*
dEQP-VK.glsl.builtin.precision.clamp.*

The problem is the hw doesn't compare denorms properly,
so we have to flush them, even though the spec says
flushing is optional, if you don't flush the results
should be correct.

The -pro driver changes the shader float mode,
it would be nice if llvm could grow that perhaps.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 12:55:34 +10:00
Dave Airlie
83e58b036e radv: flush f32->f16 conversion denormals to zero. (v2)
SPIR-V defines the f32->f16 operation as flushing denormals to 0,
this compares the class using amd class opcode.

Thanks to Matt Arsenault for figuring it out.

This fix is VI+ only, add a TODO for SI/CIK.

This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 12:55:34 +10:00
Bas Nieuwenhuizen
eeff7e1154 radv: Add userspace fence buffer per context.
Having it in the winsys didn't work when multiple devices use
the same winsys, as we then have multiple contexts per queue,
and each context counts separately.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 7b9963a28f "radv: Enable userspace fence checking."
2017-05-03 03:10:12 +02:00
Dave Airlie
2a2a21450b radv: enable lower_sub to fix loop unrolling.
Loop unroll asserts if it hits a sub, we don't really want
to lower subs as llvm handles these things, but do this for
now, until we can fix loop unroll to work with subs.

Fixes: 14ae0bfa5 (radv: Add NIR loop unrolling)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 09:03:43 +10:00
Bas Nieuwenhuizen
9e847eedd5 radv: Don't set dynamic state for pipelines with rasterizer dicard.
All of the dynamic states apply to rasterization & fragment processing,
so we don't need to set them if we don't rasterize.

We don't clear the dirty flags for them though, so we don't miss any
updates for the next pipeline with rasterization.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
2017-05-03 00:12:56 +02:00
Dave Airlie
a524704025 radv: flush more stages when semaphore are waiting.
This still doesn't give us complete pWaitDstStageMask support,
but it should provide enough to be correct if not as efficent as
possible.

If we have wait semaphores we must flush between submits and
flush the shaders as well.

This fixes the remaining fails in:
dEQP-VK.synchronization.op.single_queue.semaphore.*ssbo*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 07:21:31 +10:00
Samuel Pitoiset
e0e01895b0 glsl: set vector_elements to 1 for samplers
I don't see any reasons why vector_elements is 1 for images and
0 for samplers. This increases consistency and allows to clean
up some code a bit.

This will also help for ARB_bindless_texture.

No piglit regressions with RadeonSI.

This time the Intel CI system doesn't report any failures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-02 22:40:45 +02:00
Eric Anholt
ece06defe7 vc4: Use runtime CPU detection for whether NEON is available.
This will allow Raspbian's ARMv6 builds to take advantage of the new NEON
code, and could prevent problems if vc4 ends up getting used on a v7 CPU
without NEON.

v2: Drop dead NEON_SUFFIX (noted by Erik Faye-Lund)
2017-05-02 13:35:23 -07:00
Eric Anholt
a373f77662 vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.
Android.mk was setting the flag across the entire driver, so we didn't
have non-NEON versions getting built.  This was going to be a problem with
the next commit, when I start auto-detecting NEON support and use the
non-NEON version when appropriate.

Reviewed-by: Rob Herring <robh@kernel.org>
2017-05-02 13:35:23 -07:00
Eric Anholt
463b7d0332 gallium: Enable ARM NEON CPU detection.
I wrote this code with reference to pixman, though I've only decided to
cover Linux (what I'm testing) and Android (seems obvious enough).  Linux
has getauxval() as a cleaner interface to the /proc entry, but it's more
glibc-specific and I didn't want to add detection for that.

This will be used to enable NEON at runtime on ARMv6 builds of vc4.

v2: Actually initialize the temp vars in the Android path (noticed by
    daniels)
v3: Actually pull in the cpufeatures library (change by robher).
    Use O_CLOEXEC.  Break out of the loop when we find our feature.
v4: Drop VFP code, which was confused about what it was detecting and not
    actually used yet.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-05-02 13:35:23 -07:00
Dave Airlie
3c73063974 radv: fix stencil only clears.
If we are clearing stencil only, we still need to provide a
a valid Z output from the vertex shader, we can't rely
on the depth clear value having any meaning, as we use this
for the position output, and it could get clipped, so we
don't end up clearing anything.

Fixes:
dEQP-VK.renderpass.simple.stencil
since I added S8 support.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:31:20 +10:00
Philipp Zabel
b539335e50 renderonly: use drmIoctl
To restart interrupted system calls, use drmIoctl.

Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-02 22:22:53 +02:00
Philipp Zabel
cd8ee259c8 renderonly: drop resources on destroy
The renderonly_scanout holds a reference on its prime pipe resource,
which should be released when it is destroyed. If it was created by
renderonly_create_kms_dumb_buffer_for_resource, the dumb BO also has
to be destroyed.

Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-02 22:19:23 +02:00
Philipp Zabel
ab51cd2f26 renderonly: close transfer prime_fd
prime_fd is only used to transfer the scanout buffer to the GPU inside
renderonly_create_kms_dumb_buffer_for_resource. It should be closed
immediately to avoid leaking the DMA-BUF file handle.

Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-05-02 22:19:19 +02:00
Dave Airlie
09034aab64 radv/wsi: report presentation error per image request
This ports
0fcb92c17d
anv: wsi: report presentation error per image request

This fixes:
dEQP-VK.wsi.xlib.incremental_present.scale_none.*

Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:11:19 +10:00
Dave Airlie
ce0f692528 radv: minor pahole related improvements.
This just reduces the structs by 4-8 bytes each.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:03:07 +10:00
Dave Airlie
9399870ef0 radv/image: resize some surface members.
Oops meant to be part of previous series.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:03:02 +10:00
Dave Airlie
fe6d9c0825 radv: drop unused surface level members.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:00:42 +10:00
Dave Airlie
5d0f792f06 radv/image: drop blk_d
This was pretty much unused.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:00:38 +10:00
Dave Airlie
052487be4c radv: remove some members of radeon surface.
We would be storing this info twice per image, no need to,
remove it from the surface struct.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:00:35 +10:00
Dave Airlie
7e8d0a402b radv: move some image info into a separate struct.
This is to rework the surface code like radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 06:00:17 +10:00
Dave Airlie
d5400a5ec2 radv: provide a helper for comparing an image extents.
This just makes it easier to do the follow in cleanups of the surface.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-03 05:59:52 +10:00
Daniel Stone
80ac89a952 gbm/dri: Fix sign-extension in modifier query
When we were assembling the unsigned 64-bit query return from its
two signed 32-bit component parts, the lower half was getting
sign-extended into the top half. Be more explicit about what we want to
do.

Fixes gbm_bo_get_modifier() returning ((1 << 64) - 1) rather than
((1 << 56) - 1), i.e. DRM_FORMAT_MOD_INVALID.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2017-05-02 19:55:13 +01:00
Eric Anholt
fba6559a1e nir: Pick just the channels we want for bitmap and drawpixels lowering.
NIR now validates that SSA references use the same number of channels as
are in the SSA value.

v2: Reword commit message, since the commit didn't land before the
    validation change did.

Fixes: 370d68babc ("nir/validate: Validate that bit sizes and components always match")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Cc: <mesa-stable@lists.freedesktop.org>
2017-05-02 10:24:40 -07:00
Jason Ekstrand
6ef1bd4fa5 anv/tests: Create a dummy instance as well as device
This fixes crashes caused by 35e626bd0e
which made us start referencing the instance in the allocators.  With
this commit, the tests now happily pass again.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100877
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-05-01 17:06:40 -07:00
Bas Nieuwenhuizen
6681ab1f97 radv: Use correct stage for ready bit.
Set the bit in the same stage as the timestamp, instead always at top of pipe.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
2017-05-02 00:54:44 +02:00
Bas Nieuwenhuizen
568aec29d9 radv: Add top of pipe timestamp queries.
Does not fix brokenness with the ready bit.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-02 00:54:18 +02:00
Bas Nieuwenhuizen
14ae0bfa54 radv: Add NIR loop unrolling.
Not much effect on dota2/talos, but positive on deferred.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@itsqueeze.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-02 00:09:42 +02:00
Randy Xu
6f21b5601c i965: Solve Android native fence fd double close
The Android native fence in i965 has two fds: _EGLSync::SyncFd and
brw_fence::sync_fd.

The semantics of __DRI2fenceExtensionRec::create_fence_fd are unclear on
whether the DRI driver takes ownership of the incoming fd (which is the
same incoming fd from eglCreateSync).  i965 did take ownership, but all
other Mesa drivers do not; instead, they dup the incoming fd. As
a result, _EGLSync::SyncFd and brw_fence::sync_fd were the same fd, and
both egl_dri2 and i965 believed they owned it. On eglDestroySync, that
led to a double-close.

Fix the double-close by making brw_dri_create_fence_fd dup the incoming
fd, just like the other drivers do.

Signed-off-by: Randy Xu <randy.xu@intel.com>
Test: Run Vulkan and GLES stress test and no crash.
Fixes: 6403e37651 ("i965/sync: Implement fences based on Linux sync_file")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
[chadv: Polish the commit message]
Cc: mesa-stable@lists.freedesktop.org
2017-05-01 14:46:50 -07:00
Eric Anholt
d884d1a654 vc4: Only build the NEON code on arm32.
NEON is sufficiently different on arm64 that we can't just reuse this
code.  Disable it on arm64 for now.

v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build
    for a v8 CPU.

Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
2017-05-01 13:27:39 -07:00
Samuel Pitoiset
dec5b27b1b gm107/ir: add a missing assertion in emitISCADD()
For consistency, similar to the other emitters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-01 11:56:49 +02:00
Timothy Arceri
de8e01698f i965: Don't allocate uniform space for samplers
Samplers are encoded into the instruction word, so there's no need to
make space in the uniform file.

Previously matrix_columns and vector_elements were set to 0, making this
else case a no-op. Commit 75a31a20af changed that, causing malloc
corruption in thousands of tests on i965.

Fixes: 75a31a20af ("glsl: set vector_elements to 1 for samplers")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100871
2017-05-01 07:54:18 +10:00
Emil Velikov
a5c6ca9602 egl: initialise dummy_thread via _eglInitThreadInfo
Considering we cannot make dummy_thread a constant we might as well,
initialise by the same function that handles the actual thread info.

This way we don't need to worry about mismatch between the initialiser
and initialising function.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-04-29 14:40:53 +01:00
Emil Velikov
e5efaeb85c egl: polish dri2_to_egl_attribute_map[]
Annotate the array as static const and use C99 initialiser to populate
it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-29 14:40:09 +01:00
Ilia Mirkin
6af14778a3 gallium/targets: fix bool setting on BE architectures
val_bool and val_int are in a union. val_bool gets the first byte, which
happens to work on LE when setting via the int, but breaks on BE. By
setting the value properly, we are able to use DRI3 on BE architectures.
Tested by running glxgears with a NV34 in a G5 PPC.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: squash the vmwgfx hunk]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2017-04-29 14:32:20 +01:00
Emil Velikov
e5c24adc22 docs: add release calendar page and references to it
Add a page that has information which release is expected when and
associated information.

Reference to it from the "Releasing process" and "Release notes" pages.

v2:
 - Add Andres for 17.0.5
 - Rework table format to include the branch (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-04-29 13:43:06 +01:00
Emil Velikov
b1d45c3366 travis: bump MAKEFLAGS to -j4
The instance should have 2 cores, yet bumping the jobs to 4 should give
us a minor speed improvement.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:39:40 +01:00
Emil Velikov
27a0b383b9 travis: enable wayland support
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:39:40 +01:00
Emil Velikov
0e6a36cd3f travis: add Gallium state-tracker targets
Split into OpenCL and others, since the former is quite time consuming.

v2:
 - explicitly enable/disable components
 - build libvdpau 1.1 requirement
 - enable st/vdpau
 - build libva 1.6.2 (API 0.38) requirement

v3: Drop ubuntu-toolchain-r-test from sources (Andres)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:39:40 +01:00
Emil Velikov
b3f2076549 travis: model scons check target like the make one
Should make things a bit more consistent across the board.

Cc: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:39:40 +01:00
Emil Velikov
7e2af37474 travis: split the make target to three separate ones
Split the target to allow faster builds for each run.

The overall build time will be more, yet Travis runs multiple builds in
parallel so we're limited by the slowest one.

Things are split roughly as:
 - DRI loaders, classic DRI drivers, classic OSMesa, make check
 - All Gallium drivers (minus the SWR) alongside st/dri (mesa)
 - The Vulkan drivers - ANV and RADV, make check (anv)

v2:
 - rework RUN_CHECK to MAKE_CHECK_COMMAND
 - explicitly disable DRI loaders
 - generate linux/memfd.h locally and enable ANV
 - add libedit-dev

v3: Use printf to create the header (Andres).
v4: Really add the libedit + printf hunks.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:38:11 +01:00
Emil Velikov
8479fd8a10 travis: add "make swr" to the build matrix
v2: Quote OVERRIDE variables.
v3: Add missplaced libedit-dev hunk (Andres).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:35:17 +01:00
Emil Velikov
f55d98ac85 travis: add "scons swr" to the build matrix
Requires GCC 5.0 (due to the C++14 requirement) and LLVM 3.9.

v2: Enable the target, add libedit-dev, rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS, quote OVERRIDE
variables.
v4: Keep check target as-is (Andres)

Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:35:17 +01:00
Emil Velikov
85ee2c6cfc travis: add separate "scons" and "scons llvm" targets
The former does not require any LLVM, while the latter uses LLVM 3.3.

This way we'll quickly catch any LLVM 3.3+ functionality that gets
introduced where it shouldn't.

Add the full list of addons for each build permutation.

v2: Keep libedit-dev, rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4:
 - Remove llvm-toolchain-trusty-3.3 source (Andres)
 - Keep check target as-is (Andres)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:35:17 +01:00
Emil Velikov
56ba252e23 travis: split out matrix from env
With next commits we'll add a couple of more options.

v2: Rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4: Keep check target as-is, will rework with later patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:35:17 +01:00
Emil Velikov
abcfea23ad travis: rework "if test" blocks in the script section
Split the "if test" blocks so that we get more sensible output in case
of a failure.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:35:17 +01:00
Emil Velikov
ae713a7b79 travis: remove unused -dev packages
We effectively override libdrm-dev and libxcb-dri2-0-dev since we build
and install the package locally.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:35:17 +01:00
Emil Velikov
6431b98c54 travis: automatically manage ccache caching
According to the manual

"If you are using ccache, use:

  language: c # or other C/C++ variants

  cache: ccache

to cache $HOME/.ccache and automatically add /usr/lib/ccache to your
$PATH."

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:35:17 +01:00
Emil Velikov
486f28ba88 travis: enable apt cache
Provides a small, but consistent improvement.
Example numbers of the jobs added later in the series.

"make loaders/classic DRI" - 1s
"scons SWR" - 6s

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:34:55 +01:00
Andres Gomez
29322daef2 travis: add the possibility of using the txc-dxtn library
The txc-dxtn library implements the patented S3 Texture Compression
algorithm.

By default it won't be used but we add the possibility of setting the
USE_TXC_DXTN variable to yes in the travis web UI so it will be
installed and used for the scons tests.

Cc: Eric Anholt <eric@anholt.net>
Cc: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov: keep the LIB prefix, drop the LD_LIBRARY_PATH, fold URL]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-29 13:34:53 +01:00
Andres Gomez
7819d265c7 travis: replace Trusty-based LLVM toolchain apt-get with apt addon
Trusty's LLVM toochain repository was whitelisted some time ago. See:
479067c5e7

Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov]
 - set sudo to false
 - reference the Trusty change (Rhys)
 - keep libedit-dev
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-29 13:34:53 +01:00
Emil Velikov
cb820daa3f travis: explicitly LD_LIBRARY_PATH the local libraries
Some of the libraries may be dlopened, which may not always work due to
the non-standard prefix that we're using.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-04-29 13:34:53 +01:00
Brian Paul
52d69c2e8d st/wgl: whitespace, formatting fixes in stw_pixelformat.c
Trivial.
2017-04-28 22:01:34 -06:00
Charmaine Lee
ba8e2ea19a st/wgl: allow WGL_BIND_TO_TEXTURE_RGB_ARB for RGBA visuals
We do not need to restrict WGL_BIND_TO_TEXTURE_RGB_ARB to
RGB visuals only. It can be supported with RGBA visuals as well.

This fixes the early exit of cinebench-r15-test trace.

Tested with cinebench-r15, piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-28 22:01:24 -06:00
Brian Paul
d06045dfdd st/wgl: use ARRAY_SIZE() macro in wglChoosePixelFormatARB()
Trivial.
2017-04-28 21:37:07 -06:00
Brian Paul
394f8dacbc st/wgl: whitespace/formatting fixes in stw_ext_pixelformat.c
Trivial.
2017-04-28 21:37:06 -06:00
Neha Bhende
197907c926 svga: implement sRGB rendering for imported surfaces
If texture is imported and templ format is sRGB, use compatible sRGB format
to the imported texture format while creating surface view.

tested with MTT piglit, glretrace, viewperf and conform

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-28 21:03:06 -06:00
Neha Bhende
1b415a5b28 svga: add function svga_linear_to_srgb()
This function will return compatible svga srgb format for corresponding
linear format

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-28 21:03:06 -06:00
Neha Bhende
6e06e281c6 glx: add missing sRGB attribute check in fbconfigs_compatible()
This patch will allow driver to choose srgb capable FBconfig
if GLX_FRAMEBUFFER_SRGB_CAPABLE_ARB attribute is 1

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-28 21:03:06 -06:00
Thomas Hellstrom
ca59fd1706 svga: Add a more elaborate format compatibility determination v2
dri3 is a bit sloppy about its format compatibility requirements, so add
a possibility to import xrgb surfaces as argb textures and vice versa.

At the same time, make the svga_texture_from_handle() function a bit more
readable and fix the error path where we leaked a winsys surface.

v2: Addressed review comments by Brian.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-28 21:03:06 -06:00
Tim Rowley
18d5c452d0 swr/rast: add memory api to SwrGetInterface()
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:57:09 -05:00
Tim Rowley
a46539af11 swr/rast: use gather instruction for odd format fetch
Small fetch performance optimization - use gather instruction
for odd format fetch instead of slow emulated code.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:57:02 -05:00
Tim Rowley
eff909de7d swr/rast: enable SIMD16 8x2 tile backend
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:56:56 -05:00
Tim Rowley
5fde2ae533 swr/rast: add SwrInit() to init backend/memory tables
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:56:50 -05:00
Tim Rowley
e8d58049f6 swr/rast: increment depth/stencil tile pointer in SIMD16 BE
Misplaced #endif preventing depth and stencil hot tile pointers
from incrementing in SIMD16 8x2 configuration of BackendPixelRate.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:56:42 -05:00
Tim Rowley
d4c1486737 swr/rast: add SwrGetInterface() function to return api
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:56:34 -05:00
Tim Rowley
dabd0499a6 swr/rast: enable per-warp scratch space for CS
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:56:28 -05:00
Tim Rowley
0424e6249a swr/rast: reduce simd{16}vertex stack for VS output
Frontend - reduce simdvertex/simd16vertex stack usage for VS output in
ProcessDraw, fixes stack overflow in some of the deeper call stacks under
SIMD16.

1. Move the vertex store out of PA_FACTORY, and off the stack
2. Allocate the vertex store out of the aligned heap (pointer is
   temporarily stored in TLS, but will be migrated to thread pool
   along with other frontend temporary buffers).
3. Grow the vertex store as necessary for the number of verts per
   primitive, in chunks of 8/4 simdvertex/simd16vertex

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:56:17 -05:00
Tim Rowley
536baf507e swr/rast: remove default argument from SwrSync()
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:56:11 -05:00
Tim Rowley
145bf5aa5b swr/rast: remove unused variables in the SIMD16 FE
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:55:57 -05:00
Tim Rowley
20f3a30219 swr/rast: move construction of const above goto
Fixes gcc error for SIMD16 FE.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:55:50 -05:00
Tim Rowley
feefd3ef4e swr/rast: name threads to aid debugging
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:55:40 -05:00
Tim Rowley
9b907599b6 swr/rast: disable buffer overrun warning for Assemble()
Disabling buffer overrun warning for Assemble(uint32_t slot,
simdvector *verts) due to what looks like a MSVC compiler bug
when compiling the SIMD16 FE.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:55:33 -05:00
Tim Rowley
d523b82498 swr/rast: clean up clipper comments
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:55:26 -05:00
Tim Rowley
8c0e0bf141 swr/rast: add SIMDAPI decorators in binner/clipper
Fixes MSVC errors with SIMD16 FE.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:55:20 -05:00
Tim Rowley
42d804b2a3 swr/rast: add additional jit utility functions
Not used yet.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:55:02 -05:00
Tim Rowley
a373f1f27a swr/rast: more flexible max attribute slots
Ability to allocate space for an arbitrary number (at compile time)
of positions in the vertex layout.

Removes KNOB_NUM_ATTRIBUTES from knobs.h, replaces the VTX slot
number #defines with the SWR_VTX_SLOTS enum (which contains
replacement for NUM_ATTRIBUTES: SWR_VTX_NUM_SLOTS)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-28 19:53:39 -05:00
Kenneth Graunke
54d42cd976 i965: Drop BRW_NEW_CONTEXT from 3DSTATE_DS/GS on Gen7-7.5.
We already have BRW_NEW_BATCH, which completely covers all the cases
that BRW_NEW_CONTEXT would handle.  Drop it.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-28 17:03:33 -07:00
Kenneth Graunke
1d0e974406 i965: Drop _NEW_TRANSFORM from 3DSTATE_DS/GS on Gen7-7.5.
There's no reason for this as far as I can tell.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-28 17:03:33 -07:00
Kenneth Graunke
a1f12574b0 i965: Set point rasterization rule to UPPER_RIGHT on Gen6-7.5.
Gen4-5 and Gen8+ already set this, but Gen6-7.5 did not.  We ought to
be consistent - the answer depends on the API, not the hardware generation.

The Sandybridge PRM says about RASTRULE_UPPER_RIGHT:

   "To match OpenGL point rasterization rules (round to +infinity, where
    this is the upper right direction wrt OpenGL screen origin of lower
    left).

So this is likely the one we should use.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-04-28 17:03:33 -07:00
Kenneth Graunke
4878ab9bd4 i965: Always set AALINEDISTANCE_TRUE on Sandybridge.
We set this unconditionally on every other platform.  Zero (Manhattan)
isn't even listed as an option in the Sandybridge docs - only "true".

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-28 17:03:33 -07:00
Kenneth Graunke
b625bcc601 i965: Use true AA line distance on G45/Ironlake.
The original Broadwater and Crestline platforms computed antialiased
line distances using "manhattan" distance, aka a + b = c.  Eaglelake
and Cantiga added "true" distance, which apparently does something
like max(a, b) + min(a, b) / 4.  Not exactly "true", but at least
more accurate.

The G45 documentation indicates that the old manhattan distance setting
is "only for debug purposes" and should never be used.  The Ironlake
documentation no longer mentions AALINEDISTANCE_MANHATTAN, though it
does still contain the narrative about the feature.

At any rate, we should use the more accurate mode.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-28 17:03:33 -07:00
Andres Gomez
81149c8f52 docs: add news item and link release notes for 17.0.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-04-29 01:21:17 +03:00
Andres Gomez
e06aec99f2 docs: add sha256 checksums for 17.0.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 6cb65ce2d3)
2017-04-29 01:20:51 +03:00
Andres Gomez
0ad8c4f375 docs: add release notes for 17.0.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 61b134a862)
2017-04-29 01:19:51 +03:00
Marek Olšák
7a515a607c radeonsi: don't load unused compute shader input SGPRs and VGPRs
Basically, don't load GRID_SIZE or BLOCK_SIZE if they are unused, determine
whether to load BLOCK_ID for each component separately, and set the number
of THREAD_ID VGPRs to load. Now we should get the maximum CS launch wave
rate in most cases.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:57:44 +02:00
Marek Olšák
46e48d4044 tgsi/scan: record compute shader system value usage
v2: just do indexing with swizzle[i]

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
fa15436e63 radeonsi: add a HUD query for draw calls with primitive restart
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
55445ff189 radeonsi: tell LLVM not to remove s_barrier instructions
LLVM 5.0 removes s_barrier instructions if the max-work-group-size
attribute is not set. What a surprise.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
0490074cab radeonsi: fix tess offchip offset for per-patch attributes
We need 4 more bits there. I don't know what is fixed by this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
4e50062028 radeonsi: pass tessellation ring addresses via user SGPRs
This removes s_load_dword latency for tess rings.

We need just 1 SGPR for the address if we use 64K alignment. The final asm
for recreating the descriptor is:

    // s2 is (address >> 16)
    s_mov_b32 s3, 0
    s_lshl_b64 s[4:5], s[2:3], 16
    s_mov_b32 s6, -1
    s_mov_b32 s7, 0x27fac

v2: bitcast the descriptor type from v2i64 to v4i32

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
2823e15f60 radeonsi: use si_insert_input_ret in si_llvm_emit_tcs_epilogue
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
9fd9a7d0ba radeonsi: remove VS epilog code, compile VS with PrimID export on demand
The use of PrimID in the pixel shader is too rare to deserve such
a sizable support code.

The initial idea of the VS epilog was to move the clipping code there and
remove it based on states, but optimized variants are now used to do that
and are easier to support, so the VS epilog has turned out to be not so
useful.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
3b2e93e472 radeonsi: get InstanceID from VGPR1 (or VGPR2 for tess) instead of VGPR3
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
678d568c7b radeonsi: don't load PrimID in TES if it's not used
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
808c33f6f0 radeonsi: explain (non-)monolithic shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
fc478248f3 radeonsi/gfx9: enable OpenGL 4.5
Tentatively enable it, expecting the scratch buffer support to be done before
the next Mesa release.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
ed9a51cd3b radeonsi/gfx9: 2nd shader of merged shaders should hold a reference of the 1st
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
ef40937854 radeonsi: add reference counting for shader selectors
The 2nd shader of merged shaders should take a reference of the 1st shader.
The next commit will do that.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
6c15e15af4 radeonsi/gfx9: set VGT_VERTEX_REUSE for ES in ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
887ef1de34 radeonsi/gfx9: set TES registers for merged ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
49cd0cbfd5 radeonsi/gfx9: disallow scratch buffer for LS-HS and ES-GS
not implemented yet

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
2857b14bba radeonsi/gfx9: always compile monolithic ES-GS (asynchronously)
In addition to the non-monolithic variant.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
a82398a8f5 radeonsi/gfx9: add support for monolithic ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
6a9c20fdd5 radeonsi/gfx9: make sure the 1st shader's main part exists for merged shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
7df682c291 radeonsi/gfx9: select shader parts for non-monolithic ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
cd99c442c4 radeonsi/gfx9: add GS prolog support for merged ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
e0570bc283 radeonsi/gfx9: add VS prolog support for merged ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
6b93452b24 radeonsi/gfx9: pass GS input SGPRs and VGPRs from the ES part to GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
37e22ab65e radeonsi/gfx9: store ES outputs to LDS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
d616c57342 radeonsi/gfx9: load GS inputs from LDS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
fc781fa0ab radeonsi/gfx9: get GS wave ID from the correct input
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
bcaf905129 radeonsi/gfx9: add the function signature of merged ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
8b220877ad radeonsi/gfx9: set registers and shader key for merged ES-GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
ab197ad8d1 radeonsi/gfx9: add GS user SGPRs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
b2f5d03152 radeonsi: rename declare_tess_lds -> declare_lds_as_pointer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
e3caa1cd36 radeonsi: simplify some shader type conditions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
021e65640e radeonsi: rename the swizzle parameter of lds_store
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
dcea7e5d19 radeonsi: add si_shader::prolog2
For a GS prolog in merged ES-GS.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
eb35238ffe radeonsi/gfx9: move RW_BUFFERS to s[0:1] for merged shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
0af00f179e radeonsi/gfx9: add support for monolithic merged LS-HS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
0d6d25475d radeonsi/gfx9: set EXEC for non-mono merged shaders, add a barrier between them
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
a84a6feac9 radeonsi/gfx9: don't store the HS control word
GFX9 doesn't have it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
1d90ecd3a5 radeonsi/gfx9: pass inputs from LS to TCS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
cbd1bc2e3e radeonsi/gfx9: add TCS epilog support for merged LS-HS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
f11ced475e radeonsi/gfx9: add VS prolog support for merged LS-HS
HS input VGPRs must be reserved.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
82a0e4f658 radeonsi/gfx9: merged shaders have scratch offset at the beginning
also, screen wasn't initialized for compute shaders

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
0c253557b2 radeonsi/gfx9: define LS-HS main shader function prototype
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
852ea69a2d radeonsi: assign VS/TCS/TES/GS shader input parameter locations dynamically
They will vary with merged stages.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
067dacd1b1 radeonsi/gfx9: define and set LS-HS user SGPRs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
0588146cb0 radeonsi/gfx9: set up shader registers for merged LS-HS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
62abdb17bb radeonsi/gfx9: add initial code generation for non-monolithic merged LS-HS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
c73d9bd643 radeonsi: separate out code for selecting the VS prolog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
a98c9ba580 radeonsi/gfx9: add si_shader::previous_stage for merged shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
cfb0798bb3 radeonsi/gfx9: enlarge num_input_sgprs in shader keys due to higher hw limit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
4ab36e0ebc radeonsi/gfx9: update the summary of shader stage configs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
9d6ed572d9 radeonsi: adjust the signature of si_get_vs_prolog_key
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
b1ed3ffc56 radeonsi: separate out VS prolog key generation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
e4542f00ce radeonsi: separate out VS prolog key printing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
983d7e743e radeonsi: code shuffling in si_emit_derived_tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
130e198c49 radeonsi: separate out TGSI initialization of si_shader_context
so that we can put multiple different TGSI shaders into one module.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák
c3f37e9b50 st/mesa: use min_index and max_index directly from vbo
also remove the incorrect comment about primitive restart.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:46:44 +02:00
Marek Olšák
53cd67859d vbo: set min_index = 0 so gallium can use the value directly
We could also remove index_bounds_valid and use max_index != ~0 instead.
Opinions on that are welcome.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:46:44 +02:00
Matt Turner
ee70937d15 Revert "glsl: reject image qualifiers with non-image types inside uniform blocks"
This reverts commit 24011ead71.

This causes lots of ES 3.1 CTS tests to fail to compile a bit of code
like:

   layout(binding = 0) buffer InOut
   {
        highp uint inputValues[384];
        highp uint outputValues[384];
        coherent highp uint groupValues[64];      <-----
   } sb_inout;

   error: memory qualifiers may only be applied to images
2017-04-28 12:31:20 -07:00
Brian Paul
27469aa72e st/mesa: add more fallback gallium formats for GL integer formats
The VMware driver has a limited set of integer texture formats.  We
often have to fall back to 4-component formats when 1- or 2-component
formats are missing.

This fixes about 8 integer texture Piglit tests with the VMware driver
on Linux.  We've had this code in-house for a long time but I guess it
was never up-streamed to Mesa master.

This shouldn't regress any other drivers since we're either choosing
an earlier format in the list, or failing anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-28 13:12:31 -06:00
Brian Paul
6b60153f04 mesa: optimize color_buffer_writes_enabled()
Return as soon as we find an existing color channel that's enabled for
writing.  Typically, this allows us to return true on the first loop
iteration intead of doing four iterations.

No piglit regressions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-28 13:12:31 -06:00
Brian Paul
054fb129e1 st/mesa: whitespace clean-ups in st_manager.c
Trivial.
2017-04-28 13:12:31 -06:00
Matt Turner
b64da3d14e Revert "glsl: set vector_elements to 1 for samplers"
This reverts commit 75a31a20af.

This breaks thousands of tests on i965 with malloc corruption.
2017-04-28 11:48:57 -07:00
Chad Versace
85ca563b58 anv: Drop 'x11' prefix from non-X11 WSI funcs
Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The
functions are used by Wayland WSI too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2017-04-28 08:54:45 -07:00
Jason Ekstrand
ebd1bd6998 anv: Alphabetize KHR extensions
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-28 07:41:03 -07:00
Emil Velikov
c0139955fa ac: automake: sort sources list alphabetically
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-28 14:13:01 +01:00
Emil Velikov
ecc39b6650 ac: include all sources in the tarball
Fixes: e2659176ce ("radeonsi/ac: move vertex export remove to common code.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-28 14:13:00 +01:00
Nicolai Hähnle
9d346af322 st/mesa: remove redundant stfb->iface checks
stfb->iface is always non-NULL for an st_framebuffer. These checks
were incorrect, relying on out-of-bounds memory access in the
surface-less case of EGL_KHR_surfaceless_context.

v2: remove redundant stread check (Marek)

Reviewed-by: Marek Olšák <marek@olsak@amd.com> (v2)
2017-04-28 11:34:00 +02:00
Nicolai Hähnle
19b61799e3 st/mesa: don't cast the incomplete framebufer to st_framebuffer
The incomplete framebuffer is set for a surfaceless context. This leads to
the following error in piglit spec@egl_khr_surfaceless_context@viewport:

==26703==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7f6886e43240 at pc 0x7f68854db0fd bp 0x7ffca404b3b0 sp 0x7ffca404b3a0
READ of size 8 at 0x7f6886e43240 thread T0
    #0 0x7f68854db0fc in st_viewport ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57
    #1 0x556840176cdb in main tests/egl/spec/egl_khr_surfaceless_context/viewport.c:101
    #2 0x7f688edcf3f0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x203f0)
    #3 0x556840176e19 in _start (/home/nha/amd/piglit/bin/egl-surfaceless-context-viewport+0xe19)

0x7f6886e43240 is located 32 bytes to the left of global variable 'DummyRenderbuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:69:31' (0x7f6886e43260) of size 112
0x7f6886e43240 is located 8 bytes to the right of global variable 'IncompleteFramebuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:73:30' (0x7f6886e42de0) of size 1112
SUMMARY: AddressSanitizer: global-buffer-overflow ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57 in st_viewport

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek@olsak@amd.com>
2017-04-28 11:34:00 +02:00
Nicolai Hähnle
28ec0fc7b8 st/glsl_to_tgsi: make undef_src and undef_dst const 2017-04-28 11:34:00 +02:00
Nicolai Hähnle
6cbb8f99d2 st/glsl_to_tgsi: cleanup using visit_generic_intrinsic
It turns out that explicitly setting the writemask isn't actually
needed; emit_asm does the right thing based on looking at the types.
2017-04-28 11:34:00 +02:00
Nicolai Hähnle
ce55afc4d6 glsl: remove the shader_group_vote and shader_ballot expression ops
They are now no longer used.
2017-04-28 11:33:59 +02:00
Nicolai Hähnle
0aef96e00c glsl: implement arb_shader_ballot builtins using intrinsics 2017-04-28 11:33:59 +02:00
Nicolai Hähnle
2c30ea3fcd glsl: implement arb_shader_group_vote builtins via intrinsics
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-28 11:33:59 +02:00
Nicolai Hähnle
944455217b st/glsl_to_tgsi: implement shader_group_vote and shader_ballot intrinsics
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-28 11:33:59 +02:00
Nicolai Hähnle
99941a9724 glsl: add intrinsics for ARB_shader_group_vote and ARB_shader_ballot
These operations are currently implemented as IR expressions. However,
they cannot be transformed and moved in the way that other IR
expressions can because they have non-trivial interactions with
control-flow.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-28 11:33:58 +02:00
Samuel Pitoiset
24011ead71 glsl: reject image qualifiers with non-image types inside uniform blocks
Fixes the following ARB_shader_image_load_store tests:

format-layout-with-non-image-type.frag
memory-qualifier-with-non-image-type.frag

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-28 10:43:53 +02:00
Samuel Pitoiset
edb4a1ab2d glsl: introduce validate_image_qualifier_for_type() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-28 10:43:13 +02:00
Samuel Pitoiset
80738425e4 glsl: fix error when using format qualifiers with non-image types
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-28 10:43:04 +02:00
Timothy Arceri
22fa3d90a9 util/disk_cache: remove percentage based max cache limit
The more I think about it the more this seems like a bad idea.
When we were deleting old cache dirs this wasn't so bad as it
was unlikely we would ever hit the actual limit before things
were cleaned up. Now that we only start cleaning up old cache
items once the limit is reached the a percentage based max
cache limit is more risky.

For the inital release of shader cache I think its better to
stick to a more conservative cache limit, at least until we
have some way of cleaning up the cache more aggressively.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2017-04-28 14:35:27 +10:00
Jason Ekstrand
032861693e anv: Move queues, events, and semaphores to their own file
Things are about to get more complicated, especially as far as
semaphores are concerned.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
9bd1f03487 anv: Implement VK_KHX_external_memory_fd
This commit just exposes the memory handle type.  There's interesting we
need to do here for images.  So long as the user doesn't set any crazy
environment variables such as INTEL_DEBUG=nohiz, all of the compression
formats etc. should "just work" at least for opaque handle types.

v2 (chadv):
  - Rebase.
  - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when
    handleType == 0.
  - Move handleType-independency comments out of handleType-switch, in
    vkGetPhysicalDeviceExternalBufferPropertiesKHX.  Reduces diff in
    future dma_buf patches.

Co-authored-with: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
818b857914 anv: Use the BO cache for DeviceMemory allocations
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
494d6f65a7 anv/allocator: Add a BO cache
This cache allows us to easily ensure that we have a unique anv_bo for
each gem handle.  We'll need this in order to support multiple-import of
memory objects and semaphores.

v2 (Jason Ekstrand):
 - Reject BO imports if the size doesn't match the prime fd size as
   reported by lseek().

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
5d25ac6a4b anv: Implement VK_KHX_external_memory
This is the trivial implementation that just exposes the extension
string but exposes zero external handle types.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Chad Versace
354ca7a1d4 anv: Implement VK_KHX_external_memory_capabilities
This is a complete but trivial implementation. It's trivial becasue We
support no external memory capabilities yet.  Most of the real work in
this commit is in reworking the UUIDs advertised by the driver.

v2 (chadv):
  - Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR.
    Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of
    input structs, not the chain of output structs.
  - In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the
    input chain and the output chain separately. Reduces diff in future
    dma_buf patches.

Co-authored-with: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
d4d9258b61 anv/physical_device: Rename uuid to pipeline_cache_uuid
We're about to have more UUIDs for different things so this one really
needs to be properly labeled.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
02767cb4ff anv: Refactor device_get_cache_uuid into physical_device_init_uuids
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
35e626bd0e anv: Set EXEC_OBJECT_ASYNC when available
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
bd3a9813b9 anv/cmd_buffer: Use the device allocator for QueueSubmit
The command is really operating on a Queue not a command buffer and the
nearest object to that with an allocator is VkDevice.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>
2017-04-27 20:08:46 -07:00
Timothy Arceri
2bc06767e1 mesa: remove wip framebuffer code
This was added in 34b3b40af9 back in 2006. Seems it wasn't
needed.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-28 10:19:59 +10:00
Samuel Pitoiset
75a31a20af glsl: set vector_elements to 1 for samplers
I don't see any reasons why vector_elements is 1 for images and
0 for samplers. This increases consistency and allows to clean
up some code a bit.

This will also help for ARB_bindless_texture.

No piglit regressions with RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-27 22:52:21 +02:00
Jan Vesely
b295a52836 clover: Fix build since clang r301442
v2: rename default_ik -> ik_opencl

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-27 12:52:25 -04:00
Timothy Arceri
4e1f3afea9 disk_cache: use block size rather than file size
The majority of cache files are less than 1kb this resulted in us
greatly miscalculating the amount of disk space used by the cache.

Using the number of blocks allocated to the file is more
conservative and less likely to cause issues.

This change will result in cache sizes being miscalculated further
until old items added with the previous calculation have all been
removed. However I don't see anyway around that, the previous
patch should help limit that problem.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-04-27 20:44:00 +10:00
Timothy Arceri
ce41237151 disk_cache: reduce default cache size to 5% of filesystem
Modern disks are extremely large and are only going to get bigger.
Usage has shown frequent Mesa upgrades can result in the cache
growing very fast i.e. wasting a lot of disk space unnecessarily.

5% seems like a more reasonable default.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2017-04-27 20:43:50 +10:00
Dave Airlie
f4743763ce radeon/ac: remove assert causing regression
This assert wasn't in the original radeonsi code but I added
it without totally understanding the original code, it caused
some regressions in variable-indexing tessellation shaders.

Fixes: e2659176 radeonsi/ac: move vertex export remove to common code.
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-27 11:38:54 +01:00
Dave Airlie
550281f934 radeon/ac: fix build on llvm 3.8.1
Add missing include to fix build.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-27 11:22:12 +01:00
Boyan Ding
63df869f08 nvc0: Enable compute support for Pascal
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-27 11:11:15 +02:00
Boyan Ding
d03bfb078b nvc0: Add new launch descriptor format for GP100
v2:
Also handle the the new format in indirect dispatch
Use compute class check instead of chipset check

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-27 11:11:12 +02:00
Boyan Ding
2e35bd964e nvc0: Fix index of unk fields in nve4_cp_launch_desc
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-27 11:11:10 +02:00
Boyan Ding
4a9f7bfe90 nouveau: Fix indentation of maxwell compute class definitions
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-27 11:11:07 +02:00
Jason Ekstrand
c43b4bc85e anv: Don't place scratch buffers above the 32-bit boundary
This fixes rendering corruptions in DOOM.  Hopefully, it will also make
Jenkins a bit more stable as we've been seeing some random failures and
GPU hangs ever since turning on 48bit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620
Fixes: 651ec926fc "anv: Add support for 48-bit addresses"
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-04-27 02:04:57 -07:00
Dave Airlie
f205e19e4f radv/ac: eliminate unused vertex shader outputs. (v2)
This is ported from radeonsi, and I can see at least one
Talos shader drops an export due to this, and saves some
VGPR usage.

v2: use shared code.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-27 05:18:52 +01:00
Dave Airlie
e2659176ce radeonsi/ac: move vertex export remove to common code.
This code can be shared by radv, we bump the max to
VARYING_SLOT_MAX here, but that shouldn't have too
much fallout.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-27 05:17:47 +01:00
Dave Airlie
9da1045933 radv: fix regression in descriptor set freeing.
Since the host pool changes,

Fixes:
dEQP-VK.api.descriptor_pool.out_of_pool_memory

Fixes: 126d5ad "radv: Use host memory pool for non-freeable descriptors."
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-27 10:50:46 +10:00
Timothy Arceri
f8a2d00046 glsl: remove duplicate validation
Varying types have already been validated in
apply_type_qualifier_to_variable() by this point.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-27 08:21:28 +10:00
Timothy Arceri
52c76dbad3 glsl: use without_array() rather than get_scalar_type()
Here get_scalar_type() was just being use to remove the array
after that we converted it back to base_type anyway so just
use the without_array() helper.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-27 08:21:21 +10:00
Brian Paul
28feb63580 svga: fix vertex buffer binding issue
When we ran Viewperf11's Maya-03 test 3 we saw warnings about flushing
the command buffer with mapped buffers.  This happened when transitioning
from hardware rendering to a 'draw' fallback path.

The problem is the util_set_vertex_buffers_count() function doesn't do
exactly what we want in svga_hwtnl_vertex_buffers().  In a case such as
dst_count=2, dst={bufA, bufB}, count=1 and src={bufC}, when the function
returns we'll have dst_count=2 and dst={bufC, bufB}.  What we really want
is dst_count=1 and dst={bufC, NULL}.  As it was, we were telling the svga
device that there were two vertex buffers when in fact we really only
needed one for the subsequent drawing command.

In this particular case, we first did hardware drawing with {bufA, bufB}
then we transitioned to the 'draw' module, consuming vertex data from
bufA and bufB and writing the new vertex data to bufC.  bufA and bufB are
mapped for reading when we flush the command buffer but should not be
referenced by the command buffer.  The above change fixes that.

No Piglit regressions.  Also tested with Viewperf, Google Earth, Heaven,
etc.

VMware bug 1842059

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-26 11:38:00 -06:00
Brian Paul
a36a1ea80a gallium/util: reduce util_snprintf() calls in debug_flush_might_flush_cb()
We only need to construct the debug message if the mapped_sync flag is set.
This should make the function faster since the flag is usually false.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-26 11:38:00 -06:00
Brian Paul
495840658e gallium/util: add some comments in u_debug_flush.c
Trivial.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-26 11:37:59 -06:00
Charmaine Lee
fbda9b905a svga: Removed the unused label 'done' in svga_validate_surface_view()
Trivial fix
2017-04-26 11:37:59 -06:00
Charmaine Lee
019d5d5346 svga: use the winsys interface to invalidate surface
Instead of directly sending the InvalidateGBSurface command,
this patch uses the invalidate_surface interface.

Fixes Linux VM piglit failures including
   ext_texture_array-gen-mipmap, fbo-generatemipmap-array S3TC_DXT1

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-26 11:37:59 -06:00
Charmaine Lee
5bd5ec6a0f svga: fix format for screen target
This patch revises the fix in commit 606f13afa31c9f041a68eb22cc32112ce813f944
to properly translate the surface format for screen target.
Instead of changing the svga format for PIPE_FORMAT_B5G6R5_UNORM
to SVGA3D_R5G6B5 for all texture surfaces, this patch only restricts
SVGA3D_R5G6B5 for screen target surfaces. This avoids rendering
failures when specify a non-vgpu10 format in a vgpu10 context with
software renderer.

Fixes piglit failures spec@!opengl 1.1@draw-pixels,
                      spec@!opengl 1.1@teximage-colors gl_r3_g3_b2
                      spec@!opengl 1.1@texwrap formats

Tested Xorg with 16bits depth.
Also tested with MTT piglit, MTT glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-26 11:37:59 -06:00
Charmaine Lee
3626112214 svga: cache the backing surface handle in the texture object
CinebenchR15 not only binds the same texture for rendering and sampling,
it actually changes the framebuffer buffer attachment very often, causing
a lot of backed surface view to be created and a lot of surface copies
to be done. This patch caches the backed surface handle
in the texture resource and allows the backed surface view to
reuse the backed surface handle.  With this patch, the number of
backed surface view reduces from 1312 to 3. Unfortunately, this
does not eliminate all the surface copies. There are still surface
copies involved when we switch from original to backed surface handle
for rendering.

Tested with CinebenchR15, NobelClinicianViewer, Turbine, Lightsmark2008,
            MTT glretrace, MTT piglit.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-26 11:37:59 -06:00
Charmaine Lee
7f2f695d4d svga: Update the backing resource only if needed
This patch adds a timestamp in svga_surface structure to keep track
of when the backing surface is last sync with the original resource.
This helps to avoid unnecessary surface copy from the original
resource to the backing surface if the original resource has not
since been modified.

This reduces the amount of surface copy with CinebenchR15.

Tested with CinebenchR15, mtt glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-26 11:37:59 -06:00
Charmaine Lee
c6576461f5 svga: Set the surface dirty bit for the right surface view
For VGPU10, we will render to a backed surface view when
the same resource is used for rendering and sampling.
In this case, we will mark the dirty bit for the backed surface view.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-26 11:37:59 -06:00
Charmaine Lee
dc30ac5c24 svga: Move rendertarget view related fields to hw_clear state
This patch moves the rendertarget view related fields from
svga_hw_draw_state to svga_hw_clear_state where all the hw
framebuffer related state resides.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-26 11:37:59 -06:00
Charmaine Lee
f482493dcf svga: Move setting the rendered_to flags to framebuffer emit time
Instead of setting the rendered_to flags at set time, this patch
moves the setting of the flags to framebuffer emit time.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-26 11:37:59 -06:00
Brian Paul
1ee181b354 svga: add const qualifiers on svga_check_sampler_view_resource_collision()
We don't change any of the argument objects.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-26 11:37:59 -06:00
Brian Paul
0f236ea785 svga: improve surface view debug messages
The old ones were somewhat cryptic.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-26 11:37:59 -06:00
Brian Paul
943f4f47e0 svga: add DEBUG_SAMPLERS
The debug output in svga_create_sampler_state() was controlled by
DEBUG_VIEWS but that's not consistent with the other debug output for
sampler views.  Create/use a new debug flag just for this.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-26 11:37:59 -06:00
Brian Paul
577e114e46 svga: fail screen creation if HW version is too old
Tested by verifying 3D acceleration works with HWv8 but not earlier.
For HWv7 and older we get the GDI Generic renderer.

Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-26 11:37:59 -06:00
Deepak Rawat
8de0452ec4 winsys/svga: fix error path when kernel is not able to create surface
If for some reason kernel is not able to create surface,
when no buffer was provided the function
vmw_svga_winsys_surface_create should return NULL.

This patch fixes the issue where the code was not following the
clean up path in case of error, which used to cause SIGSEGV.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2017-04-26 11:37:59 -06:00
Brian Paul
75be43ed33 draw: whitespace fixes in draw_pipe_vbuf.c
Remove trailing whitespace, fix formatting, etc.  Trivial.
2017-04-26 11:37:59 -06:00
Brian Paul
4bb19a1514 st/mesa: minor clean-ups in st_update_renderbuffer_surface()
Remove unneeded parens.  Add const qualifiers.  Move var decls closer
to where they're used.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Neha Bhende<bhenden@vmware.com>
2017-04-26 11:37:59 -06:00
Samuel Pitoiset
00b5044740 nv50,nvc0: disable the TGSI merge registers pass
shader-db results on GK106 (Thanks Karol):

total instructions in shared programs : 3931608 -> 3929463 (-0.05%)
total gprs used in shared programs    : 481255 -> 479014 (-0.47%)
total local used in shared programs   : 27481 -> 27381 (-0.36%)
total bytes used in shared programs   : 36031256 -> 36011120 (-0.06%)

                local        gpr       inst      bytes
    helped          14        1471        1309        1309
      hurt           1          88         384         384

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-26 19:15:54 +02:00
Samuel Pitoiset
0bceefc295 radeonsi: disable the TGSI merge registers pass
47109 shaders in 29632 tests
Totals:
SGPRS: 1917364 -> 1916620 (-0.04 %)
VGPRS: 1165802 -> 1165202 (-0.05 %)
Spilled SGPRs: 1880 -> 1843 (-1.97 %)
Spilled VGPRs: 70 -> 65 (-7.14 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1312 -> 1308 (-0.30 %) dwords per thread
Code Size: 60211356 -> 60192268 (-0.03 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 428597 -> 428674 (0.02 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 238173 -> 237429 (-0.31 %)
VGPRS: 149556 -> 148956 (-0.40 %)
Spilled SGPRs: 1263 -> 1226 (-2.93 %)
Spilled VGPRs: 25 -> 20 (-20.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 20 -> 16 (-20.00 %) dwords per thread
Code Size: 10457904 -> 10438816 (-0.18 %) bytes
LDS: 50 -> 50 (0.00 %) blocks
Max Waves: 41283 -> 41360 (0.19 %)
Wait states: 0 -> 0 (0.00 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-26 19:15:40 +02:00
Samuel Pitoiset
066a572955 st/glsl_to_tgsi: disable the merge registers pass conditionally
The main goal of this pass to merge temporary registers in order
to reduce the total number of registers and also to produce
optimal TGSI code.

In fact, compilers seem to be confused when temporary variables
are already merged, maybe because it's done too early in the
process.

Skipping the pass, reduce both the register pressure and the code
size, at least for Nouveau and RadeonSI because they have a real
backend compiler.

Found by luck while fixing an issue in the TGSI dead code elimination
pass which affects tex instructions with bindless samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-26 19:15:37 +02:00
Samuel Pitoiset
3a927e0aa3 gallium: add PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-26 19:15:34 +02:00
Samuel Pitoiset
ec301497b8 radeonsi: use unsynchronized transfers for shader binary uploads
Because the buffer is new, it can't be referenced by any CS.

This can save few CPU cycles by skipping the whole
PIPE_TRANSFER_UNSYNCHRONIZED if in amdgpu_bo_map().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-26 19:15:22 +02:00
Marek Olšák
96b0cfc82e radeonsi: turn si_shader_key::mono into a non-union
A merged LS-HS shader needs both fix_fetch and inputs_to_copy
for compilation.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Marek Olšák
3f2a0649ab radeonsi: adjust ESGS ring buffer size computation on VI
Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Marek Olšák
80814819c2 radeonsi/gfx9: don't set deprecated field PARTIAL_ES_WAVE_ON
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Marek Olšák
60a20e6879 radeonsi/gfx9: set MAX_PRIMGRP_IN_WAVE in the correct register
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Marek Olšák
8e8570a9e8 radeonsi/gfx9: add a workaround for viewing a slice of 3D as a 2D image
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Marek Olšák
482e6b07cc radeonsi/gfx9: fix 1D array shader images
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Marek Olšák
5c94779585 radeonsi/gfx9: fix most things wrong with shader images
There are 2 major hw changes:
- The address must always point to the address of level 0. GFX9 tiling
  modes don't allow binding to a non-0 level.
- 3D must always be bound as 3D, because 2D and 3D use entirely different
  tiling modes, and the texture target determines which set of modes is
  used.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Marek Olšák
65e0c3fba7 radeonsi/gfx9: fix texture buffer objects and image buffers with IDXEN==0
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 13:08:05 +02:00
Eric Engestrom
9d1dbf2aa1 configure: print LDFLAGS alongside CFLAGS & co.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 10:27:17 +01:00
Timothy Arceri
2895d96a05 mesa: tidy up left over APPLE_vertex_array_object semantics
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 10:03:06 +10:00
Timothy Arceri
f38845b9cb mesa: inline bind_vertex_array() helper
The previous commit removed the only other user of this function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 10:03:06 +10:00
Timothy Arceri
7927d0378f mesa: drop APPLE_vertex_array_object support
Shared context support for VAOs was dropped in 0b2750620b.

From the ARB_vertex_array_object spec:

   "This extension differs from GL_APPLE_vertex_array_object
   in that client memory cannot be accessed through a
   non-zero vertex array object.  It also differs in that
   vertex array objects are explicitly not sharable between
   contexts."

Nobody should be using this extension over
ARB_vertex_array_object anymore so just drop it rather than
adding locking back just for VAOs created from these
functions.

For reference the Nvidia blob doesn't expose this extension.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26 10:03:06 +10:00
Bas Nieuwenhuizen
7b9963a28f radv: Enable userspace fence checking.
v2: - Added some error handling.
    - memset the buffer to 0.

v3: Added assert for buffer size.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-26 01:32:41 +02:00
Matt Turner
ee5f96581a i965: Remove unused variable 'options'
Should have been removed in commit ad55b1a770
2017-04-25 15:28:33 -07:00
Matt Turner
71d11f3998 glsl: Initialize current_var
CID: 1324644 (Uninitialized pointer field)
2017-04-25 15:28:33 -07:00
Dave Airlie
7f77554b5b radv/ac: setup mrt exports then export them in one go. (v2)
Noticed while looking at Sascha Willems deferred shaders.

This is a bit of an llvm workaround, llvm was producing this:
        v_cvt_pkrtz_f16_f32_e64 v4, v7, v8                       ; D2960004 00021107
        v_cvt_pkrtz_f16_f32_e64 v6, v9, 1.0                      ; D2960006 0001E509
        s_waitcnt vmcnt(0)                                       ; BF8C0F70
        exp mrt0 v4, v4, v6, v6 compr                            ; C400040F 00000604
        s_waitcnt expcnt(0)                                      ; BF8C0F0F
        v_cvt_pkrtz_f16_f32_e64 v4, v12, v5                      ; D2960004 00020B0C
        v_cvt_pkrtz_f16_f32_e64 v5, v14, 1.0                     ; D2960005 0001E50E
        exp mrt1 v4, v4, v5, v5 compr                            ; C400041F 00000504
        s_waitcnt expcnt(0)                                      ; BF8C0F0F
        v_cvt_pkrtz_f16_f32_e64 v0, v0, v1                       ; D2960000 00020300
        v_cvt_pkrtz_f16_f32_e64 v1, v2, v3                       ; D2960001 00020702
        exp mrt2 v0, v0, v1, v1 done compr vm                    ; C4001C2F 00000100

After this change:
        v_cvt_pkrtz_f16_f32_e64 v4, v7, v8                       ; D2960004 00021107
        s_waitcnt vmcnt(0)                                       ; BF8C0F70
        v_cvt_pkrtz_f16_f32_e64 v0, v0, v1                       ; D2960000 00020300
        v_cvt_pkrtz_f16_f32_e64 v6, v9, 1.0                      ; D2960006 0001E509
        v_cvt_pkrtz_f16_f32_e64 v5, v12, v5                      ; D2960005 00020B0C
        v_cvt_pkrtz_f16_f32_e64 v7, v14, 1.0                     ; D2960007 0001E50E
        exp mrt0 v4, v4, v6, v6 compr                            ; C400040F 00000604
        v_cvt_pkrtz_f16_f32_e64 v1, v2, v3                       ; D2960001 00020702
        exp mrt1 v5, v5, v7, v7 compr                            ; C400041F 00000705
        exp mrt2 v0, v0, v1, v1 done compr vm                    ; C4001C2F 00000100

No waitcnt for exports are emitted.

v2: fixup index->mrt mapping (Bas).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-25 23:26:11 +01:00
Dave Airlie
b2cedb3ea9 radv/ac: overhaul vs output/ps input routing
In order to cleanly eliminate exports rewrite the
code first to mirror how radeonsi works for now.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-25 23:24:39 +01:00
Dave Airlie
b858cb4df8 radv/ac: move point coord after layer/viewport.
These need to be ordered as per shader enum ordering, I'll
rewrite this soon, but this is a bug fix.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-25 23:24:21 +01:00
Samuel Pitoiset
1c66522ecc gallium: remove u_caps.c/h interface
No longer used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-25 23:26:44 +02:00
Marek Olšák
04d7978b8c ddebug: implement get_query_result_resource
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-25 22:39:31 +02:00
Marek Olšák
231dfa5a02 trace: don't trace resource_destroy
due to the lack of pipe_resource wrapping, we can get this call from inside
of driver calls, which would try to lock an already-locked mutex.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-25 22:39:31 +02:00
Marek Olšák
2c1ec23a06 gallium/util: add debugging helpers printing pipeline statistics
typically useful for hw bring-up

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-25 22:39:31 +02:00
Rob Herring
26a36c1af7 Android: fix r300g only build
If r300g is the only radeon driver built, the Android build fails to
build:

ninja: error:
'out/target/product/linaro_x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_radeon_intermediates/export_includes',
needed by
'out/target/product/linaro_x86_64/obj/SHARED_LIBRARIES/gallium_dri_intermediates/import_includes',
missing and no known rule to make it

This is because the path to build libmesa_pipe_radeon was only getting
added for r600g and radeonsi, but the library dependency was added for
all radeon drivers. As libmesa_pipe_radeon is not needed for r300g, drop
the library dependency.

Cc: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-25 17:08:06 +01:00
Timothy Arceri
347fe24f82 mesa: use locked version of HashWalk for xfb objects
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:

   "Objects which contain references to other objects include
   framebuffer, program pipeline, query, transform feedback,
   and vertex array objects.   Such objects are called container
   objects and are not shared"

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-25 09:58:47 +10:00
Timothy Arceri
a82d6a307d mesa: create locked version of HashWalk
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-25 09:58:39 +10:00
Rafael Antognolli
6a40ccec4b genxml: Fix gen_pack_header.py crash when field type is invalid.
Just return earlier in that case. Also set prefix to an empty string, so
we don't get to use it undefined.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:12 -07:00
Rafael Antognolli
9670124e31 genxml: Make BLEND_STATE command support variable length array.
We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.

For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.

With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.

v2:
   - Use designated initializers on blorp and remove 0 from
   initialization (Jason)
   - Default entries to disabled on Vulkan (Jason)
   - Rebase code.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:10 -07:00
Rafael Antognolli
4ace73b1f6 genxml: Fix python crash when no dwords are found.
If the 'dwords' dict is empty, max(dwords.keys()) throws an exception.
This case could happen when we have an instruction that is only an array
of other structs, with variable length.

v2:
   - Add another clause for empty dwords and make it work with python 3
   (Dylan)
   - Set the length to 0 if dwords is empty, and do not declare dw

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:08 -07:00
Rafael Antognolli
19720405d5 genxml: Remove unused parameter.
'start' parameter from Group.emit_pack_function() is useless.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:05 -07:00
Rafael Antognolli
1ea41163eb intel/aubinator: Correctly read variable length structs.
Before this commit, when a group with count="0" is found, only one field
is added to the struct representing the instruction. This causes only
one entry to be printed by aubinator, for variable length groups.

With this commit we "detect" that there's a variable length group
(count="0") and store the offset of the last entry added to the struct
when reading the xml. When finally reading the aubdump file, we check
the size of the group and whether we have variable number of elements,
and in that case, reuse the last field to add the remaining elements.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:13:51 -07:00
Nanley Chery
50134cede1 isl/format: Update the R16G16B16X16_FLOAT entry
The section of the PRM mentioned in the code comment above this table
says that this format supports the render target write message. Internal
documentation says that this format also supports alpha blending. As a
side effect, this allows CCS_D buffers to be created for images with
this format.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-24 13:30:50 -07:00
Nanley Chery
b1066f7365 anv/pass: Delete anv_pass::subpass_attachments
This field has no users.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-24 13:30:50 -07:00
Francisco Jerez
58324389be intel/fs: Take into account amount of data read in spilling cost heuristic.
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.

Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%.  In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 11:01:40 -07:00
Francisco Jerez
ecc19e12dc intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 10:59:56 -07:00
Kenneth Graunke
6b10c37b9c i965/vec4: Use reads_accumulator_implicitly(), not MACH checks.
Curro pointed out that I should not just check for MACH, but use
the reads_accumulator_implicitly() helper, which would also prevent
the same bug with MAC and SADA2 (if we ever decide to use them).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-24 10:53:49 -07:00
Mauro Rossi
11db3d10bb android: radv/ac: Fix nir.h include
Fixes following building errors due to missing include paths:

external/mesa/src/amd/common/ac_shader_info.c:23:10: fatal error: 'nir/nir.h' file not found
         ^

external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found
         ^

Fixes: 224cf29 "radv/ac: add initial pre-pass for shader info gathering"
Acked-by: Dave Airlie <Airlied@redhat.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-24 18:01:03 +01:00
Vinson Lee
b81d85f175 configure.ac: Fix typos.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-23 22:23:22 -07:00
Dave Airlie
fed740eafe radv/ac: copy llvm machine feature flags from radeonsi.
This just updates this to use the same flags as radeonsi
for consistency.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-24 05:55:44 +01:00
Timothy Arceri
794ae44095 i965: remove now unused GLSL IR optimisations
These are no longer used since the previous commit.

Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri
ad55b1a770 i965: remove GLSL IR optimisation loop
IVB is running into some spilling issues in piglit with the
loop removed. However those tests are not really reflective
of a real world use case, also fp64 is brand new to IVB
so we leave the spilling issues to be resolved at a later
time.

Run time for shader-db on my machine goes from ~795 seconds to
~665 seconds.

shader-db results BDW:

total instructions in shared programs: 12969459 -> 12968891 (-0.00%)
instructions in affected programs: 1463154 -> 1462586 (-0.04%)
helped: 3622
HURT: 3326

total cycles in shared programs: 246453572 -> 246504318 (0.02%)
cycles in affected programs: 208842622 -> 208893368 (0.02%)
helped: 24029
HURT: 35407

total loops in shared programs: 2931 -> 2931 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 14560 -> 14498 (-0.43%)
spills in affected programs: 2270 -> 2208 (-2.73%)
helped: 17
HURT: 2

total fills in shared programs: 19671 -> 19632 (-0.20%)
fills in affected programs: 2060 -> 2021 (-1.89%)
helped: 17
HURT: 2

LOST:   17
GAINED: 40

Most of the hurt shaders are 1-2 instructions, with what looks like a max of 7.

I've looked at the worst cycles regressions and as far as I can tell its just
a scheduling difference.

Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri
21173194db glsl: use ARB_enhahnced_layouts for packing where possible
If packing doesn't cross locations we can easily make use of
ARB_enhanced_layouts to do packing rather than using the GLSL IR
lowering pass lower_packed_varyings().

Shader-db Broadwell results:

total instructions in shared programs: 12977822 -> 12977819 (-0.00%)
instructions in affected programs: 1871 -> 1868 (-0.16%)
helped: 4
HURT: 3

total cycles in shared programs: 246567288 -> 246567668 (0.00%)
cycles in affected programs: 1370386 -> 1370766 (0.03%)
helped: 592
HURT: 733

Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri
eb8aa93c03 glsl: disable varying packing for varying used by interpolateAt*
Currently the NIR backends depend on GLSL IR copy propagation to
fix up the interpolateAt* function params after varying packing
changes the shader input to a global. It's possible copy propagation
might not always do what we need it too, and we also shouldn't
depend on optimisations to do this type of thing for us.

I'm not sure if the same is true for TGSI, but the following
commit should re-enable packing for most cases in a safer way,
so we just disable it everywhere.

No change in shader-db for i965 (BDW)

Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri
aa021d50c0 glsl_to_nir: skip ir_var_shader_shared variables
These should be lowered away in GLSL IR but if we don't get dead
code to clean them up it causes issues in glsl_to_nir.

We wan't to drop as many GLSL IR opts in future as we can so this
makes glsl_to_nir just ignore the vars if it sees them.

In future we will want to just use the nir lowering pass that
Vulkan currently uses.

Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri
7a7ee40c2d nir/i965: add before ffma algebraic opts
This shuffles constants down in the reverse of what the previous
patch does and applies some simpilifications that may be made
possible from doing so.

Shader-db results BDW:

total instructions in shared programs: 12980814 -> 12977822 (-0.02%)
instructions in affected programs: 281889 -> 278897 (-1.06%)
helped: 1231
HURT: 128

total cycles in shared programs: 246562852 -> 246567288 (0.00%)
cycles in affected programs: 11271524 -> 11275960 (0.04%)
helped: 1630
HURT: 1378

V2: mark float opts as inexact

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri
fb2269fed1 nir: shuffle constants to the top
V2: mark float opts as inexact

If one of the inputs to an mul/add is the result of another
mul/add there is a chance that we can reuse the result of that
mul/add in other calls if we do the multiplication in the right
order.

Also by attempting to move all constants to the top we increase
the chance of constant folding.

For example it is a fairly common pattern for shaders to do something
similar to this:

  const float a = 0.5;
  in vec4 b;
  in float c;

  ...

  b.x = b.x * c;
  b.y = b.y * c;

  ...

  b.x = b.x * a + a;
  b.y = b.y * a + a;

So by simply detecting that constant a is part of the multiplication
in ffma and switching it with previous fmul that updates b we end up
with:

  ...

  c = a * c;

  ...

  b.x = b.x * c + a;
  b.y = b.y * c + a;

Shader-db results BDW:

total instructions in shared programs: 13011050 -> 12967888 (-0.33%)
instructions in affected programs: 4118366 -> 4075204 (-1.05%)
helped: 17739
HURT: 1343

total cycles in shared programs: 246717952 -> 246410716 (-0.12%)
cycles in affected programs: 166870802 -> 166563566 (-0.18%)
helped: 18493
HURT: 7965

total spills in shared programs: 14937 -> 14560 (-2.52%)
spills in affected programs: 9331 -> 8954 (-4.04%)
helped: 284
HURT: 33

total fills in shared programs: 20211 -> 19671 (-2.67%)
fills in affected programs: 12586 -> 12046 (-4.29%)
helped: 286
HURT: 33

LOST:   39
GAINED: 33

Some of the hurt will go away when we shuffle things back down to the
bottom in the following patch. It's also noteworthy that almost all of the
spill changes are in Deus Ex both hurt and helped.

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri
83f7fdf83a nir: add flt comparision simplification
Didn't turn out as useful as I'd hoped, but it will help alot more on
i965 by reducing regressions when we drop brw_do_channel_expressions()
and brw_do_vector_splitting().

I'm not sure how much sense 'is_not_used_by_conditional' makes on
platforms other than i965 but since this is a new opt it at least
won't do any harm.

shader-db BDW:

total instructions in shared programs: 13029581 -> 13029415 (-0.00%)
instructions in affected programs: 15268 -> 15102 (-1.09%)
helped: 86
HURT: 0

total cycles in shared programs: 247038346 -> 247036198 (-0.00%)
cycles in affected programs: 692634 -> 690486 (-0.31%)
helped: 183
HURT: 27

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Bas Nieuwenhuizen
18947fde7a radv: Enable lowering fdiv in nir.
Results in faster code than the lowering by LLVM.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-23 20:38:06 +02:00
Rob Clark
0012a98c0e freedreno/a5xx: hack for r8g8b8a8_snorm
Blob won't render to this format, and sampling from it it uses the same
fmt value for r8g8b8_snorm and r8g8b8a8_snorm.  But this is what is what
blocks us from jumping from gl30/gles20 to gl31/gles30.  So a hack it
is!

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-23 13:03:25 -04:00
Rob Clark
c21fc881ed freedreno/a5xx: rgtc formats
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-23 13:03:25 -04:00
Marek Olšák
070072ad43 mesa: replace _mesa_index_buffer::type with index_size
This avoids repeated translations of the enum.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-22 22:51:15 +02:00
Bas Nieuwenhuizen
e137b9eed9 radv: Use the correct pipeline for dispatches.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: ec15e0d30 "radv: optimise compute shader grid size emission."
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-22 20:26:59 +01:00
Wladimir J. van der Laan
9da0cd56c3 etnaviv: Supertiled texture support on gc3000
Support supertiled textures on hardware that has the appropriate
feature flag SUPERTILED_TEXTURE.

Most of the scaffolding was already in place in etna_layout_multiple:

   case ETNA_LAYOUT_SUPER_TILED:
      *paddingX = 64;
      *paddingY = 64;
      *halign = TEXTURE_HALIGN_SUPER_TILED;

So this is just a matter of allowing it.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-22 17:49:29 +02:00
Fabio Estevam
53e39f6df4 etnaviv: etnaviv_fence: Simplify the return code logic
The return code can be simplified by using the logical not operator.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-22 17:48:35 +02:00
Rob Clark
e769349fc6 freedreno/a5xx: occlusion query
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-22 10:03:02 -04:00
Rob Clark
52d2fa37f5 freedreno: drop ring arg from _set_stage()
It is always the draw ring.  Except for a5xx queries like time-elapsed,
where we will eventually want to emit cmds into both binning and draw
rings.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-22 10:03:02 -04:00
Rob Clark
5923780b2a freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-22 10:03:02 -04:00
Rob Clark
d310ea0f32 freedreno: add support for hw accumulating queries
Some queries on a4xx and all queries on a5xx can do result accumulation
on CP so we don't need to track per-tile samples.  We do still need to
handle pausing/resuming while switching batches (in case the query is
active over multiple draws which are executed out of order).

So introduce new accumulated-query helpers for these sorts of queries,
since it doesn't really fit in cleanly with the original query infra-
structure.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-22 10:03:02 -04:00
Rob Clark
935623af14 freedreno: a bit of query refactor
Move a bit more of the logic shared by all query types (active tracking,
etc) into common code.  This avoids introducing a 3rd copy of that logic
for a5xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-22 10:03:02 -04:00
Rob Clark
df63ff4d82 freedreno: make hw-query a helper
For a5xx (and actually some queries on a4xx) we can accumulate results
in the cmdstream, so we don't need this elaborate mechanism of tracking
per-tile query results.  So make it into vfuncs so generation specific
backend can use it when it makes sense.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-22 10:03:01 -04:00
Kenneth Graunke
2faf227ec2 i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().
opt_register_coalesce() was optimizing sequences such as:

   mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
   mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
   mov(8) m4.zw:F, vgrf5.xxxy:F

into:

   mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
   mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D

This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well.  Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy.  But the MACH instruction expects
.z to contain attr18.x * attr19.x.  Bogus results ensue.

No change in shader-db on Haswell.  Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-22 00:01:16 -07:00
Timothy Arceri
d682f8aa8e mesa: validate sampler type across the whole program
Currently we were only making sure types were the same within a
single stage. This looks to have regressed with 953a0af8e3.

Fixes: 953a0af8e3 ("mesa: validate sampler uniforms during gluniform calls")

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=97524
2017-04-22 10:01:15 +10:00
Timothy Arceri
918cec8cbe mesa: don't lock hashtables that are not shared across contexts
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:

   "Objects which contain references to other objects include
   framebuffer, program pipeline, query, transform feedback,
   and vertex array objects.   Such objects are called container
   objects and are not shared"

For we leave locking in place for framebuffer objects because
the EXT fbo extension allowed sharing.

We could maybe just replace the hash with an ordinary hash table
but for now this should remove most of the unnecessary locking.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-22 10:01:15 +10:00
Matt Turner
ef6af0d5f7 mesa: Remove deleteFlag pattern from container objects.
This pattern was only useful when we used mutex locks, which the previous
commit removed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-22 10:01:15 +10:00
Matt Turner
0b2750620b mesa: Remove unnecessary locking from container objects.
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:

   "Objects which contain references to other objects include
   framebuffer, program pipeline, query, transform feedback,
   and vertex array objects.   Such objects are called container
   objects and are not shared"

For we leave locking in place for framebuffer objects because
the EXT fbo extension allowed sharing.

V2: (Timothy Arceri)
 - rebased and dropped changes to framebuffer objects

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-22 10:01:15 +10:00
Timothy Arceri
622a68ed3e mesa: remove fallback RefCount == 0 pattern
We should never get here if this is 0 unless there is a
bug. Replace the check with an assert.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-22 10:01:15 +10:00
Elie TOURNIER
0cc8c81902 egl: add gitignore
Since commit ce562f9e3f, two new files are generated.
We don't want to track them.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-22 00:42:38 +01:00
Samuel Pitoiset
a7bc51aef8 glsl: make use of glsl_type::is_float()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:34:15 +02:00
Samuel Pitoiset
cacc823c39 glsl: make use of glsl_type::is_double()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:34:12 +02:00
Samuel Pitoiset
100721959b glsl: make use of glsl_type::is_integer_64()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:57 +02:00
Samuel Pitoiset
362d9de29c glsl: simplify glsl_type::is_integer_32_64()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:42 +02:00
Samuel Pitoiset
87be9faa78 glsl: add glsl_type::is_integer_64()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:40 +02:00
Samuel Pitoiset
60caca3019 glsl: make use of glsl_type::is_boolean()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:38 +02:00
Samuel Pitoiset
64db02b5fa glsl: make use of glsl_type::is_record()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:36 +02:00
Samuel Pitoiset
cd78ab55d0 glsl: make use of glsl_type::is_interface()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:34 +02:00
Samuel Pitoiset
0c8898dc34 glsl: make use of glsl_type::is_array()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:32 +02:00
Samuel Pitoiset
053912382e glsl: make use glsl_type::is_atomic_uint()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:29 +02:00
Samuel Pitoiset
993a05f0eb glsl: add glsl_type::is_atomic_uint() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-21 19:33:27 +02:00
Emil Velikov
52df318d61 mesa/glthread: correctly compare thread handles
As mentioned in the manual - comparing pthread_t handles via the C
comparison operator is incorrect and pthread_equal() should be used
instead.

Cc: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: d8d81fbc31 ("mesa: Add infrastructure for a worker thread to process GL commands.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-04-21 13:39:57 +01:00
Emil Velikov
dd6ec78b4f st/clover: add space between < and ::
As pointed out by compiler

./llvm/codegen.hpp:52:22: error: ‘<::’ cannot begin a template-argument list [-fpermissive]
./llvm/codegen.hpp:52:22: note: ‘<:’ is an alternate spelling for ‘[’. Insert whitespace between ‘<’ and ‘::’

Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2017-04-21 13:39:57 +01:00
Samuel Pitoiset
862361c4f5 glsl: get rid of values_for_type()
This function is actually a wrapper for component_slots()
and it always returns 1 (or N) for samplers. Since
component_slots() now return 1 for samplers, it can go.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-21 10:08:32 +02:00
Samuel Pitoiset
4a0aa0b3b3 glsl: make component_slots() returns 1 for sampler types
It looks inconsistent to return 1 for image types and 0 for
sampler types. Especially because component_slots() is mostly
used by values_for_type() which always returns 1 for samplers.

For bindless, this value will be bumped to 2 because the
ARB_bindless_texture states that bindless samplers/images
should consume two components.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-21 10:08:04 +02:00
Kai Wasserbäch
29582dd20c docs/features: mark KHR_no_error as started
The OpenGL extension KHR_no_error is exposed since commit
d42d150ad2 by Timothy Arceri. Therefore it
should be marked as "started" in the features.txt

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-21 09:39:38 +02:00
Tapani Pälli
ae6cbdede0 Revert "android: fix segfault within swap_buffers"
This reverts commit 4d4558411d.

This was a wrong call, while it fixed issue with 3DMark it
actually introduced regression elsewhere.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
2017-04-21 10:03:58 +03:00
Ilia Mirkin
da0a80804c nvc0: Add support for setting viewport index/layer from VS/TES
This enables support on GM200+ for:
 - GL_AMD_vertex_shader_layer
 - GL_AMD_vertex_shader_layer_viewport_index
 - GL_ARB_shader_viewport_layer_array

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[lyude: add relnotes/TES cap]
Signed-off-by: Lyude <lyude@redhat.com>
[imirkin: move relnotes to right place, add features.txt]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-20 23:24:06 -04:00
Lyude
214f96c1e7 nvc0/ir: Only store viewport in scratch register for GP
EMIT only applies to geometry shaders. For everything else, we want to
export the viewport normally.

Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-20 23:24:06 -04:00
Bas Nieuwenhuizen
0e91d8f38c radv: Prefetch compute shader too.
For consistency, doesn't really impact performance.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-21 00:59:02 +02:00
Jason Ekstrand
1e21d4227e anv/query: Use genxml for MI_MATH
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
2017-04-20 15:24:06 -07:00
Jason Ekstrand
e23129ac0c genxml: Add better support for MI_MATH
This breaks the guts of MI_MATH (the instruction part) out into its own
structure with proper named values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
2017-04-20 15:24:06 -07:00
Jason Ekstrand
b7a2af8e38 genxml/pack: Allow hex values in the XML
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-04-20 15:24:06 -07:00
Dave Airlie
35ea0c07a1 radv/ac: use tex_lz if we can.
Looking at some Talos shaders vs radeonsi, I noticed they use
tex_lz in a few places, so we should be able to as well.

Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-20 22:00:13 +01:00
Marek Olšák
d1608d6982 st/mesa: use one big translation table in st_pipe_vertex_format
for lower overhead.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-20 20:11:35 +02:00
Marek Olšák
86f99c1e4c st/mesa: check in advance in st_draw_vbo whether the bitmap cache is empty
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-20 20:11:35 +02:00
Marek Olšák
1fb5bc83f1 st/mesa: put the bitmap_cache structure inside st_context
This is nicer on caches, and the next commit will need to access
the structure from a different place.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-20 20:11:35 +02:00
Marek Olšák
69423dcf23 st/mesa: inline and optimize st_invalidate_readpix_cache
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-20 20:11:35 +02:00
Marek Olšák
7cd6e2df65 st/mesa: invalidate the readpix cache in st_indirect_draw_vbo
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-20 20:11:35 +02:00
Marek Olšák
4219e09343 gallium/util: remove util_draw_range_elements helper
min/max_index are typically hints for the u_vbuf module, not the driver.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-20 20:11:35 +02:00
Marek Olšák
707d2e8b3e gallium: fold u_trim_pipe_prim call from st/mesa to drivers
Most drivers don't need it and shouldn't need it because it can't be used
in some cases (indirect draws, primitive restart, count from streamout).

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-20 20:11:35 +02:00
Samuel Iglesias Gonsálvez
2beff74314 docs/envvars: sort INTEL_DEBUG envvar options by name
It helps to find the envvar option you are looking for.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-04-20 16:27:31 +02:00
Christoph Haag
a9d27c8a33 ac: fix build after LLVM 5.0 SVN r300718
v2: previously getWithDereferenceableBytes() exists, but addAttr() doesn't take that type

Signed-off-by: Christoph Haag <haagch+mesadev@frickel.club>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-and-reviewed-by: Mike Lothian <mike@fireburn.co.uk>
2017-04-20 10:58:19 +02:00
Juan A. Suarez Romero
3af7f8275b bin/get-{extra,fixes}-pick-list.sh: improve output
Show the commit hash and the title in a way that it is easier to copy
and paste in the bin/.cherry-ignore-extra file if we want to ignore
those commits for the future.

v2:
- Use printf instead echo (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-20 10:28:54 +02:00
Juan A. Suarez Romero
99b41631bb bin/get-{extra,fixes}-pick-list.sh: add support for ignore list
Both scripts does not use a file with the commits to ignore. So if we
have handled one of the suggested commits and decided we won't pick it,
the scripts will continue suggesting them.

v2:
- Mark the candidates in bin/get-extra-pick-list.sh (Juan A. Suarez)
- Use bin/.cherry-ignore to store rejected patches (Emil)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-20 10:28:21 +02:00
Brian Paul
8a7e3693c8 mesa: print target string in glBindTexture() error message
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-19 19:57:32 -06:00
Brian Paul
9bfecb03c5 mesa: fix Windows build error related to getuid()
getuid() and geteuid() are not present on Windows.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-19 19:55:29 -06:00
Tim Rowley
dd4488ea6c swr: simd16 vs work
Build VS with alternating output for the current simd16 fe double-pump
of a simd8 shader.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-19 19:01:48 -05:00
Bas Nieuwenhuizen
6bb1ed6bcc radv: Set variant code_size when created from the cache.
Signed-off-by: Bas Nieeuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-20 01:01:49 +02:00
Bas Nieuwenhuizen
1e1165389c radv: Add shader prefetch.
Gives me approximately a 2% perf increase in bot dota2 & talos.

Having descriptors (both sets and vertex buffers) prefetched
didn't help so I didn't include that.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-19 23:47:27 +02:00
Bas Nieuwenhuizen
74d92e547c radv: Remove binding buffer count.
In cases where it is used it is always 1.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
2017-04-19 20:37:57 +02:00
Bas Nieuwenhuizen
f7b14ff4be radv: Don't try to find gaps for non-freeable descriptors.
With this we don't have any operations on a pool with non-freeable
descriptors left that have O(#descriptors) complexity.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
2017-04-19 20:37:57 +02:00
Bas Nieuwenhuizen
126d5adb11 radv: Use host memory pool for non-freeable descriptors.
v2: Handle out of pool memory error.
v3: Actually use VK_ERROR_OUT_OF_POOL_MEMORY_KHR for the error condition.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
2017-04-19 20:37:57 +02:00
Bas Nieuwenhuizen
39644fa40a radv: Don't allocate dynamic descriptors separately.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
2017-04-19 20:37:57 +02:00
Emil Velikov
51c0c213b7 st/mesa: automake: honour the vdpau header install location
If VDPAU is installed in the non-default location, we'll fail to find
the headers and error at build time.

../../src/gallium/include/state_tracker/vdpau_dmabuf.h:37:25: fatal error: vdpau/vdpau.h: No such file or directory
 #include <vdpau/vdpau.h>
                         ^

Fixes: faba96bc60 ("st/vdpau: add new interop interface")
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 12:19:46 +01:00
Emil Velikov
309f4067a7 winsys/sw/dri: don't use GNU void pointer arithmetic
Resolves build issues like the following:

src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:31: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
        data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
                               ^
src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:62: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
        data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
                                                              ^

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 12:19:38 +01:00
Emil Velikov
4516bfbd30 configure.ac: check require_basic_egl only if egl enabled
Fixes: 1ac40173c2 ("configure.ac: simplify EGL requirements for drivers dependent on EGL")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 12:19:24 +01:00
Emil Velikov
179e21a720 configure.ac: manually expand PKG_CHECK_VAR
The macro is introduced with pkgconfig v0.28 which isn't universally
available. Thus it will error at configure stage.

Reported-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-19 12:18:29 +01:00
Timothy Arceri
1787a3163f mesa: add KHR_no_error support to glVertexAttribDivisor()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
f27f699672 mesa/vbo: add KHR_no_error support to DrawElements*() functions
V2: move MESA_VERBOSE checks back into the common code path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
3d08e18731 mesa/vbo: add KHR_no_error support to vbo_exec_DrawArrays*()
V2: add missing FLUSH_CURRENT() to no_error path

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
4df2931a87 mesa/vbo: move some Draw checks out of validation
These checks do not generate any errors. Move them so we can add
KHR_no_error support and still make sure we do these checks.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
63a14e9e14 mesa/varray: add KHR_no_error support to *Pointer() functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
d86dd5963e mesa/varray: add KHR_no_error support to some callers of validate_array_format()
The only caller we don't update is update_arrays(), we leave that to the
following commit.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
c495c2398c mesa/varray: rename update_array_format() -> validate_array_format()
We also move _mesa_update_array_format() into the caller.

This gets these functions ready for KHR_no_error support.

V2: Updated function comment as suggested by Brian.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
9e60742ddc mesa/varray: create get_array_format() helper
This will help us split array validation from array update.

V2: add const to ctx param

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
d0608c43c5 mesa/varray: split update_array() into validate_array() and update_array()
This will be used for adding KHR_no_error support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
bd2662bfa1 mesa: add KHR_no_error support to glUniform*() functions
V2: restore lost comment, add static to validate_uniform(),
    simplify array offset logic.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
2c9ac0bc63 mesa: always return GL_OUT_OF_MEMORY or GL_NO_ERROR when KHR_no_error enabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
3ff1fce6c9 mesa: add _mesa_is_no_error_enabled() helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:25 +10:00
Timothy Arceri
a0ed0eb342 mesa: add env var to force enable the KHR_no_error ctx flag
V2: typo know -> known
V3: add security check (Suggested by Nicolai)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:24 +10:00
Timothy Arceri
d42d150ad2 mesa: expose KHR_no_error
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 16:53:24 +10:00
Constantine Kharlamov
2a8a569276 r600g: update dirty_level_mask after the 1-st draw after FB change
Ported from radeonsi. Testing with Kane&Lynch2 shows ≈1k skipped updates per
frame on average.

No piglit changes with tests/gpu.py, gbm mode.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-19 08:15:22 +02:00
Nicolai Hähnle
51deba0eb3 vbo: fix gl_DrawID handling in glMultiDrawArrays
Fixes a bug in
KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-19 08:11:07 +02:00
Nicolai Hähnle
42d5465b9b mesa: move glMultiDrawArrays to vbo and fix error handling
When any count[i] is negative, we must skip all draws.

Moving to vbo makes the subsequent change easier.

v2:
- provide the function in all contexts, including GLES
- adjust validation accordingly to include the xfb check
v3:
- fix mix-up of pre- and post-xfb prim count (Nils Wallménius)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-19 08:10:19 +02:00
Nicolai Hähnle
756e9ebbdd mesa: extract need_xfb_remaining_prims_check
The same logic needs to be applied to glMultiDrawArrays.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-19 08:09:57 +02:00
Nicolai Hähnle
ea9a8940ca mesa: fix remaining xfb prims check for GLES with multiple instances
Found by inspection.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-19 08:09:53 +02:00
Mike Lothian
2284d6bf7a radv/meta: Fix nir_builder.h include
This fixes the build after:

commit 399ebd2a84
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Apr 19 06:18:23 2017 +1000

    radv/meta: add common shader vertex generation function

Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 12:25:18 +10:00
Mike Lothian
709ed1fa9f radv/ac: Fix nir.h include
This fixes the build after:

commit 224cf2906a
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Apr 17 13:01:52 2017 +1000

    radv/ac: add initial pre-pass for shader info gathering

Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 12:25:18 +10:00
Dave Airlie
03a2ca6356 radv/meta: refactor out some common shaders.
The vs vertex generate and fs noop shaders are used in a few places,
so refactor them out.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:03:05 +10:00
Dave Airlie
bdd98d950f radv/meta: generate position for blit shaders.
This generates the position info using the vertex shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:03:01 +10:00
Dave Airlie
922f44d1ab radv/meta: reduce vertex buffer in blit2d.
Generate the position vertices.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:02:58 +10:00
Dave Airlie
dd17e4ceb4 radv/meta: reduce vertex buffer usage in clear shaders
For depth clears we have to pass the depth in the 2nd
component, we can use push constants for some of this
later to drop the vertex buffer completely

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:02:53 +10:00
Dave Airlie
84b9e3a831 radv/meta: avoid using vertex buffer for resolve shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:02:50 +10:00
Dave Airlie
3a7fd0c4db radv/meta: move depth decompress to using inline vertex data
This removes the vertex buffer, and just generates the values
in the shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:02:47 +10:00
Dave Airlie
90ed2872bc radv/meta: move fast clear to generate vertices in shader.
Avoids having to setup vertex buffers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:02:43 +10:00
Dave Airlie
399ebd2a84 radv/meta: add common shader vertex generation function
Instead of passing in the same 1.0, -1.0 combinations via
vertex buffers, we can just use vertex id to have the vertex
shader build them. This function introduces the generator
code needed, later patches will use this.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:02:39 +10:00
Dave Airlie
0e6d532d32 radv/meta: add support for save/restore meta without vertex data.
Some of the shaders could just generate the vertex data in the
shader, so add helpers to allow us to move to doing that.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 10:02:23 +10:00
Dave Airlie
60a93e11ba radv: drop debugging leftovers code in descriptor set patches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:31:14 +10:00
Dave Airlie
fd420a7417 radv: add support for 32 descriptor sets.
This bumps the limit to the number of sets to 32, now that
we have proper support for it. It also uses 1u in a few places
to make things a bit safer.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:43 +10:00
Dave Airlie
25a5ee391d radv/ac: add support for indirect access of descriptor sets.
We want to expose more descriptor sets to the applications,
but currently we have a 1:1 mapping between shader descriptor
sets and 2 user sgprs, limiting us to 4 per stage. This commit
check if we don't have enough user sgprs for the number of
bound sets for this shader, we can ask for them to be indirected.

Two sgprs are then used to point to a buffer or 64-bit pointers
to the number of allocated descriptor sets. All shaders point
to the same buffer.

We can use some user sgprs to inline one or two descriptor sets
in future, but until we have a workload that needs this I don't
 think we should spend too much time on it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:43 +10:00
Dave Airlie
d0991b135b radv: start allocating user sgprs
This adds an initial implementation to allocate the user
sgprs and make sure we don't run out if we try to bind
a bunch of descriptor sets.

This can be enhanced further in the future if we add
support for inlining push constants.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:43 +10:00
Dave Airlie
4087eaecd0 radv/ac: mark used descriptor sets in shader info.
This pre calculates the used descriptor sets.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:43 +10:00
Dave Airlie
0b62669c8d radv/ac: frag shader only needs ring offsets if sample positions enabled
mostly documenting things, since with modern llvm we always have the
spill enabled.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
ec4785afb7 radv/ac: move needs_push_constants to shader info.
First step to optimising push constants.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
ec15e0d301 radv: optimise compute shader grid size emission.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
31174069d2 radv: start conditionalising vertex inputs. (v2)
In practice this will probably just drop draw id in a few places.

v2: just do draw_id for now. (Bas)
it might be possible to do something more if we need it in the
future. (nha)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
224cf2906a radv/ac: add initial pre-pass for shader info gathering
There is some radv specific info we need to gather from shaders
before we get into converting nir->llvm, so we can make
better decisions especially around user sgpr allocation.

This is just an initial placeholder to gather if sample positions
are required in the frag shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Rob Clark
4299849ec7 freedreno: refactor dirty state handling
In particular, move per-shader-stage info out to a seperate array of
enum's indexed by shader stage.  This will make it easier to add more
shader stages as well as new per-stage state (like SSBOs).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
d7fa7f5e7e freedreno: move clear path dirty state hack to a2xx backend
a3xx/a4xx use the generic u_blitter path, which will make state dirty
bits be set appropriately thanks to the automagic of generic code
setting generic state in the driver.  And a5xx has a blit/dma engine
(actually, two) so it doesn't need these extra dirty bits set.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
b662f71d9c freedreno/ir3: split out per-stage emit_consts fxns
This makes it easier to deal with adding additional stages which have
their own driver-params.  The duplicated code this introduces can be
refactored out after a later patch moves to per-shader-stage dirty
flags.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
df37902e34 freedreno: add helper to mark all state clean
Note that this involves juggling around a bit when we emit and clear
texture state.  So split out from the patch that adds the helper to set
all state dirty.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
71f9e03d21 freedreno: add helper to mark all state dirty
This will simplify things when we break out per-shader-stage dirty bits.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
248a508f24 freedreno: move a2xx specific hack out of core
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
0cc23ae779 freedreno: make texture state an array
Make this an array indexed by shader stage, as is done elsewhere for
other per-shader-stage state.  This will simplify things as more shader
stages are eventually added.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
5845b20455 freedreno/ir3: refactor out helpers for comparing shader keys
Each of the ir3 users has *basically* the same logic for comparing the
previous and current shader key, to see which, if any, shader state
needs to be marked dirty due to shader variant change.

The difference between gen's was just that some lowering flags never get
set on certain generations.  But it doesn't really hurt to include the
extra checks (because both keys would have false).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-18 16:32:00 -04:00
Rob Clark
6fb7935ded util/queue: don't hang at exit
So atexit() is horrible and 4aea8fe7 is probably not a good idea.  But
add an extra layer of duct-tape to the problem.  Otherwise we hit a
situation where app using an atexit() handler that runs later than ours
doesn't hang when trying to tear down a context.

 (gdb) bt
 #0  util_queue_killall_and_wait (queue=queue@entry=0x52bc80) at ../../../src/util/u_queue.c:264
 #1  0x0000007fb6c380c0 in atexit_handler () at ../../../src/util/u_queue.c:51
 #2  0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
 #3  0x0000007fb7730e5c in exit () from /lib64/libc.so.6
 #4  0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
 #5  0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
 #6  0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
 #7  0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
 #8  0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
 #9  0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
 (gdb) c
 Continuing.
 [Thread 0x7fb67580c0 (LWP 3471) exited]
 ^C
 Thread 1 "drawbuffer-mode" received signal SIGINT, Interrupt.
 0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
 (gdb) bt
 #0  0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
 #1  0x0000007fb6c38304 in cnd_wait (mtx=0x5bdc90, cond=0x5bdcc0) at ../../../include/c11/threads_posix.h:159
 #2  util_queue_fence_wait (fence=0x5bdc90) at ../../../src/util/u_queue.c:106
 #3  0x0000007fb6daac70 in fd_batch_sync (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:233
 #4  batch_reset (batch=batch@entry=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:183
 #5  0x0000007fb6daa5e0 in batch_flush (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:290
 #6  fd_batch_flush (batch=0x5bdc70, sync=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:308
 #7  0x0000007fb6daba2c in fd_bc_flush (cache=0x461220, ctx=0x52b920) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch_cache.c:141
 #8  0x0000007fb6dac954 in fd_context_flush (pctx=0x52b920, fence=0x0, flags=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_context.c:54
 #9  0x0000007fb6b43294 in st_glFlush (ctx=<optimized out>) at ../../../src/mesa/state_tracker/st_cb_flush.c:121
 #10 0x0000007fb69a84e8 in _mesa_make_current (newCtx=newCtx@entry=0x0, drawBuffer=drawBuffer@entry=0x0, readBuffer=readBuffer@entry=0x0) at ../../../src/mesa/main/context.c:1654
 #11 0x0000007fb6b7ca58 in st_api_make_current (stapi=<optimized out>, stctxi=0x0, stdrawi=0x0, streadi=0x0) at ../../../src/mesa/state_tracker/st_manager.c:827
 #12 0x0000007fb6cc87e8 in dri_unbind_context (cPriv=<optimized out>) at ../../../../../src/gallium/state_trackers/dri/dri_context.c:217
 #13 0x0000007fb6cc80b0 in driUnbindContext (pcp=0x5271e0) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:591
 #14 0x0000007fb7d1da08 in MakeContextCurrent (dpy=0x433380, draw=0, read=0, gc_user=0x0) at ../../../src/glx/glxcurrent.c:214
 #15 0x0000007fb7a8d5e0 in glx_platform_make_current () from /lib64/libwaffle-1.so.0
 #16 0x0000007fb7a894e4 in waffle_make_current () from /lib64/libwaffle-1.so.0
 #17 0x0000007fb7ef8c60 in piglit_wfl_framework_teardown (wfl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_wfl_framework.c:628
 #18 0x0000007fb7ef939c in piglit_winsys_framework_teardown (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:238
 #19 0x0000007fb7ef9c30 in destroy (gl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:212
 #20 0x0000007fb7edb7c4 in destroy () at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:184
 #21 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
 #22 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
 #23 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
 #24 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
 #25 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
 #26 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
 #27 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
 #28 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
 (gdb) r

Fixes: 4aea8fe7 ("gallium/u_queue: fix random crashes when the app calls exit()")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-18 16:32:00 -04:00
Eric Anholt
c1362e78ad vc4: Enable V3D 2.6.
This version of the chip is present on the Cygnus-based 911360 enterprise
phone platform.  It appears to be completely backwards compatible.
2017-04-18 13:21:40 -07:00
Samuel Pitoiset
a18ff34452 st/mesa: add st_convert_sampler()
Similar to st_convert_image(), will be useful for bindless. While
we are at it, rename convert_sampler() to convert_sampler_from_unit()
and make 'st' a const argument.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-18 21:42:01 +02:00
Bartosz Tomczyk
ca41ecf838 mesa/glthread: add async support to ARB_viewport_array functions
v2: fix attribute name, it is count_scale not scale_count

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-18 12:19:12 +02:00
Timothy Arceri
a63919f848 mesa: rename _mesa_add_renderbuffer* functions
These names make it easier to understand what is going on in
regards to references.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-18 10:01:55 +10:00
Nanley Chery
d9d793696b anv/cmd_buffer: Disable CCS on BDW input attachments
The description under RENDER_SURFACE_STATE::RedClearColor says,

   For Sampling Engine Multisampled Surfaces and Render Targets:
    Specifies the clear value for the red channel.
   For Other Surfaces:
    This field is ignored.

This means that the sampler on BDW doesn't support CCS.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-17 16:47:38 -07:00
Lionel Landwerlin
d71efbe5f2 anv: blorp: flush memory after copy
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-17 14:45:57 -07:00
Grazvydas Ignotas
ba6c451390 radv: enable timestampComputeAndGraphics
Commit bfee9866 "radv: Use RELEASE_MEM packet for MEC timestamp query."
added WriteTimestamp handling for compute queues but forgot to flip
the flag.

Tested with DOOM (by me) and CTS (by Bas), but without verification
that these tests actually use timestamps on compute queues.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-17 21:21:35 +03:00
Rob Clark
d4601b0efc freedreno: fix crash if ctx torn down with no rendering
In this case, ctx->flush_queue would not have been initialized.

Fixes: 0b613c20 ("freedreno: enable draw/batch reordering by default")
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-17 14:00:05 -04:00
Rob Clark
15fe9b2347 freedreno/ir3: add 'high' register class
For compute shaders, we need to be able to allocate some "high"
registers (r48.x to r55.w).  (Possibly these are global to all threads
in a warp?)  Add a new register class to handle this.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-17 14:00:05 -04:00
Rob Clark
3c5d309477 freedreno: extract helper for stage->sb for a4xx+
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-17 14:00:05 -04:00
Rob Clark
9567beab36 freedreno/{a4xx,a5xx}: switch to CP_LOAD_STATE4
The layout of CP_LOAD_STATE packet is slightly different on a4xx+.
Switch to the a4xx+ specific CP_LOAD_STATE4 to get the correct encoding.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-17 14:00:05 -04:00
Rob Clark
dfdb1fed78 freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-17 14:00:05 -04:00
Emil Velikov
9915753e63 configure.ac: print deprecation warning as needed
The warning should be printed only when one explicitly uses the
deprecated configure toggle.

Fixes: 7748c3f5eb ("configure.ac: deprecate --with-egl-platforms over
--with-platforms")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-04-17 15:07:44 +01:00
Emil Velikov
19aec22c75 docs: add news item and link release notes for 17.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 14:44:35 +01:00
Emil Velikov
89ef8750f0 docs: add sha256 checksums for 17.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 12434966eb)
2017-04-17 14:43:27 +01:00
Emil Velikov
d271401d61 docs: add release notes for 17.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 367bafc7c1)
2017-04-17 14:43:26 +01:00
Emil Velikov
36aea77cd7 docs: add 17.2.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 14:31:41 +01:00
1466 changed files with 181085 additions and 47541 deletions

View File

@@ -1,24 +1,11 @@
language: c language: c
sudo: required sudo: false
dist: trusty dist: trusty
cache: cache:
directories: apt: true
- $HOME/.ccache ccache: true
addons:
apt:
packages:
- libdrm-dev
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libxcb-dri2-0-dev
- libx11-xcb-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- libelf-dev
- scons
env: env:
global: global:
@@ -32,17 +19,259 @@ env:
- XCBPROTO_VERSION=xcb-proto-1.11 - XCBPROTO_VERSION=xcb-proto-1.11
- LIBXCB_VERSION=libxcb-1.11 - LIBXCB_VERSION=libxcb-1.11
- LIBXSHMFENCE_VERSION=libxshmfence-1.2 - LIBXSHMFENCE_VERSION=libxshmfence-1.2
- LLVM_VERSION=3.9 - LIBTXC_DXTN_VERSION=libtxc_dxtn-1.0.1
- LLVM_PACKAGE="llvm-${LLVM_VERSION} llvm-${LLVM_VERSION}-dev" - LIBVDPAU_VERSION=libvdpau-1.1
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}" - LIBVA_VERSION=libva-1.6.2
- LIBWAYLAND_VERSION=wayland-1.11.1
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig - PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig
- MAKEFLAGS=-j2 - LD_LIBRARY_PATH="$HOME/prefix/lib:$LD_LIBRARY_PATH"
matrix:
- BUILD=make matrix:
- BUILD=scons include:
- env:
- LABEL="make loaders/classic DRI"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="make check"
- DRI_LOADERS="--enable-glx --enable-gbm --enable-egl --with-platforms=x11,drm,surfaceless,wayland --enable-osmesa"
- DRI_DRIVERS="i915,i965,radeon,r200,swrast,nouveau"
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS=""
- VULKAN_DRIVERS=""
addons:
apt:
packages:
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libxdamage-dev
- libxfixes-dev
- env:
# NOTE: Building SWR is 2x (yes two) times slower than all the other
# gallium drivers combined.
# Start this early so that it doesn't hunder the run time.
- LABEL="make Gallium Drivers SWR"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.9
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- OVERRIDE_CC="gcc-5"
- OVERRIDE_CXX="g++-5"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS="swr"
- VULKAN_DRIVERS=""
addons:
apt:
sources:
- ubuntu-toolchain-r-test
- llvm-toolchain-trusty-3.9
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- g++-5
- llvm-3.9-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="make Gallium Drivers Other"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.9
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS="i915,nouveau,pl111,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,etnaviv,imx"
- VULKAN_DRIVERS=""
addons:
apt:
sources:
- llvm-toolchain-trusty-3.9
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- llvm-3.9-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
# NOTE: Analogous to SWR above, building Clover is quite slow.
- LABEL="make Gallium ST Clover"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.6
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- OVERRIDE_CC=gcc-4.7
- OVERRIDE_CXX=g++-4.7
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
# i915 most likely doesn't work with OpenCL.
# Regardless - we're doing a quick build test here.
- GALLIUM_DRIVERS="i915"
- VULKAN_DRIVERS=""
addons:
apt:
sources:
- llvm-toolchain-trusty-3.6
packages:
- libclc-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- g++-4.7
# From sources above
- llvm-3.6-dev
- clang-3.6
- libclang-3.6-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="make Gallium ST Other"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --enable-xa --enable-nine --enable-xvmc --enable-vdpau --enable-va --enable-omx --enable-gallium-osmesa"
# We need swrast for osmesa and nine.
# i915 most likely doesn't work with most ST.
# Regardless - we're doing a quick build test here.
- GALLIUM_DRIVERS="i915,swrast"
- VULKAN_DRIVERS=""
addons:
apt:
packages:
# Nine requires gcc 4.6... which is the one we have right ?
- libxvmc-dev
# Build locally, for now.
#- libvdpau-dev
#- libva-dev
- libomxil-bellagio-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="make Vulkan"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="make -C src/gtest check && make -C src/intel check"
- LLVM_VERSION=3.9
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl --with-platforms=x11,wayland"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --enable-dri3 --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
- GALLIUM_DRIVERS=""
- VULKAN_DRIVERS="intel,radeon"
addons:
apt:
sources:
- llvm-toolchain-trusty-3.9
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- llvm-3.9-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="scons"
- BUILD=scons
- SCONSFLAGS="-j4"
# Explicitly disable.
- SCONS_TARGET="llvm=0"
# Keep it symmetrical to the make build.
- SCONS_CHECK_COMMAND="scons llvm=0 check"
addons:
apt:
packages:
- scons
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="scons LLVM"
- BUILD=scons
- SCONSFLAGS="-j4"
- SCONS_TARGET="llvm=1"
# Keep it symmetrical to the make build.
- SCONS_CHECK_COMMAND="scons llvm=1 check"
- LLVM_VERSION=3.3
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
packages:
- scons
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- llvm-3.3-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="scons SWR"
- BUILD=scons
- SCONSFLAGS="-j4"
- SCONS_TARGET="swr=1"
- LLVM_VERSION=3.9
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
# Keep it symmetrical to the make build. There's no actual SWR, yet.
- SCONS_CHECK_COMMAND="true"
- OVERRIDE_CC="gcc-5"
- OVERRIDE_CXX="g++-5"
addons:
apt:
sources:
- ubuntu-toolchain-r-test
- llvm-toolchain-trusty-3.9
packages:
- scons
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- g++-5
- llvm-3.9-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
install: install:
- export PATH="/usr/lib/ccache:$PATH"
- pip install --user mako - pip install --user mako
# Since libdrm gets updated in configure.ac regularly, try to pick up the # Since libdrm gets updated in configure.ac regularly, try to pick up the
@@ -90,25 +319,64 @@ install:
- tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2 - tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2
- (cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install) - (cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install)
# Install LLVM directly via apt-get (not Travis-CI's apt addon) # libtxc-dxtn uses the patented S3 Texture Compression
# See https://github.com/travis-ci/apt-source-whitelist/pull/205#issuecomment-216054237 # algorithm. Therefore, we don't want to use this library but it is
# still possible through setting the USE_TXC_DXTN variable to yes in
# the travis web UI.
#
# According to Wikipedia, the patent expires on October 2, 2017:
# https://en.wikipedia.org/wiki/S3_Texture_Compression#Patent
- if test "x$USE_TXC_DXTN" = xyes; then
wget https://people.freedesktop.org/~cbrill/libtxc_dxtn/$LIBTXC_DXTN_VERSION.tar.bz2;
tar -jxvf $LIBTXC_DXTN_VERSION.tar.bz2;
(cd $LIBTXC_DXTN_VERSION && ./configure --prefix=$HOME/prefix && make install);
fi
- wget -nv -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add - - wget http://people.freedesktop.org/~aplattner/vdpau/$LIBVDPAU_VERSION.tar.bz2
- sudo apt-add-repository -y 'deb http://llvm.org/apt/trusty llvm-toolchain-trusty-3.9 main' - tar -jxvf $LIBVDPAU_VERSION.tar.bz2
- sudo apt-add-repository -y 'deb http://llvm.org/apt/trusty llvm-toolchain-trusty main' - (cd $LIBVDPAU_VERSION && ./configure --prefix=$HOME/prefix && make install)
- sudo apt-get update -qq
- sudo apt-get install -qq -y $LLVM_PACKAGE - wget http://www.freedesktop.org/software/vaapi/releases/libva/$LIBVA_VERSION.tar.bz2
- tar -jxvf $LIBVA_VERSION.tar.bz2
- (cd $LIBVA_VERSION && ./configure --prefix=$HOME/prefix --disable-wayland --disable-dummy-driver && make install)
- wget http://wayland.freedesktop.org/releases/$LIBWAYLAND_VERSION.tar.xz
- tar -axvf $LIBWAYLAND_VERSION.tar.xz
- (cd $LIBWAYLAND_VERSION && ./configure --prefix=$HOME/prefix --enable-libraries --without-host-scanner --disable-documentation --disable-dtd-validation && make install)
# Generate the header since one is missing on the Travis instance
- mkdir -p linux
- printf "%s\n" \
"#ifndef _LINUX_MEMFD_H" \
"#define _LINUX_MEMFD_H" \
"" \
"#define __NR_memfd_create 319" \
"#define SYS_memfd_create __NR_memfd_create" \
"" \
"#define MFD_CLOEXEC 0x0001U" \
"#define MFD_ALLOW_SEALING 0x0002U" \
"" \
"#endif /* _LINUX_MEMFD_H */" > linux/memfd.h
script: script:
- if test "x$BUILD" = xmake; then - if test "x$BUILD" = xmake; then
test -n "$OVERRIDE_CC" && export CC="$OVERRIDE_CC";
test -n "$OVERRIDE_CXX" && export CXX="$OVERRIDE_CXX";
export CC="$CC -isystem`pwd`";
./autogen.sh --enable-debug ./autogen.sh --enable-debug
--with-platforms=x11,drm $DRI_LOADERS
--with-dri-drivers=i915,i965,radeon,r200,swrast,nouveau --with-dri-drivers=$DRI_DRIVERS
--with-gallium-drivers=i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,etnaviv,imx $GALLIUM_ST
--with-vulkan-drivers=radeon --with-gallium-drivers=$GALLIUM_DRIVERS
--with-vulkan-drivers=$VULKAN_DRIVERS
--disable-llvm-shared-libs --disable-llvm-shared-libs
; &&
make && make check; make && eval $MAKE_CHECK_COMMAND;
elif test x$BUILD = xscons; then fi
scons llvm=1 && scons llvm=1 check;
- if test "x$BUILD" = xscons; then
test -n "$OVERRIDE_CC" && export CC="$OVERRIDE_CC";
test -n "$OVERRIDE_CXX" && export CXX="$OVERRIDE_CXX";
scons $SCONS_TARGET && eval $SCONS_CHECK_COMMAND;
fi fi

View File

@@ -37,11 +37,18 @@ LOCAL_CFLAGS += \
-Wno-missing-field-initializers \ -Wno-missing-field-initializers \
-Wno-initializer-overrides \ -Wno-initializer-overrides \
-Wno-mismatched-tags \ -Wno-mismatched-tags \
-DVERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \ -DPACKAGE_VERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" -DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\"
# XXX: The following __STDC_*_MACROS defines should not be needed.
# It's likely due to a bug elsewhere, but let's temporarily add them
# here to fix the radeonsi build.
LOCAL_CFLAGS += \ LOCAL_CFLAGS += \
-DANDROID_API_LEVEL=$(PLATFORM_SDK_VERSION) \
-DENABLE_SHADER_CACHE \ -DENABLE_SHADER_CACHE \
-D__STDC_CONSTANT_MACROS \
-D__STDC_LIMIT_MACROS \
-DHAVE___BUILTIN_EXPECT \ -DHAVE___BUILTIN_EXPECT \
-DHAVE___BUILTIN_FFS \ -DHAVE___BUILTIN_FFS \
-DHAVE___BUILTIN_FFSLL \ -DHAVE___BUILTIN_FFSLL \
@@ -59,6 +66,7 @@ LOCAL_CFLAGS += \
-DHAVE_PTHREAD=1 \ -DHAVE_PTHREAD=1 \
-DHAVE_DLOPEN \ -DHAVE_DLOPEN \
-DHAVE_DL_ITERATE_PHDR \ -DHAVE_DL_ITERATE_PHDR \
-DMAJOR_IN_SYSMACROS \
-fvisibility=hidden \ -fvisibility=hidden \
-Wno-sign-compare -Wno-sign-compare
@@ -81,28 +89,10 @@ LOCAL_CFLAGS += \
endif endif
endif endif
ifeq ($(MESA_ENABLE_LLVM),true)
ifeq ($(MESA_ANDROID_MAJOR_VERSION),5)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0305 -DMESA_LLVM_VERSION_PATCH=2
ELF_INCLUDES := external/elfutils/0.153/libelf
endif
ifeq ($(MESA_ANDROID_MAJOR_VERSION),6)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0
ELF_INCLUDES := external/elfutils/src/libelf
endif
ifeq ($(MESA_ANDROID_MAJOR_VERSION),7)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0
ELF_INCLUDES := external/elfutils/libelf
endif
endif
ifneq ($(LOCAL_IS_HOST_MODULE),true) ifneq ($(LOCAL_IS_HOST_MODULE),true)
# add libdrm if there are hardware drivers
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SHARED_LIBRARIES += libdrm LOCAL_SHARED_LIBRARIES += libdrm
endif endif
endif
LOCAL_CFLAGS_32 += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\" LOCAL_CFLAGS_32 += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\"
LOCAL_CFLAGS_64 += -DDEFAULT_DRIVER_DIR=\"/system/lib64/$(MESA_DRI_MODULE_REL_PATH)\" LOCAL_CFLAGS_64 += -DDEFAULT_DRIVER_DIR=\"/system/lib64/$(MESA_DRI_MODULE_REL_PATH)\"
@@ -116,7 +106,3 @@ endif
# Quiet down the build system and remove any .h files from the sources # Quiet down the build system and remove any .h files from the sources
LOCAL_SRC_FILES := $(patsubst %.h, , $(LOCAL_SRC_FILES)) LOCAL_SRC_FILES := $(patsubst %.h, , $(LOCAL_SRC_FILES))
ifneq ($(LOCAL_IS_HOST_MODULE),true)
LOCAL_SHARED_LIBRARIES += libz
endif

View File

@@ -24,7 +24,7 @@
# BOARD_GPU_DRIVERS should be defined. The valid values are # BOARD_GPU_DRIVERS should be defined. The valid values are
# #
# classic drivers: i915 i965 # classic drivers: i915 i965
# gallium drivers: swrast freedreno i915g nouveau r300g r600g radeonsi vc4 virgl vmwgfx # gallium drivers: swrast freedreno i915g nouveau pl111 r300g r600g radeonsi vc4 virgl vmwgfx
# #
# The main target is libGLES_mesa. For each classic driver enabled, a DRI # The main target is libGLES_mesa. For each classic driver enabled, a DRI
# module will also be built. DRI modules will be loaded by libGLES_mesa. # module will also be built. DRI modules will be loaded by libGLES_mesa.
@@ -32,6 +32,9 @@
MESA_TOP := $(call my-dir) MESA_TOP := $(call my-dir)
MESA_ANDROID_MAJOR_VERSION := $(word 1, $(subst ., , $(PLATFORM_VERSION))) MESA_ANDROID_MAJOR_VERSION := $(word 1, $(subst ., , $(PLATFORM_VERSION)))
ifneq ($(filter 2 4, $(MESA_ANDROID_MAJOR_VERSION)),)
$(error "Android 4.4 and earlier not supported")
endif
MESA_DRI_MODULE_REL_PATH := dri MESA_DRI_MODULE_REL_PATH := dri
MESA_DRI_MODULE_PATH := $(TARGET_OUT_SHARED_LIBRARIES)/$(MESA_DRI_MODULE_REL_PATH) MESA_DRI_MODULE_PATH := $(TARGET_OUT_SHARED_LIBRARIES)/$(MESA_DRI_MODULE_REL_PATH)
@@ -40,19 +43,37 @@ MESA_DRI_MODULE_UNSTRIPPED_PATH := $(TARGET_OUT_SHARED_LIBRARIES_UNSTRIPPED)/$(M
MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk
MESA_PYTHON2 := python MESA_PYTHON2 := python
classic_drivers := i915 i965 # Lists to convert driver names to boolean variables
gallium_drivers := swrast freedreno i915g nouveau r300g r600g radeonsi vmwgfx vc4 virgl # in form of <driver name>.<boolean make variable>
classic_drivers := i915.HAVE_I915_DRI i965.HAVE_I965_DRI
gallium_drivers := \
swrast.HAVE_GALLIUM_SOFTPIPE \
freedreno.HAVE_GALLIUM_FREEDRENO \
i915g.HAVE_GALLIUM_I915 \
nouveau.HAVE_GALLIUM_NOUVEAU \
pl111.HAVE_GALLIUM_PL111 \
r300g.HAVE_GALLIUM_R300 \
r600g.HAVE_GALLIUM_R600 \
radeonsi.HAVE_GALLIUM_RADEONSI \
vmwgfx.HAVE_GALLIUM_VMWGFX \
vc4.HAVE_GALLIUM_VC4 \
virgl.HAVE_GALLIUM_VIRGL
MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS)) ifeq ($(BOARD_GPU_DRIVERS),all)
MESA_BUILD_CLASSIC := $(filter HAVE_%, $(subst ., , $(classic_drivers)))
# warn about invalid drivers MESA_BUILD_GALLIUM := $(filter HAVE_%, $(subst ., , $(gallium_drivers)))
invalid_drivers := $(filter-out \ else
$(classic_drivers) $(gallium_drivers), $(MESA_GPU_DRIVERS)) # Warn if we have any invalid driver names
ifneq ($(invalid_drivers),) $(foreach d, $(BOARD_GPU_DRIVERS), \
$(warning invalid GPU drivers: $(invalid_drivers)) $(if $(findstring $(d).,$(classic_drivers) $(gallium_drivers)), \
# tidy up , \
MESA_GPU_DRIVERS := $(filter-out $(invalid_drivers), $(MESA_GPU_DRIVERS)) $(warning invalid GPU driver: $(d)) \
) \
)
MESA_BUILD_CLASSIC := $(strip $(foreach d, $(BOARD_GPU_DRIVERS), $(patsubst $(d).%,%, $(filter $(d).%, $(classic_drivers)))))
MESA_BUILD_GALLIUM := $(strip $(foreach d, $(BOARD_GPU_DRIVERS), $(patsubst $(d).%,%, $(filter $(d).%, $(gallium_drivers)))))
endif endif
$(foreach d, $(MESA_BUILD_CLASSIC) $(MESA_BUILD_GALLIUM), $(eval $(d) := true))
# host and target must be the same arch to generate matypes.h # host and target must be the same arch to generate matypes.h
ifeq ($(TARGET_ARCH),$(HOST_ARCH)) ifeq ($(TARGET_ARCH),$(HOST_ARCH))
@@ -61,23 +82,27 @@ else
MESA_ENABLE_ASM := false MESA_ENABLE_ASM := false
endif endif
ifneq ($(filter $(classic_drivers), $(MESA_GPU_DRIVERS)),) ifneq ($(filter true, $(HAVE_GALLIUM_RADEONSI)),)
MESA_BUILD_CLASSIC := true MESA_ENABLE_LLVM := true
else
MESA_BUILD_CLASSIC := false
endif endif
ifneq ($(filter $(gallium_drivers), $(MESA_GPU_DRIVERS)),) define mesa-build-with-llvm
MESA_BUILD_GALLIUM := true $(if $(filter $(MESA_ANDROID_MAJOR_VERSION), 4 5), \
else $(warning Unsupported LLVM version in Android $(MESA_ANDROID_MAJOR_VERSION)),) \
MESA_BUILD_GALLIUM := false $(if $(filter 6,$(MESA_ANDROID_MAJOR_VERSION)), \
endif $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_STATIC_LIBRARIES += libLLVMCore) \
MESA_ENABLE_LLVM := $(if $(filter radeonsi,$(MESA_GPU_DRIVERS)),true,false) $(eval LOCAL_C_INCLUDES += external/llvm/include external/llvm/device/include),) \
$(if $(filter 7,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_STATIC_LIBRARIES += libLLVMCore) \
$(eval LOCAL_C_INCLUDES += external/llvm/include external/llvm/device/include),) \
$(if $(filter O,$(MESA_ANDROID_MAJOR_VERSION)), \
$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_PATCH=0) \
$(eval LOCAL_HEADER_LIBRARIES += llvm-headers),)
endef
# add subdirectories # add subdirectories
ifneq ($(strip $(MESA_GPU_DRIVERS)),)
SUBDIRS := \ SUBDIRS := \
src/gbm \ src/gbm \
src/loader \ src/loader \
@@ -92,11 +117,5 @@ SUBDIRS := \
src/vulkan src/vulkan
INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS)) INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))
ifeq ($(strip $(MESA_BUILD_GALLIUM)),true)
INC_DIRS += $(call all-named-subdir-makefiles,src/gallium) INC_DIRS += $(call all-named-subdir-makefiles,src/gallium)
endif
include $(INC_DIRS) include $(INC_DIRS)
endif

View File

@@ -43,7 +43,7 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-llvm-shared-libs \ --enable-llvm-shared-libs \
--with-platforms=x11,wayland,drm,surfaceless \ --with-platforms=x11,wayland,drm,surfaceless \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \ --with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr,etnaviv,imx \ --with-gallium-drivers=i915,nouveau,r300,pl111,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr,etnaviv,imx \
--with-vulkan-drivers=intel,radeon --with-vulkan-drivers=intel,radeon
ACLOCAL_AMFLAGS = -I m4 ACLOCAL_AMFLAGS = -I m4

View File

@@ -1 +1 @@
17.1.0-devel 17.2.0-devel

View File

@@ -1,3 +1,2 @@
[*.sh] [*.sh]
indent_style = space indent_style = tab
indent_size = 2

View File

@@ -30,7 +30,15 @@ do
if grep -q ^$candidate already_picked ; then if grep -q ^$candidate already_picked ; then
continue continue
fi fi
echo Commit $candidate references $sha # Or if it isn't in the ignore list.
if [ -f bin/.cherry-ignore ] ; then
if grep -q ^$candidate bin/.cherry-ignore ; then
continue
fi
fi
printf "Commit \"%s\" references %s\n" \
"`git log -n1 --pretty=oneline $candidate`" \
"$sha"
done done
done done

View File

@@ -24,35 +24,55 @@ git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |
git log --reverse --pretty=%H -i --grep="fixes:" $latest_branchpoint..origin/master |\ git log --reverse --pretty=%H -i --grep="fixes:" $latest_branchpoint..origin/master |\
while read sha while read sha
do do
# For each one try to extract the tag # Check to see whether the patch is on the ignore list ...
fixes_count=`git show $sha | grep -i "fixes:" | wc -l` if [ -f bin/.cherry-ignore ] ; then
if [ "x$fixes_count" != x1 ] ; then if grep -q ^$sha bin/.cherry-ignore ; then
echo WARNING: Commit $sha has more than one Fixes tag continue
fi
fi fi
fixes=`git show $sha | grep -i "fixes:" | head -n 1`
# The following sed/cut combination is borrowed from GregKH
id=`echo ${fixes} | sed -e 's/^[ \t]*//' | cut -f 2 -d ':' | sed -e 's/^[ \t]*//' | cut -f 1 -d ' '`
# Bail out if we cannot find suitable id. # Skip if it has been already cherry-picked.
# Any specific validation the $id is valid and not some junk, is if grep -q ^$sha already_picked ; then
# implied with the follow up code
if [ "x$id" = x ] ; then
continue continue
fi fi
# Check if the offending commit is in branch. # Place every "fixes:" tag on its own line and join with the next word
# on its line or a later one.
fixes=`git show -s $sha | tr -d "\n" | sed -e 's/fixes:[[:space:]]*/\nfixes:/Ig' | grep "fixes:" | sed -e 's/\(fixes:[a-zA-Z0-9]*\).*$/\1/'`
# Be that cherry-picked ... # For each one try to extract the tag
# ... or landed before the branchpoint. fixes_count=`echo "$fixes" | wc -l`
if grep -q ^$id already_picked || warn=`(test $fixes_count -gt 1 && echo $fixes_count) || echo 0`
grep -q ^$id already_landed ; then while [ $fixes_count -gt 0 ] ; do
# Treat only the current line
id=`echo "$fixes" | tail -n $fixes_count | head -n 1 | cut -d : -f 2`
fixes_count=$(($fixes_count-1))
# Finally nominate the fix if it hasn't landed yet. # Bail out if we cannot find suitable id.
if grep -q ^$sha already_picked ; then # Any specific validation the $id is valid and not some junk, is
# implied with the follow up code
if [ "x$id" = x ] ; then
continue continue
fi fi
echo Commit $sha fixes $id # Check if the offending commit is in branch.
# Be that cherry-picked ...
# ... or landed before the branchpoint.
if grep -q ^$id already_picked ||
grep -q ^$id already_landed ; then
printf "Commit \"%s\" fixes %s\n" \
"`git log -n1 --pretty=oneline $sha`" \
"$id"
warn=$(($warn-1))
fi
done
if [ $warn -gt 0 ] ; then
printf "WARNING: Commit \"%s\" has more than one Fixes tag\n" \
"`git log -n1 --pretty=oneline $sha`"
fi fi
done done

View File

@@ -133,7 +133,7 @@ class PerfParser(LineParser):
def __init__(self, infile, symbol): def __init__(self, infile, symbol):
LineParser.__init__(self, infile) LineParser.__init__(self, infile)
self.symbol = symbol self.symbol = symbol
def readline(self): def readline(self):
# Override LineParser.readline to ignore comment lines # Override LineParser.readline to ignore comment lines
@@ -155,7 +155,7 @@ class PerfParser(LineParser):
addresses.sort() addresses.sort()
total_samples = 0 total_samples = 0
sys.stdout.write('%s:\n' % self.symbol) sys.stdout.write('%s:\n' % self.symbol)
for address, instr in asm: for address, instr in asm:
try: try:
sample = samples.pop(address) sample = samples.pop(address)

View File

@@ -74,7 +74,7 @@ AC_SUBST([OPENCL_VERSION])
# in the first entry. # in the first entry.
LIBDRM_REQUIRED=2.4.75 LIBDRM_REQUIRED=2.4.75
LIBDRM_RADEON_REQUIRED=2.4.71 LIBDRM_RADEON_REQUIRED=2.4.71
LIBDRM_AMDGPU_REQUIRED=2.4.79 LIBDRM_AMDGPU_REQUIRED=2.4.81
LIBDRM_INTEL_REQUIRED=2.4.75 LIBDRM_INTEL_REQUIRED=2.4.75
LIBDRM_NVVIEUX_REQUIRED=2.4.66 LIBDRM_NVVIEUX_REQUIRED=2.4.66
LIBDRM_NOUVEAU_REQUIRED=2.4.66 LIBDRM_NOUVEAU_REQUIRED=2.4.66
@@ -97,13 +97,13 @@ XSHMFENCE_REQUIRED=1.1
XVMC_REQUIRED=1.0.6 XVMC_REQUIRED=1.0.6
PYTHON_MAKO_REQUIRED=0.8.0 PYTHON_MAKO_REQUIRED=0.8.0
LIBSENSORS_REQUIRED=4.0.0 LIBSENSORS_REQUIRED=4.0.0
ZLIB_REQUIRED=1.2.8 ZLIB_REQUIRED=1.2.3
dnl LLVM versions dnl LLVM versions
LLVM_REQUIRED_GALLIUM=3.3.0 LLVM_REQUIRED_GALLIUM=3.3.0
LLVM_REQUIRED_OPENCL=3.6.0 LLVM_REQUIRED_OPENCL=3.6.0
LLVM_REQUIRED_R600=3.8.0 LLVM_REQUIRED_R600=3.9.0
LLVM_REQUIRED_RADEONSI=3.8.0 LLVM_REQUIRED_RADEONSI=3.9.0
LLVM_REQUIRED_RADV=3.9.0 LLVM_REQUIRED_RADV=3.9.0
LLVM_REQUIRED_SWR=3.9.0 LLVM_REQUIRED_SWR=3.9.0
@@ -269,7 +269,7 @@ DEFINES="-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS"
AC_SUBST([DEFINES]) AC_SUBST([DEFINES])
android=no android=no
case "$host_os" in case "$host_os" in
*-android) *-android*)
android=yes android=yes
;; ;;
linux*|*-gnu*|gnu*|cygwin*) linux*|*-gnu*|gnu*|cygwin*)
@@ -455,7 +455,7 @@ int main () {
CFLAGS=$save_CFLAGS CFLAGS=$save_CFLAGS
AC_ARG_ENABLE(pwr8, AC_ARG_ENABLE(pwr8,
[AS_HELP_STRING([--disable-pwr8-inst], [AS_HELP_STRING([--disable-pwr8],
[disable POWER8-specific instructions])], [disable POWER8-specific instructions])],
[enable_pwr8=$enableval], [enable_pwr8=auto]) [enable_pwr8=$enableval], [enable_pwr8=auto])
@@ -724,7 +724,7 @@ dnl Arch/platform-specific settings
dnl dnl
AC_ARG_ENABLE([asm], AC_ARG_ENABLE([asm],
[AS_HELP_STRING([--disable-asm], [AS_HELP_STRING([--disable-asm],
[disable assembly usage @<:@default=enabled on supported plaforms@:>@])], [disable assembly usage @<:@default=enabled on supported platforms@:>@])],
[enable_asm="$enableval"], [enable_asm="$enableval"],
[enable_asm=yes] [enable_asm=yes]
) )
@@ -766,6 +766,13 @@ if test "x$enable_asm" = xyes; then
;; ;;
esac esac
;; ;;
powerpc64le)
case "$host_os" in
linux*)
asm_arch=ppc64le
;;
esac
;;
esac esac
case "$asm_arch" in case "$asm_arch" in
@@ -781,6 +788,10 @@ if test "x$enable_asm" = xyes; then
DEFINES="$DEFINES -DUSE_SPARC_ASM" DEFINES="$DEFINES -DUSE_SPARC_ASM"
AC_MSG_RESULT([yes, sparc]) AC_MSG_RESULT([yes, sparc])
;; ;;
ppc64le)
DEFINES="$DEFINES -DUSE_PPC64LE_ASM"
AC_MSG_RESULT([yes, ppc64le])
;;
*) *)
AC_MSG_RESULT([no, platform not supported]) AC_MSG_RESULT([no, platform not supported])
;; ;;
@@ -837,6 +848,11 @@ dnl is not valid for that platform.
if test "x$android" = xno; then if test "x$android" = xno; then
test -z "$PTHREAD_LIBS" && PTHREAD_LIBS="-lpthread" test -z "$PTHREAD_LIBS" && PTHREAD_LIBS="-lpthread"
fi fi
dnl According to the manual when using pthreads, one should add -pthread to
dnl both compile and link-time arguments.
dnl In practise that should be sufficient for all platforms, since any
dnl platforms build with GCC and Clang support the flag.
PTHREAD_LIBS="$PTHREAD_LIBS -pthread"
dnl pthread-stubs is mandatory on BSD platforms, due to the nature of the dnl pthread-stubs is mandatory on BSD platforms, due to the nature of the
dnl project. Even then there's a notable issue as described in the project README dnl project. Even then there's a notable issue as described in the project README
@@ -851,8 +867,6 @@ esac
if test "x$pthread_stubs_possible" = xyes; then if test "x$pthread_stubs_possible" = xyes; then
PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs >= 0.4) PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs >= 0.4)
AC_SUBST(PTHREADSTUBS_CFLAGS)
AC_SUBST(PTHREADSTUBS_LIBS)
fi fi
dnl SELinux awareness. dnl SELinux awareness.
@@ -1066,27 +1080,18 @@ AC_SUBST([LLVM_INCLUDEDIR])
dnl dnl
dnl libunwind dnl libunwind
dnl dnl
PKG_CHECK_EXISTS(libunwind, [HAVE_LIBUNWIND=yes], [HAVE_LIBUNWIND=no])
AC_ARG_ENABLE([libunwind], AC_ARG_ENABLE([libunwind],
[AS_HELP_STRING([--enable-libunwind], [AS_HELP_STRING([--enable-libunwind],
[Use libunwind for backtracing (default: auto)])], [Use libunwind for backtracing (default: auto)])],
[LIBUNWIND="$enableval"], [LIBUNWIND="$enableval"],
[LIBUNWIND="auto"]) [LIBUNWIND="$HAVE_LIBUNWIND"])
PKG_CHECK_EXISTS(libunwind, [HAVE_LIBUNWIND=yes], [HAVE_LIBUNWIND=no])
if test "x$LIBUNWIND" = "xauto"; then
LIBUNWIND="$HAVE_LIBUNWIND"
fi
if test "x$LIBUNWIND" = "xyes"; then if test "x$LIBUNWIND" = "xyes"; then
PKG_CHECK_MODULES(LIBUNWIND, libunwind) PKG_CHECK_MODULES(LIBUNWIND, libunwind)
if test "x$HAVE_LIBUNWIND" != "xyes"; then
AC_MSG_ERROR([libunwind requested but not installed.])
fi
AC_DEFINE(HAVE_LIBUNWIND, 1, [Have libunwind support]) AC_DEFINE(HAVE_LIBUNWIND, 1, [Have libunwind support])
fi fi
AM_CONDITIONAL(HAVE_LIBUNWIND, [test "x$LIBUNWIND" = xyes])
dnl Options for APIs dnl Options for APIs
AC_ARG_ENABLE([opengl], AC_ARG_ENABLE([opengl],
@@ -1245,7 +1250,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
AC_ARG_WITH([gallium-drivers], AC_ARG_WITH([gallium-drivers],
[AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@], [AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
[comma delimited Gallium drivers list, e.g. [comma delimited Gallium drivers list, e.g.
"i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,swr,vc4,virgl,etnaviv,imx" "i915,nouveau,r300,r600,radeonsi,freedreno,pl111,svga,swrast,swr,vc4,virgl,etnaviv,imx"
@<:@default=r300,r600,svga,swrast@:>@])], @<:@default=r300,r600,svga,swrast@:>@])],
[with_gallium_drivers="$withval"], [with_gallium_drivers="$withval"],
[with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"]) [with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
@@ -1367,7 +1372,7 @@ if test "x$enable_libglvnd" = xyes ; then
esac esac
PKG_CHECK_MODULES([GLVND], libglvnd >= 0.2.0) PKG_CHECK_MODULES([GLVND], libglvnd >= 0.2.0)
PKG_CHECK_VAR(LIBGLVND_DATADIR, libglvnd, datadir) LIBGLVND_DATADIR=`$PKG_CONFIG --variable=datadir libglvnd`
AC_SUBST([LIBGLVND_DATADIR]) AC_SUBST([LIBGLVND_DATADIR])
DEFINES="${DEFINES} -DUSE_LIBGLVND=1" DEFINES="${DEFINES} -DUSE_LIBGLVND=1"
@@ -1541,15 +1546,10 @@ xdri)
PKG_CHECK_MODULES([DRI2PROTO], [dri2proto >= $DRI2PROTO_REQUIRED]) PKG_CHECK_MODULES([DRI2PROTO], [dri2proto >= $DRI2PROTO_REQUIRED])
GL_PC_REQ_PRIV="$GL_PC_REQ_PRIV libdrm >= $LIBDRM_REQUIRED" GL_PC_REQ_PRIV="$GL_PC_REQ_PRIV libdrm >= $LIBDRM_REQUIRED"
if test x"$enable_dri" = xyes; then if test x"$enable_dri" = xyes; then
dri_modules="$dri_modules xcb-dri2 >= $XCBDRI2_REQUIRED" dri_modules="$dri_modules xcb-dri2 >= $XCBDRI2_REQUIRED"
fi fi
if test x"$enable_dri3" = xyes; then
PKG_CHECK_EXISTS([xcb >= $XCB_REQUIRED], [], AC_MSG_ERROR([DRI3 requires xcb >= $XCB_REQUIRED]))
dri3_modules="xcb xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
PKG_CHECK_MODULES([XCB_DRI3], [$dri3_modules])
fi
fi fi
if test x"$dri_platform" = xapple ; then if test x"$dri_platform" = xapple ; then
DEFINES="$DEFINES -DGLX_USE_APPLEGL" DEFINES="$DEFINES -DGLX_USE_APPLEGL"
@@ -1638,6 +1638,111 @@ if test "x$enable_glx_read_only_text" = xyes; then
DEFINES="$DEFINES -DGLX_X86_READONLY_TEXT" DEFINES="$DEFINES -DGLX_X86_READONLY_TEXT"
fi fi
dnl
dnl DEPRECATED: EGL Platforms configuration
dnl
AC_ARG_WITH([egl-platforms],
[AS_HELP_STRING([--with-egl-platforms@<:@=DIRS...@:>@],
[DEPRECATED: use --with-platforms instead@<:@default=auto@:>@])],
[with_egl_platforms="$withval"],
[with_egl_platforms=auto])
if test "x$with_egl_platforms" = xauto; then
with_egl_platforms="x11,surfaceless"
if test "x$enable_gbm" = xyes; then
with_egl_platforms="$with_egl_platforms,drm"
fi
else
AC_MSG_WARN([--with-egl-platforms is deprecated. Use --with-platforms instead.])
fi
dnl
dnl Platforms configuration
dnl
AC_ARG_WITH([platforms],
[AS_HELP_STRING([--with-platforms@<:@=DIRS...@:>@],
[comma delimited native platforms libEGL/Vulkan/other supports, e.g.
"x11,drm,wayland,surfaceless..." @<:@default=auto@:>@])],
[with_platforms="$withval"],
[with_platforms=auto])
# Reuse the autodetection rather than duplicating it.
if test "x$with_platforms" = xauto; then
with_platforms=$with_egl_platforms
fi
PKG_CHECK_MODULES([WAYLAND_SCANNER], [wayland-scanner],
WAYLAND_SCANNER=`$PKG_CONFIG --variable=wayland_scanner wayland-scanner`,
WAYLAND_SCANNER='')
if test "x$WAYLAND_SCANNER" = x; then
AC_PATH_PROG([WAYLAND_SCANNER], [wayland-scanner], [:])
fi
# Do per platform setups and checks
platforms=`IFS=', '; echo $with_platforms`
for plat in $platforms; do
case "$plat" in
wayland)
PKG_CHECK_MODULES([WAYLAND], [wayland-client >= $WAYLAND_REQUIRED wayland-server >= $WAYLAND_REQUIRED])
if test "x$WAYLAND_SCANNER" = "x:"; then
AC_MSG_ERROR([wayland-scanner is needed to compile the wayland platform])
fi
DEFINES="$DEFINES -DHAVE_WAYLAND_PLATFORM"
;;
x11)
PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED xcb-xfixes])
DEFINES="$DEFINES -DHAVE_X11_PLATFORM"
;;
drm)
test "x$enable_gbm" = "xno" &&
AC_MSG_ERROR([EGL platform drm needs gbm])
DEFINES="$DEFINES -DHAVE_DRM_PLATFORM"
;;
surfaceless)
DEFINES="$DEFINES -DHAVE_SURFACELESS_PLATFORM"
;;
android)
PKG_CHECK_MODULES([ANDROID], [cutils hardware sync])
DEFINES="$DEFINES -DHAVE_ANDROID_PLATFORM"
;;
*)
AC_MSG_ERROR([platform '$plat' does not exist])
;;
esac
case "$plat" in
wayland|drm|surfaceless)
require_libdrm "Platform $plat"
;;
esac
done
if test "x$enable_glx" != xno; then
if ! echo "$platforms" | grep -q 'x11'; then
AC_MSG_ERROR([Building GLX without the x11 platform is not supported])
fi
fi
if test x"$enable_dri3" = xyes; then
DEFINES="$DEFINES -DHAVE_DRI3"
dri3_modules="x11-xcb xcb >= $XCB_REQUIRED xcb-dri3 xcb-xfixes xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
PKG_CHECK_MODULES([XCB_DRI3], [$dri3_modules])
fi
AM_CONDITIONAL(HAVE_PLATFORM_X11, echo "$platforms" | grep -q 'x11')
AM_CONDITIONAL(HAVE_PLATFORM_WAYLAND, echo "$platforms" | grep -q 'wayland')
AM_CONDITIONAL(HAVE_PLATFORM_DRM, echo "$platforms" | grep -q 'drm')
AM_CONDITIONAL(HAVE_PLATFORM_SURFACELESS, echo "$platforms" | grep -q 'surfaceless')
AM_CONDITIONAL(HAVE_PLATFORM_ANDROID, echo "$platforms" | grep -q 'android')
dnl dnl
dnl More DRI setup dnl More DRI setup
dnl dnl
@@ -1680,10 +1785,6 @@ if test "x$enable_dri" = xyes; then
# Platform specific settings and drivers to build # Platform specific settings and drivers to build
case "$host_os" in case "$host_os" in
linux*) linux*)
if test "x$enable_dri3" = xyes; then
DEFINES="$DEFINES -DHAVE_DRI3"
fi
case "$host_cpu" in case "$host_cpu" in
powerpc* | sparc*) powerpc* | sparc*)
# Build only the drivers for cards that exist on PowerPC/sparc # Build only the drivers for cards that exist on PowerPC/sparc
@@ -1740,12 +1841,11 @@ if test -n "$with_dri_drivers"; then
xi915) xi915)
require_libdrm "i915" require_libdrm "i915"
HAVE_I915_DRI=yes HAVE_I915_DRI=yes
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED]) PKG_CHECK_MODULES([I915], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
;; ;;
xi965) xi965)
require_libdrm "i965" require_libdrm "i965"
HAVE_I965_DRI=yes HAVE_I965_DRI=yes
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
;; ;;
xnouveau) xnouveau)
require_libdrm "nouveau" require_libdrm "nouveau"
@@ -1839,6 +1939,14 @@ AC_ARG_WITH([vulkan-icddir],
[VULKAN_ICD_INSTALL_DIR='${datarootdir}/vulkan/icd.d']) [VULKAN_ICD_INSTALL_DIR='${datarootdir}/vulkan/icd.d'])
AC_SUBST([VULKAN_ICD_INSTALL_DIR]) AC_SUBST([VULKAN_ICD_INSTALL_DIR])
require_x11_dri3() {
if echo "$platforms" | grep -q 'x11'; then
if test "x$enable_dri3" != xyes; then
AC_MSG_ERROR([$1 Vulkan driver requires DRI3 when built with X11])
fi
fi
}
if test -n "$with_vulkan_drivers"; then if test -n "$with_vulkan_drivers"; then
if test "x$ac_cv_func_dl_iterate_phdr" = xno; then if test "x$ac_cv_func_dl_iterate_phdr" = xno; then
AC_MSG_ERROR([Vulkan drivers require the dl_iterate_phdr function]) AC_MSG_ERROR([Vulkan drivers require the dl_iterate_phdr function])
@@ -1849,13 +1957,14 @@ if test -n "$with_vulkan_drivers"; then
case "x$driver" in case "x$driver" in
xintel) xintel)
require_libdrm "ANV" require_libdrm "ANV"
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED]) require_x11_dri3 "ANV"
HAVE_INTEL_VULKAN=yes HAVE_INTEL_VULKAN=yes
;; ;;
xradeon) xradeon)
require_libdrm "radv" require_libdrm "radv"
PKG_CHECK_MODULES([AMDGPU], [libdrm >= $LIBDRM_AMDGPU_REQUIRED libdrm_amdgpu >= $LIBDRM_AMDGPU_REQUIRED]) PKG_CHECK_MODULES([AMDGPU], [libdrm >= $LIBDRM_AMDGPU_REQUIRED libdrm_amdgpu >= $LIBDRM_AMDGPU_REQUIRED])
radeon_llvm_check $LLVM_REQUIRED_RADV "radv" radeon_llvm_check $LLVM_REQUIRED_RADV "radv"
require_x11_dri3 "radv"
HAVE_RADEON_VULKAN=yes HAVE_RADEON_VULKAN=yes
;; ;;
*) *)
@@ -1961,23 +2070,47 @@ if test "x$enable_xa" = xyes; then
fi fi
AM_CONDITIONAL(HAVE_ST_XA, test "x$enable_xa" = xyes) AM_CONDITIONAL(HAVE_ST_XA, test "x$enable_xa" = xyes)
if echo $platforms | grep -q "x11"; then
have_xvmc_platform=yes
else
have_xvmc_platform=no
fi
if echo $platforms | grep -q "x11"; then
have_vdpau_platform=yes
else
have_vdpau_platform=no
fi
if echo $platforms | grep -q "x11\|drm"; then
have_omx_platform=yes
else
have_omx_platform=no
fi
if echo $platforms | grep -q "x11\|drm\|wayland"; then
have_va_platform=yes
else
have_va_platform=no
fi
dnl dnl
dnl Gallium G3DVL configuration dnl Gallium G3DVL configuration
dnl dnl
if test -n "$with_gallium_drivers" -a "x$with_gallium_drivers" != xswrast; then if test -n "$with_gallium_drivers" -a "x$with_gallium_drivers" != xswrast; then
if test "x$enable_xvmc" = xauto; then if test "x$enable_xvmc" = xauto -a "x$have_xvmc_platform" = xyes; then
PKG_CHECK_EXISTS([xvmc >= $XVMC_REQUIRED], [enable_xvmc=yes], [enable_xvmc=no]) PKG_CHECK_EXISTS([xvmc >= $XVMC_REQUIRED], [enable_xvmc=yes], [enable_xvmc=no])
fi fi
if test "x$enable_vdpau" = xauto; then if test "x$enable_vdpau" = xauto -a "x$have_vdpau_platform" = xyes; then
PKG_CHECK_EXISTS([vdpau >= $VDPAU_REQUIRED], [enable_vdpau=yes], [enable_vdpau=no]) PKG_CHECK_EXISTS([vdpau >= $VDPAU_REQUIRED], [enable_vdpau=yes], [enable_vdpau=no])
fi fi
if test "x$enable_omx" = xauto; then if test "x$enable_omx" = xauto -a "x$have_omx_platform" = xyes; then
PKG_CHECK_EXISTS([libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED], [enable_omx=yes], [enable_omx=no]) PKG_CHECK_EXISTS([libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED], [enable_omx=yes], [enable_omx=no])
fi fi
if test "x$enable_va" = xauto; then if test "x$enable_va" = xauto -a "x$have_va_platform" = xyes; then
PKG_CHECK_EXISTS([libva >= $LIBVA_REQUIRED], [enable_va=yes], [enable_va=no]) PKG_CHECK_EXISTS([libva >= $LIBVA_REQUIRED], [enable_va=yes], [enable_va=no])
fi fi
fi fi
@@ -1995,23 +2128,24 @@ if test "x$enable_xvmc" = xyes -o \
"x$enable_vdpau" = xyes -o \ "x$enable_vdpau" = xyes -o \
"x$enable_omx" = xyes -o \ "x$enable_omx" = xyes -o \
"x$enable_va" = xyes; then "x$enable_va" = xyes; then
if test x"$enable_dri3" = xyes; then PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
PKG_CHECK_MODULES([VL], [xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED
xcb-xfixes x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
else
PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
fi
need_gallium_vl_winsys=yes need_gallium_vl_winsys=yes
fi fi
AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes) AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes)
if test "x$enable_xvmc" = xyes; then if test "x$enable_xvmc" = xyes; then
if test "x$have_xvmc_platform" != xyes; then
AC_MSG_ERROR([XVMC requires the x11 platforms])
fi
PKG_CHECK_MODULES([XVMC], [xvmc >= $XVMC_REQUIRED]) PKG_CHECK_MODULES([XVMC], [xvmc >= $XVMC_REQUIRED])
gallium_st="$gallium_st xvmc" gallium_st="$gallium_st xvmc"
fi fi
AM_CONDITIONAL(HAVE_ST_XVMC, test "x$enable_xvmc" = xyes) AM_CONDITIONAL(HAVE_ST_XVMC, test "x$enable_xvmc" = xyes)
if test "x$enable_vdpau" = xyes; then if test "x$enable_vdpau" = xyes; then
if test "x$have_vdpau_platform" != xyes; then
AC_MSG_ERROR([VDPAU requires the x11 platforms])
fi
PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED]) PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED])
gallium_st="$gallium_st vdpau" gallium_st="$gallium_st vdpau"
DEFINES="$DEFINES -DHAVE_ST_VDPAU" DEFINES="$DEFINES -DHAVE_ST_VDPAU"
@@ -2019,12 +2153,18 @@ fi
AM_CONDITIONAL(HAVE_ST_VDPAU, test "x$enable_vdpau" = xyes) AM_CONDITIONAL(HAVE_ST_VDPAU, test "x$enable_vdpau" = xyes)
if test "x$enable_omx" = xyes; then if test "x$enable_omx" = xyes; then
if test "x$have_omx_platform" != xyes; then
AC_MSG_ERROR([OMX requires at least one of the x11 or drm platforms])
fi
PKG_CHECK_MODULES([OMX], [libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED]) PKG_CHECK_MODULES([OMX], [libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED])
gallium_st="$gallium_st omx" gallium_st="$gallium_st omx"
fi fi
AM_CONDITIONAL(HAVE_ST_OMX, test "x$enable_omx" = xyes) AM_CONDITIONAL(HAVE_ST_OMX, test "x$enable_omx" = xyes)
if test "x$enable_va" = xyes; then if test "x$enable_va" = xyes; then
if test "x$have_va_platform" != xyes; then
AC_MSG_ERROR([VA requires at least one of the x11 drm or wayland platforms])
fi
PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED]) PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED])
gallium_st="$gallium_st va" gallium_st="$gallium_st va"
fi fi
@@ -2141,112 +2281,21 @@ dnl Gallium configuration
dnl dnl
AM_CONDITIONAL(HAVE_GALLIUM, test -n "$with_gallium_drivers") AM_CONDITIONAL(HAVE_GALLIUM, test -n "$with_gallium_drivers")
dnl
dnl DEPRECATED: EGL Platforms configuration
dnl
AC_ARG_WITH([egl-platforms],
[AS_HELP_STRING([--with-egl-platforms@<:@=DIRS...@:>@],
[DEPRECATED: use --with-plaforms instead@<:@default=auto@:>@])],
[with_egl_platforms="$withval"],
[with_egl_platforms=auto])
if test "x$with_egl_platforms" = xauto; then
AC_MSG_WARN([--with-egl-platforms is deprecated. Use --with-plaforms instead.])
if test "x$enable_egl" = xyes; then
if test "x$enable_gbm" = xyes; then
with_egl_platforms="x11,drm"
else
with_egl_platforms="x11"
fi
else
with_egl_platforms=""
fi
fi
dnl
dnl Platforms configuration
dnl
AC_ARG_WITH([platforms],
[AS_HELP_STRING([--with-platforms@<:@=DIRS...@:>@],
[comma delimited native platforms libEGL/Vulkan/other supports, e.g.
"x11,drm,wayland,surfaceless..." @<:@default=auto@:>@])],
[with_platforms="$withval"],
[with_platforms=auto])
# For the time being, we still reuse the EGL named variables/defines.
if test "x$with_platforms" != xauto; then
with_egl_platforms=$with_platforms
fi
PKG_CHECK_MODULES([WAYLAND_SCANNER], [wayland-scanner],
WAYLAND_SCANNER=`$PKG_CONFIG --variable=wayland_scanner wayland-scanner`,
WAYLAND_SCANNER='')
if test "x$WAYLAND_SCANNER" = x; then
AC_PATH_PROG([WAYLAND_SCANNER], [wayland-scanner], [:])
fi
# Do per-EGL platform setups and checks
egl_platforms=`IFS=', '; echo $with_egl_platforms`
for plat in $egl_platforms; do
case "$plat" in
wayland)
PKG_CHECK_MODULES([WAYLAND], [wayland-client >= $WAYLAND_REQUIRED wayland-server >= $WAYLAND_REQUIRED])
if test "x$WAYLAND_SCANNER" = "x:"; then
AC_MSG_ERROR([wayland-scanner is needed to compile the wayland egl platform])
fi
;;
x11)
PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED xcb-xfixes])
;;
drm)
test "x$enable_gbm" = "xno" &&
AC_MSG_ERROR([EGL platform drm needs gbm])
;;
surfaceless)
;;
android)
PKG_CHECK_MODULES([ANDROID], [cutils hardware sync])
;;
*)
AC_MSG_ERROR([EGL platform '$plat' does not exist])
;;
esac
case "$plat" in
wayland|drm|surfaceless)
require_libdrm "Platform $plat"
;;
esac
done
# libEGL wants to default to the first platform specified in # libEGL wants to default to the first platform specified in
# ./configure. parse that here. # ./configure. parse that here.
if test "x$egl_platforms" != "x"; then if test "x$platforms" != "x"; then
FIRST_PLATFORM_CAPS=`echo $egl_platforms | sed 's| .*||' | tr '[[a-z]]' '[[A-Z]]'` FIRST_PLATFORM_CAPS=`echo $platforms | sed 's| .*||' | tr '[[a-z]]' '[[A-Z]]'`
EGL_NATIVE_PLATFORM="_EGL_PLATFORM_$FIRST_PLATFORM_CAPS" EGL_NATIVE_PLATFORM="_EGL_PLATFORM_$FIRST_PLATFORM_CAPS"
else else
EGL_NATIVE_PLATFORM="_EGL_INVALID_PLATFORM" EGL_NATIVE_PLATFORM="_EGL_INVALID_PLATFORM"
fi fi
AM_CONDITIONAL(HAVE_PLATFORM_X11, echo "$egl_platforms" | grep -q 'x11')
AM_CONDITIONAL(HAVE_PLATFORM_WAYLAND, echo "$egl_platforms" | grep -q 'wayland')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_DRM, echo "$egl_platforms" | grep -q 'drm')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_SURFACELESS, echo "$egl_platforms" | grep -q 'surfaceless')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_ANDROID, echo "$egl_platforms" | grep -q 'android')
AC_SUBST([EGL_NATIVE_PLATFORM]) AC_SUBST([EGL_NATIVE_PLATFORM])
AC_SUBST([EGL_CFLAGS]) AC_SUBST([EGL_CFLAGS])
# If we don't have the X11 platform, set this define so we don't try to include # If we don't have the X11 platform, set this define so we don't try to include
# the X11 headers. # the X11 headers.
if ! echo "$egl_platforms" | grep -q 'x11'; then if ! echo "$platforms" | grep -q 'x11'; then
DEFINES="$DEFINES -DMESA_EGL_NO_X11_HEADERS" DEFINES="$DEFINES -DMESA_EGL_NO_X11_HEADERS"
GL_PC_CFLAGS="$GL_PC_CFLAGS -DMESA_EGL_NO_X11_HEADERS" GL_PC_CFLAGS="$GL_PC_CFLAGS -DMESA_EGL_NO_X11_HEADERS"
fi fi
@@ -2316,7 +2365,7 @@ dnl DRM is needed by X, Wayland, and offscreen rendering.
dnl Surfaceless is an alternative for the last one. dnl Surfaceless is an alternative for the last one.
dnl dnl
require_basic_egl() { require_basic_egl() {
case "$with_egl_platforms" in case "$with_platforms" in
*drm*|*surfaceless*) *drm*|*surfaceless*)
;; ;;
*) *)
@@ -2378,7 +2427,7 @@ if test -n "$with_gallium_drivers"; then
;; ;;
xi915) xi915)
HAVE_GALLIUM_I915=yes HAVE_GALLIUM_I915=yes
PKG_CHECK_MODULES([INTEL], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED]) PKG_CHECK_MODULES([I915], [libdrm >= $LIBDRM_INTEL_REQUIRED libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
require_libdrm "Gallium i915" require_libdrm "Gallium i915"
;; ;;
xr300) xr300)
@@ -2435,10 +2484,10 @@ if test -n "$with_gallium_drivers"; then
xswr) xswr)
llvm_require_version $LLVM_REQUIRED_SWR "swr" llvm_require_version $LLVM_REQUIRED_SWR "swr"
swr_require_cxx_feature_flags "C++14" "__cplusplus >= 201402L" \ swr_require_cxx_feature_flags "C++11" "__cplusplus >= 201103L" \
"-std=c++14" \ ",-std=c++11" \
SWR_CXX14_CXXFLAGS SWR_CXX11_CXXFLAGS
AC_SUBST([SWR_CXX14_CXXFLAGS]) AC_SUBST([SWR_CXX11_CXXFLAGS])
swr_require_cxx_feature_flags "AVX" "defined(__AVX__)" \ swr_require_cxx_feature_flags "AVX" "defined(__AVX__)" \
",-mavx,-march=core-avx" \ ",-mavx,-march=core-avx" \
@@ -2462,10 +2511,15 @@ if test -n "$with_gallium_drivers"; then
DEFINES="$DEFINES -DUSE_VC4_SIMULATOR"], DEFINES="$DEFINES -DUSE_VC4_SIMULATOR"],
[USE_VC4_SIMULATOR=no]) [USE_VC4_SIMULATOR=no])
;; ;;
xpl111)
HAVE_GALLIUM_PL111=yes
;;
xvirgl) xvirgl)
HAVE_GALLIUM_VIRGL=yes HAVE_GALLIUM_VIRGL=yes
require_libdrm "virgl" require_libdrm "virgl"
require_basic_egl "virgl" if test "x$enable_egl" = xyes; then
require_basic_egl "virgl"
fi
;; ;;
*) *)
AC_MSG_ERROR([Unknown Gallium driver: $driver]) AC_MSG_ERROR([Unknown Gallium driver: $driver])
@@ -2474,6 +2528,10 @@ if test -n "$with_gallium_drivers"; then
done done
fi fi
# XXX: Keep in sync with LLVM_REQUIRED_SWR
AM_CONDITIONAL(SWR_INVALID_LLVM_VERSION, test "x$LLVM_VERSION" != x3.9.0 -a \
"x$LLVM_VERSION" != x3.9.1)
if test "x$enable_llvm" = "xyes" -a "$with_gallium_drivers"; then if test "x$enable_llvm" = "xyes" -a "$with_gallium_drivers"; then
llvm_require_version $LLVM_REQUIRED_GALLIUM "gallium" llvm_require_version $LLVM_REQUIRED_GALLIUM "gallium"
llvm_add_default_components "gallium" llvm_add_default_components "gallium"
@@ -2485,6 +2543,10 @@ if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_IMX" = xyes ; then
AC_MSG_ERROR([Building with imx requires etnaviv]) AC_MSG_ERROR([Building with imx requires etnaviv])
fi fi
if test "x$HAVE_GALLIUM_VC4" != xyes -a "x$HAVE_GALLIUM_PL111" = xyes ; then
AC_MSG_ERROR([Building with pl111 requires vc4])
fi
dnl dnl
dnl Set defines and buildtime variables only when using LLVM. dnl Set defines and buildtime variables only when using LLVM.
dnl dnl
@@ -2549,6 +2611,7 @@ fi
AM_CONDITIONAL(HAVE_GALLIUM_SVGA, test "x$HAVE_GALLIUM_SVGA" = xyes) AM_CONDITIONAL(HAVE_GALLIUM_SVGA, test "x$HAVE_GALLIUM_SVGA" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_I915, test "x$HAVE_GALLIUM_I915" = xyes) AM_CONDITIONAL(HAVE_GALLIUM_I915, test "x$HAVE_GALLIUM_I915" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_PL111, test "x$HAVE_GALLIUM_PL111" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_R300, test "x$HAVE_GALLIUM_R300" = xyes) AM_CONDITIONAL(HAVE_GALLIUM_R300, test "x$HAVE_GALLIUM_R300" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_R600, test "x$HAVE_GALLIUM_R600" = xyes) AM_CONDITIONAL(HAVE_GALLIUM_R600, test "x$HAVE_GALLIUM_R600" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_RADEONSI, test "x$HAVE_GALLIUM_RADEONSI" = xyes) AM_CONDITIONAL(HAVE_GALLIUM_RADEONSI, test "x$HAVE_GALLIUM_RADEONSI" = xyes)
@@ -2588,8 +2651,7 @@ AM_CONDITIONAL(HAVE_SWRAST_DRI, test x$HAVE_SWRAST_DRI = xyes)
AM_CONDITIONAL(HAVE_RADEON_VULKAN, test "x$HAVE_RADEON_VULKAN" = xyes) AM_CONDITIONAL(HAVE_RADEON_VULKAN, test "x$HAVE_RADEON_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_INTEL_VULKAN, test "x$HAVE_INTEL_VULKAN" = xyes) AM_CONDITIONAL(HAVE_INTEL_VULKAN, test "x$HAVE_INTEL_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_AMD_DRIVERS, test "x$HAVE_GALLIUM_R600" = xyes -o \ AM_CONDITIONAL(HAVE_AMD_DRIVERS, test "x$HAVE_GALLIUM_RADEONSI" = xyes -o \
"x$HAVE_GALLIUM_RADEONSI" = xyes -o \
"x$HAVE_RADEON_VULKAN" = xyes) "x$HAVE_RADEON_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_INTEL_DRIVERS, test "x$HAVE_INTEL_VULKAN" = xyes -o \ AM_CONDITIONAL(HAVE_INTEL_DRIVERS, test "x$HAVE_INTEL_VULKAN" = xyes -o \
@@ -2612,6 +2674,7 @@ AM_CONDITIONAL(HAVE_COMMON_OSMESA, test "x$enable_osmesa" = xyes -o \
AM_CONDITIONAL(HAVE_X86_ASM, test "x$asm_arch" = xx86 -o "x$asm_arch" = xx86_64) AM_CONDITIONAL(HAVE_X86_ASM, test "x$asm_arch" = xx86 -o "x$asm_arch" = xx86_64)
AM_CONDITIONAL(HAVE_X86_64_ASM, test "x$asm_arch" = xx86_64) AM_CONDITIONAL(HAVE_X86_64_ASM, test "x$asm_arch" = xx86_64)
AM_CONDITIONAL(HAVE_SPARC_ASM, test "x$asm_arch" = xsparc) AM_CONDITIONAL(HAVE_SPARC_ASM, test "x$asm_arch" = xsparc)
AM_CONDITIONAL(HAVE_PPC64LE_ASM, test "x$asm_arch" = xppc64le)
AC_SUBST([NINE_MAJOR], 1) AC_SUBST([NINE_MAJOR], 1)
AC_SUBST([NINE_MINOR], 0) AC_SUBST([NINE_MINOR], 0)
@@ -2698,6 +2761,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/drivers/llvmpipe/Makefile src/gallium/drivers/llvmpipe/Makefile
src/gallium/drivers/noop/Makefile src/gallium/drivers/noop/Makefile
src/gallium/drivers/nouveau/Makefile src/gallium/drivers/nouveau/Makefile
src/gallium/drivers/pl111/Makefile
src/gallium/drivers/r300/Makefile src/gallium/drivers/r300/Makefile
src/gallium/drivers/r600/Makefile src/gallium/drivers/r600/Makefile
src/gallium/drivers/radeon/Makefile src/gallium/drivers/radeon/Makefile
@@ -2743,6 +2807,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/winsys/freedreno/drm/Makefile src/gallium/winsys/freedreno/drm/Makefile
src/gallium/winsys/i915/drm/Makefile src/gallium/winsys/i915/drm/Makefile
src/gallium/winsys/nouveau/drm/Makefile src/gallium/winsys/nouveau/drm/Makefile
src/gallium/winsys/pl111/drm/Makefile
src/gallium/winsys/radeon/drm/Makefile src/gallium/winsys/radeon/drm/Makefile
src/gallium/winsys/amdgpu/drm/Makefile src/gallium/winsys/amdgpu/drm/Makefile
src/gallium/winsys/svga/drm/Makefile src/gallium/winsys/svga/drm/Makefile
@@ -2869,7 +2934,7 @@ else
echo " GBM: no" echo " GBM: no"
fi fi
echo " EGL/Vulkan/VL platforms: $egl_platforms" echo " EGL/Vulkan/VL platforms: $platforms"
# Vulkan # Vulkan
echo "" echo ""
@@ -2917,15 +2982,17 @@ echo " Static libs: $enable_static"
echo " Shared-glapi: $enable_shared_glapi" echo " Shared-glapi: $enable_shared_glapi"
dnl Compiler options dnl Compiler options
# cleanup the CFLAGS/CXXFLAGS/DEFINES vars # cleanup the CFLAGS/CXXFLAGS/LDFLAGS/DEFINES vars
cflags=`echo $CFLAGS | \ cflags=`echo $CFLAGS | \
$SED 's/^ *//;s/ */ /;s/ *$//'` $SED 's/^ *//;s/ */ /;s/ *$//'`
cxxflags=`echo $CXXFLAGS | \ cxxflags=`echo $CXXFLAGS | \
$SED 's/^ *//;s/ */ /;s/ *$//'` $SED 's/^ *//;s/ */ /;s/ *$//'`
ldflags=`echo $LDFLAGS | $SED 's/^ *//;s/ */ /;s/ *$//'`
defines=`echo $DEFINES | $SED 's/^ *//;s/ */ /;s/ *$//'` defines=`echo $DEFINES | $SED 's/^ *//;s/ */ /;s/ *$//'`
echo "" echo ""
echo " CFLAGS: $cflags" echo " CFLAGS: $cflags"
echo " CXXFLAGS: $cxxflags" echo " CXXFLAGS: $cxxflags"
echo " LDFLAGS: $ldflags"
echo " Macros: $defines" echo " Macros: $defines"
echo "" echo ""
if test "x$enable_llvm" = xyes; then if test "x$enable_llvm" = xyes; then

View File

@@ -84,6 +84,7 @@
<li><a href="codingstyle.html" target="_parent">Coding Style</a> <li><a href="codingstyle.html" target="_parent">Coding Style</a>
<li><a href="submittingpatches.html" target="_parent">Submitting patches</a> <li><a href="submittingpatches.html" target="_parent">Submitting patches</a>
<li><a href="releasing.html" target="_parent">Releasing process</a> <li><a href="releasing.html" target="_parent">Releasing process</a>
<li><a href="release-calendar.html" target="_parent">Release calendar</a>
<li><a href="sourcedocs.html" target="_parent">Source Documentation</a> <li><a href="sourcedocs.html" target="_parent">Source Documentation</a>
<li><a href="dispatch.html" target="_parent">GL Dispatch</a> <li><a href="dispatch.html" target="_parent">GL Dispatch</a>
</ul> </ul>

View File

@@ -77,15 +77,13 @@ drivers will be installed to <code>${libdir}/egl</code>.</p>
</dd> </dd>
<dt><code>--with-egl-platforms</code></dt> <dt><code>--with-platforms</code></dt>
<dd> <dd>
<p>List the platforms (window systems) to support. Its argument is a comma <p>List the platforms (window systems) to support. Its argument is a comma
separated string such as <code>--with-egl-platforms=x11,drm</code>. It decides separated string such as <code>--with-platforms=x11,drm</code>. It decides
the platforms a driver may support. The first listed platform is also used by the platforms a driver may support. The first listed platform is also used by
the main library to decide the native platform: this defines EGL native the main library to decide the native platform.</p>
types such as <code>EGLNativeDisplayType</code> or
<code>EGLNativeWindowType</code>.</p>
<p>The available platforms are <code>x11</code>, <code>drm</code>, <p>The available platforms are <code>x11</code>, <code>drm</code>,
<code>wayland</code>, <code>surfaceless</code>, <code>android</code>, <code>wayland</code>, <code>surfaceless</code>, <code>android</code>,
@@ -167,9 +165,9 @@ binaries.</p>
<dd> <dd>
<p>This variable specifies the native platform. The valid values are the same <p>This variable specifies the native platform. The valid values are the same
as those for <code>--with-egl-platforms</code>. When the variable is not set, as those for <code>--with-platforms</code>. When the variable is not set,
the main library uses the first platform listed in the main library uses the first platform listed in
<code>--with-egl-platforms</code> as the native platform.</p> <code>--with-platforms</code> as the native platform.</p>
<p>Extensions like <code>EGL_MESA_drm_display</code> define new functions to <p>Extensions like <code>EGL_MESA_drm_display</code> define new functions to
create displays for non-native platforms. These extensions are usually used by create displays for non-native platforms. These extensions are usually used by

View File

@@ -46,6 +46,9 @@ sometimes be useful for debugging end-user issues.
<li>MESA_NO_MMX - if set, disables Intel MMX optimizations <li>MESA_NO_MMX - if set, disables Intel MMX optimizations
<li>MESA_NO_3DNOW - if set, disables AMD 3DNow! optimizations <li>MESA_NO_3DNOW - if set, disables AMD 3DNow! optimizations
<li>MESA_NO_SSE - if set, disables Intel SSE optimizations <li>MESA_NO_SSE - if set, disables Intel SSE optimizations
<li>MESA_NO_ERROR - if set error checking is disabled as per KHR_no_error.
This will result in undefined behaviour for invalid use of the api, but
can reduce CPU use for apps that are known to be error free.</li>
<li>MESA_DEBUG - if set, error messages are printed to stderr. For example, <li>MESA_DEBUG - if set, error messages are printed to stderr. For example,
if the application generates a GL_INVALID_ENUM error, a corresponding error if the application generates a GL_INVALID_ENUM error, a corresponding error
message indicating where the error occurred, and possibly why, will be message indicating where the error occurred, and possibly why, will be
@@ -160,48 +163,47 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
This is useful for debugging hangs, etc.</li> This is useful for debugging hangs, etc.</li>
<li>INTEL_DEBUG - a comma-separated list of named flags, which do various things: <li>INTEL_DEBUG - a comma-separated list of named flags, which do various things:
<ul> <ul>
<li>color - use color in output</li> <li>ann - annotate IR in assembly dumps</li>
<li>tex - emit messages about textures.</li> <li>aub - dump batches into an AUB trace for use with simulation tools</li>
<li>state - emit messages about state flag tracking</li>
<li>blit - emit messages about blit operations</li>
<li>miptree - emit messages about miptrees</li>
<li>perf - emit messages about performance issues</li>
<li>perfmon - emit messages about AMD_performance_monitor</li>
<li>bat - emit batch information</li> <li>bat - emit batch information</li>
<li>pix - emit messages about pixel operations</li> <li>blit - emit messages about blit operations</li>
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>buf - emit messages about buffer objects</li> <li>buf - emit messages about buffer objects</li>
<li>clip - emit messages about the clip unit (for old gens, includes the CLIP program)</li>
<li>color - use color in output</li>
<li>cs - dump shader assembly for compute shaders</li>
<li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li>
<li>dri - emit messages about the DRI interface</li>
<li>fbo - emit messages about framebuffers</li> <li>fbo - emit messages about framebuffers</li>
<li>fs - dump shader assembly for fragment shaders</li> <li>fs - dump shader assembly for fragment shaders</li>
<li>gs - dump shader assembly for geometry shaders</li> <li>gs - dump shader assembly for geometry shaders</li>
<li>sync - after sending each batch, emit a message and wait for that batch to finish rendering</li> <li>hex - print instruction hex dump with the disassembly</li>
<li>prim - emit messages about drawing primitives</li> <li>l3 - emit messages about the new L3 state during transitions</li>
<li>vert - emit messages about vertex assembly</li> <li>miptree - emit messages about miptrees</li>
<li>dri - emit messages about the DRI interface</li>
<li>sf - emit messages about the strips &amp; fans unit (for old gens, includes the SF program)</li>
<li>stats - enable statistics counters. you probably actually want perfmon or intel_gpu_top instead.</li>
<li>urb - emit messages about URB setup</li>
<li>vs - dump shader assembly for vertex shaders</li>
<li>clip - emit messages about the clip unit (for old gens, includes the CLIP program)</li>
<li>aub - dump batches into an AUB trace for use with simulation tools</li>
<li>shader_time - record how much GPU time is spent in each shader</li>
<li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
<li>ann - annotate IR in assembly dumps</li>
<li>no8 - don't generate SIMD8 fragment shader</li> <li>no8 - don't generate SIMD8 fragment shader</li>
<li>vec4 - force vec4 mode in vertex shader</li> <li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>
<li>nocompact - disable instruction compaction</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>norbc - disable single sampled render buffer compression</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
<li>perf - emit messages about performance issues</li>
<li>perfmon - emit messages about AMD_performance_monitor</li>
<li>pix - emit messages about pixel operations</li>
<li>prim - emit messages about drawing primitives</li>
<li>sf - emit messages about the strips &amp; fans unit (for old gens, includes the SF program)</li>
<li>shader_time - record how much GPU time is spent in each shader</li>
<li>spill_fs - force spilling of all registers in the scalar backend (useful to debug spilling code)</li> <li>spill_fs - force spilling of all registers in the scalar backend (useful to debug spilling code)</li>
<li>spill_vec4 - force spilling of all registers in the vec4 backend (useful to debug spilling code)</li> <li>spill_vec4 - force spilling of all registers in the vec4 backend (useful to debug spilling code)</li>
<li>cs - dump shader assembly for compute shaders</li> <li>state - emit messages about state flag tracking</li>
<li>hex - print instruction hex dump with the disassembly</li> <li>sync - after sending each batch, emit a message and wait for that batch to finish rendering</li>
<li>nocompact - disable instruction compaction</li>
<li>tcs - dump shader assembly for tessellation control shaders</li> <li>tcs - dump shader assembly for tessellation control shaders</li>
<li>tes - dump shader assembly for tessellation evaluation shaders</li> <li>tes - dump shader assembly for tessellation evaluation shaders</li>
<li>l3 - emit messages about the new L3 state during transitions</li> <li>tex - emit messages about textures.</li>
<li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li> <li>urb - emit messages about URB setup</li>
<li>norbc - disable single sampled render buffer compression</li> <li>vert - emit messages about vertex assembly</li>
<li>vs - dump shader assembly for vertex shaders</li>
</ul> </ul>
<li>INTEL_SCALAR_VS (or TCS, TES, GS) - force scalar/vec4 mode for a shader stage (Gen8-9 only)</li>
<li>INTEL_PRECISE_TRIG - if set to 1, true or yes, then the driver prefers <li>INTEL_PRECISE_TRIG - if set to 1, true or yes, then the driver prefers
accuracy over performance in trig functions.</li> accuracy over performance in trig functions.</li>
</ul> </ul>
@@ -302,6 +304,8 @@ See src/mesa/state_tracker/st_debug.c for other options.
(will often result in incorrect rendering). (will often result in incorrect rendering).
<li>SVGA_DEBUG - for dumping shaders, constant buffers, etc. See the code <li>SVGA_DEBUG - for dumping shaders, constant buffers, etc. See the code
for details. for details.
<li>SVGA_EXTRA_LOGGING - if set, enables extra logging to the vmware.log file,
such as the OpenGL program's name and command line arguments.
<li>See the driver code for other, lesser-used variables. <li>See the driver code for other, lesser-used variables.
</ul> </ul>

View File

@@ -277,7 +277,7 @@ GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+
Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES version: Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES version:
GL_ARB_bindless_texture started (airlied) GL_ARB_bindless_texture DONE (radeonsi)
GL_ARB_cl_event not started GL_ARB_cl_event not started
GL_ARB_compute_variable_group_size DONE (nvc0, radeonsi) GL_ARB_compute_variable_group_size DONE (nvc0, radeonsi)
GL_ARB_ES3_2_compatibility DONE (i965/gen8+) GL_ARB_ES3_2_compatibility DONE (i965/gen8+)
@@ -297,7 +297,7 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi) GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (nvc0, radeonsi) GL_ARB_shader_group_vote DONE (nvc0, radeonsi)
GL_ARB_shader_stencil_export DONE (i965/gen9+, radeonsi, softpipe, llvmpipe, swr) GL_ARB_shader_stencil_export DONE (i965/gen9+, radeonsi, softpipe, llvmpipe, swr)
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+, radeonsi) GL_ARB_shader_viewport_layer_array DONE (i965/gen6+, nvc0, radeonsi)
GL_ARB_sparse_buffer DONE (radeonsi/CIK+) GL_ARB_sparse_buffer DONE (radeonsi/CIK+)
GL_ARB_sparse_texture not started GL_ARB_sparse_texture not started
GL_ARB_sparse_texture2 not started GL_ARB_sparse_texture2 not started
@@ -305,9 +305,9 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_texture_filter_minmax not started GL_ARB_texture_filter_minmax not started
GL_ARB_transform_feedback_overflow_query DONE (i965/gen6+) GL_ARB_transform_feedback_overflow_query DONE (i965/gen6+)
GL_KHR_blend_equation_advanced_coherent DONE (i965/gen9+) GL_KHR_blend_equation_advanced_coherent DONE (i965/gen9+)
GL_KHR_no_error not started GL_KHR_no_error started (Timothy Arceri)
GL_KHR_texture_compression_astc_hdr DONE (core only) GL_KHR_texture_compression_astc_hdr DONE (i965/bxt)
GL_KHR_texture_compression_astc_sliced_3d not started GL_KHR_texture_compression_astc_sliced_3d DONE (i965/gen9+)
GL_OES_depth_texture_cube_map DONE (all drivers that support GLSL 1.30+) GL_OES_depth_texture_cube_map DONE (all drivers that support GLSL 1.30+)
GL_OES_EGL_image DONE (all drivers) GL_OES_EGL_image DONE (all drivers)
GL_OES_EGL_image_external_essl3 not started GL_OES_EGL_image_external_essl3 not started

View File

@@ -16,6 +16,59 @@
<h1>News</h1> <h1>News</h1>
<h2>June 19, 2017</h2>
<p>
<a href="relnotes/17.1.3.html">Mesa 17.1.3</a> is released.
This is a bug-fix release.
</p>
<h2>June 5, 2017</h2>
<p>
<a href="relnotes/17.1.2.html">Mesa 17.1.2</a> is released.
This is a bug-fix release.
</p>
<h2>June 1, 2017</h2>
<p>
<a href="relnotes/17.0.7.html">Mesa 17.0.7</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 17.0.7 will be the final release in the 17.0
series. Users of 17.0 are encouraged to migrate to the 17.1 series in order
to obtain future fixes.
</p>
<h2>May 25, 2017</h2>
<p>
<a href="relnotes/17.1.1.html">Mesa 17.1.1</a> is released.
This is a bug-fix release.
</p>
<h2>May 12, 2017</h2>
<p>
<a href="relnotes/17.0.6.html">Mesa 17.0.6</a> is released.
This is a bug-fix release.
</p>
<h2>May 10, 2017</h2>
<p>
<a href="relnotes/17.1.0.html">Mesa 17.1.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>April 28, 2017</h2>
<p>
<a href="relnotes/17.0.5.html">Mesa 17.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>April 17, 2017</h2>
<p>
<a href="relnotes/17.0.4.html">Mesa 17.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>April 1, 2017</h2> <h2>April 1, 2017</h2>
<p> <p>
<a href="relnotes/17.0.3.html">Mesa 17.0.3</a> is released. <a href="relnotes/17.0.3.html">Mesa 17.0.3</a> is released.

106
docs/release-calendar.html Normal file
View File

@@ -0,0 +1,106 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Release calendar</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Overview</h1>
<p>
Mesa provides feature/development and stable releases.
</p>
<p>
The table below lists the date and release manager that is expected to do the
specific release.
<br>
Take a look <a href="submittingpatches.html#criteria" target="_parent">here</a>
if you'd like to nominate a patch in the next stable release.
</p>
<h1 id="calendar">Calendar</h1>
<table border="1">
<tr>
<th>Branch</th>
<th>Expected date</th>
<th>Release</th>
<th>Release manager</th>
<th>Notes</th>
</tr>
<tr>
<td rowspan="5">17.1</td>
<td>2017-06-30</td>
<td>17.1.4</td>
<td>Andres Gomez</td>
<td></td>
</tr>
<tr>
<td>2017-07-14</td>
<td>17.1.5</td>
<td>Andres Gomez</td>
<td></td>
</tr>
<tr>
<td>2017-07-28</td>
<td>17.1.6</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-08-11</td>
<td>17.1.7</td>
<td>Juan A. Suarez Romero</td>
<td></td>
</tr>
<tr>
<td>2017-08-25</td>
<td>17.1.8</td>
<td>Andres Gomez</td>
<td>Final planned release for the 17.1 series</td>
</tr>
<tr>
<td rowspan="5">17.2</td>
<td>2017-07-21</td>
<td>17.2.0-rc1</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-07-28</td>
<td>17.2.0-rc2</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-08-04</td>
<td>17.2.0-rc3</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>2017-08-11</td>
<td>17.2.0-rc4</td>
<td>Emil Velikov</td>
<td>May be promoted to 17.2.0 final</td>
</tr>
<tr>
<td>2017-08-25</td>
<td>17.2.1</td>
<td>Emil Velikov</td>
<td></td>
</table>
</div>
</body>
</html>

View File

@@ -14,6 +14,7 @@
<iframe src="contents.html"></iframe> <iframe src="contents.html"></iframe>
<div class="content"> <div class="content">
<h1>Releasing process</h1> <h1>Releasing process</h1>
<ul> <ul>
@@ -23,11 +24,13 @@
<li><a href="#branch">Making a branchpoint</a> <li><a href="#branch">Making a branchpoint</a>
<li><a href="#prerelease">Pre-release announcement</a> <li><a href="#prerelease">Pre-release announcement</a>
<li><a href="#release">Making a new release</a> <li><a href="#release">Making a new release</a>
<li><a href="#calendar">Update the calendar</a>
<li><a href="#announce">Announce the release</a> <li><a href="#announce">Announce the release</a>
<li><a href="#website">Update the mesa3d.org website</a> <li><a href="#website">Update the mesa3d.org website</a>
<li><a href="#bugzilla">Update Bugzilla</a> <li><a href="#bugzilla">Update Bugzilla</a>
</ul> </ul>
<h1 id="overview">Overview</h1> <h1 id="overview">Overview</h1>
<p> <p>
@@ -48,11 +51,15 @@ For example:
Mesa 12.0.2 - 12.0 branch, bugfix Mesa 12.0.2 - 12.0 branch, bugfix
</pre> </pre>
<h1 id="schedule">Release schedule</h1> <h1 id="schedule">Release schedule</h1>
<p> <p>
Releases should happen on Fridays. Delays can occur although those should be keep Releases should happen on Fridays. Delays can occur although those should be keep
to a minimum. to a minimum.
<br>
See our <a href="release-calendar.html" target="_parent">calendar</a> for the
date and other details for individual releases.
</p> </p>
<h2>Feature releases</h2> <h2>Feature releases</h2>
@@ -79,15 +86,24 @@ The final release from the 12.0 series Mesa 12.0.5 will be out around the same
time (or shortly after) 13.0.1 is out. time (or shortly after) 13.0.1 is out.
</p> </p>
<h1 id="pickntest">Cherry-picking and testing</h1> <h1 id="pickntest">Cherry-picking and testing</h1>
<p> <p>
Commits nominated for the active branch are picked as based on the Commits nominated for the active branch are picked as based on the
<a href="submittingpatches.html#criteria" target="_parent">criteria</a> as <a href="submittingpatches.html#criteria" target="_parent">criteria</a> as
described in the same section. described in the same section.
</p>
<p> <p>
Maintainer is responsible for testing in various possible permutations of Nomination happens in the mesa-stable@ mailing list. However,
maintainer is resposible of checking for forgotten candidates in the
master branch. This is achieved by a combination of ad-hoc scripts and
a casual search for terms such as regression, fix, broken and similar.
</p>
<p>
Maintainer is also responsible for testing in various possible permutations of
the autoconf and scons build. the autoconf and scons build.
</p> </p>
@@ -101,33 +117,57 @@ release. This is made <strong>only</strong> with explicit permission/request,
and the patch <strong>must</strong> be very well contained. Thus it cannot and the patch <strong>must</strong> be very well contained. Thus it cannot
affect more than one driver/subsystem. affect more than one driver/subsystem.
</p> </p>
<p> <p>
Currently Ilia Mirkin and AMD devs have requested "permanent" exception. Currently Ilia Mirkin and AMD devs have requested "permanent" exception.
</p> </p>
<ul> <ul>
<li>make distcheck, scons and scons check must pass <li>make distcheck, scons and scons check must pass
<li>Testing with different version of system components - LLVM and others is also <li>Testing with different version of system components - LLVM and others is also
performed where possible. performed where possible.
<li>As a general rule, testing with various combinations of configure
switches, depending on the specific patchset.
</ul> </ul>
<p> <p>
Achieved by combination of local ad-hoc scripts and AppVeyor plus Travis-CI, Achieved by combination of local ad-hoc scripts, mingw-w64 cross
the latter as part of their Github integration. compilation and AppVeyor plus Travis-CI, the latter as part of their
Github integration.
</p> </p>
<p>
For Windows related changes, the main contact point is Brian
Paul. Jose Fonseca can also help as a fallback contact.
</p>
<p>
For Android related changes, the main contact is Tapani
P&auml;lli. Mauro Rossi is collaborating with android-x86 and may
provide feedback about the build status in that project.
</p>
<p>
For MacOSX related changes, Jeremy Huddleston Sequoia is currently a
good contact point.
</p>
<p> <p>
<strong>Note:</strong> If a patch in the current queue needs any additional <strong>Note:</strong> If a patch in the current queue needs any additional
fix(es), then they should be squashed together. fix(es), then they should be squashed together.
<br> <br>
The commit messages and the <code>cherry picked from</code> tags must be preserved. The commit messages and the <code>cherry picked from</code> tags must be preserved.
</p> </p>
<p> <p>
This should be noted in the <a href="#prerelease">pre-announce</a> email. This should be noted in the <a href="#prerelease">pre-announce</a> email.
</p>
<pre> <pre>
git show b10859ec41d09c57663a258f43fe57c12332698e git show b10859ec41d09c57663a258f43fe57c12332698e
commit b10859ec41d09c57663a258f43fe57c12332698e commit b10859ec41d09c57663a258f43fe57c12332698e
Author: Jonas Pfeil &ltpfeiljonas@gmx.de&gt Author: Jonas Pfeil &lt;pfeiljonas@gmx.de&gt;
Date: Wed Mar 1 18:11:10 2017 +0100 Date: Wed Mar 1 18:11:10 2017 +0100
ralloc: Make sure ralloc() allocations match malloc()'s alignment. ralloc: Make sure ralloc() allocations match malloc()'s alignment.
@@ -146,7 +186,6 @@ This should be noted in the <a href="#prerelease">pre-announce</a> email.
(cherry picked from commit ff494fe999510ea40e3ed5827e7818550b6de126) (cherry picked from commit ff494fe999510ea40e3ed5827e7818550b6de126)
</pre> </pre>
</p>
<h2>Regression/functionality testing</h2> <h2>Regression/functionality testing</h2>
@@ -154,15 +193,23 @@ This should be noted in the <a href="#prerelease">pre-announce</a> email.
Less often (once or twice), shortly before the pre-release announcement. Less often (once or twice), shortly before the pre-release announcement.
Ensure that testing is redone if Intel devs have requested an exception, as per above. Ensure that testing is redone if Intel devs have requested an exception, as per above.
</p> </p>
<ul> <ul>
<li><em>no regressions should be observed for Piglit/dEQP/CTS/Vulkan on Intel platforms</em> <li><em>no regressions should be observed for Piglit/dEQP/CTS/Vulkan on Intel platforms</em>
<li><em>no regressions should be observed for Piglit using the swrast, softpipe <li><em>no regressions should be observed for Piglit using the swrast, softpipe
and llvmpipe drivers</em> and llvmpipe drivers</em>
</ul> </ul>
<p> <p>
Currently testing is performed courtesy of the Intel OTC team and their Jenkins CI setup. Check with the Intel team over IRC how to get things setup. Currently testing is performed courtesy of the Intel OTC team and their Jenkins CI setup. Check with the Intel team over IRC how to get things setup.
</p> </p>
<p>
Installing the built driver from the pre-announced RC branch in the
system and making some every day's use until the release may be a good
idea too.
</p>
<h1 id="branch">Making a branchpoint</h1> <h1 id="branch">Making a branchpoint</h1>
@@ -202,15 +249,18 @@ To setup the branchpoint:
Now go to Now go to
<a href="https://bugs.freedesktop.org/editversions.cgi?action=add&amp;product=Mesa" target="_parent">Bugzilla</a> and add the new Mesa version X.Y. <a href="https://bugs.freedesktop.org/editversions.cgi?action=add&amp;product=Mesa" target="_parent">Bugzilla</a> and add the new Mesa version X.Y.
</p> </p>
<p> <p>
Check that there are no distribution breaking changes and revert them if needed. Check that there are no distribution breaking changes and revert them if needed.
For example: files being overwritten on install, etc. Happens extremely rarely - For example: files being overwritten on install, etc. Happens extremely rarely -
we had only one case so far (see commit 2ced8eb136528914e1bf4e000dea06a9d53c7e04). we had only one case so far (see commit 2ced8eb136528914e1bf4e000dea06a9d53c7e04).
</p> </p>
<p> <p>
Proceed to <a href="#release">release</a> -rc1. Proceed to <a href="#release">release</a> -rc1.
</p> </p>
<h1 id="prerelease">Pre-release announcement</h1> <h1 id="prerelease">Pre-release announcement</h1>
<p> <p>
@@ -224,18 +274,22 @@ release is made.
</p> </p>
<h2>Terminology used</h2> <h2>Terminology used</h2>
<ul><li>Nominated</ul> <ul><li>Nominated</ul>
<p> <p>
Patch that is nominated but yet to to merged in the patch queue/branch. Patch that is nominated but yet to to merged in the patch queue/branch.
</p> </p>
<ul><li>Queued</ul> <ul><li>Queued</ul>
<p> <p>
Patch is in the queue/branch and will feature in the next release. Patch is in the queue/branch and will feature in the next release.
Barring reported regressions or objections from developers. Barring reported regressions or objections from developers.
</p> </p>
<ul><li>Rejected</ul> <ul><li>Rejected</ul>
<p> <p>
Patch does not fit the Patch does not fit the
<a href="submittingpatches.html#criteria" target="_parent">criteria</a> and <a href="submittingpatches.html#criteria" target="_parent">criteria</a> and
@@ -341,6 +395,7 @@ AUTHOR (NUMBER):
Reason: ... Reason: ...
</pre> </pre>
<h1 id="release">Making a new release</h1> <h1 id="release">Making a new release</h1>
<p> <p>
@@ -348,18 +403,21 @@ These are the instructions for making a new Mesa release.
</p> </p>
<h3>Get latest source files</h3> <h3>Get latest source files</h3>
<p> <p>
Ensure the latest code is available - both in your local master and the Ensure the latest code is available - both in your local master and the
relevant branch. relevant branch.
</p> </p>
<h3>Perform basic testing</h3> <h3>Perform basic testing</h3>
<p> <p>
Most of the testing should already be done during the Most of the testing should already be done during the
<a href="#pickntest">cherry-pick</a> and <a href="#pickntest">cherry-pick</a> and
<a href="#prerelease">pre-announce</a> stages. <a href="#prerelease">pre-announce</a> stages.
So we do a quick 'touch test' So we do a quick 'touch test'
</p>
<ul> <ul>
<li>make distcheck (you can omit this if you're not using --dist below) <li>make distcheck (you can omit this if you're not using --dist below)
<li>scons (from release tarball) <li>scons (from release tarball)
@@ -402,7 +460,7 @@ Here is one solution that I've been using.
--enable-glx-tls \ --enable-glx-tls \
--enable-gbm \ --enable-gbm \
--enable-egl \ --enable-egl \
--with-egl-platforms=x11,drm,wayland --with-platforms=x11,drm,wayland,surfaceless
make -j2 &amp;&amp; DESTDIR=`pwd`/test make -j6 install make -j2 &amp;&amp; DESTDIR=`pwd`/test make -j6 install
__glxinfo_cmd='glxinfo 2>&amp;1 | egrep -o "Mesa.*|Gallium.*|.*dri\.so"' __glxinfo_cmd='glxinfo 2>&amp;1 | egrep -o "Mesa.*|Gallium.*|.*dri\.so"'
__glxgears_cmd='glxgears 2>&amp;1 | grep -v "configuration file"' __glxgears_cmd='glxgears 2>&amp;1 | grep -v "configuration file"'
@@ -452,6 +510,7 @@ be empty (TBD) at this point.
<p> <p>
Two scripts are available to help generate portions of the release notes: Two scripts are available to help generate portions of the release notes:
</p>
<pre> <pre>
./bin/bugzilla_mesa.sh ./bin/bugzilla_mesa.sh
@@ -468,6 +527,7 @@ to be included in the release notes.
<p> <p>
Commit these changes and push the branch. Commit these changes and push the branch.
</p> </p>
<pre> <pre>
git push origin HEAD git push origin HEAD
</pre> </pre>
@@ -478,6 +538,7 @@ Commit these changes and push the branch.
<p> <p>
Start the release process. Start the release process.
</p> </p>
<pre> <pre>
../relative/path/to/release.sh . # append --dist if you've already done distcheck above ../relative/path/to/release.sh . # append --dist if you've already done distcheck above
</pre> </pre>
@@ -515,7 +576,15 @@ docs/index.html to add a news entry. Then commit and push:
</pre> </pre>
<h1 id="calendar">Update the calendar</h1>
<p>
Remove the version from the <a href="release-calendar.html" target="_parent">calendar</a>.
</p>
<h1 id="announce">Announce the release</h1> <h1 id="announce">Announce the release</h1>
<p> <p>
Use the generated template during the releasing process. Use the generated template during the releasing process.
</p> </p>
@@ -528,6 +597,7 @@ As the hosting was moved to freedesktop, git hooks are deployed to update the
website. Manually check that it is updated 5-10 minutes after the final <code>git push</code> website. Manually check that it is updated 5-10 minutes after the final <code>git push</code>
</p> </p>
<h1 id="bugzilla">Update Bugzilla</h1> <h1 id="bugzilla">Update Bugzilla</h1>
<p> <p>

View File

@@ -21,6 +21,14 @@ The release notes summarize what's new or changed in each Mesa release.
</p> </p>
<ul> <ul>
<li><a href="relnotes/17.1.3.html">17.1.3 release notes</a>
<li><a href="relnotes/17.1.2.html">17.1.2 release notes</a>
<li><a href="relnotes/17.0.7.html">17.0.7 release notes</a>
<li><a href="relnotes/17.1.1.html">17.1.1 release notes</a>
<li><a href="relnotes/17.0.6.html">17.0.6 release notes</a>
<li><a href="relnotes/17.1.0.html">17.1.0 release notes</a>
<li><a href="relnotes/17.0.5.html">17.0.5 release notes</a>
<li><a href="relnotes/17.0.4.html">17.0.4 release notes</a>
<li><a href="relnotes/17.0.3.html">17.0.3 release notes</a> <li><a href="relnotes/17.0.3.html">17.0.3 release notes</a>
<li><a href="relnotes/17.0.2.html">17.0.2 release notes</a> <li><a href="relnotes/17.0.2.html">17.0.2 release notes</a>
<li><a href="relnotes/13.0.6.html">13.0.6 release notes</a> <li><a href="relnotes/13.0.6.html">13.0.6 release notes</a>

156
docs/relnotes/17.0.4.html Normal file
View File

@@ -0,0 +1,156 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.4 Release Notes / April 17, 2017</h1>
<p>
Mesa 17.0.4 is a bug fix release which fixes bugs found since the 17.0.3 release.
</p>
<p>
Mesa 17.0.4 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
c4c34ba05d48f76b45bc05bc4b6e9242077f403d63c4f0c355c7b07786de233e mesa-17.0.4.tar.gz
1269dc8545a193932a0779b2db5bce9be4a5f6813b98c38b93b372be8362a346 mesa-17.0.4.tar.xz
</pre>
<h2>Next release</h2>
<p>
Mesa 17.0.5 is expected in approximatelly two weeks. See the release
<a href="../release-calendar.html#calendar" target="_parent">calendar</a>
for details.
</p>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99515">Bug 99515</a> - SIGSEGV MAPERR on Android nougat-x86 with mesa 17.0.0rc</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100391">Bug 100391</a> - SachaWillems deferredmultisampling asserts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100452">Bug 100452</a> - push_constants host memory leak when resetting command buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100582">Bug 100582</a> - [GEN8+] piglit.spec.arb_stencil_texturing.glblitframebuffer corrupts state.gl_texture* assertions</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: add new polaris10 pci id</li>
</ul>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Invalidate L2 for TRANSFER_WRITE barriers</li>
</ul>
<p>Andres Gomez (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.3</li>
</ul>
<p>Craig Stout (1):</p>
<ul>
<li>anv/cmd_buffer: fix host memory leak</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>Revert "cherry-ignore: add the Flush after unmap in gbm/dri fix"</li>
<li>Revert "freedreno: fix memory leak"</li>
<li>Update version to 17.0.4</li>
</ul>
<p>Fabio Estevam (1):</p>
<ul>
<li>loader: Move non-error message to debug level</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>nvc0/ir: fix LSB/BFE/BFI implementations</li>
<li>nvc0/ir: fix overwriting of offset register with interpolateAtOffset</li>
<li>nvc0: increase texture buffer object alignment to 256 for pre-GM107</li>
<li>nouveau: when mapping a persistent buffer, synchronize on former xfers</li>
</ul>
<p>Jason Ekstrand (5):</p>
<ul>
<li>i965/fs: Always provide a default LOD of 0 for TXS and TXL</li>
<li>anv/pipeline: Properly handle unset gl_Layer and gl_ViewportIndex</li>
<li>anv/blorp: Align vertex buffers to 64B</li>
<li>i965/blorp: Align vertex buffers to 64B</li>
<li>i965/blorp: Bump the batch space estimate</li>
</ul>
<p>Jerome Duval (2):</p>
<ul>
<li>haiku: build fixes around debug defines</li>
<li>haiku/winsys: fix dt prototype args</li>
</ul>
<p>Julien Isorce (4):</p>
<ul>
<li>winsys/radeon: check null in radeon_cs_create_fence</li>
<li>winsys/radeon: check null return from radeon_cs_create_fence in cs_flush</li>
<li>radeon: initialize hole variable before calling container_of</li>
<li>radeon_drm_bo: explicitly check return value of drmCommandWriteRead</li>
</ul>
<p>Kenneth Graunke (4):</p>
<ul>
<li>i965: Document the sad story of the kernel command parser.</li>
<li>i965: Set screen-&gt;cmd_parser_version to 0 if we can't write registers.</li>
<li>i965: Skip register write detection when possible.</li>
<li>i965: Set kernel features before computing max GL version.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>targets: export radeon winsys_create functions to silence LLVM warning</li>
</ul>
<p>Michal Srb (1):</p>
<ul>
<li>st: Add cubeMapFace parameter to st_finalize_texture.</li>
</ul>
<p>Thomas Hellstrom (1):</p>
<ul>
<li>gbm/dri: Flush after unmap</li>
</ul>
</div>
</body>
</html>

144
docs/relnotes/17.0.5.html Normal file
View File

@@ -0,0 +1,144 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.5 Release Notes / April 28, 2017</h1>
<p>
Mesa 17.0.5 is a bug fix release which fixes bugs found since the 17.0.4 release.
</p>
<p>
Mesa 17.0.5 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
7510eee0d0077860b250d30d73305048c2df4ba09ea8fc04e4f3eec7beece301 mesa-17.0.5.tar.gz
668efa445d2f57a26e5c096b1965a685733a3b57d9c736f9d6460263847f9bfe mesa-17.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (16):</p>
<ul>
<li>cherry-ignore: Add the pci_id into the shader cache UUID</li>
<li>cherry-ignore: fix crash if ctx torn down with no rendering</li>
<li>cherry-ignore: Fix typos.</li>
<li>cherry-ignore: Revert "etnaviv: Cannot render to rb-swapped formats"</li>
<li>cherry-ignore: Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."</li>
<li>cherry-ignore: fix typo in a2b10g10r10 fast clear calculation</li>
<li>cherry-ignore: remove unused anv_dispatch_table dtable</li>
<li>cherry-ignore: remove unused radv_dispatch_table dtable</li>
<li>cherry-ignore: make radv_resolve_entrypoint static</li>
<li>cherry-ignore: vulkan: add support for libmesa_vulkan_util</li>
<li>cherry-ignore: r600: fix libmesa_amd_common dependency</li>
<li>cherry-ignore: remove dead brw_new_shader() declaration</li>
<li>cherry-ignore: remove i965_symbols_test reference from .gitignore</li>
<li>cherry-ignore: automake: ensure that the destination directory is created</li>
<li>cherry-ignore: provide required gem stubs for the tests</li>
<li>Update version to 17.0.5</li>
</ul>
<p>Boyan Ding (2):</p>
<ul>
<li>nvc0/ir: Properly handle a "split form" of predicate destination</li>
<li>nir: Destination component count of shader_clock intrinsic is 2</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.4</li>
<li>winsys/sw/dri: don't use GNU void pointer arithmetic</li>
<li>st/clover: add space between &lt; and ::</li>
<li>configure.ac: check require_basic_egl only if egl enabled</li>
<li>st/mesa: automake: honour the vdpau header install location</li>
</ul>
<p>Francisco Jerez (2):</p>
<ul>
<li>intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.</li>
<li>intel/fs: Take into account amount of data read in spilling cost heuristic.</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>radv: report timestampPeriod correctly</li>
</ul>
<p>Jason Ekstrand (5):</p>
<ul>
<li>anv/blorp: Flush the texture cache in UpdateBuffer</li>
<li>anv/cmd_buffer: Flush the VF cache at the top of all primaries</li>
<li>anv/cmd_buffer: Always set up a null surface state</li>
<li>anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED</li>
<li>anv/blorp: Properly handle VK_ATTACHMENT_UNUSED</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: invalidate the readpix cache in st_indirect_draw_vbo</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>anv/cmd_buffer: Disable CCS on BDW input attachments</li>
</ul>
<p>Nicolai Hähnle (4):</p>
<ul>
<li>mesa: fix remaining xfb prims check for GLES with multiple instances</li>
<li>mesa: extract need_xfb_remaining_prims_check</li>
<li>mesa: move glMultiDrawArrays to vbo and fix error handling</li>
<li>vbo: fix gl_DrawID handling in glMultiDrawArrays</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>util/queue: don't hang at exit</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>mesa: validate sampler type across the whole program</li>
</ul>
</div>
</body>
</html>

186
docs/relnotes/17.0.6.html Normal file
View File

@@ -0,0 +1,186 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.6 Release Notes / May 12, 2017</h1>
<p>
Mesa 17.0.6 is a bug fix release which fixes bugs found since the 17.0.5 release.
</p>
<p>
Mesa 17.0.6 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
f1b2497d553e9a584f0caa3a2d9d310e27ead15fb0af170da69f6e70fb5031cd mesa-17.0.6.tar.gz
89ecf3bcd0f18dcca5aaa42bf36bb52a2df33be89889f94aaaad91f7a504a69d mesa-17.0.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98428">Bug 98428</a> - Undefined non-weak-symbol in dri-drivers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100854">Bug 100854</a> - YUV to RGB Color Space Conversion result is not precise</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>egl/platform/drm: Don't take display ownership until gbm is initialized</li>
</ul>
<p>Andres Gomez (7):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.5</li>
<li>travis: replace Trusty-based LLVM toolchain apt-get with apt addon</li>
<li>travis: add the possibility of using the txc-dxtn library</li>
<li>cherry-ignore: 17.1 nominations only</li>
<li>cherry-ignore: fix regression in descriptor set freeing.</li>
<li>cherry-ignore: rejected commits</li>
<li>Update version to 17.0.6</li>
</ul>
<p>Ben Boeckel (1):</p>
<ul>
<li>scons: update for LLVM 4.0</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: move duplicated st_ws_framebuffer() function into header file</li>
</ul>
<p>Chad Versace (3):</p>
<ul>
<li>egl: Emit error when EGLSurface is lost</li>
<li>egl/android: Cancel any outstanding ANativeBuffer in surface destructor</li>
<li>egl/android: Mark surface as lost when dequeueBuffer fails</li>
</ul>
<p>Christian Gmeiner (1):</p>
<ul>
<li>etnaviv: add L8A8_UNORM texture format</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>radv/wsi: report presentation error per image request</li>
<li>radv: enable POLARIS12 support.</li>
</ul>
<p>Emil Velikov (21):</p>
<ul>
<li>travis: correct libdrm required regex to also track libdrm itself</li>
<li>travis: add nearly all gallium drivers to the list</li>
<li>travis: use both cores for make/make check</li>
<li>travis: bring the scons build on par with AppVeyor</li>
<li>travis: explicitly LD_LIBRARY_PATH the local libraries</li>
<li>travis: enable apt cache</li>
<li>travis: automatically manage ccache caching</li>
<li>travis: remove unused -dev packages</li>
<li>travis: rework "if test" blocks in the script section</li>
<li>travis: split out matrix from env</li>
<li>travis: add separate "scons" and "scons llvm" targets</li>
<li>travis: add "scons swr" to the build matrix</li>
<li>travis: add "make swr" to the build matrix</li>
<li>travis: split the make target to three separate ones</li>
<li>travis: model scons check target like the make one</li>
<li>travis: add Gallium state-tracker targets</li>
<li>travis: enable wayland support</li>
<li>travis: bump MAKEFLAGS to -j4</li>
<li>gallium/dri: always link against shared glapi</li>
<li>mesa/dri: always link against shared glapi</li>
<li>glx: glX_proto_send.py: use correct compile guard GLX_INDIRECT_RENDERING</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>nir: Pick just the channels we want for bitmap and drawpixels lowering.</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>gallium/targets: fix bool setting on BE architectures</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>anv/cmd_buffer: Use the device allocator for QueueSubmit</li>
</ul>
<p>Johnson Lin (1):</p>
<ul>
<li>nir/lower_tex: Fix minor error in YUV color conversion matrix</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>radeonsi: adjust ESGS ring buffer size computation on VI</li>
<li>radeonsi: apply the tess+GS hang workaround to Polaris12 as well</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>radeonsi: fix gl_PrimitiveID in tessellation with instanced draws on SI</li>
</ul>
<p>Philipp Zabel (3):</p>
<ul>
<li>renderonly: close transfer prime_fd</li>
<li>renderonly: drop resources on destroy</li>
<li>renderonly: use drmIoctl</li>
</ul>
<p>Rhys Kidd (3):</p>
<ul>
<li>travis: Support LLVM 3.8+ on Trusty-based Travis-CI via apt-get not apt addon</li>
<li>travis: Add radv vulkan driver to continuous integration</li>
<li>travis: Add radeonsi to continuous integration</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno/a3xx: fix hang w/ large render targets and small gmem</li>
</ul>
<p>Samuel Iglesias Gonsálvez (5):</p>
<ul>
<li>i965/vec4: fix vertical stride to avoid breaking region parameter rule</li>
<li>i965/vec4: fix register width for DF VGRF and UNIFORM</li>
<li>i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions</li>
<li>anv: anv_gem_mmap() returns MAP_FAILED as mapping error</li>
<li>anv: vkBindImageMemory() should return VK_ERROR_OUT_OF_{HOST,DEVICE}_MEMORY on failure</li>
</ul>
</div>
</body>
</html>

145
docs/relnotes/17.0.7.html Normal file
View File

@@ -0,0 +1,145 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.7 Release Notes / June 1, 2017</h1>
<p>
Mesa 17.0.7 is a bug fix release which fixes bugs found since the 17.0.6 release.
</p>
<p>
Mesa 17.0.7 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
bc68d13c6b1a053b855ac453ebf7e62bd89511adf44bad6c613e09f7fa13390a mesa-17.0.7.tar.gz
f6d75304a229c8d10443e219d6b6c0c342567dbab5a879ebe7cfa3c9139c4492 mesa-17.0.7.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98833">Bug 98833</a> - [REGRESSION, bisected] Wayland revert commit breaks non-Vsync fullscreen frame updates</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100741">Bug 100741</a> - Chromium - Memory leak</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.6</li>
</ul>
<p>Bartosz Tomczyk (1):</p>
<ul>
<li>mesa: Avoid leaking surface in st_renderbuffer_delete</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>egl: Partially revert 23c86c74, fix eglMakeCurrent</li>
</ul>
<p>Daniel Stone (7):</p>
<ul>
<li>vulkan: Fix Wayland uninitialised registry</li>
<li>vulkan/wsi/wayland: Remove roundtrip when creating image</li>
<li>vulkan/wsi/wayland: Use per-display event queue</li>
<li>vulkan/wsi/wayland: Use proxy wrappers for swapchain</li>
<li>egl/wayland: Don't open-code roundtrip</li>
<li>egl/wayland: Use per-surface event queues</li>
<li>egl/wayland: Ensure we get a back buffer</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>st/va: fix misplaced closing bracket</li>
<li>anv: automake: list shared libraries after the static ones</li>
<li>radv: automake: list shared libraries after the static ones</li>
<li>egl/wayland: select the format based on the interface used</li>
<li>Update version to 17.0.7</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>renderonly: Initialize fields of struct winsys_handle.</li>
<li>vc4: Don't allocate new BOs to avoid synchronization when they're shared.</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>glxglvnddispatch: Add missing dispatch for GetDriverConfig</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nvc0/ir: SHLADD's middle source must be an immediate</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>i965/blorp: Do and end-of-pipe sync on both sides of fast-clear ops</li>
<li>i965: Round copy size to the nearest block in intel_miptree_copy</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>etnaviv: stop oversizing buffer resources</li>
</ul>
<p>Nanley Chery (2):</p>
<ul>
<li>anv/formats: Update the three-channel BC1 mappings</li>
<li>i965/formats: Update the three-channel DXT1 mappings</li>
</ul>
<p>Pohjolainen, Topi (1):</p>
<ul>
<li>intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4</li>
</ul>
<p>Samuel Iglesias Gonsálvez (3):</p>
<ul>
<li>i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution</li>
<li>i965/vec4: fix swizzle and writemask when loading an uniform with constant offset</li>
<li>i965/vec4: load dvec3/4 uniforms first in the push constant buffer</li>
</ul>
<p>Tom Stellard (1):</p>
<ul>
<li>gallivm: Make sure module has the correct data layout when pass manager runs</li>
</ul>
</div>
</body>
</html>

View File

@@ -14,12 +14,13 @@
<iframe src="../contents.html"></iframe> <iframe src="../contents.html"></iframe>
<div class="content"> <div class="content">
<h1>Mesa 17.1.0 Release Notes / TBD</h1> <h1>Mesa 17.1.0 Release Notes / May 10, 2017</h1>
<p> <p>
Mesa 17.1.0 is a new development release. Mesa 17.1.0 is a new development release.
People who are concerned with stability and reliability should stick People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 17.1.1. with a previous release or wait for
<a href="../release-calendar.html#calendar" target="_parent">Mesa 17.1.1</a>.
</p> </p>
<p> <p>
Mesa 17.1.0 implements the OpenGL 4.5 API, but the version reported by Mesa 17.1.0 implements the OpenGL 4.5 API, but the version reported by
@@ -33,7 +34,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2> <h2>SHA256 checksums</h2>
<pre> <pre>
TBD. c388069581a72853161657ac365f2c083afabd7cffd53f80513dacfa1cfa58a8 mesa-17.1.0.tar.gz
cf234a6ed4764673886b6661553b54675776ef0898f774716173cec890ac3b17 mesa-17.1.0.tar.xz
</pre> </pre>
@@ -63,6 +65,147 @@ Note: some of the new features are only available with certain drivers.
<h2>Bug fixes</h2> <h2>Bug fixes</h2>
<ul> <ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84325">Bug 84325</a> - X.Org segfaults when starting DE on an Intel+Radeon laptop, caused by libpciaccess cleanup, patch attached</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93089">Bug 93089</a> - mesa fails to check for gcc atomic primitives before using them</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95460">Bug 95460</a> - Please add more drivers (freedreno, virgl) to features.txt status document</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96743">Bug 96743</a> - [BYT, HSW, SKL, BXT, KBL] GPU hangs with GfxBench 4.0 CarChase</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97102">Bug 97102</a> - [dri][swr] stack overflow / infinite loop with GALLIUM_DRIVER=swr</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97338">Bug 97338</a> - Black squares in the Spec Ops: The Line chapter select screen</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97967">Bug 97967</a> - glsl/tests/cache-test regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97988">Bug 97988</a> - [radeonsi] playing back videos with VDPAU exhibits deinterlacing/anti-aliasing issues not visible with VA-API</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with &quot;Fatal error: Cannot set display mode.&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98428">Bug 98428</a> - Undefined non-weak-symbol in dri-drivers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98502">Bug 98502</a> - Delay when starting firefox, thunderbird or chromium and dmesg spam</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98869">Bug 98869</a> - Electronic Super Joy graphic artefacts (regression,bisected)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98975">Bug 98975</a> - Wasteland 2 Directors Cut: Hangs. GPU fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99010">Bug 99010</a> - --disable-gallium-llvm no longer recognized</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99246">Bug 99246</a> - [d3dadapter+radeonsi &amp; bisect] EVE-Online : hang on wormhole sight</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99265">Bug 99265</a> - i965: Piglit egl_khr_gl_renderbuffer_image-clear-shared-image fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99339">Bug 99339</a> - Blender line rendering broken after removing XY clipping of lines</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99401">Bug 99401</a> - [g33] regression: piglit.spec.!opengl 1_0.gl-1_0-beginend-coverage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99450">Bug 99450</a> - [amdgpu] Payday 2 visual glitches on some models</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99451">Bug 99451</a> - polygon offset use after free</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99465">Bug 99465</a> - vtn_vector_construct writing out of bounds when given multiple non-zero length sources</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99484">Bug 99484</a> - Crusader Kings 2 - Loading bars, siege bars, morale bars, etc. do not render correctly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99532">Bug 99532</a> - Compute shader doesn't give right result under some circumstances</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99542">Bug 99542</a> - vdpau logging errors since gallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: const struct API_STATE has no member named linkageCount</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99660">Bug 99660</a> - Not all of the int64 conversion opcodes got implemented</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99677">Bug 99677</a> - heap-use-after-free in glsl</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99692">Bug 99692</a> - [radv] Mostly broken on Hawaii PRO/CIK ASICs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99701">Bug 99701</a> - loader.c:353:8: error: implicit declaration of function 'geteuid' is invalid in C99 [-Werror,-Wimplicit-function-declaration]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: &quot;Note: Buggy applications may crash, if they do please report to vendor&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99789">Bug 99789</a> - Memory leak on failure to create an ir_constant in calculate_iterations in loop_controls.cpp</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99817">Bug 99817</a> - [softpipe] piglit glsl-fs-tan-1 regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99842">Bug 99842</a> - GL_ARB_transform_feedback2 on i965 gen6</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99850">Bug 99850</a> - Tessellation bug on Carrizo</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99918">Bug 99918</a> - disk_cache.h:57:20: error: no member named 'st_mtim' in 'struct stat'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99953">Bug 99953</a> - device9.c:122:49: error: PIPE_CAP_USER_INDEX_BUFFERS undeclared (first use in this function)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99955">Bug 99955</a> - [r600g] GPU load always displayed at 100% with GALLIUM_HUD=GPU-load</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100026">Bug 100026</a> - piglit.spec.arb_shader_subroutine.compiler.direct-call_vert regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - &quot;ralloc: Make sure ralloc() allocations match malloc()'s alignment.&quot; causes seg fault in 32bit build</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100060">Bug 100060</a> - wsi/wsi_common_wayland.c:25:41: fatal error: wayland-drm-client-protocol.h: No such file or directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100061">Bug 100061</a> - LODQ instruction generated with invalid dst mask</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100068">Bug 100068</a> - LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.buffer.load.format</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100088">Bug 100088</a> - piglit.spec.arb_get_texture_sub_image.arb_get_texture_sub_image regressions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100091">Bug 100091</a> - Failure to create folder for on-disk shader cache</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100133">Bug 100133</a> - swr_context.cpp:336:44: error: invalid conversion from uint {aka unsigned int} to pipe_render_cond_flag [-fpermissive]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100154">Bug 100154</a> - test_eu_compact regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100180">Bug 100180</a> - Build failure in GNOME Continuous</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100182">Bug 100182</a> - Flickering in The Talos Principle on Sky Lake GT4.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100201">Bug 100201</a> - Windows scons build with MSVC toolchain and LLVM 4.0 fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100223">Bug 100223</a> - marshal_generated.c:38:10: fatal error: 'X11/Xlib-xcb.h' file not found</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100236">Bug 100236</a> - Undefined symbols for architecture x86_64: &quot;typeinfo for llvm::RTDyldMemoryManager&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100259">Bug 100259</a> - [EGL] [GBM] undefined reference to `gbm_bo_create_with_modifiers'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100288">Bug 100288</a> - clover unable to run OpenCL kernels since 03127bb radeonsi: compile all TGSI compute shaders asynchronously</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100303">Bug 100303</a> - Adding a single, meaningless if-else to a shader source leads to different image</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100391">Bug 100391</a> - SachaWillems deferredmultisampling asserts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100452">Bug 100452</a> - push_constants host memory leak when resetting command buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100531">Bug 100531</a> - [regression] Broken graphics in several games</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100562">Bug 100562</a> - u_debug_stack.c:59: undefined reference to `_Ux86_64_getcontext'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100569">Bug 100569</a> - core/resource.cpp:36:33: error: non-constant-expression cannot be narrowed from type 'int' to 'int16_t' (aka 'short') in initializer list [-Wc++11-narrowing]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100574">Bug 100574</a> - anv_device.c:189: undefined reference to `anv_gem_supports_48b_addresses'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100582">Bug 100582</a> - [GEN8+] piglit.spec.arb_stencil_texturing.glblitframebuffer corrupts state.gl_texture* assertions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100600">Bug 100600</a> - anv_device.c:1337: undefined reference to `anv_gem_busy'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100620">Bug 100620</a> - [SKL] 48-bit addresses break DOOM</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100663">Bug 100663</a> - commit 61e47d92c5196 breaks RS780</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100690">Bug 100690</a> - [Regression, bisected] TotalWar: Warhammer corrupted graphics</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100892">Bug 100892</a> - Polaris 12: winsys init bad switch (missing break) initializing addrlib</li>
</ul> </ul>
<h2>Changes</h2> <h2>Changes</h2>

188
docs/relnotes/17.1.1.html Normal file
View File

@@ -0,0 +1,188 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.1.1 Release Notes / March 25, 2017</h1>
<p>
Mesa 17.1.1 is a bug fix release which fixes bugs found since the 17.1.0 release.
</p>
<p>
Mesa 17.1.1 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
652315af87f2bb015ce99ee3b90d9d115d53cbf9e052493bd13d521a753b1930 mesa-17.1.1.tar.gz
aed503f94c0c1630a162a3e276f4ee12a86764cee4cb92338ea2dea99a04e7ef mesa-17.1.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100854">Bug 100854</a> - YUV to RGB Color Space Conversion result is not precise</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: add new vega10 pci ids</li>
</ul>
<p>Andres Gomez (2):</p>
<ul>
<li>bin/get-fixes-pick-list.sh: don't warn if more than one, go over them</li>
<li>bin/get-fixes-pick-list.sh: bring back the warning</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>swr: move msaa resolve to generalized StoreTile</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>egl: Partially revert 23c86c74, fix eglMakeCurrent</li>
</ul>
<p>Chih-Wei Huang (1):</p>
<ul>
<li>Android: correct libz dependency</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>gbm/dri: Fix sign-extension in modifier query</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: add sha256 checksums for 17.1.0</li>
<li>radeon: automake: remove unneeded elf Cflags/Libs</li>
<li>configure: remove unneeded bits around libunwind handling</li>
<li>egl: add g_egldispatchstubs.h to the release tarball</li>
<li>automake: add SWR LLVM gen_builder.hpp workaround</li>
<li>Update version to 17.1.1</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>renderonly: Initialize fields of struct winsys_handle.</li>
<li>vc4: Don't allocate new BOs to avoid synchronization when they're shared.</li>
</ul>
<p>Grazvydas Ignotas (2):</p>
<ul>
<li>anv: fix possible stack corruption</li>
<li>anv: don't leak DRM devices</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>glxglvnddispatch: Add missing dispatch for GetDriverConfig</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nvc0/ir: SHLADD's middle source must be an immediate</li>
</ul>
<p>Johnson Lin (1):</p>
<ul>
<li>nir/lower_tex: Fix minor error in YUV color conversion matrix</li>
</ul>
<p>Juan A. Suarez Romero (2):</p>
<ul>
<li>bin/get-{extra,fixes}-pick-list.sh: add support for ignore list</li>
<li>bin/get-{extra,fixes}-pick-list.sh: improve output</li>
</ul>
<p>Lucas Stach (2):</p>
<ul>
<li>etnaviv: stop oversizing buffer resources</li>
<li>etnaviv: allow R/B swapped surfaces to be cleared</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>amd/addrlib: import Raven support</li>
<li>radeonsi/gfx9: add support for Raven</li>
</ul>
<p>Nanley Chery (2):</p>
<ul>
<li>anv/formats: Update the three-channel BC1 mappings</li>
<li>i965/formats: Update the three-channel DXT1 mappings</li>
</ul>
<p>Nicolai Hähnle (5):</p>
<ul>
<li>radeonsi: mark fast-cleared textures as compressed when dirtying</li>
<li>radeonsi: fix primitive ID in fragment shader when using tessellation</li>
<li>radeonsi: fix gl_PrimitiveID in tessellation with instanced draws on SI</li>
<li>radeonsi: fix gl_PrimitiveIDIn in geometry shader when using tessellation</li>
<li>st/mesa: remove an incorrect assertion</li>
</ul>
<p>Pohjolainen, Topi (1):</p>
<ul>
<li>intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4</li>
</ul>
<p>Rob Clark (2):</p>
<ul>
<li>mesa/st: fix yuv EGLImage's</li>
<li>freedreno: fix crash when flush() but no rendering</li>
</ul>
<p>Rob Herring (1):</p>
<ul>
<li>virgl: fix virgl_bo_transfer_{put, get} box struct copy</li>
</ul>
<p>Samuel Iglesias Gonsálvez (3):</p>
<ul>
<li>i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution</li>
<li>i965/vec4: fix swizzle and writemask when loading an uniform with constant offset</li>
<li>i965/vec4: load dvec3/4 uniforms first in the push constant buffer</li>
</ul>
<p>Tom Stellard (1):</p>
<ul>
<li>gallivm: Make sure module has the correct data layout when pass manager runs</li>
</ul>
</div>
</body>
</html>

187
docs/relnotes/17.1.2.html Normal file
View File

@@ -0,0 +1,187 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.1.2 Release Notes / June 5, 2017</h1>
<p>
Mesa 17.1.2 is a bug fix release which fixes bugs found since the 17.1.1 release.
</p>
<p>
Mesa 17.1.2 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
0d2020c2115db0d13a5be0075abf0da143290f69f5817a2f277861e89166a3e1 mesa-17.1.2.tar.gz
0937804f43746339b1f9540d8f9c8b4a1bb3d3eec0e4020eac283b8799798239 mesa-17.1.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98833">Bug 98833</a> - [REGRESSION, bisected] Wayland revert commit breaks non-Vsync fullscreen frame updates</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100741">Bug 100741</a> - Chromium - Memory leak</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100877">Bug 100877</a> - vulkan/tests/block_pool_no_free regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101110">Bug 101110</a> - Build failure in GNOME Continuous</li>
</ul>
<h2>Changes</h2>
<p>Bartosz Tomczyk (1):</p>
<ul>
<li>mesa: Avoid leaking surface in st_renderbuffer_delete</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: Reserve space for descriptor and push constant user SGPR setting.</li>
</ul>
<p>Daniel Stone (7):</p>
<ul>
<li>vulkan: Fix Wayland uninitialised registry</li>
<li>vulkan/wsi/wayland: Remove roundtrip when creating image</li>
<li>vulkan/wsi/wayland: Use per-display event queue</li>
<li>vulkan/wsi/wayland: Use proxy wrappers for swapchain</li>
<li>egl/wayland: Don't open-code roundtrip</li>
<li>egl/wayland: Use per-surface event queues</li>
<li>egl/wayland: Ensure we get a back buffer</li>
</ul>
<p>Emil Velikov (24):</p>
<ul>
<li>docs: add sha256 checksums for 17.1.1</li>
<li>configure: move platform handling further up</li>
<li>configure: rename remaining HAVE_EGL_PLATFORM_* guards</li>
<li>configure: update remaining --with-egl-platforms references</li>
<li>configure: loosen --with-platforms heuristics</li>
<li>configure: enable the surfaceless platform by default</li>
<li>configure: set HAVE_foo_PLATFORM as applicable</li>
<li>configure: error out when building GLX w/o the X11 platform</li>
<li>configure: check once for DRI3 dependencies</li>
<li>loader: build libloader_dri3_helper.la only with HAVE_PLATFORM_X11</li>
<li>configure: error out when building X11 Vulkan without DRI3</li>
<li>auxiliary/vl: use vl_*_screen_create stubs when building w/o platform</li>
<li>st/va: fix misplaced closing bracket</li>
<li>st/omx: remove unneeded X11 include</li>
<li>st/omx: fix building against X11-less setups</li>
<li>gallium/targets: link against XCB only as needed</li>
<li>configure: error out if building VA w/o supported platform</li>
<li>configure: error out if building OMX w/o supported platform</li>
<li>configure: error out if building VDPAU w/o supported platform</li>
<li>configure: error out if building XVMC w/o supported platform</li>
<li>travis: remove workarounds for the Vulkan target</li>
<li>anv: automake: list shared libraries after the static ones</li>
<li>radv: automake: list shared libraries after the static ones</li>
<li>egl/wayland: select the format based on the interface used</li>
</ul>
<p>Ian Romanick (3):</p>
<ul>
<li>r100: Don't assume that the base mipmap of a texture exists</li>
<li>r100,r200: Don't assume glVisual is non-NULL during context creation</li>
<li>r100: Use _mesa_get_format_base_format in radeon_update_wrapper</li>
</ul>
<p>Jason Ekstrand (17):</p>
<ul>
<li>anv: Handle color layout transitions from the UNINITIALIZED layout</li>
<li>anv: Handle transitioning depth from UNDEFINED to other layouts</li>
<li>anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hack</li>
<li>anv: Predicate 48bit support on gen &gt;= 8</li>
<li>anv: Set up memory types and heaps during physical device init</li>
<li>anv: Set image memory types based on the type count</li>
<li>i965/blorp: Do and end-of-pipe sync on both sides of fast-clear ops</li>
<li>i965: Round copy size to the nearest block in intel_miptree_copy</li>
<li>anv: Set EXEC_OBJECT_ASYNC when available</li>
<li>anv: Determine the type of mapping based on type metadata</li>
<li>anv: Add valid_bufer_usage to the memory type metadata</li>
<li>anv: Stop setting BO flags in bo_init_new</li>
<li>anv: Make supports_48bit_addresses a heap property</li>
<li>anv: Refactor memory type setup</li>
<li>anv: Advertise both 32-bit and 48-bit heaps when we have enough memory</li>
<li>i965: Rework Sandy Bridge HiZ and stencil layouts</li>
<li>anv: Require vertex buffers to come from a 32-bit heap</li>
</ul>
<p>Juan A. Suarez Romero (13):</p>
<ul>
<li>Revert "android: fix segfault within swap_buffers"</li>
<li>cherry-ignore: radeonsi: load patch_id for TES-as-ES when exporting for PS</li>
<li>cherry-ignore: anv: Determine the type of mapping based on type metadata</li>
<li>cherry-ignore: anv: Stop setting BO flags in bo_init_new</li>
<li>cherry-ignore: anv: Make supports_48bit_addresses a heap property</li>
<li>cherry-ignore: anv: Advertise both 32-bit and 48-bit heaps when we have enough memory</li>
<li>cherry-ignore: anv: Require vertex buffers to come from a 32-bit heap</li>
<li>cherry-ignore: radv: fix regression in descriptor set freeing</li>
<li>cherry-ignore: anv: Add valid_bufer_usage to the memory type metadata</li>
<li>cherry-ignore: anv: Refactor memory type setup</li>
<li>Revert "cherry-ignore: anv: [...]"</li>
<li>Revert "cherry-ignore: anv: Require vertex buffers to come from a 32-bit heap"</li>
<li>Update version to 17.1.2</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi/gfx9: compile shaders with +xnack</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>st/mesa: remove redundant stfb-&gt;iface checks</li>
</ul>
<p>Nicolas Boichat (1):</p>
<ul>
<li>configure.ac: Also match -androideabi tuple</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno: fix fence creation fail if no rendering</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>egl/android: fix segfault within swap_buffers</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>st/mesa: don't mark the program as in cache_fallback when there is cache miss</li>
</ul>
</div>
</body>
</html>

156
docs/relnotes/17.1.3.html Normal file
View File

@@ -0,0 +1,156 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.1.3 Release Notes / June 19, 2017</h1>
<p>
Mesa 17.1.3 is a bug fix release which fixes bugs found since the 17.1.2 release.
</p>
<p>
Mesa 17.1.3 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
81ae9127286ff8d631e466d258608d6dea9854fe7bee2e8521da44c7544f01e5 mesa-17.1.3.tar.gz
5f1ee9a8aea2880f887884df2dea0c16dd1b13eb42fd2e52265db0dc1b380e8c mesa-17.1.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100988">Bug 100988</a> - glXGetCurrentDisplay() no longer works for FakeGLX contexts?</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Set both compute and graphics SGPRS on descriptor set flush.</li>
<li>radv: Dirty all descriptors sets when changing the pipeline.</li>
<li>radv: Remove SI num RB override for occlusion queries.</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>xlib: fix glXGetCurrentDisplay() failure</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>i965/dri: Fix bad GL error in intel_create_winsys_renderbuffer()</li>
</ul>
<p>Chuck Atkins (1):</p>
<ul>
<li>configure.ac: Reduce zlib requirement from 1.2.8 to 1.2.3.</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv: expose integrated device type for APUs.</li>
<li>radv: set fmask state to all 0s when no fmask. (v2)</li>
<li>glsl/lower_distance: only set max_array_access for 1D clip dist arrays</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>Update version to 17.1.3</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>radv: fix trace dumping for !use_ib_bos</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>i965/blorp: Take a layer range in intel_hiz_exec</li>
<li>i965: Move the pre-depth-clear flush/stalls to intel_hiz_exec</li>
<li>i965: Perform HiZ flush/stall prior to HiZ resolves</li>
<li>i965: Mark depth surfaces as needing a HiZ resolve after blitting</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>automake: Link all libGL.so variants with -Bsymbolic.</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.1.2</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>etnaviv: always do cpu_fini in transfer_unmap</li>
</ul>
<p>Lyude (1):</p>
<ul>
<li>nvc0: disable BGRA8 images on Fermi</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>st/mesa: don't load cached TGSI shaders on demand</li>
<li>radeonsi: fix a GPU hang with tessellation on 2-CU configs</li>
<li>radeonsi: disable the patch ID workaround on SI when the patch ID isn't used (v2)</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>radv: fewer than 8 RBs are possible</li>
</ul>
<p>Nicolas Dechesne (1):</p>
<ul>
<li>util/rand_xor: add missing include statements</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>egl: fix _eglQuerySurface in EGL_BUFFER_AGE_EXT case</li>
</ul>
<p>Thomas Hellstrom (1):</p>
<ul>
<li>dri3/GLX: Fix drawable invalidation v2</li>
</ul>
<p>Tim Rowley (1):</p>
<ul>
<li>swr: relax c++ requirement from c++14 to c++11</li>
</ul>
</div>
</body>
</html>

68
docs/relnotes/17.2.0.html Normal file
View File

@@ -0,0 +1,68 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.2.0 Release Notes / TBD</h1>
<p>
Mesa 17.2.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 17.2.1.
</p>
<p>
Mesa 17.2.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_bindless_texture on radeonsi</li>
<li>GL_ARB_post_depth_coverage on nvc0 (GM200+)</li>
<li>GL_ARB_shader_viewport_layer_array on nvc0 (GM200+)</li>
<li>GL_AMD_vertex_shader_layer on nvc0 (GM200+)</li>
<li>GL_AMD_vertex_shader_viewport_index on nvc0 (GM200+)</li>
</ul>
<h2>Bug fixes</h2>
<ul>
TBD
</ul>
<h2>Changes</h2>
<ul>
<li>GL_APPLE_vertex_array_object support removed.</li>
</ul>
</div>
</body>
</html>

View File

@@ -50,6 +50,8 @@ execution. These are generally used for debugging.
The filenames will be "shader_X.vert" or "shader_X.frag" where X The filenames will be "shader_X.vert" or "shader_X.frag" where X
the shader ID. the shader ID.
<li><b>cache_info</b> - print debug information about shader cache <li><b>cache_info</b> - print debug information about shader cache
<li><b>cache_fb</b> - force cached shaders to be ignored and do a full
recompile via the fallback path</li>
<li><b>uniform</b> - print message to stdout when glUniform is called <li><b>uniform</b> - print message to stdout when glUniform is called
<li><b>nopvert</b> - force vertex shaders to be a simple shader that just transforms <li><b>nopvert</b> - force vertex shaders to be a simple shader that just transforms
the vertex position with ftransform() and passes through the color and the vertex position with ftransform() and passes through the color and

12
git_sha1_gen.sh Executable file
View File

@@ -0,0 +1,12 @@
#!/bin/sh
# run git from the sources directory
cd "$(dirname "$0")"
# don't print anything if git fails
if ! git_sha1=$(git --git-dir=.git rev-parse --short=10 HEAD 2>/dev/null)
then
exit
fi
printf '#define MESA_GIT_SHA1 "git-%s"\n' "$git_sha1"

View File

@@ -702,6 +702,7 @@ struct __DRIuseInvalidateExtensionRec {
#define __DRI_ATTRIB_BIND_TO_TEXTURE_TARGETS 46 #define __DRI_ATTRIB_BIND_TO_TEXTURE_TARGETS 46
#define __DRI_ATTRIB_YINVERTED 47 #define __DRI_ATTRIB_YINVERTED 47
#define __DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE 48 #define __DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE 48
#define __DRI_ATTRIB_MAX (__DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE + 1)
/* __DRI_ATTRIB_RENDER_TYPE */ /* __DRI_ATTRIB_RENDER_TYPE */
#define __DRI_ATTRIB_RGBA_BIT 0x01 #define __DRI_ATTRIB_RGBA_BIT 0x01
@@ -1136,7 +1137,7 @@ struct __DRIdri2ExtensionRec {
* extensions. * extensions.
*/ */
#define __DRI_IMAGE "DRI_IMAGE" #define __DRI_IMAGE "DRI_IMAGE"
#define __DRI_IMAGE_VERSION 14 #define __DRI_IMAGE_VERSION 15
/** /**
* These formats correspond to the similarly named MESA_FORMAT_* * These formats correspond to the similarly named MESA_FORMAT_*
@@ -1493,6 +1494,67 @@ struct __DRIimageExtensionRec {
const uint64_t *modifiers, const uint64_t *modifiers,
const unsigned int modifier_count, const unsigned int modifier_count,
void *loaderPrivate); void *loaderPrivate);
/*
* Like createImageFromDmaBufs, but takes also format modifiers.
*
* For EGL_EXT_image_dma_buf_import_modifiers.
*
* \since 15
*/
__DRIimage *(*createImageFromDmaBufs2)(__DRIscreen *screen,
int width, int height, int fourcc,
uint64_t modifier,
int *fds, int num_fds,
int *strides, int *offsets,
enum __DRIYUVColorSpace color_space,
enum __DRISampleRange sample_range,
enum __DRIChromaSiting horiz_siting,
enum __DRIChromaSiting vert_siting,
unsigned *error,
void *loaderPrivate);
/*
* dmabuf format query to support EGL_EXT_image_dma_buf_import_modifiers.
*
* \param max Maximum number of formats that can be accomodated into
* \param formats. If zero, no formats are returned -
* instead, the driver returns the total number of
* supported dmabuf formats in \param count.
* \param formats Buffer to fill formats into.
* \param count Count of formats returned, or, total number of
* supported formats in case \param max is zero.
*
* Returns true on success.
*
* \since 15
*/
GLboolean (*queryDmaBufFormats)(__DRIscreen *screen, int max,
int *formats, int *count);
/*
* dmabuf format modifier query for a given format to support
* EGL_EXT_image_dma_buf_import_modifiers.
*
* \param fourcc The format to query modifiers for. If this format
* is not supported by the driver, return false.
* \param max Maximum number of modifiers that can be accomodated in
* \param modifiers. If zero, no modifiers are returned -
* instead, the driver returns the total number of
* modifiers for \param format in \param count.
* \param modifiers Buffer to fill modifiers into.
* \param count Count of the modifiers returned, or, total number of
* supported modifiers for \param fourcc in case
* \param max is zero.
*
* Returns true upon success.
*
* \since 15
*/
GLboolean (*queryDmaBufModifiers)(__DRIscreen *screen, int fourcc,
int max, uint64_t *modifiers,
unsigned int *external_only,
int *count);
}; };
@@ -1720,6 +1782,19 @@ struct __DRIbackgroundCallableExtensionRec {
* operations (e.g. it should just set a thread-local variable). * operations (e.g. it should just set a thread-local variable).
*/ */
void (*setBackgroundContext)(void *loaderPrivate); void (*setBackgroundContext)(void *loaderPrivate);
/**
* Indicate that it is multithread safe to use glthread. For GLX/EGL
* platforms using Xlib, that involves calling XInitThreads, before
* opening an X display.
*
* Note: only supported if extension version is at least 2.
*
* \param loaderPrivate is the value that was passed to to the driver when
* the context was created. This can be used by the loader to identify
* which context any callbacks are associated with.
*/
GLboolean (*isThreadSafe)(void *loaderPrivate);
}; };
#endif #endif

View File

@@ -502,9 +502,13 @@ thrd_current(void)
HANDLE hCurrentThread; HANDLE hCurrentThread;
BOOL bRet; BOOL bRet;
/* GetCurrentThread() returns a pseudo-handle, which is useless. We need /* GetCurrentThread() returns a pseudo-handle, which we need
* to call DuplicateHandle to get a real handle. However the handle value * to pass to DuplicateHandle(). Only the resulting handle can be used
* will not match the one returned by thread_create. * from other threads.
*
* Note that neither handle can be compared to the one by thread_create.
* Only the thread IDs - as returned by GetThreadId() and GetCurrentThreadId()
* can be compared directly.
* *
* Other potential solutions would be: * Other potential solutions would be:
* - define thrd_t as a thread Ids, but this would mean we'd need to OpenThread for many operations * - define thrd_t as a thread Ids, but this would mean we'd need to OpenThread for many operations

View File

@@ -165,3 +165,26 @@ CHIPSET(0x5927, kbl_gt3, "Intel(R) Iris Plus Graphics 650 (Kaby Lake GT3)")
CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4") CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x3184, glk, "Intel(R) HD Graphics (Geminilake)") CHIPSET(0x3184, glk, "Intel(R) HD Graphics (Geminilake)")
CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)") CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)")
CHIPSET(0x3E90, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
CHIPSET(0x3E93, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
CHIPSET(0x3E91, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E92, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E96, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E9B, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3E94, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
CHIPSET(0x3EA6, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x3EA7, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x3EA8, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x3EA5, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
CHIPSET(0x5A49, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)")
CHIPSET(0x5A4A, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)")
CHIPSET(0x5A41, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)")
CHIPSET(0x5A42, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)")
CHIPSET(0x5A44, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)")
CHIPSET(0x5A59, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)")
CHIPSET(0x5A5A, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)")
CHIPSET(0x5A5C, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)")
CHIPSET(0x5A50, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
CHIPSET(0x5A51, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
CHIPSET(0x5A52, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
CHIPSET(0x5A54, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")

View File

@@ -213,12 +213,17 @@ CHIPSET(0x6985, POLARIS12_, POLARIS12)
CHIPSET(0x6986, POLARIS12_, POLARIS12) CHIPSET(0x6986, POLARIS12_, POLARIS12)
CHIPSET(0x6987, POLARIS12_, POLARIS12) CHIPSET(0x6987, POLARIS12_, POLARIS12)
CHIPSET(0x6995, POLARIS12_, POLARIS12) CHIPSET(0x6995, POLARIS12_, POLARIS12)
CHIPSET(0x6997, POLARIS12_, POLARIS12)
CHIPSET(0x699F, POLARIS12_, POLARIS12) CHIPSET(0x699F, POLARIS12_, POLARIS12)
CHIPSET(0x6860, VEGA10_, VEGA10) CHIPSET(0x6860, VEGA10_, VEGA10)
CHIPSET(0x6861, VEGA10_, VEGA10) CHIPSET(0x6861, VEGA10_, VEGA10)
CHIPSET(0x6862, VEGA10_, VEGA10) CHIPSET(0x6862, VEGA10_, VEGA10)
CHIPSET(0x6863, VEGA10_, VEGA10) CHIPSET(0x6863, VEGA10_, VEGA10)
CHIPSET(0x6864, VEGA10_, VEGA10)
CHIPSET(0x6867, VEGA10_, VEGA10) CHIPSET(0x6867, VEGA10_, VEGA10)
CHIPSET(0x6868, VEGA10_, VEGA10)
CHIPSET(0x687F, VEGA10_, VEGA10) CHIPSET(0x687F, VEGA10_, VEGA10)
CHIPSET(0x686C, VEGA10_, VEGA10) CHIPSET(0x686C, VEGA10_, VEGA10)
CHIPSET(0x15DD, RAVEN_, RAVEN)

View File

@@ -43,7 +43,7 @@ extern "C" {
#define VK_VERSION_MINOR(version) (((uint32_t)(version) >> 12) & 0x3ff) #define VK_VERSION_MINOR(version) (((uint32_t)(version) >> 12) & 0x3ff)
#define VK_VERSION_PATCH(version) ((uint32_t)(version) & 0xfff) #define VK_VERSION_PATCH(version) ((uint32_t)(version) & 0xfff)
// Version of this file // Version of this file
#define VK_HEADER_VERSION 46 #define VK_HEADER_VERSION 49
#define VK_NULL_HANDLE 0 #define VK_NULL_HANDLE 0
@@ -261,9 +261,6 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_EXTERNAL_BUFFER_INFO_KHX = 1000071002, VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_EXTERNAL_BUFFER_INFO_KHX = 1000071002,
VK_STRUCTURE_TYPE_EXTERNAL_BUFFER_PROPERTIES_KHX = 1000071003, VK_STRUCTURE_TYPE_EXTERNAL_BUFFER_PROPERTIES_KHX = 1000071003,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHX = 1000071004, VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHX = 1000071004,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2_KHX = 1000071005,
VK_STRUCTURE_TYPE_IMAGE_FORMAT_PROPERTIES_2_KHX = 1000071006,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_IMAGE_FORMAT_INFO_2_KHX = 1000071007,
VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_BUFFER_CREATE_INFO_KHX = 1000072000, VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_BUFFER_CREATE_INFO_KHX = 1000072000,
VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO_KHX = 1000072001, VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO_KHX = 1000072001,
VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO_KHX = 1000072002, VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO_KHX = 1000072002,
@@ -301,6 +298,10 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DISCARD_RECTANGLE_PROPERTIES_EXT = 1000099000, VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DISCARD_RECTANGLE_PROPERTIES_EXT = 1000099000,
VK_STRUCTURE_TYPE_PIPELINE_DISCARD_RECTANGLE_STATE_CREATE_INFO_EXT = 1000099001, VK_STRUCTURE_TYPE_PIPELINE_DISCARD_RECTANGLE_STATE_CREATE_INFO_EXT = 1000099001,
VK_STRUCTURE_TYPE_HDR_METADATA_EXT = 1000105000, VK_STRUCTURE_TYPE_HDR_METADATA_EXT = 1000105000,
VK_STRUCTURE_TYPE_SHARED_PRESENT_SURFACE_CAPABILITIES_KHR = 1000111000,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SURFACE_INFO_2_KHR = 1000119000,
VK_STRUCTURE_TYPE_SURFACE_CAPABILITIES_2_KHR = 1000119001,
VK_STRUCTURE_TYPE_SURFACE_FORMAT_2_KHR = 1000119002,
VK_STRUCTURE_TYPE_IOS_SURFACE_CREATE_INFO_MVK = 1000122000, VK_STRUCTURE_TYPE_IOS_SURFACE_CREATE_INFO_MVK = 1000122000,
VK_STRUCTURE_TYPE_MACOS_SURFACE_CREATE_INFO_MVK = 1000123000, VK_STRUCTURE_TYPE_MACOS_SURFACE_CREATE_INFO_MVK = 1000123000,
VK_STRUCTURE_TYPE_BEGIN_RANGE = VK_STRUCTURE_TYPE_APPLICATION_INFO, VK_STRUCTURE_TYPE_BEGIN_RANGE = VK_STRUCTURE_TYPE_APPLICATION_INFO,
@@ -590,6 +591,7 @@ typedef enum VkImageLayout {
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL = 7, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL = 7,
VK_IMAGE_LAYOUT_PREINITIALIZED = 8, VK_IMAGE_LAYOUT_PREINITIALIZED = 8,
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR = 1000001002, VK_IMAGE_LAYOUT_PRESENT_SRC_KHR = 1000001002,
VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR = 1000111000,
VK_IMAGE_LAYOUT_BEGIN_RANGE = VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_BEGIN_RANGE = VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_END_RANGE = VK_IMAGE_LAYOUT_PREINITIALIZED, VK_IMAGE_LAYOUT_END_RANGE = VK_IMAGE_LAYOUT_PREINITIALIZED,
VK_IMAGE_LAYOUT_RANGE_SIZE = (VK_IMAGE_LAYOUT_PREINITIALIZED - VK_IMAGE_LAYOUT_UNDEFINED + 1), VK_IMAGE_LAYOUT_RANGE_SIZE = (VK_IMAGE_LAYOUT_PREINITIALIZED - VK_IMAGE_LAYOUT_UNDEFINED + 1),
@@ -896,6 +898,47 @@ typedef enum VkSubpassContents {
VK_SUBPASS_CONTENTS_MAX_ENUM = 0x7FFFFFFF VK_SUBPASS_CONTENTS_MAX_ENUM = 0x7FFFFFFF
} VkSubpassContents; } VkSubpassContents;
typedef enum VkObjectType {
VK_OBJECT_TYPE_UNKNOWN = 0,
VK_OBJECT_TYPE_INSTANCE = 1,
VK_OBJECT_TYPE_PHYSICAL_DEVICE = 2,
VK_OBJECT_TYPE_DEVICE = 3,
VK_OBJECT_TYPE_QUEUE = 4,
VK_OBJECT_TYPE_SEMAPHORE = 5,
VK_OBJECT_TYPE_COMMAND_BUFFER = 6,
VK_OBJECT_TYPE_FENCE = 7,
VK_OBJECT_TYPE_DEVICE_MEMORY = 8,
VK_OBJECT_TYPE_BUFFER = 9,
VK_OBJECT_TYPE_IMAGE = 10,
VK_OBJECT_TYPE_EVENT = 11,
VK_OBJECT_TYPE_QUERY_POOL = 12,
VK_OBJECT_TYPE_BUFFER_VIEW = 13,
VK_OBJECT_TYPE_IMAGE_VIEW = 14,
VK_OBJECT_TYPE_SHADER_MODULE = 15,
VK_OBJECT_TYPE_PIPELINE_CACHE = 16,
VK_OBJECT_TYPE_PIPELINE_LAYOUT = 17,
VK_OBJECT_TYPE_RENDER_PASS = 18,
VK_OBJECT_TYPE_PIPELINE = 19,
VK_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT = 20,
VK_OBJECT_TYPE_SAMPLER = 21,
VK_OBJECT_TYPE_DESCRIPTOR_POOL = 22,
VK_OBJECT_TYPE_DESCRIPTOR_SET = 23,
VK_OBJECT_TYPE_FRAMEBUFFER = 24,
VK_OBJECT_TYPE_COMMAND_POOL = 25,
VK_OBJECT_TYPE_SURFACE_KHR = 1000000000,
VK_OBJECT_TYPE_SWAPCHAIN_KHR = 1000001000,
VK_OBJECT_TYPE_DISPLAY_KHR = 1000002000,
VK_OBJECT_TYPE_DISPLAY_MODE_KHR = 1000002001,
VK_OBJECT_TYPE_DEBUG_REPORT_CALLBACK_EXT = 1000011000,
VK_OBJECT_TYPE_DESCRIPTOR_UPDATE_TEMPLATE_KHR = 1000085000,
VK_OBJECT_TYPE_OBJECT_TABLE_NVX = 1000086000,
VK_OBJECT_TYPE_INDIRECT_COMMANDS_LAYOUT_NVX = 1000086001,
VK_OBJECT_TYPE_BEGIN_RANGE = VK_OBJECT_TYPE_UNKNOWN,
VK_OBJECT_TYPE_END_RANGE = VK_OBJECT_TYPE_COMMAND_POOL,
VK_OBJECT_TYPE_RANGE_SIZE = (VK_OBJECT_TYPE_COMMAND_POOL - VK_OBJECT_TYPE_UNKNOWN + 1),
VK_OBJECT_TYPE_MAX_ENUM = 0x7FFFFFFF
} VkObjectType;
typedef VkFlags VkInstanceCreateFlags; typedef VkFlags VkInstanceCreateFlags;
typedef enum VkFormatFeatureFlagBits { typedef enum VkFormatFeatureFlagBits {
@@ -3323,6 +3366,8 @@ typedef enum VkPresentModeKHR {
VK_PRESENT_MODE_MAILBOX_KHR = 1, VK_PRESENT_MODE_MAILBOX_KHR = 1,
VK_PRESENT_MODE_FIFO_KHR = 2, VK_PRESENT_MODE_FIFO_KHR = 2,
VK_PRESENT_MODE_FIFO_RELAXED_KHR = 3, VK_PRESENT_MODE_FIFO_RELAXED_KHR = 3,
VK_PRESENT_MODE_SHARED_DEMAND_REFRESH_KHR = 1000111000,
VK_PRESENT_MODE_SHARED_CONTINUOUS_REFRESH_KHR = 1000111001,
VK_PRESENT_MODE_BEGIN_RANGE_KHR = VK_PRESENT_MODE_IMMEDIATE_KHR, VK_PRESENT_MODE_BEGIN_RANGE_KHR = VK_PRESENT_MODE_IMMEDIATE_KHR,
VK_PRESENT_MODE_END_RANGE_KHR = VK_PRESENT_MODE_FIFO_RELAXED_KHR, VK_PRESENT_MODE_END_RANGE_KHR = VK_PRESENT_MODE_FIFO_RELAXED_KHR,
VK_PRESENT_MODE_RANGE_SIZE_KHR = (VK_PRESENT_MODE_FIFO_RELAXED_KHR - VK_PRESENT_MODE_IMMEDIATE_KHR + 1), VK_PRESENT_MODE_RANGE_SIZE_KHR = (VK_PRESENT_MODE_FIFO_RELAXED_KHR - VK_PRESENT_MODE_IMMEDIATE_KHR + 1),
@@ -4101,6 +4146,64 @@ VKAPI_ATTR void VKAPI_CALL vkCmdPushDescriptorSetWithTemplateKHR(
const void* pData); const void* pData);
#endif #endif
#define VK_KHR_shared_presentable_image 1
#define VK_KHR_SHARED_PRESENTABLE_IMAGE_SPEC_VERSION 1
#define VK_KHR_SHARED_PRESENTABLE_IMAGE_EXTENSION_NAME "VK_KHR_shared_presentable_image"
typedef struct VkSharedPresentSurfaceCapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkImageUsageFlags sharedPresentSupportedUsageFlags;
} VkSharedPresentSurfaceCapabilitiesKHR;
typedef VkResult (VKAPI_PTR *PFN_vkGetSwapchainStatusKHR)(VkDevice device, VkSwapchainKHR swapchain);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR VkResult VKAPI_CALL vkGetSwapchainStatusKHR(
VkDevice device,
VkSwapchainKHR swapchain);
#endif
#define VK_KHR_get_surface_capabilities2 1
#define VK_KHR_GET_SURFACE_CAPABILITIES_2_SPEC_VERSION 1
#define VK_KHR_GET_SURFACE_CAPABILITIES_2_EXTENSION_NAME "VK_KHR_get_surface_capabilities2"
typedef struct VkPhysicalDeviceSurfaceInfo2KHR {
VkStructureType sType;
const void* pNext;
VkSurfaceKHR surface;
} VkPhysicalDeviceSurfaceInfo2KHR;
typedef struct VkSurfaceCapabilities2KHR {
VkStructureType sType;
void* pNext;
VkSurfaceCapabilitiesKHR surfaceCapabilities;
} VkSurfaceCapabilities2KHR;
typedef struct VkSurfaceFormat2KHR {
VkStructureType sType;
void* pNext;
VkSurfaceFormatKHR surfaceFormat;
} VkSurfaceFormat2KHR;
typedef VkResult (VKAPI_PTR *PFN_vkGetPhysicalDeviceSurfaceCapabilities2KHR)(VkPhysicalDevice physicalDevice, const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo, VkSurfaceCapabilities2KHR* pSurfaceCapabilities);
typedef VkResult (VKAPI_PTR *PFN_vkGetPhysicalDeviceSurfaceFormats2KHR)(VkPhysicalDevice physicalDevice, const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo, uint32_t* pSurfaceFormatCount, VkSurfaceFormat2KHR* pSurfaceFormats);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceCapabilities2KHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo,
VkSurfaceCapabilities2KHR* pSurfaceCapabilities);
VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceFormats2KHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceSurfaceInfo2KHR* pSurfaceInfo,
uint32_t* pSurfaceFormatCount,
VkSurfaceFormat2KHR* pSurfaceFormats);
#endif
#define VK_EXT_debug_report 1 #define VK_EXT_debug_report 1
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkDebugReportCallbackEXT) VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkDebugReportCallbackEXT)

View File

@@ -103,8 +103,26 @@ def generate(env):
'HAVE_STDINT_H', 'HAVE_STDINT_H',
]) ])
env.Prepend(LIBPATH = [os.path.join(llvm_dir, 'lib')]) env.Prepend(LIBPATH = [os.path.join(llvm_dir, 'lib')])
# LIBS should match the output of `llvm-config --libs engine mcjit bitwriter x86asmprinter` # LIBS should match the output of `llvm-config --libs engine mcjit bitwriter x86asmprinter irreader`
if llvm_version >= distutils.version.LooseVersion('3.9'): if llvm_version >= distutils.version.LooseVersion('4.0'):
env.Prepend(LIBS = [
'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',
'LLVMDebugInfoCodeView', 'LLVMCodeGen',
'LLVMScalarOpts', 'LLVMInstCombine',
'LLVMTransformUtils',
'LLVMBitWriter', 'LLVMX86Desc',
'LLVMMCDisassembler', 'LLVMX86Info',
'LLVMX86AsmPrinter', 'LLVMX86Utils',
'LLVMMCJIT', 'LLVMExecutionEngine', 'LLVMTarget',
'LLVMAnalysis', 'LLVMProfileData',
'LLVMRuntimeDyld', 'LLVMObject', 'LLVMMCParser',
'LLVMBitReader', 'LLVMMC', 'LLVMCore',
'LLVMSupport',
'LLVMIRReader', 'LLVMAsmParser',
'LLVMDemangle', 'LLVMGlobalISel', 'LLVMDebugInfoMSF',
])
elif llvm_version >= distutils.version.LooseVersion('3.9'):
env.Prepend(LIBS = [ env.Prepend(LIBS = [
'LLVMX86Disassembler', 'LLVMX86AsmParser', 'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter', 'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',

View File

@@ -21,18 +21,7 @@
.PHONY: git_sha1.h.tmp .PHONY: git_sha1.h.tmp
git_sha1.h.tmp: git_sha1.h.tmp:
@# Don't assume that $(top_srcdir)/.git is a directory. It may be @sh $(top_srcdir)/git_sha1_gen.sh > $@
@# a gitlink file if $(top_srcdir) is a submodule checkout or a linked
@# worktree.
@# If we are building from a release tarball copy the bundled header.
@touch git_sha1.h.tmp
@if test -e $(top_srcdir)/.git; then \
if which git > /dev/null; then \
git --git-dir=$(top_srcdir)/.git log -n 1 --oneline | \
sed 's/^\([^ ]*\) .*/#define MESA_GIT_SHA1 "git-\1"/' \
> git_sha1.h.tmp ; \
fi \
fi
git_sha1.h: git_sha1.h.tmp git_sha1.h: git_sha1.h.tmp
@echo "updating git_sha1.h" @echo "updating git_sha1.h"

View File

@@ -22,27 +22,15 @@ def write_git_sha1_h_file(filename):
to retrieve the git hashid and write the header file. An empty file to retrieve the git hashid and write the header file. An empty file
will be created if anything goes wrong.""" will be created if anything goes wrong."""
args = [ 'git', 'rev-parse', '--short=10', 'HEAD' ]
try:
(commit, foo) = subprocess.Popen(args, stdout=subprocess.PIPE).communicate()
except:
print "Warning: exception in write_git_sha1_h_file()"
# git log command didn't work
if not os.path.exists(filename):
dirname = os.path.dirname(filename)
if dirname and not os.path.exists(dirname):
os.makedirs(dirname)
# create an empty file if none already exists
f = open(filename, "w")
f.close()
return
# note that commit[:-1] removes the trailing newline character
commit = '#define MESA_GIT_SHA1 "git-%s"\n' % commit[:-1]
tempfile = "git_sha1.h.tmp" tempfile = "git_sha1.h.tmp"
f = open(tempfile, "w") with open(tempfile, "w") as f:
f.write(commit) args = [ 'sh', Dir('#').abspath + '/git_sha1_gen.sh' ]
f.close() try:
subprocess.Popen(args, stdout=f).wait()
except:
print "Warning: exception in write_git_sha1_h_file()"
return
if not os.path.exists(filename) or not filecmp.cmp(tempfile, filename): if not os.path.exists(filename) or not filecmp.cmp(tempfile, filename):
# The filename does not exist or it's different from the new file, # The filename does not exist or it's different from the new file,
# so replace old file with new. # so replace old file with new.

View File

@@ -42,5 +42,11 @@ LOCAL_C_INCLUDES := \
$(MESA_TOP)/src/amd/addrlib/gfx9/chip \ $(MESA_TOP)/src/amd/addrlib/gfx9/chip \
$(MESA_TOP)/src/amd/addrlib/r800/chip $(MESA_TOP)/src/amd/addrlib/r800/chip
LOCAL_EXPORT_C_INCLUDE_DIRS := \
$(LOCAL_PATH) \
$(LOCAL_PATH)/addrlib/core \
$(LOCAL_PATH)/addrlib/inc/chip/r800 \
$(LOCAL_PATH)/addrlib/r800/chip
include $(MESA_COMMON_MK) include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY) include $(BUILD_STATIC_LIBRARY)

View File

@@ -29,6 +29,7 @@ include $(CLEAR_VARS)
LOCAL_MODULE := libmesa_amd_common LOCAL_MODULE := libmesa_amd_common
LOCAL_SRC_FILES := \ LOCAL_SRC_FILES := \
$(AMD_COMMON_FILES) \
$(AMD_COMPILER_FILES) \ $(AMD_COMPILER_FILES) \
$(AMD_DEBUG_FILES) $(AMD_DEBUG_FILES)
@@ -49,15 +50,27 @@ LOCAL_C_INCLUDES := \
$(MESA_TOP)/include \ $(MESA_TOP)/include \
$(MESA_TOP)/src \ $(MESA_TOP)/src \
$(MESA_TOP)/src/amd/common \ $(MESA_TOP)/src/amd/common \
$(MESA_TOP)/src/compiler \
$(call generated-sources-dir-for,STATIC_LIBRARIES,libmesa_nir,,)/nir \
$(MESA_TOP)/src/gallium/include \ $(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary \ $(MESA_TOP)/src/gallium/auxiliary \
$(intermediates)/common \ $(intermediates)/common \
external/llvm/include \ external/llvm/include \
external/llvm/device/include \ external/llvm/device/include
external/libcxx/include \
$(ELF_INCLUDES)
LOCAL_STATIC_LIBRARIES := libLLVMCore LOCAL_EXPORT_C_INCLUDE_DIRS := \
$(LOCAL_PATH)/common
LOCAL_SHARED_LIBRARIES := \
libdrm_amdgpu
LOCAL_STATIC_LIBRARIES := \
libmesa_nir
LOCAL_WHOLE_STATIC_LIBRARIES := \
libelf
$(call mesa-build-with-llvm)
include $(MESA_COMMON_MK) include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY) include $(BUILD_STATIC_LIBRARY)

View File

@@ -25,6 +25,7 @@ COMMON_LIBS = common/libamd_common.la
# TODO cleanup these # TODO cleanup these
common_libamd_common_la_CPPFLAGS = \ common_libamd_common_la_CPPFLAGS = \
$(AMDGPU_CFLAGS) \
$(VALGRIND_CFLAGS) \ $(VALGRIND_CFLAGS) \
$(DEFINES) \ $(DEFINES) \
-I$(top_srcdir)/include \ -I$(top_srcdir)/include \
@@ -54,6 +55,7 @@ common_libamd_common_la_CXXFLAGS = \
noinst_LTLIBRARIES += $(COMMON_LIBS) noinst_LTLIBRARIES += $(COMMON_LIBS)
common_libamd_common_la_SOURCES = \ common_libamd_common_la_SOURCES = \
$(AMD_COMMON_FILES) \
$(AMD_COMPILER_FILES) \ $(AMD_COMPILER_FILES) \
$(AMD_DEBUG_FILES) \ $(AMD_DEBUG_FILES) \
$(AMD_GENERATED_FILES) $(AMD_GENERATED_FILES)
@@ -65,6 +67,8 @@ common_libamd_common_la_SOURCES += $(AMD_NIR_FILES)
endif endif
endif endif
common_libamd_common_la_LIBADD = $(LIBELF_LIBS)
common/sid_tables.h: $(srcdir)/common/sid_tables.py $(srcdir)/common/sid.h $(srcdir)/common/gfx9d.h common/sid_tables.h: $(srcdir)/common/sid_tables.py $(srcdir)/common/sid.h $(srcdir)/common/gfx9d.h
$(AM_V_at)$(MKDIR_P) $(@D) $(AM_V_at)$(MKDIR_P) $(@D)
$(AM_V_GEN) $(PYTHON2) $(srcdir)/common/sid_tables.py $(srcdir)/common/sid.h $(srcdir)/common/gfx9d.h > $@ $(AM_V_GEN) $(PYTHON2) $(srcdir)/common/sid_tables.py $(srcdir)/common/sid.h $(srcdir)/common/gfx9d.h > $@

View File

@@ -42,16 +42,25 @@ ADDRLIB_FILES = \
AMD_COMPILER_FILES = \ AMD_COMPILER_FILES = \
common/ac_binary.c \ common/ac_binary.c \
common/ac_binary.h \ common/ac_binary.h \
common/ac_exp_param.h \
common/ac_llvm_build.c \ common/ac_llvm_build.c \
common/ac_llvm_build.h \ common/ac_llvm_build.h \
common/ac_llvm_helper.cpp \ common/ac_llvm_helper.cpp \
common/ac_llvm_util.c \ common/ac_llvm_util.c \
common/ac_llvm_util.h common/ac_llvm_util.h \
common/ac_shader_info.c \
common/ac_shader_info.h
AMD_NIR_FILES = \ AMD_NIR_FILES = \
common/ac_nir_to_llvm.c \ common/ac_nir_to_llvm.c \
common/ac_nir_to_llvm.h common/ac_nir_to_llvm.h
AMD_COMMON_FILES = \
common/ac_gpu_info.c \
common/ac_gpu_info.h \
common/ac_surface.c \
common/ac_surface.h
AMD_DEBUG_FILES = \ AMD_DEBUG_FILES = \
common/ac_debug.c \ common/ac_debug.c \
common/ac_debug.h common/ac_debug.h

View File

@@ -1193,6 +1193,20 @@ ChipFamily Gfx9Lib::HwlConvertChipFamily(
m_settings.depthPipeXorDisable = 1; m_settings.depthPipeXorDisable = 1;
break; break;
case FAMILY_RV:
m_settings.isArcticIsland = 1;
m_settings.isRaven = ASICREV_IS_RAVEN(uChipRevision);
if (m_settings.isRaven)
{
m_settings.isDcn1 = 1;
}
m_settings.metaBaseAlignFix = 1;
m_settings.depthPipeXorDisable = 1;
break;
default: default:
ADDR_ASSERT(!"This should be a Fusion"); ADDR_ASSERT(!"This should be a Fusion");
break; break;
@@ -2734,6 +2748,35 @@ BOOL_32 Gfx9Lib::IsValidDisplaySwizzleMode(
break; break;
} }
} }
else if (m_settings.isDcn1)
{
switch (swizzleMode)
{
case ADDR_SW_4KB_D:
case ADDR_SW_64KB_D:
case ADDR_SW_VAR_D:
case ADDR_SW_64KB_D_T:
case ADDR_SW_4KB_D_X:
case ADDR_SW_64KB_D_X:
case ADDR_SW_VAR_D_X:
support = (pIn->bpp == 64);
break;
case ADDR_SW_LINEAR:
case ADDR_SW_4KB_S:
case ADDR_SW_64KB_S:
case ADDR_SW_VAR_S:
case ADDR_SW_64KB_S_T:
case ADDR_SW_4KB_S_X:
case ADDR_SW_64KB_S_X:
case ADDR_SW_VAR_S_X:
support = (pIn->bpp <= 64);
break;
default:
break;
}
}
else else
{ {
ADDR_NOT_IMPLEMENTED(); ADDR_NOT_IMPLEMENTED();
@@ -3195,6 +3238,20 @@ ADDR_E_RETURNCODE Gfx9Lib::HwlGetPreferredSurfaceSetting(
// DCE12 does not support display surface to be _T swizzle mode // DCE12 does not support display surface to be _T swizzle mode
prtXor = FALSE; prtXor = FALSE;
} }
else if (m_settings.isDcn1)
{
// _R is not supported by Dcn1
if (pIn->bpp == 64)
{
swType = ADDR_SW_D;
}
else
{
swType = ADDR_SW_S;
}
blockSet.micro = FALSE;
}
else else
{ {
ADDR_NOT_IMPLEMENTED(); ADDR_NOT_IMPLEMENTED();

View File

@@ -54,11 +54,13 @@ struct Gfx9ChipSettings
// Asic/Generation name // Asic/Generation name
UINT_32 isArcticIsland : 1; UINT_32 isArcticIsland : 1;
UINT_32 isVega10 : 1; UINT_32 isVega10 : 1;
UINT_32 reserved0 : 30; UINT_32 isRaven : 1;
UINT_32 reserved0 : 29;
// Display engine IP version name // Display engine IP version name
UINT_32 isDce12 : 1; UINT_32 isDce12 : 1;
UINT_32 reserved1 : 31; UINT_32 isDcn1 : 1;
UINT_32 reserved1 : 29;
// Misc configuration bits // Misc configuration bits
UINT_32 metaBaseAlignFix : 1; UINT_32 metaBaseAlignFix : 1;
@@ -201,7 +203,7 @@ protected:
if (IsXor(swizzleMode)) if (IsXor(swizzleMode))
{ {
if (m_settings.isVega10) if (m_settings.isVega10 || m_settings.isRaven)
{ {
baseAlign = GetBlockSize(swizzleMode); baseAlign = GetBlockSize(swizzleMode);
} }

View File

@@ -132,9 +132,15 @@ void ac_dump_reg(FILE *file, unsigned offset, uint32_t value,
static void ac_parse_set_reg_packet(FILE *f, uint32_t *ib, unsigned count, static void ac_parse_set_reg_packet(FILE *f, uint32_t *ib, unsigned count,
unsigned reg_offset) unsigned reg_offset)
{ {
unsigned reg = (ib[1] << 2) + reg_offset; unsigned reg = ((ib[1] & 0xFFFF) << 2) + reg_offset;
unsigned index = ib[1] >> 28;
int i; int i;
if (index != 0) {
print_spaces(f, INDENT_PKT);
fprintf(f, "INDEX = %u\n", index);
}
for (i = 0; i < count; i++) for (i = 0; i < count; i++)
ac_dump_reg(f, reg + i*4, ib[2+i], ~0); ac_dump_reg(f, reg + i*4, ib[2+i], ~0);
} }
@@ -214,6 +220,52 @@ static uint32_t *ac_parse_packet3(FILE *f, uint32_t *ib, int *num_dw,
print_named_value(f, "ADDRESS_HI", ib[3], 16); print_named_value(f, "ADDRESS_HI", ib[3], 16);
} }
break; break;
case PKT3_EVENT_WRITE_EOP:
ac_dump_reg(f, R_028A90_VGT_EVENT_INITIATOR, ib[1],
S_028A90_EVENT_TYPE(~0));
print_named_value(f, "EVENT_INDEX", (ib[1] >> 8) & 0xf, 4);
print_named_value(f, "TCL1_VOL_ACTION_ENA", (ib[1] >> 12) & 0x1, 1);
print_named_value(f, "TC_VOL_ACTION_ENA", (ib[1] >> 13) & 0x1, 1);
print_named_value(f, "TC_WB_ACTION_ENA", (ib[1] >> 15) & 0x1, 1);
print_named_value(f, "TCL1_ACTION_ENA", (ib[1] >> 16) & 0x1, 1);
print_named_value(f, "TC_ACTION_ENA", (ib[1] >> 17) & 0x1, 1);
print_named_value(f, "ADDRESS_LO", ib[2], 32);
print_named_value(f, "ADDRESS_HI", ib[3], 16);
print_named_value(f, "DST_SEL", (ib[3] >> 16) & 0x3, 2);
print_named_value(f, "INT_SEL", (ib[3] >> 24) & 0x7, 3);
print_named_value(f, "DATA_SEL", ib[3] >> 29, 3);
print_named_value(f, "DATA_LO", ib[4], 32);
print_named_value(f, "DATA_HI", ib[5], 32);
break;
case PKT3_RELEASE_MEM:
ac_dump_reg(f, R_028A90_VGT_EVENT_INITIATOR, ib[1],
S_028A90_EVENT_TYPE(~0));
print_named_value(f, "EVENT_INDEX", (ib[1] >> 8) & 0xf, 4);
print_named_value(f, "TCL1_VOL_ACTION_ENA", (ib[1] >> 12) & 0x1, 1);
print_named_value(f, "TC_VOL_ACTION_ENA", (ib[1] >> 13) & 0x1, 1);
print_named_value(f, "TC_WB_ACTION_ENA", (ib[1] >> 15) & 0x1, 1);
print_named_value(f, "TCL1_ACTION_ENA", (ib[1] >> 16) & 0x1, 1);
print_named_value(f, "TC_ACTION_ENA", (ib[1] >> 17) & 0x1, 1);
print_named_value(f, "TC_NC_ACTION_ENA", (ib[1] >> 19) & 0x1, 1);
print_named_value(f, "TC_WC_ACTION_ENA", (ib[1] >> 20) & 0x1, 1);
print_named_value(f, "TC_MD_ACTION_ENA", (ib[1] >> 21) & 0x1, 1);
print_named_value(f, "DST_SEL", (ib[2] >> 16) & 0x3, 2);
print_named_value(f, "INT_SEL", (ib[2] >> 24) & 0x7, 3);
print_named_value(f, "DATA_SEL", ib[2] >> 29, 3);
print_named_value(f, "ADDRESS_LO", ib[3], 32);
print_named_value(f, "ADDRESS_HI", ib[4], 32);
print_named_value(f, "DATA_LO", ib[5], 32);
print_named_value(f, "DATA_HI", ib[6], 32);
print_named_value(f, "CTXID", ib[7], 32);
break;
case PKT3_WAIT_REG_MEM:
print_named_value(f, "OP", ib[1], 32);
print_named_value(f, "ADDRESS_LO", ib[2], 32);
print_named_value(f, "ADDRESS_HI", ib[3], 32);
print_named_value(f, "REF", ib[4], 32);
print_named_value(f, "MASK", ib[5], 32);
print_named_value(f, "POLL_INTERVAL", ib[6], 16);
break;
case PKT3_DRAW_INDEX_AUTO: case PKT3_DRAW_INDEX_AUTO:
ac_dump_reg(f, R_030930_VGT_NUM_INDICES, ib[1], ~0); ac_dump_reg(f, R_030930_VGT_NUM_INDICES, ib[1], ~0);
ac_dump_reg(f, R_0287F0_VGT_DRAW_INITIATOR, ib[2], ~0); ac_dump_reg(f, R_0287F0_VGT_DRAW_INITIATOR, ib[2], ~0);

View File

@@ -0,0 +1,40 @@
/*
* Copyright 2014 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
*/
#ifndef AC_EXP_PARAM_H
#define AC_EXP_PARAM_H
enum {
/* SPI_PS_INPUT_CNTL_i.OFFSET[0:4] */
AC_EXP_PARAM_OFFSET_0 = 0,
AC_EXP_PARAM_OFFSET_31 = 31,
/* SPI_PS_INPUT_CNTL_i.DEFAULT_VAL[0:1] */
AC_EXP_PARAM_DEFAULT_VAL_0000 = 64,
AC_EXP_PARAM_DEFAULT_VAL_0001,
AC_EXP_PARAM_DEFAULT_VAL_1110,
AC_EXP_PARAM_DEFAULT_VAL_1111,
AC_EXP_PARAM_UNDEFINED = 255,
};
#endif

View File

@@ -0,0 +1,303 @@
/*
* Copyright © 2017 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS
* AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*/
#include "ac_gpu_info.h"
#include "sid.h"
#include "gfx9d.h"
#include "util/u_math.h"
#include <stdio.h>
#include <xf86drm.h>
#include <amdgpu_drm.h>
#include <amdgpu.h>
#define CIK_TILE_MODE_COLOR_2D 14
#define CIK__GB_TILE_MODE__PIPE_CONFIG(x) (((x) >> 6) & 0x1f)
#define CIK__PIPE_CONFIG__ADDR_SURF_P2 0
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_8x16 4
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_16x16 5
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_16x32 6
#define CIK__PIPE_CONFIG__ADDR_SURF_P4_32x32 7
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x16_8x16 8
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_8x16 9
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_8x16 10
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_16x16 11
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x16 12
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x32 13
#define CIK__PIPE_CONFIG__ADDR_SURF_P8_32x64_32x32 14
#define CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_8X16 16
#define CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_16X16 17
static unsigned cik_get_num_tile_pipes(struct amdgpu_gpu_info *info)
{
unsigned mode2d = info->gb_tile_mode[CIK_TILE_MODE_COLOR_2D];
switch (CIK__GB_TILE_MODE__PIPE_CONFIG(mode2d)) {
case CIK__PIPE_CONFIG__ADDR_SURF_P2:
return 2;
case CIK__PIPE_CONFIG__ADDR_SURF_P4_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_16x32:
case CIK__PIPE_CONFIG__ADDR_SURF_P4_32x32:
return 4;
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x16_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_8x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_16x32_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x16:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x32_16x32:
case CIK__PIPE_CONFIG__ADDR_SURF_P8_32x64_32x32:
return 8;
case CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_8X16:
case CIK__PIPE_CONFIG__ADDR_SURF_P16_32X32_16X16:
return 16;
default:
fprintf(stderr, "Invalid CIK pipe configuration, assuming P2\n");
assert(!"this should never occur");
return 2;
}
}
bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
struct radeon_info *info,
struct amdgpu_gpu_info *amdinfo)
{
struct amdgpu_buffer_size_alignments alignment_info = {};
struct amdgpu_heap_info vram, vram_vis, gtt;
struct drm_amdgpu_info_hw_ip dma = {}, compute = {}, uvd = {}, vce = {}, vcn_dec = {};
uint32_t vce_version = 0, vce_feature = 0, uvd_version = 0, uvd_feature = 0;
uint32_t unused_feature;
int r, i, j;
drmDevicePtr devinfo;
/* Get PCI info. */
r = drmGetDevice2(fd, 0, &devinfo);
if (r) {
fprintf(stderr, "amdgpu: drmGetDevice2 failed.\n");
return false;
}
info->pci_domain = devinfo->businfo.pci->domain;
info->pci_bus = devinfo->businfo.pci->bus;
info->pci_dev = devinfo->businfo.pci->dev;
info->pci_func = devinfo->businfo.pci->func;
drmFreeDevice(&devinfo);
/* Query hardware and driver information. */
r = amdgpu_query_gpu_info(dev, amdinfo);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_gpu_info failed.\n");
return false;
}
r = amdgpu_query_buffer_size_alignment(dev, &alignment_info);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_buffer_size_alignment failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM, 0, &vram);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM,
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED,
&vram_vis);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram_vis) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_GTT, 0, &gtt);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(gtt) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_DMA, 0, &dma);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(dma) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_COMPUTE, 0, &compute);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(compute) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_UVD, 0, &uvd);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(uvd) failed.\n");
return false;
}
if (info->drm_major == 3 && info->drm_minor >= 17) {
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_VCN_DEC, 0, &vcn_dec);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(vcn_dec) failed.\n");
return false;
}
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_ME, 0, 0,
&info->me_fw_version, &unused_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(me) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_PFP, 0, 0,
&info->pfp_fw_version, &unused_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(pfp) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_CE, 0, 0,
&info->ce_fw_version, &unused_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(ce) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_UVD, 0, 0,
&uvd_version, &uvd_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(uvd) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_VCE, 0, &vce);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(vce) failed.\n");
return false;
}
r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_VCE, 0, 0,
&vce_version, &vce_feature);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_firmware_version(vce) failed.\n");
return false;
}
/* Set chip identification. */
info->pci_id = amdinfo->asic_id; /* TODO: is this correct? */
info->vce_harvest_config = amdinfo->vce_harvest_config;
switch (info->pci_id) {
#define CHIPSET(pci_id, name, cfamily) case pci_id: info->family = CHIP_##cfamily; break;
#include "pci_ids/radeonsi_pci_ids.h"
#undef CHIPSET
default:
fprintf(stderr, "amdgpu: Invalid PCI ID.\n");
return false;
}
if (info->family >= CHIP_VEGA10)
info->chip_class = GFX9;
else if (info->family >= CHIP_TONGA)
info->chip_class = VI;
else if (info->family >= CHIP_BONAIRE)
info->chip_class = CIK;
else if (info->family >= CHIP_TAHITI)
info->chip_class = SI;
else {
fprintf(stderr, "amdgpu: Unknown family.\n");
return false;
}
/* Set which chips have dedicated VRAM. */
info->has_dedicated_vram =
!(amdinfo->ids_flags & AMDGPU_IDS_FLAGS_FUSION);
/* Set hardware information. */
info->gart_size = gtt.heap_size;
info->vram_size = vram.heap_size;
info->vram_vis_size = vram_vis.heap_size;
/* The kernel can split large buffers in VRAM but not in GTT, so large
* allocations can fail or cause buffer movement failures in the kernel.
*/
info->max_alloc_size = MIN2(info->vram_size * 0.9, info->gart_size * 0.7);
/* convert the shader clock from KHz to MHz */
info->max_shader_clock = amdinfo->max_engine_clk / 1000;
info->max_se = amdinfo->num_shader_engines;
info->max_sh_per_se = amdinfo->num_shader_arrays_per_engine;
info->has_hw_decode =
(uvd.available_rings != 0) || (vcn_dec.available_rings != 0);
info->uvd_fw_version =
uvd.available_rings ? uvd_version : 0;
info->vce_fw_version =
vce.available_rings ? vce_version : 0;
info->has_userptr = true;
info->num_render_backends = amdinfo->rb_pipes;
info->clock_crystal_freq = amdinfo->gpu_counter_freq;
info->tcc_cache_line_size = 64; /* TC L2 line size on GCN */
if (info->chip_class == GFX9) {
info->num_tile_pipes = 1 << G_0098F8_NUM_PIPES(amdinfo->gb_addr_cfg);
info->pipe_interleave_bytes =
256 << G_0098F8_PIPE_INTERLEAVE_SIZE_GFX9(amdinfo->gb_addr_cfg);
} else {
info->num_tile_pipes = cik_get_num_tile_pipes(amdinfo);
info->pipe_interleave_bytes =
256 << G_0098F8_PIPE_INTERLEAVE_SIZE_GFX6(amdinfo->gb_addr_cfg);
}
info->has_virtual_memory = true;
assert(util_is_power_of_two(dma.available_rings + 1));
assert(util_is_power_of_two(compute.available_rings + 1));
info->num_sdma_rings = util_bitcount(dma.available_rings);
info->num_compute_rings = util_bitcount(compute.available_rings);
/* Get the number of good compute units. */
info->num_good_compute_units = 0;
for (i = 0; i < info->max_se; i++)
for (j = 0; j < info->max_sh_per_se; j++)
info->num_good_compute_units +=
util_bitcount(amdinfo->cu_bitmap[i][j]);
memcpy(info->si_tile_mode_array, amdinfo->gb_tile_mode,
sizeof(amdinfo->gb_tile_mode));
info->enabled_rb_mask = amdinfo->enabled_rb_pipes_mask;
memcpy(info->cik_macrotile_mode_array, amdinfo->gb_macro_tile_mode,
sizeof(amdinfo->gb_macro_tile_mode));
info->pte_fragment_size = alignment_info.size_local;
info->gart_page_size = alignment_info.size_remote;
if (info->chip_class == SI)
info->gfx_ib_pad_with_type2 = TRUE;
return true;
}

View File

@@ -0,0 +1,111 @@
/*
* Copyright © 2017 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS
* AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*/
#ifndef AC_GPU_INFO_H
#define AC_GPU_INFO_H
#include <stdint.h>
#include <stdbool.h>
#include "amd_family.h"
#ifdef __cplusplus
extern "C" {
#endif
/* Prior to C11 the following may trigger a typedef redeclaration warning */
typedef struct amdgpu_device *amdgpu_device_handle;
struct amdgpu_gpu_info;
struct radeon_info {
/* PCI info: domain:bus:dev:func */
uint32_t pci_domain;
uint32_t pci_bus;
uint32_t pci_dev;
uint32_t pci_func;
/* Device info. */
uint32_t pci_id;
enum radeon_family family;
enum chip_class chip_class;
uint32_t pte_fragment_size;
uint32_t gart_page_size;
uint64_t gart_size;
uint64_t vram_size;
uint64_t vram_vis_size;
uint64_t max_alloc_size;
uint32_t min_alloc_size;
bool has_dedicated_vram;
bool has_virtual_memory;
bool gfx_ib_pad_with_type2;
bool has_hw_decode;
uint32_t num_sdma_rings;
uint32_t num_compute_rings;
uint32_t uvd_fw_version;
uint32_t vce_fw_version;
uint32_t me_fw_version;
uint32_t pfp_fw_version;
uint32_t ce_fw_version;
uint32_t vce_harvest_config;
uint32_t clock_crystal_freq;
uint32_t tcc_cache_line_size;
/* Kernel info. */
uint32_t drm_major; /* version */
uint32_t drm_minor;
uint32_t drm_patchlevel;
bool has_userptr;
/* Shader cores. */
uint32_t r600_max_quad_pipes; /* wave size / 16 */
uint32_t max_shader_clock;
uint32_t num_good_compute_units;
uint32_t max_se; /* shader engines */
uint32_t max_sh_per_se; /* shader arrays per shader engine */
/* Render backends (color + depth blocks). */
uint32_t r300_num_gb_pipes;
uint32_t r300_num_z_pipes;
uint32_t r600_gb_backend_map; /* R600 harvest config */
bool r600_gb_backend_map_valid;
uint32_t r600_num_banks;
uint32_t num_render_backends;
uint32_t num_tile_pipes; /* pipe count from PIPE_CONFIG */
uint32_t pipe_interleave_bytes;
uint32_t enabled_rb_mask; /* GCN harvest config */
/* Tile modes. */
uint32_t si_tile_mode_array[32];
uint32_t cik_macrotile_mode_array[16];
};
bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
struct radeon_info *info,
struct amdgpu_gpu_info *amdinfo);
#ifdef __cplusplus
}
#endif
#endif /* AC_GPU_INFO_H */

View File

@@ -33,11 +33,13 @@
#include <stdio.h> #include <stdio.h>
#include "ac_llvm_util.h" #include "ac_llvm_util.h"
#include "ac_exp_param.h"
#include "util/bitscan.h" #include "util/bitscan.h"
#include "util/macros.h" #include "util/macros.h"
#include "sid.h" #include "sid.h"
#include "shader_enums.h"
/* Initialize module-independent parts of the context. /* Initialize module-independent parts of the context.
* *
* The caller is responsible for initializing ctx::module and ctx::builder. * The caller is responsible for initializing ctx::module and ctx::builder.
@@ -54,11 +56,20 @@ ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context)
ctx->voidt = LLVMVoidTypeInContext(ctx->context); ctx->voidt = LLVMVoidTypeInContext(ctx->context);
ctx->i1 = LLVMInt1TypeInContext(ctx->context); ctx->i1 = LLVMInt1TypeInContext(ctx->context);
ctx->i8 = LLVMInt8TypeInContext(ctx->context); ctx->i8 = LLVMInt8TypeInContext(ctx->context);
ctx->i16 = LLVMIntTypeInContext(ctx->context, 16);
ctx->i32 = LLVMIntTypeInContext(ctx->context, 32); ctx->i32 = LLVMIntTypeInContext(ctx->context, 32);
ctx->i64 = LLVMIntTypeInContext(ctx->context, 64);
ctx->f16 = LLVMHalfTypeInContext(ctx->context);
ctx->f32 = LLVMFloatTypeInContext(ctx->context); ctx->f32 = LLVMFloatTypeInContext(ctx->context);
ctx->f64 = LLVMDoubleTypeInContext(ctx->context);
ctx->v4i32 = LLVMVectorType(ctx->i32, 4); ctx->v4i32 = LLVMVectorType(ctx->i32, 4);
ctx->v4f32 = LLVMVectorType(ctx->f32, 4); ctx->v4f32 = LLVMVectorType(ctx->f32, 4);
ctx->v16i8 = LLVMVectorType(ctx->i8, 16); ctx->v8i32 = LLVMVectorType(ctx->i32, 8);
ctx->i32_0 = LLVMConstInt(ctx->i32, 0, false);
ctx->i32_1 = LLVMConstInt(ctx->i32, 1, false);
ctx->f32_0 = LLVMConstReal(ctx->f32, 0.0);
ctx->f32_1 = LLVMConstReal(ctx->f32, 1.0);
ctx->range_md_kind = LLVMGetMDKindIDInContext(ctx->context, ctx->range_md_kind = LLVMGetMDKindIDInContext(ctx->context,
"range", 5); "range", 5);
@@ -231,42 +242,16 @@ build_cube_intrinsic(struct ac_llvm_context *ctx,
LLVMValueRef in[3], LLVMValueRef in[3],
struct cube_selection_coords *out) struct cube_selection_coords *out)
{ {
LLVMBuilderRef builder = ctx->builder; LLVMTypeRef f32 = ctx->f32;
if (HAVE_LLVM >= 0x0309) { out->stc[1] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubetc",
LLVMTypeRef f32 = ctx->f32; f32, in, 3, AC_FUNC_ATTR_READNONE);
out->stc[0] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubesc",
out->stc[1] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubetc", f32, in, 3, AC_FUNC_ATTR_READNONE);
f32, in, 3, AC_FUNC_ATTR_READNONE); out->ma = ac_build_intrinsic(ctx, "llvm.amdgcn.cubema",
out->stc[0] = ac_build_intrinsic(ctx, "llvm.amdgcn.cubesc", f32, in, 3, AC_FUNC_ATTR_READNONE);
f32, in, 3, AC_FUNC_ATTR_READNONE); out->id = ac_build_intrinsic(ctx, "llvm.amdgcn.cubeid",
out->ma = ac_build_intrinsic(ctx, "llvm.amdgcn.cubema", f32, in, 3, AC_FUNC_ATTR_READNONE);
f32, in, 3, AC_FUNC_ATTR_READNONE);
out->id = ac_build_intrinsic(ctx, "llvm.amdgcn.cubeid",
f32, in, 3, AC_FUNC_ATTR_READNONE);
} else {
LLVMValueRef c[4] = {
in[0],
in[1],
in[2],
LLVMGetUndef(LLVMTypeOf(in[0]))
};
LLVMValueRef vec = ac_build_gather_values(ctx, c, 4);
LLVMValueRef tmp =
ac_build_intrinsic(ctx, "llvm.AMDGPU.cube",
LLVMTypeOf(vec), &vec, 1,
AC_FUNC_ATTR_READNONE);
out->stc[1] = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 0, 0), "");
out->stc[0] = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 1, 0), "");
out->ma = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 2, 0), "");
out->id = LLVMBuildExtractElement(builder, tmp,
LLVMConstInt(ctx->i32, 3, 0), "");
}
} }
/** /**
@@ -556,7 +541,7 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
bool has_add_tid) bool has_add_tid)
{ {
/* TODO: Fix stores with ADD_TID and remove the "has_add_tid" flag. */ /* TODO: Fix stores with ADD_TID and remove the "has_add_tid" flag. */
if (HAVE_LLVM >= 0x0309 && !has_add_tid) { if (!has_add_tid) {
/* Split 3 channel stores, becase LLVM doesn't support 3-channel /* Split 3 channel stores, becase LLVM doesn't support 3-channel
* intrinsics. */ * intrinsics. */
if (num_channels == 3) { if (num_channels == 3) {
@@ -657,114 +642,89 @@ ac_build_buffer_load(struct ac_llvm_context *ctx,
unsigned inst_offset, unsigned inst_offset,
unsigned glc, unsigned glc,
unsigned slc, unsigned slc,
bool readonly_memory) bool can_speculate,
bool allow_smem)
{ {
LLVMValueRef offset = LLVMConstInt(ctx->i32, inst_offset, 0);
if (voffset)
offset = LLVMBuildAdd(ctx->builder, offset, voffset, "");
if (soffset)
offset = LLVMBuildAdd(ctx->builder, offset, soffset, "");
/* TODO: VI and later generations can use SMEM with GLC=1.*/
if (allow_smem && !glc && !slc) {
assert(vindex == NULL);
LLVMValueRef result[4];
for (int i = 0; i < num_channels; i++) {
if (i) {
offset = LLVMBuildAdd(ctx->builder, offset,
LLVMConstInt(ctx->i32, 4, 0), "");
}
LLVMValueRef args[2] = {rsrc, offset};
result[i] = ac_build_intrinsic(ctx, "llvm.SI.load.const.v4i32",
ctx->f32, args, 2,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_LEGACY);
}
if (num_channels == 1)
return result[0];
if (num_channels == 3)
result[num_channels++] = LLVMGetUndef(ctx->f32);
return ac_build_gather_values(ctx, result, num_channels);
}
unsigned func = CLAMP(num_channels, 1, 3) - 1; unsigned func = CLAMP(num_channels, 1, 3) - 1;
if (HAVE_LLVM >= 0x309) { LLVMValueRef args[] = {
LLVMValueRef args[] = { LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0), offset,
LLVMConstInt(ctx->i32, inst_offset, 0), LLVMConstInt(ctx->i1, glc, 0),
LLVMConstInt(ctx->i1, glc, 0), LLVMConstInt(ctx->i1, slc, 0)
LLVMConstInt(ctx->i1, slc, 0) };
};
LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2), LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2),
ctx->v4f32}; ctx->v4f32};
const char *type_names[] = {"f32", "v2f32", "v4f32"}; const char *type_names[] = {"f32", "v2f32", "v4f32"};
char name[256]; char name[256];
if (voffset) { snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
args[2] = LLVMBuildAdd(ctx->builder, args[2], voffset, type_names[func]);
"");
}
if (soffset) { return ac_build_intrinsic(ctx, name, types[func], args,
args[2] = LLVMBuildAdd(ctx->builder, args[2], soffset, ARRAY_SIZE(args),
""); /* READNONE means writes can't affect it, while
} * READONLY means that writes can affect it. */
can_speculate && HAVE_LLVM >= 0x0400 ?
snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s", AC_FUNC_ATTR_READNONE :
type_names[func]); AC_FUNC_ATTR_READONLY);
return ac_build_intrinsic(ctx, name, types[func], args,
ARRAY_SIZE(args),
/* READNONE means writes can't
* affect it, while READONLY means
* that writes can affect it. */
readonly_memory && HAVE_LLVM >= 0x0400 ?
AC_FUNC_ATTR_READNONE :
AC_FUNC_ATTR_READONLY);
} else {
LLVMValueRef args[] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v16i8, ""),
voffset ? voffset : vindex,
soffset,
LLVMConstInt(ctx->i32, inst_offset, 0),
LLVMConstInt(ctx->i32, voffset ? 1 : 0, 0), // offen
LLVMConstInt(ctx->i32, vindex ? 1 : 0, 0), //idxen
LLVMConstInt(ctx->i32, glc, 0),
LLVMConstInt(ctx->i32, slc, 0),
LLVMConstInt(ctx->i32, 0, 0), // TFE
};
LLVMTypeRef types[] = {ctx->i32, LLVMVectorType(ctx->i32, 2),
ctx->v4i32};
const char *type_names[] = {"i32", "v2i32", "v4i32"};
const char *arg_type = "i32";
char name[256];
if (voffset && vindex) {
LLVMValueRef vaddr[] = {vindex, voffset};
arg_type = "v2i32";
args[1] = ac_build_gather_values(ctx, vaddr, 2);
}
snprintf(name, sizeof(name), "llvm.SI.buffer.load.dword.%s.%s",
type_names[func], arg_type);
return ac_build_intrinsic(ctx, name, types[func], args,
ARRAY_SIZE(args), AC_FUNC_ATTR_READONLY);
}
} }
LLVMValueRef ac_build_buffer_load_format(struct ac_llvm_context *ctx, LLVMValueRef ac_build_buffer_load_format(struct ac_llvm_context *ctx,
LLVMValueRef rsrc, LLVMValueRef rsrc,
LLVMValueRef vindex, LLVMValueRef vindex,
LLVMValueRef voffset, LLVMValueRef voffset,
bool readonly_memory) bool can_speculate)
{ {
if (HAVE_LLVM >= 0x0309) { LLVMValueRef args [] = {
LLVMValueRef args [] = { LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
vindex,
voffset,
LLVMConstInt(ctx->i1, 0, 0), /* glc */
LLVMConstInt(ctx->i1, 0, 0), /* slc */
};
return ac_build_intrinsic(ctx,
"llvm.amdgcn.buffer.load.format.v4f32",
ctx->v4f32, args, ARRAY_SIZE(args),
/* READNONE means writes can't
* affect it, while READONLY means
* that writes can affect it. */
readonly_memory && HAVE_LLVM >= 0x0400 ?
AC_FUNC_ATTR_READNONE :
AC_FUNC_ATTR_READONLY);
}
LLVMValueRef args[] = {
rsrc,
voffset,
vindex, vindex,
voffset,
LLVMConstInt(ctx->i1, 0, 0), /* glc */
LLVMConstInt(ctx->i1, 0, 0), /* slc */
}; };
return ac_build_intrinsic(ctx, "llvm.SI.vs.load.input",
ctx->v4f32, args, 3, return ac_build_intrinsic(ctx,
AC_FUNC_ATTR_READNONE | "llvm.amdgcn.buffer.load.format.v4f32",
AC_FUNC_ATTR_LEGACY); ctx->v4f32, args, ARRAY_SIZE(args),
/* READNONE means writes can't affect it, while
* READONLY means that writes can affect it. */
can_speculate && HAVE_LLVM >= 0x0400 ?
AC_FUNC_ATTR_READNONE :
AC_FUNC_ATTR_READONLY);
} }
/** /**
@@ -1244,3 +1204,265 @@ void ac_get_image_intr_name(const char *base_name,
data_type_name, coords_type_name, rsrc_type_name); data_type_name, coords_type_name, rsrc_type_name);
} }
} }
#define AC_EXP_TARGET (HAVE_LLVM >= 0x0500 ? 0 : 3)
#define AC_EXP_OUT0 (HAVE_LLVM >= 0x0500 ? 2 : 5)
enum ac_ir_type {
AC_IR_UNDEF,
AC_IR_CONST,
AC_IR_VALUE,
};
struct ac_vs_exp_chan
{
LLVMValueRef value;
float const_float;
enum ac_ir_type type;
};
struct ac_vs_exp_inst {
unsigned offset;
LLVMValueRef inst;
struct ac_vs_exp_chan chan[4];
};
struct ac_vs_exports {
unsigned num;
struct ac_vs_exp_inst exp[VARYING_SLOT_MAX];
};
/* Return true if the PARAM export has been eliminated. */
static bool ac_eliminate_const_output(uint8_t *vs_output_param_offset,
uint32_t num_outputs,
struct ac_vs_exp_inst *exp)
{
unsigned i, default_val; /* SPI_PS_INPUT_CNTL_i.DEFAULT_VAL */
bool is_zero[4] = {}, is_one[4] = {};
for (i = 0; i < 4; i++) {
/* It's a constant expression. Undef outputs are eliminated too. */
if (exp->chan[i].type == AC_IR_UNDEF) {
is_zero[i] = true;
is_one[i] = true;
} else if (exp->chan[i].type == AC_IR_CONST) {
if (exp->chan[i].const_float == 0)
is_zero[i] = true;
else if (exp->chan[i].const_float == 1)
is_one[i] = true;
else
return false; /* other constant */
} else
return false;
}
/* Only certain combinations of 0 and 1 can be eliminated. */
if (is_zero[0] && is_zero[1] && is_zero[2])
default_val = is_zero[3] ? 0 : 1;
else if (is_one[0] && is_one[1] && is_one[2])
default_val = is_zero[3] ? 2 : 3;
else
return false;
/* The PARAM export can be represented as DEFAULT_VAL. Kill it. */
LLVMInstructionEraseFromParent(exp->inst);
/* Change OFFSET to DEFAULT_VAL. */
for (i = 0; i < num_outputs; i++) {
if (vs_output_param_offset[i] == exp->offset) {
vs_output_param_offset[i] =
AC_EXP_PARAM_DEFAULT_VAL_0000 + default_val;
break;
}
}
return true;
}
static bool ac_eliminate_duplicated_output(uint8_t *vs_output_param_offset,
uint32_t num_outputs,
struct ac_vs_exports *processed,
struct ac_vs_exp_inst *exp)
{
unsigned p, copy_back_channels = 0;
/* See if the output is already in the list of processed outputs.
* The LLVMValueRef comparison relies on SSA.
*/
for (p = 0; p < processed->num; p++) {
bool different = false;
for (unsigned j = 0; j < 4; j++) {
struct ac_vs_exp_chan *c1 = &processed->exp[p].chan[j];
struct ac_vs_exp_chan *c2 = &exp->chan[j];
/* Treat undef as a match. */
if (c2->type == AC_IR_UNDEF)
continue;
/* If c1 is undef but c2 isn't, we can copy c2 to c1
* and consider the instruction duplicated.
*/
if (c1->type == AC_IR_UNDEF) {
copy_back_channels |= 1 << j;
continue;
}
/* Test whether the channels are not equal. */
if (c1->type != c2->type ||
(c1->type == AC_IR_CONST &&
c1->const_float != c2->const_float) ||
(c1->type == AC_IR_VALUE &&
c1->value != c2->value)) {
different = true;
break;
}
}
if (!different)
break;
copy_back_channels = 0;
}
if (p == processed->num)
return false;
/* If a match was found, but the matching export has undef where the new
* one has a normal value, copy the normal value to the undef channel.
*/
struct ac_vs_exp_inst *match = &processed->exp[p];
while (copy_back_channels) {
unsigned chan = u_bit_scan(&copy_back_channels);
assert(match->chan[chan].type == AC_IR_UNDEF);
LLVMSetOperand(match->inst, AC_EXP_OUT0 + chan,
exp->chan[chan].value);
match->chan[chan] = exp->chan[chan];
}
/* The PARAM export is duplicated. Kill it. */
LLVMInstructionEraseFromParent(exp->inst);
/* Change OFFSET to the matching export. */
for (unsigned i = 0; i < num_outputs; i++) {
if (vs_output_param_offset[i] == exp->offset) {
vs_output_param_offset[i] = match->offset;
break;
}
}
return true;
}
void ac_optimize_vs_outputs(struct ac_llvm_context *ctx,
LLVMValueRef main_fn,
uint8_t *vs_output_param_offset,
uint32_t num_outputs,
uint8_t *num_param_exports)
{
LLVMBasicBlockRef bb;
bool removed_any = false;
struct ac_vs_exports exports;
exports.num = 0;
/* Process all LLVM instructions. */
bb = LLVMGetFirstBasicBlock(main_fn);
while (bb) {
LLVMValueRef inst = LLVMGetFirstInstruction(bb);
while (inst) {
LLVMValueRef cur = inst;
inst = LLVMGetNextInstruction(inst);
struct ac_vs_exp_inst exp;
if (LLVMGetInstructionOpcode(cur) != LLVMCall)
continue;
LLVMValueRef callee = ac_llvm_get_called_value(cur);
if (!ac_llvm_is_function(callee))
continue;
const char *name = LLVMGetValueName(callee);
unsigned num_args = LLVMCountParams(callee);
/* Check if this is an export instruction. */
if ((num_args != 9 && num_args != 8) ||
(strcmp(name, "llvm.SI.export") &&
strcmp(name, "llvm.amdgcn.exp.f32")))
continue;
LLVMValueRef arg = LLVMGetOperand(cur, AC_EXP_TARGET);
unsigned target = LLVMConstIntGetZExtValue(arg);
if (target < V_008DFC_SQ_EXP_PARAM)
continue;
target -= V_008DFC_SQ_EXP_PARAM;
/* Parse the instruction. */
memset(&exp, 0, sizeof(exp));
exp.offset = target;
exp.inst = cur;
for (unsigned i = 0; i < 4; i++) {
LLVMValueRef v = LLVMGetOperand(cur, AC_EXP_OUT0 + i);
exp.chan[i].value = v;
if (LLVMIsUndef(v)) {
exp.chan[i].type = AC_IR_UNDEF;
} else if (LLVMIsAConstantFP(v)) {
LLVMBool loses_info;
exp.chan[i].type = AC_IR_CONST;
exp.chan[i].const_float =
LLVMConstRealGetDouble(v, &loses_info);
} else {
exp.chan[i].type = AC_IR_VALUE;
}
}
/* Eliminate constant and duplicated PARAM exports. */
if (ac_eliminate_const_output(vs_output_param_offset,
num_outputs, &exp) ||
ac_eliminate_duplicated_output(vs_output_param_offset,
num_outputs, &exports,
&exp)) {
removed_any = true;
} else {
exports.exp[exports.num++] = exp;
}
}
bb = LLVMGetNextBasicBlock(bb);
}
/* Remove holes in export memory due to removed PARAM exports.
* This is done by renumbering all PARAM exports.
*/
if (removed_any) {
uint8_t old_offset[VARYING_SLOT_MAX];
unsigned out, i;
/* Make a copy of the offsets. We need the old version while
* we are modifying some of them. */
memcpy(old_offset, vs_output_param_offset,
sizeof(old_offset));
for (i = 0; i < exports.num; i++) {
unsigned offset = exports.exp[i].offset;
/* Update vs_output_param_offset. Multiple outputs can
* have the same offset.
*/
for (out = 0; out < num_outputs; out++) {
if (old_offset[out] == offset)
vs_output_param_offset[out] = i;
}
/* Change the PARAM offset in the instruction. */
LLVMSetOperand(exports.exp[i].inst, AC_EXP_TARGET,
LLVMConstInt(ctx->i32,
V_008DFC_SQ_EXP_PARAM + i, 0));
}
*num_param_exports = exports.num;
}
}

View File

@@ -40,11 +40,20 @@ struct ac_llvm_context {
LLVMTypeRef voidt; LLVMTypeRef voidt;
LLVMTypeRef i1; LLVMTypeRef i1;
LLVMTypeRef i8; LLVMTypeRef i8;
LLVMTypeRef i16;
LLVMTypeRef i32; LLVMTypeRef i32;
LLVMTypeRef i64;
LLVMTypeRef f16;
LLVMTypeRef f32; LLVMTypeRef f32;
LLVMTypeRef f64;
LLVMTypeRef v4i32; LLVMTypeRef v4i32;
LLVMTypeRef v4f32; LLVMTypeRef v4f32;
LLVMTypeRef v16i8; LLVMTypeRef v8i32;
LLVMValueRef i32_0;
LLVMValueRef i32_1;
LLVMValueRef f32_0;
LLVMValueRef f32_1;
unsigned range_md_kind; unsigned range_md_kind;
unsigned invariant_load_md_kind; unsigned invariant_load_md_kind;
@@ -143,13 +152,14 @@ ac_build_buffer_load(struct ac_llvm_context *ctx,
unsigned inst_offset, unsigned inst_offset,
unsigned glc, unsigned glc,
unsigned slc, unsigned slc,
bool readonly_memory); bool can_speculate,
bool allow_smem);
LLVMValueRef ac_build_buffer_load_format(struct ac_llvm_context *ctx, LLVMValueRef ac_build_buffer_load_format(struct ac_llvm_context *ctx,
LLVMValueRef rsrc, LLVMValueRef rsrc,
LLVMValueRef vindex, LLVMValueRef vindex,
LLVMValueRef voffset, LLVMValueRef voffset,
bool readonly_memory); bool can_speculate);
LLVMValueRef LLVMValueRef
ac_get_thread_id(struct ac_llvm_context *ctx); ac_get_thread_id(struct ac_llvm_context *ctx);
@@ -239,6 +249,12 @@ void ac_get_image_intr_name(const char *base_name,
LLVMTypeRef coords_type, LLVMTypeRef coords_type,
LLVMTypeRef rsrc_type, LLVMTypeRef rsrc_type,
char *out_name, unsigned out_len); char *out_name, unsigned out_len);
void ac_optimize_vs_outputs(struct ac_llvm_context *ac,
LLVMValueRef main_fn,
uint8_t *vs_output_param_offset,
uint32_t num_outputs,
uint8_t *num_param_exports);
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif

View File

@@ -34,6 +34,7 @@
#include <llvm/Target/TargetOptions.h> #include <llvm/Target/TargetOptions.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h> #include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/IR/Attributes.h> #include <llvm/IR/Attributes.h>
#include <llvm/IR/CallSite.h>
#if HAVE_LLVM < 0x0500 #if HAVE_LLVM < 0x0500
namespace llvm { namespace llvm {
@@ -44,9 +45,13 @@ typedef AttributeSet AttributeList;
void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes) void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes)
{ {
llvm::Argument *A = llvm::unwrap<llvm::Argument>(val); llvm::Argument *A = llvm::unwrap<llvm::Argument>(val);
#if HAVE_LLVM < 0x0500
llvm::AttrBuilder B; llvm::AttrBuilder B;
B.addDereferenceableAttr(bytes); B.addDereferenceableAttr(bytes);
A->addAttr(llvm::AttributeList::get(A->getContext(), A->getArgNo() + 1, B)); A->addAttr(llvm::AttributeList::get(A->getContext(), A->getArgNo() + 1, B));
#else
A->addAttr(llvm::Attribute::getWithDereferenceableBytes(A->getContext(), bytes));
#endif
} }
bool ac_is_sgpr_param(LLVMValueRef arg) bool ac_is_sgpr_param(LLVMValueRef arg)
@@ -57,3 +62,21 @@ bool ac_is_sgpr_param(LLVMValueRef arg)
return AS.hasAttribute(ArgNo + 1, llvm::Attribute::ByVal) || return AS.hasAttribute(ArgNo + 1, llvm::Attribute::ByVal) ||
AS.hasAttribute(ArgNo + 1, llvm::Attribute::InReg); AS.hasAttribute(ArgNo + 1, llvm::Attribute::InReg);
} }
LLVMValueRef ac_llvm_get_called_value(LLVMValueRef call)
{
#if HAVE_LLVM >= 0x0309
return LLVMGetCalledValue(call);
#else
return llvm::wrap(llvm::CallSite(llvm::unwrap<llvm::Instruction>(call)).getCalledValue());
#endif
}
bool ac_llvm_is_function(LLVMValueRef v)
{
#if HAVE_LLVM >= 0x0309
return LLVMGetValueKind(v) == LLVMFunctionValueKind;
#else
return llvm::isa<llvm::Function>(llvm::unwrap(v));
#endif
}

View File

@@ -105,17 +105,14 @@ static const char *ac_get_llvm_processor_name(enum radeon_family family)
return "fiji"; return "fiji";
case CHIP_STONEY: case CHIP_STONEY:
return "stoney"; return "stoney";
#if HAVE_LLVM == 0x0308
case CHIP_POLARIS10:
return "tonga";
case CHIP_POLARIS11:
return "tonga";
#else
case CHIP_POLARIS10: case CHIP_POLARIS10:
return "polaris10"; return "polaris10";
case CHIP_POLARIS11: case CHIP_POLARIS11:
case CHIP_POLARIS12:
return "polaris11"; return "polaris11";
#endif case CHIP_VEGA10:
case CHIP_RAVEN:
return "gfx900";
default: default:
return ""; return "";
} }
@@ -131,7 +128,7 @@ LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family, bool su
target, target,
triple, triple,
ac_get_llvm_processor_name(family), ac_get_llvm_processor_name(family),
"+DumpCode,+vgpr-spilling", "+DumpCode,+vgpr-spilling,-fp32-denormals,-xnack",
LLVMCodeGenLevelDefault, LLVMCodeGenLevelDefault,
LLVMRelocDefault, LLVMRelocDefault,
LLVMCodeModelDefault); LLVMCodeModelDefault);
@@ -223,3 +220,13 @@ ac_dump_module(LLVMModuleRef module)
fprintf(stderr, "%s", str); fprintf(stderr, "%s", str);
LLVMDisposeMessage(str); LLVMDisposeMessage(str);
} }
void
ac_llvm_add_target_dep_function_attr(LLVMValueRef F,
const char *name, int value)
{
char str[16];
snprintf(str, sizeof(str), "%i", value);
LLVMAddTargetDependentFunctionAttr(F, name, str);
}

View File

@@ -64,6 +64,13 @@ void ac_add_func_attributes(LLVMContextRef ctx, LLVMValueRef function,
unsigned attrib_mask); unsigned attrib_mask);
void ac_dump_module(LLVMModuleRef module); void ac_dump_module(LLVMModuleRef module);
LLVMValueRef ac_llvm_get_called_value(LLVMValueRef call);
bool ac_llvm_is_function(LLVMValueRef v);
void
ac_llvm_add_target_dep_function_attr(LLVMValueRef F,
const char *name, int value);
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif

File diff suppressed because it is too large Load Diff

View File

@@ -29,7 +29,7 @@
#include "llvm-c/TargetMachine.h" #include "llvm-c/TargetMachine.h"
#include "amd_family.h" #include "amd_family.h"
#include "../vulkan/radv_descriptor_set.h" #include "../vulkan/radv_descriptor_set.h"
#include "ac_shader_info.h"
#include "shader_enums.h" #include "shader_enums.h"
struct ac_shader_binary; struct ac_shader_binary;
struct ac_shader_config; struct ac_shader_config;
@@ -41,10 +41,12 @@ struct ac_vs_variant_key {
uint32_t instance_rate_inputs; uint32_t instance_rate_inputs;
uint32_t as_es:1; uint32_t as_es:1;
uint32_t as_ls:1; uint32_t as_ls:1;
uint32_t export_prim_id:1;
}; };
struct ac_tes_variant_key { struct ac_tes_variant_key {
uint32_t as_es:1; uint32_t as_es:1;
uint32_t export_prim_id:1;
}; };
struct ac_tcs_variant_key { struct ac_tcs_variant_key {
@@ -83,7 +85,8 @@ struct ac_userdata_info {
enum ac_ud_index { enum ac_ud_index {
AC_UD_SCRATCH_RING_OFFSETS = 0, AC_UD_SCRATCH_RING_OFFSETS = 0,
AC_UD_PUSH_CONSTANTS = 1, AC_UD_PUSH_CONSTANTS = 1,
AC_UD_SHADER_START = 2, AC_UD_INDIRECT_DESCRIPTOR_SETS = 2,
AC_UD_SHADER_START = 3,
AC_UD_VS_VERTEX_BUFFERS = AC_UD_SHADER_START, AC_UD_VS_VERTEX_BUFFERS = AC_UD_SHADER_START,
AC_UD_VS_BASE_VERTEX_START_INSTANCE, AC_UD_VS_BASE_VERTEX_START_INSTANCE,
AC_UD_VS_LS_TCS_IN_LAYOUT, AC_UD_VS_LS_TCS_IN_LAYOUT,
@@ -120,15 +123,15 @@ struct ac_userdata_locations {
}; };
struct ac_vs_output_info { struct ac_vs_output_info {
uint8_t vs_output_param_offset[VARYING_SLOT_MAX];
uint8_t clip_dist_mask; uint8_t clip_dist_mask;
uint8_t cull_dist_mask; uint8_t cull_dist_mask;
uint8_t param_exports;
bool writes_pointsize; bool writes_pointsize;
bool writes_layer; bool writes_layer;
bool writes_viewport_index; bool writes_viewport_index;
uint32_t prim_id_output; bool export_prim_id;
uint32_t layer_output;
uint32_t export_mask; uint32_t export_mask;
unsigned param_exports;
unsigned pos_exports; unsigned pos_exports;
}; };
@@ -138,10 +141,11 @@ struct ac_es_output_info {
struct ac_shader_variant_info { struct ac_shader_variant_info {
struct ac_userdata_locations user_sgprs_locs; struct ac_userdata_locations user_sgprs_locs;
struct ac_shader_info info;
unsigned num_user_sgprs; unsigned num_user_sgprs;
unsigned num_input_sgprs; unsigned num_input_sgprs;
unsigned num_input_vgprs; unsigned num_input_vgprs;
bool need_indirect_descriptor_sets;
union { union {
struct { struct {
struct ac_vs_output_info outinfo; struct ac_vs_output_info outinfo;
@@ -166,7 +170,6 @@ struct ac_shader_variant_info {
bool force_persample; bool force_persample;
bool prim_id_input; bool prim_id_input;
bool layer_input; bool layer_input;
bool uses_sample_positions;
} fs; } fs;
struct { struct {
unsigned block_size[3]; unsigned block_size[3];
@@ -178,6 +181,7 @@ struct ac_shader_variant_info {
unsigned invocations; unsigned invocations;
unsigned gsvs_vertex_size; unsigned gsvs_vertex_size;
unsigned max_gsvs_emit_size; unsigned max_gsvs_emit_size;
bool uses_prim_id;
} gs; } gs;
struct { struct {
bool uses_prim_id; bool uses_prim_id;

View File

@@ -0,0 +1,127 @@
/*
* Copyright © 2017 Red Hat
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "nir/nir.h"
#include "ac_shader_info.h"
#include "ac_nir_to_llvm.h"
static void mark_sampler_desc(nir_variable *var, struct ac_shader_info *info)
{
info->desc_set_used_mask = (1 << var->data.descriptor_set);
}
static void
gather_intrinsic_info(nir_intrinsic_instr *instr, struct ac_shader_info *info)
{
switch (instr->intrinsic) {
case nir_intrinsic_interp_var_at_sample:
info->ps.needs_sample_positions = true;
break;
case nir_intrinsic_load_draw_id:
info->vs.needs_draw_id = true;
break;
case nir_intrinsic_load_num_work_groups:
info->cs.grid_components_used = instr->num_components;
break;
case nir_intrinsic_vulkan_resource_index:
info->desc_set_used_mask |= (1 << nir_intrinsic_desc_set(instr));
break;
case nir_intrinsic_image_load:
case nir_intrinsic_image_store:
case nir_intrinsic_image_atomic_add:
case nir_intrinsic_image_atomic_min:
case nir_intrinsic_image_atomic_max:
case nir_intrinsic_image_atomic_and:
case nir_intrinsic_image_atomic_or:
case nir_intrinsic_image_atomic_xor:
case nir_intrinsic_image_atomic_exchange:
case nir_intrinsic_image_atomic_comp_swap:
case nir_intrinsic_image_size:
mark_sampler_desc(instr->variables[0]->var, info);
break;
default:
break;
}
}
static void
gather_tex_info(nir_tex_instr *instr, struct ac_shader_info *info)
{
if (instr->sampler)
mark_sampler_desc(instr->sampler->var, info);
if (instr->texture)
mark_sampler_desc(instr->texture->var, info);
}
static void
gather_info_block(nir_block *block, struct ac_shader_info *info)
{
nir_foreach_instr(instr, block) {
switch (instr->type) {
case nir_instr_type_intrinsic:
gather_intrinsic_info(nir_instr_as_intrinsic(instr), info);
break;
case nir_instr_type_tex:
gather_tex_info(nir_instr_as_tex(instr), info);
break;
default:
break;
}
}
}
static void
gather_info_input_decl(nir_shader *nir,
const struct ac_nir_compiler_options *options,
nir_variable *var,
struct ac_shader_info *info)
{
switch (nir->stage) {
case MESA_SHADER_VERTEX:
info->vs.has_vertex_buffers = true;
break;
default:
break;
}
}
void
ac_nir_shader_info_pass(struct nir_shader *nir,
const struct ac_nir_compiler_options *options,
struct ac_shader_info *info)
{
struct nir_function *func = (struct nir_function *)exec_list_get_head(&nir->functions);
info->needs_push_constants = true;
if (!options->layout)
info->needs_push_constants = false;
else if (!options->layout->push_constant_size &&
!options->layout->dynamic_offset_count)
info->needs_push_constants = false;
nir_foreach_variable(variable, &nir->inputs)
gather_info_input_decl(nir, options, variable, info);
nir_foreach_block(block, func->impl) {
gather_info_block(block, info);
}
}

View File

@@ -0,0 +1,53 @@
/*
* Copyright © 2017 Red Hat
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#ifndef AC_SHADER_INFO_H
#define AC_SHADER_INFO_H
struct nir_shader;
struct ac_nir_compiler_options;
struct ac_shader_info {
bool needs_push_constants;
uint32_t desc_set_used_mask;
struct {
bool has_vertex_buffers; /* needs vertex buffers and base/start */
bool needs_draw_id;
} vs;
struct {
bool needs_sample_positions;
} ps;
struct {
uint8_t grid_components_used;
} cs;
};
/* A NIR pass to gather all the info needed to optimise the allocation patterns
* for the RADV user sgprs
*/
void
ac_nir_shader_info_pass(struct nir_shader *nir,
const struct ac_nir_compiler_options *options,
struct ac_shader_info *info);
#endif

1087
src/amd/common/ac_surface.c Normal file

File diff suppressed because it is too large Load Diff

220
src/amd/common/ac_surface.h Normal file
View File

@@ -0,0 +1,220 @@
/*
* Copyright © 2017 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS
* AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*/
#ifndef AC_SURFACE_H
#define AC_SURFACE_H
#include <stdint.h>
#include "amd_family.h"
#ifdef __cplusplus
extern "C" {
#endif
/* Forward declarations. */
typedef void* ADDR_HANDLE;
struct amdgpu_gpu_info;
struct radeon_info;
#define RADEON_SURF_MAX_LEVELS 15
enum radeon_surf_mode {
RADEON_SURF_MODE_LINEAR_ALIGNED = 1,
RADEON_SURF_MODE_1D = 2,
RADEON_SURF_MODE_2D = 3,
};
/* These are defined exactly like GB_TILE_MODEn.MICRO_TILE_MODE_NEW. */
enum radeon_micro_mode {
RADEON_MICRO_MODE_DISPLAY = 0,
RADEON_MICRO_MODE_THIN = 1,
RADEON_MICRO_MODE_DEPTH = 2,
RADEON_MICRO_MODE_ROTATED = 3,
};
/* the first 16 bits are reserved for libdrm_radeon, don't use them */
#define RADEON_SURF_SCANOUT (1 << 16)
#define RADEON_SURF_ZBUFFER (1 << 17)
#define RADEON_SURF_SBUFFER (1 << 18)
#define RADEON_SURF_Z_OR_SBUFFER (RADEON_SURF_ZBUFFER | RADEON_SURF_SBUFFER)
/* bits 19 and 20 are reserved for libdrm_radeon, don't use them */
#define RADEON_SURF_HAS_TILE_MODE_INDEX (1 << 20)
#define RADEON_SURF_FMASK (1 << 21)
#define RADEON_SURF_DISABLE_DCC (1 << 22)
#define RADEON_SURF_TC_COMPATIBLE_HTILE (1 << 23)
#define RADEON_SURF_IMPORTED (1 << 24)
#define RADEON_SURF_OPTIMIZE_FOR_SPACE (1 << 25)
struct legacy_surf_level {
uint64_t offset;
uint64_t slice_size;
uint64_t dcc_offset;
uint64_t dcc_fast_clear_size;
uint16_t nblk_x;
uint16_t nblk_y;
enum radeon_surf_mode mode;
};
struct legacy_surf_layout {
unsigned bankw:4; /* max 8 */
unsigned bankh:4; /* max 8 */
unsigned mtilea:4; /* max 8 */
unsigned tile_split:13; /* max 4K */
unsigned stencil_tile_split:13; /* max 4K */
unsigned pipe_config:5; /* max 17 */
unsigned num_banks:5; /* max 16 */
unsigned macro_tile_index:4; /* max 15 */
/* Whether the depth miptree or stencil miptree as used by the DB are
* adjusted from their TC compatible form to ensure depth/stencil
* compatibility. If either is true, the corresponding plane cannot be
* sampled from.
*/
unsigned depth_adjusted:1;
unsigned stencil_adjusted:1;
struct legacy_surf_level level[RADEON_SURF_MAX_LEVELS];
struct legacy_surf_level stencil_level[RADEON_SURF_MAX_LEVELS];
uint8_t tiling_index[RADEON_SURF_MAX_LEVELS];
uint8_t stencil_tiling_index[RADEON_SURF_MAX_LEVELS];
};
/* Same as addrlib - AddrResourceType. */
enum gfx9_resource_type {
RADEON_RESOURCE_1D = 0,
RADEON_RESOURCE_2D,
RADEON_RESOURCE_3D,
};
struct gfx9_surf_flags {
uint16_t swizzle_mode; /* tile mode */
uint16_t epitch; /* (pitch - 1) or (height - 1) */
};
struct gfx9_surf_meta_flags {
unsigned rb_aligned:1; /* optimal for RBs */
unsigned pipe_aligned:1; /* optimal for TC */
};
struct gfx9_surf_layout {
struct gfx9_surf_flags surf; /* color or depth surface */
struct gfx9_surf_flags fmask; /* not added to surf_size */
struct gfx9_surf_flags stencil; /* added to surf_size, use stencil_offset */
struct gfx9_surf_meta_flags dcc; /* metadata of color */
struct gfx9_surf_meta_flags htile; /* metadata of depth and stencil */
struct gfx9_surf_meta_flags cmask; /* metadata of fmask */
enum gfx9_resource_type resource_type; /* 1D, 2D or 3D */
uint64_t surf_offset; /* 0 unless imported with an offset */
/* The size of the 2D plane containing all mipmap levels. */
uint64_t surf_slice_size;
uint16_t surf_pitch; /* in blocks */
uint16_t surf_height;
/* Mipmap level offset within the slice in bytes. Only valid for LINEAR. */
uint32_t offset[RADEON_SURF_MAX_LEVELS];
uint16_t dcc_pitch_max; /* (mip chain pitch - 1) */
uint64_t stencil_offset; /* separate stencil */
uint64_t fmask_size;
uint64_t cmask_size;
uint32_t fmask_alignment;
uint32_t cmask_alignment;
};
struct radeon_surf {
/* Format properties. */
unsigned blk_w:4;
unsigned blk_h:4;
unsigned bpe:5;
/* Number of mipmap levels where DCC is enabled starting from level 0.
* Non-zero levels may be disabled due to alignment constraints, but not
* the first level.
*/
unsigned num_dcc_levels:4;
unsigned is_linear:1;
/* Displayable, thin, depth, rotated. AKA D,S,Z,R swizzle modes. */
unsigned micro_tile_mode:3;
uint32_t flags;
/* These are return values. Some of them can be set by the caller, but
* they will be treated as hints (e.g. bankw, bankh) and might be
* changed by the calculator.
*/
uint64_t surf_size;
uint64_t dcc_size;
uint64_t htile_size;
uint32_t htile_slice_size;
uint32_t surf_alignment;
uint32_t dcc_alignment;
uint32_t htile_alignment;
union {
/* R600-VI return values.
*
* Some of them can be set by the caller if certain parameters are
* desirable. The allocator will try to obey them.
*/
struct legacy_surf_layout legacy;
/* GFX9+ return values. */
struct gfx9_surf_layout gfx9;
} u;
};
struct ac_surf_info {
uint32_t width;
uint32_t height;
uint32_t depth;
uint8_t samples;
uint8_t levels;
uint16_t array_size;
};
struct ac_surf_config {
struct ac_surf_info info;
unsigned is_3d : 1;
unsigned is_cube : 1;
};
ADDR_HANDLE amdgpu_addr_create(const struct radeon_info *info,
const struct amdgpu_gpu_info *amdinfo);
int ac_compute_surface(ADDR_HANDLE addrlib, const struct radeon_info *info,
const struct ac_surf_config * config,
enum radeon_surf_mode mode,
struct radeon_surf *surf);
#ifdef __cplusplus
}
#endif
#endif /* AC_SURFACE_H */

View File

@@ -93,6 +93,7 @@ enum radeon_family {
CHIP_POLARIS11, CHIP_POLARIS11,
CHIP_POLARIS12, CHIP_POLARIS12,
CHIP_VEGA10, CHIP_VEGA10,
CHIP_RAVEN,
CHIP_LAST, CHIP_LAST,
}; };

View File

@@ -36,7 +36,7 @@
// Gets bits for specified mask from specified src packed instance. // Gets bits for specified mask from specified src packed instance.
#define AMD_HSA_BITS_GET(src, mask) \ #define AMD_HSA_BITS_GET(src, mask) \
((src & mask) >> mask ## _SHIFT) \ ((src & mask) >> mask ## _SHIFT)
/* Every amd_*_code_t has the following properties, which are composed of /* Every amd_*_code_t has the following properties, which are composed of
* a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*), * a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*),

View File

@@ -49,6 +49,7 @@ enum {
FAMILY_CZ, FAMILY_CZ,
FAMILY_PI, FAMILY_PI,
FAMILY_AI, FAMILY_AI,
FAMILY_RV,
FAMILY_LAST, FAMILY_LAST,
}; };
@@ -185,4 +186,13 @@ enum {
#define ASICREV_IS_VEGA10_P(eChipRev) \ #define ASICREV_IS_VEGA10_P(eChipRev) \
((eChipRev) >= AI_VEGA10_P_A0 && (eChipRev) < AI_UNKNOWN) ((eChipRev) >= AI_VEGA10_P_A0 && (eChipRev) < AI_UNKNOWN)
/* RV specific rev IDs */
enum {
RAVEN_A0 = 0x01,
RAVEN_UNKNOWN = 0xFF
};
#define ASICREV_IS_RAVEN(eChipRev) \
((eChipRev) >= RAVEN_A0 && (eChipRev) < RAVEN_UNKNOWN)
#endif /* AMDGPU_ID_H */ #endif /* AMDGPU_ID_H */

View File

@@ -1345,8 +1345,8 @@
#define V_008F14_IMG_DATA_FORMAT_RESERVED_56 0x38 #define V_008F14_IMG_DATA_FORMAT_RESERVED_56 0x38
#define V_008F14_IMG_DATA_FORMAT_4_4 0x39 #define V_008F14_IMG_DATA_FORMAT_4_4 0x39
#define V_008F14_IMG_DATA_FORMAT_6_5_5 0x3A #define V_008F14_IMG_DATA_FORMAT_6_5_5 0x3A
#define V_008F14_IMG_DATA_S8_16 0x3B #define V_008F14_IMG_DATA_FORMAT_S8_16 0x3B
#define V_008F14_IMG_DATA_S8_32 0x3C #define V_008F14_IMG_DATA_FORMAT_S8_32 0x3C
#define V_008F14_IMG_DATA_FORMAT_8_AS_32 0x3D #define V_008F14_IMG_DATA_FORMAT_8_AS_32 0x3D
#define V_008F14_IMG_DATA_FORMAT_8_AS_32_32 0x3E #define V_008F14_IMG_DATA_FORMAT_8_AS_32_32 0x3E
#define V_008F14_IMG_DATA_FORMAT_32_AS_32_32_32_32 0x3F #define V_008F14_IMG_DATA_FORMAT_32_AS_32_32_32_32 0x3F
@@ -4074,6 +4074,10 @@
#define S_028060_PUNCHOUT_MODE(x) (((unsigned)(x) & 0x03) << 0) #define S_028060_PUNCHOUT_MODE(x) (((unsigned)(x) & 0x03) << 0)
#define G_028060_PUNCHOUT_MODE(x) (((x) >> 0) & 0x03) #define G_028060_PUNCHOUT_MODE(x) (((x) >> 0) & 0x03)
#define C_028060_PUNCHOUT_MODE 0xFFFFFFFC #define C_028060_PUNCHOUT_MODE 0xFFFFFFFC
#define V_028060_AUTO 0
#define V_028060_FORCE_ON 1
#define V_028060_FORCE_OFF 2
#define V_028060_RESERVED 3
#define S_028060_POPS_DRAIN_PS_ON_OVERLAP(x) (((unsigned)(x) & 0x1) << 2) #define S_028060_POPS_DRAIN_PS_ON_OVERLAP(x) (((unsigned)(x) & 0x1) << 2)
#define G_028060_POPS_DRAIN_PS_ON_OVERLAP(x) (((x) >> 2) & 0x1) #define G_028060_POPS_DRAIN_PS_ON_OVERLAP(x) (((x) >> 2) & 0x1)
#define C_028060_POPS_DRAIN_PS_ON_OVERLAP 0xFFFFFFFB #define C_028060_POPS_DRAIN_PS_ON_OVERLAP 0xFFFFFFFB

View File

@@ -54,6 +54,17 @@
#define PKT3_WAIT_REG_MEM 0x3C #define PKT3_WAIT_REG_MEM 0x3C
#define WAIT_REG_MEM_EQUAL 3 #define WAIT_REG_MEM_EQUAL 3
#define WAIT_REG_MEM_MEM_SPACE(x) (((unsigned)(x) & 0x3) << 4) #define WAIT_REG_MEM_MEM_SPACE(x) (((unsigned)(x) & 0x3) << 4)
#define PKT3_COPY_DATA 0x40
#define COPY_DATA_SRC_SEL(x) ((x) & 0xf)
#define COPY_DATA_REG 0
#define COPY_DATA_MEM 1
#define COPY_DATA_PERF 4
#define COPY_DATA_IMM 5
#define COPY_DATA_TIMESTAMP 9
#define COPY_DATA_DST_SEL(x) (((unsigned)(x) & 0xf) << 8)
#define COPY_DATA_MEM_ASYNC 5
#define COPY_DATA_COUNT_SEL (1 << 16)
#define COPY_DATA_WR_CONFIRM (1 << 20)
#define PKT3_EVENT_WRITE 0x46 #define PKT3_EVENT_WRITE 0x46
#define PKT3_EVENT_WRITE_EOP 0x47 #define PKT3_EVENT_WRITE_EOP 0x47
#define EOP_DATA_SEL(x) ((x) << 29) #define EOP_DATA_SEL(x) ((x) << 29)

View File

@@ -154,6 +154,7 @@
#define COPY_DATA_MEM 1 #define COPY_DATA_MEM 1
#define COPY_DATA_PERF 4 #define COPY_DATA_PERF 4
#define COPY_DATA_IMM 5 #define COPY_DATA_IMM 5
#define COPY_DATA_TIMESTAMP 9
#define COPY_DATA_DST_SEL(x) (((unsigned)(x) & 0xf) << 8) #define COPY_DATA_DST_SEL(x) (((unsigned)(x) & 0xf) << 8)
#define COPY_DATA_COUNT_SEL (1 << 16) #define COPY_DATA_COUNT_SEL (1 << 16)
#define COPY_DATA_WR_CONFIRM (1 << 20) #define COPY_DATA_WR_CONFIRM (1 << 20)
@@ -169,7 +170,7 @@
*/ */
/* fix CP DMA before uncommenting: */ /* fix CP DMA before uncommenting: */
/*#define PKT3_EVENT_WRITE_EOS 0x48*/ /* not on GFX9 */ /*#define PKT3_EVENT_WRITE_EOS 0x48*/ /* not on GFX9 */
#define PKT3_RELEASE_MEM 0x49 /* GFX9+ (any ring) or GFX8 (compute ring only) */ #define PKT3_RELEASE_MEM 0x49 /* GFX9+ [any ring] or GFX8 [compute ring only] */
#define PKT3_ONE_REG_WRITE 0x57 /* not on CIK */ #define PKT3_ONE_REG_WRITE 0x57 /* not on CIK */
#define PKT3_ACQUIRE_MEM 0x58 /* new for CIK */ #define PKT3_ACQUIRE_MEM 0x58 /* new for CIK */
#define PKT3_SET_CONFIG_REG 0x68 #define PKT3_SET_CONFIG_REG 0x68
@@ -279,6 +280,7 @@
#define S_500_DSL_SEL(x) (((unsigned)(x) & 0x3) << 20) #define S_500_DSL_SEL(x) (((unsigned)(x) & 0x3) << 20)
#define V_500_DST_ADDR 0 #define V_500_DST_ADDR 0
#define V_500_GDS 1 /* program DAS to 1 as well */ #define V_500_GDS 1 /* program DAS to 1 as well */
#define V_500_NOWHERE 2 /* new for GFX9 */
#define V_500_DST_ADDR_TC_L2 3 /* new for CIK */ #define V_500_DST_ADDR_TC_L2 3 /* new for CIK */
#define S_500_ENGINE(x) ((x) & 0x1) #define S_500_ENGINE(x) ((x) & 0x1)
#define V_500_ME 0 #define V_500_ME 0
@@ -9094,5 +9096,18 @@
#define CIK_SDMA_PACKET_SRBM_WRITE 0xe #define CIK_SDMA_PACKET_SRBM_WRITE 0xe
#define CIK_SDMA_COPY_MAX_SIZE 0x3fffe0 #define CIK_SDMA_COPY_MAX_SIZE 0x3fffe0
enum amd_cmp_class_flags {
S_NAN = 1 << 0, // Signaling NaN
Q_NAN = 1 << 1, // Quiet NaN
N_INFINITY = 1 << 2, // Negative infinity
N_NORMAL = 1 << 3, // Negative normal
N_SUBNORMAL = 1 << 4, // Negative subnormal
N_ZERO = 1 << 5, // Negative zero
P_ZERO = 1 << 6, // Positive zero
P_SUBNORMAL = 1 << 7, // Positive subnormal
P_NORMAL = 1 << 8, // Positive normal
P_INFINITY = 1 << 9 // Positive infinity
};
#endif /* _SID_H */ #endif /* _SID_H */

View File

@@ -110,7 +110,7 @@ class IntTable:
[static] const typename name[] = { ... }; [static] const typename name[] = { ... };
to filp. to filp.
""" """
idxs = sorted(self.idxs) + [-1] idxs = sorted(self.idxs) + [len(self.table)]
fragments = [ fragments = [
('\t/* %s */ %s' % ( ('\t/* %s */ %s' % (

View File

@@ -59,8 +59,22 @@ VULKAN_SOURCES = \
$(VULKAN_GENERATED_FILES) \ $(VULKAN_GENERATED_FILES) \
$(VULKAN_FILES) $(VULKAN_FILES)
VULKAN_LIB_DEPS = VULKAN_LIB_DEPS = \
libvulkan_common.la \
$(top_builddir)/src/vulkan/libvulkan_util.la \
$(top_builddir)/src/vulkan/libvulkan_wsi.la \
$(top_builddir)/src/amd/common/libamd_common.la \
$(top_builddir)/src/amd/addrlib/libamdgpu_addrlib.la \
$(top_builddir)/src/compiler/nir/libnir.la \
$(top_builddir)/src/util/libmesautil.la \
$(LLVM_LIBS) \
$(LIBELF_LIBS) \
$(PTHREAD_LIBS) \
$(AMDGPU_LIBS) \
$(LIBDRM_LIBS) \
$(PTHREAD_LIBS) \
$(DLOPEN_LIBS) \
-lm
if HAVE_PLATFORM_X11 if HAVE_PLATFORM_X11
AM_CPPFLAGS += \ AM_CPPFLAGS += \
@@ -70,8 +84,7 @@ AM_CPPFLAGS += \
VULKAN_SOURCES += $(VULKAN_WSI_X11_FILES) VULKAN_SOURCES += $(VULKAN_WSI_X11_FILES)
# FIXME: Use pkg-config for X11-xcb ldflags. VULKAN_LIB_DEPS += $(XCB_DRI3_LIBS)
VULKAN_LIB_DEPS += $(XCB_DRI3_LIBS) -lX11-xcb
endif endif
@@ -89,23 +102,6 @@ endif
noinst_LTLIBRARIES = libvulkan_common.la noinst_LTLIBRARIES = libvulkan_common.la
libvulkan_common_la_SOURCES = $(VULKAN_SOURCES) libvulkan_common_la_SOURCES = $(VULKAN_SOURCES)
VULKAN_LIB_DEPS += \
libvulkan_common.la \
$(top_builddir)/src/vulkan/libvulkan_util.la \
$(top_builddir)/src/vulkan/libvulkan_wsi.la \
$(top_builddir)/src/amd/common/libamd_common.la \
$(top_builddir)/src/amd/addrlib/libamdgpu_addrlib.la \
$(top_builddir)/src/compiler/nir/libnir.la \
$(top_builddir)/src/util/libmesautil.la \
$(LLVM_LIBS) \
$(LIBELF_LIBS) \
$(PTHREAD_LIBS) \
$(AMDGPU_LIBS) \
$(LIBDRM_LIBS) \
$(PTHREAD_LIBS) \
$(DLOPEN_LIBS) \
-lm
nodist_EXTRA_libvulkan_radeon_la_SOURCES = dummy.cpp nodist_EXTRA_libvulkan_radeon_la_SOURCES = dummy.cpp
libvulkan_radeon_la_SOURCES = $(VULKAN_GEM_FILES) libvulkan_radeon_la_SOURCES = $(VULKAN_GEM_FILES)

View File

@@ -51,6 +51,7 @@ VULKAN_FILES := \
radv_meta_fast_clear.c \ radv_meta_fast_clear.c \
radv_meta_resolve.c \ radv_meta_resolve.c \
radv_meta_resolve_cs.c \ radv_meta_resolve_cs.c \
radv_meta_resolve_fs.c \
radv_pass.c \ radv_pass.c \
radv_pipeline.c \ radv_pipeline.c \
radv_pipeline_cache.c \ radv_pipeline_cache.c \

File diff suppressed because it is too large Load Diff

View File

@@ -37,4 +37,7 @@ enum {
RADV_DEBUG_NO_IBS = 0x200, RADV_DEBUG_NO_IBS = 0x200,
}; };
enum {
RADV_PERFTEST_BATCHCHAIN = 0x1,
};
#endif #endif

View File

@@ -77,6 +77,7 @@ VkResult radv_CreateDescriptorSetLayout(
const VkDescriptorSetLayoutBinding *binding = &pCreateInfo->pBindings[j]; const VkDescriptorSetLayoutBinding *binding = &pCreateInfo->pBindings[j];
uint32_t b = binding->binding; uint32_t b = binding->binding;
uint32_t alignment; uint32_t alignment;
unsigned binding_buffer_count = 0;
switch (binding->descriptorType) { switch (binding->descriptorType) {
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC: case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
@@ -85,7 +86,7 @@ VkResult radv_CreateDescriptorSetLayout(
set_layout->binding[b].dynamic_offset_count = 1; set_layout->binding[b].dynamic_offset_count = 1;
set_layout->dynamic_shader_stages |= binding->stageFlags; set_layout->dynamic_shader_stages |= binding->stageFlags;
set_layout->binding[b].size = 0; set_layout->binding[b].size = 0;
set_layout->binding[b].buffer_count = 1; binding_buffer_count = 1;
alignment = 1; alignment = 1;
break; break;
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER: case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
@@ -93,7 +94,7 @@ VkResult radv_CreateDescriptorSetLayout(
case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER: case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:
case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER: case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:
set_layout->binding[b].size = 16; set_layout->binding[b].size = 16;
set_layout->binding[b].buffer_count = 1; binding_buffer_count = 1;
alignment = 16; alignment = 16;
break; break;
case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE: case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
@@ -101,13 +102,13 @@ VkResult radv_CreateDescriptorSetLayout(
case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT: case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:
/* main descriptor + fmask descriptor */ /* main descriptor + fmask descriptor */
set_layout->binding[b].size = 64; set_layout->binding[b].size = 64;
set_layout->binding[b].buffer_count = 1; binding_buffer_count = 1;
alignment = 32; alignment = 32;
break; break;
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER: case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
/* main descriptor + fmask descriptor + sampler */ /* main descriptor + fmask descriptor + sampler */
set_layout->binding[b].size = 96; set_layout->binding[b].size = 96;
set_layout->binding[b].buffer_count = 1; binding_buffer_count = 1;
alignment = 32; alignment = 32;
break; break;
case VK_DESCRIPTOR_TYPE_SAMPLER: case VK_DESCRIPTOR_TYPE_SAMPLER:
@@ -150,7 +151,7 @@ VkResult radv_CreateDescriptorSetLayout(
} }
set_layout->size += binding->descriptorCount * set_layout->binding[b].size; set_layout->size += binding->descriptorCount * set_layout->binding[b].size;
buffer_count += binding->descriptorCount * set_layout->binding[b].buffer_count; buffer_count += binding->descriptorCount * binding_buffer_count;
dynamic_offset_count += binding->descriptorCount * dynamic_offset_count += binding->descriptorCount *
set_layout->binding[b].dynamic_offset_count; set_layout->binding[b].dynamic_offset_count;
set_layout->shader_stages |= binding->stageFlags; set_layout->shader_stages |= binding->stageFlags;
@@ -261,26 +262,29 @@ radv_descriptor_set_create(struct radv_device *device,
struct radv_descriptor_set **out_set) struct radv_descriptor_set **out_set)
{ {
struct radv_descriptor_set *set; struct radv_descriptor_set *set;
unsigned mem_size = sizeof(struct radv_descriptor_set) + unsigned range_offset = sizeof(struct radv_descriptor_set) +
sizeof(struct radeon_winsys_bo *) * layout->buffer_count; sizeof(struct radeon_winsys_bo *) * layout->buffer_count;
set = vk_alloc2(&device->alloc, NULL, mem_size, 8, unsigned mem_size = range_offset +
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT); sizeof(struct radv_descriptor_range) * layout->dynamic_offset_count;
if (!set) if (pool->host_memory_base) {
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); if (pool->host_memory_end - pool->host_memory_ptr < mem_size)
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
set = (struct radv_descriptor_set*)pool->host_memory_ptr;
pool->host_memory_ptr += mem_size;
} else {
set = vk_alloc2(&device->alloc, NULL, mem_size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!set)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
}
memset(set, 0, mem_size); memset(set, 0, mem_size);
if (layout->dynamic_offset_count) { if (layout->dynamic_offset_count) {
unsigned size = sizeof(struct radv_descriptor_range) * set->dynamic_descriptors = (struct radv_descriptor_range*)((uint8_t*)set + range_offset);
layout->dynamic_offset_count;
set->dynamic_descriptors = vk_alloc2(&device->alloc, NULL, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!set->dynamic_descriptors) {
vk_free2(&device->alloc, NULL, set);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
}
} }
set->layout = layout; set->layout = layout;
@@ -297,10 +301,12 @@ radv_descriptor_set_create(struct radv_device *device,
set->va = device->ws->buffer_get_va(set->bo) + pool->current_offset; set->va = device->ws->buffer_get_va(set->bo) + pool->current_offset;
pool->current_offset += layout_size; pool->current_offset += layout_size;
list_addtail(&set->vram_list, &pool->vram_list); list_addtail(&set->vram_list, &pool->vram_list);
} else { } else if (!pool->host_memory_base) {
uint64_t offset = 0; uint64_t offset = 0;
struct list_head *prev = &pool->vram_list; struct list_head *prev = &pool->vram_list;
struct radv_descriptor_set *cur; struct radv_descriptor_set *cur;
assert(!pool->host_memory_base);
LIST_FOR_EACH_ENTRY(cur, &pool->vram_list, vram_list) { LIST_FOR_EACH_ENTRY(cur, &pool->vram_list, vram_list) {
uint64_t start = (uint8_t*)cur->mapped_ptr - pool->mapped_ptr; uint64_t start = (uint8_t*)cur->mapped_ptr - pool->mapped_ptr;
if (start - offset >= layout_size) if (start - offset >= layout_size)
@@ -319,7 +325,8 @@ radv_descriptor_set_create(struct radv_device *device,
set->mapped_ptr = (uint32_t*)(pool->mapped_ptr + offset); set->mapped_ptr = (uint32_t*)(pool->mapped_ptr + offset);
set->va = device->ws->buffer_get_va(set->bo) + offset; set->va = device->ws->buffer_get_va(set->bo) + offset;
list_add(&set->vram_list, prev); list_add(&set->vram_list, prev);
} } else
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
} }
for (unsigned i = 0; i < layout->binding_count; ++i) { for (unsigned i = 0; i < layout->binding_count; ++i) {
@@ -348,10 +355,10 @@ radv_descriptor_set_destroy(struct radv_device *device,
struct radv_descriptor_set *set, struct radv_descriptor_set *set,
bool free_bo) bool free_bo)
{ {
assert(!pool->host_memory_base);
if (free_bo && set->size) if (free_bo && set->size)
list_del(&set->vram_list); list_del(&set->vram_list);
if (set->dynamic_descriptors)
vk_free2(&device->alloc, NULL, set->dynamic_descriptors);
vk_free2(&device->alloc, NULL, set); vk_free2(&device->alloc, NULL, set);
} }
@@ -364,18 +371,17 @@ VkResult radv_CreateDescriptorPool(
RADV_FROM_HANDLE(radv_device, device, _device); RADV_FROM_HANDLE(radv_device, device, _device);
struct radv_descriptor_pool *pool; struct radv_descriptor_pool *pool;
int size = sizeof(struct radv_descriptor_pool); int size = sizeof(struct radv_descriptor_pool);
uint64_t bo_size = 0; uint64_t bo_size = 0, bo_count = 0, range_count = 0;
pool = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!pool)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
memset(pool, 0, sizeof(*pool));
for (unsigned i = 0; i < pCreateInfo->poolSizeCount; ++i) { for (unsigned i = 0; i < pCreateInfo->poolSizeCount; ++i) {
if (pCreateInfo->pPoolSizes[i].type != VK_DESCRIPTOR_TYPE_SAMPLER)
bo_count += pCreateInfo->pPoolSizes[i].descriptorCount;
switch(pCreateInfo->pPoolSizes[i].type) { switch(pCreateInfo->pPoolSizes[i].type) {
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC: case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC: case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
range_count += pCreateInfo->pPoolSizes[i].descriptorCount;
break; break;
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER: case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER: case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
@@ -399,6 +405,26 @@ VkResult radv_CreateDescriptorPool(
} }
} }
if (!(pCreateInfo->flags & VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT)) {
uint64_t host_size = pCreateInfo->maxSets * sizeof(struct radv_descriptor_set);
host_size += sizeof(struct radeon_winsys_bo*) * bo_count;
host_size += sizeof(struct radv_descriptor_range) * range_count;
size += host_size;
}
pool = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!pool)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
memset(pool, 0, sizeof(*pool));
if (!(pCreateInfo->flags & VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT)) {
pool->host_memory_base = (uint8_t*)pool + sizeof(struct radv_descriptor_pool);
pool->host_memory_ptr = pool->host_memory_base;
pool->host_memory_end = (uint8_t*)pool + size;
}
if (bo_size) { if (bo_size) {
pool->bo = device->ws->buffer_create(device->ws, bo_size, pool->bo = device->ws->buffer_create(device->ws, bo_size,
32, RADEON_DOMAIN_VRAM, 0); 32, RADEON_DOMAIN_VRAM, 0);
@@ -422,9 +448,11 @@ void radv_DestroyDescriptorPool(
if (!pool) if (!pool)
return; return;
list_for_each_entry_safe(struct radv_descriptor_set, set, if (!pool->host_memory_base) {
&pool->vram_list, vram_list) { list_for_each_entry_safe(struct radv_descriptor_set, set,
radv_descriptor_set_destroy(device, pool, set, false); &pool->vram_list, vram_list) {
radv_descriptor_set_destroy(device, pool, set, false);
}
} }
if (pool->bo) if (pool->bo)
@@ -440,14 +468,17 @@ VkResult radv_ResetDescriptorPool(
RADV_FROM_HANDLE(radv_device, device, _device); RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_descriptor_pool, pool, descriptorPool); RADV_FROM_HANDLE(radv_descriptor_pool, pool, descriptorPool);
list_for_each_entry_safe(struct radv_descriptor_set, set, if (!pool->host_memory_base) {
&pool->vram_list, vram_list) { list_for_each_entry_safe(struct radv_descriptor_set, set,
radv_descriptor_set_destroy(device, pool, set, false); &pool->vram_list, vram_list) {
radv_descriptor_set_destroy(device, pool, set, false);
}
} }
list_inithead(&pool->vram_list); list_inithead(&pool->vram_list);
pool->current_offset = 0; pool->current_offset = 0;
pool->host_memory_ptr = pool->host_memory_base;
return VK_SUCCESS; return VK_SUCCESS;
} }
@@ -496,7 +527,7 @@ VkResult radv_FreeDescriptorSets(
for (uint32_t i = 0; i < count; i++) { for (uint32_t i = 0; i < count; i++) {
RADV_FROM_HANDLE(radv_descriptor_set, set, pDescriptorSets[i]); RADV_FROM_HANDLE(radv_descriptor_set, set, pDescriptorSets[i]);
if (set) if (set && !pool->host_memory_base)
radv_descriptor_set_destroy(device, pool, set, true); radv_descriptor_set_destroy(device, pool, set, true);
} }
return VK_SUCCESS; return VK_SUCCESS;
@@ -639,7 +670,7 @@ void radv_update_descriptor_sets(
ptr += binding_layout->offset / 4; ptr += binding_layout->offset / 4;
ptr += binding_layout->size * writeset->dstArrayElement / 4; ptr += binding_layout->size * writeset->dstArrayElement / 4;
buffer_list += binding_layout->buffer_offset; buffer_list += binding_layout->buffer_offset;
buffer_list += binding_layout->buffer_count * writeset->dstArrayElement; buffer_list += writeset->dstArrayElement;
for (j = 0; j < writeset->descriptorCount; ++j) { for (j = 0; j < writeset->descriptorCount; ++j) {
switch(writeset->descriptorType) { switch(writeset->descriptorType) {
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC: case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
@@ -690,7 +721,7 @@ void radv_update_descriptor_sets(
break; break;
} }
ptr += binding_layout->size / 4; ptr += binding_layout->size / 4;
buffer_list += binding_layout->buffer_count; ++buffer_list;
} }
} }
@@ -734,8 +765,7 @@ VkResult radv_CreateDescriptorUpdateTemplateKHR(VkDevice _device,
const VkDescriptorUpdateTemplateEntryKHR *entry = &pCreateInfo->pDescriptorUpdateEntries[i]; const VkDescriptorUpdateTemplateEntryKHR *entry = &pCreateInfo->pDescriptorUpdateEntries[i];
const struct radv_descriptor_set_binding_layout *binding_layout = const struct radv_descriptor_set_binding_layout *binding_layout =
set_layout->binding + entry->dstBinding; set_layout->binding + entry->dstBinding;
const uint32_t buffer_offset = binding_layout->buffer_offset + const uint32_t buffer_offset = binding_layout->buffer_offset + entry->dstArrayElement;
binding_layout->buffer_count * entry->dstArrayElement;
const uint32_t *immutable_samplers = NULL; const uint32_t *immutable_samplers = NULL;
uint32_t dst_offset; uint32_t dst_offset;
uint32_t dst_stride; uint32_t dst_stride;
@@ -775,7 +805,6 @@ VkResult radv_CreateDescriptorUpdateTemplateKHR(VkDevice _device,
.dst_offset = dst_offset, .dst_offset = dst_offset,
.dst_stride = dst_stride, .dst_stride = dst_stride,
.buffer_offset = buffer_offset, .buffer_offset = buffer_offset,
.buffer_count = binding_layout->buffer_count,
.has_sampler = !binding_layout->immutable_samplers_offset, .has_sampler = !binding_layout->immutable_samplers_offset,
.immutable_samplers = immutable_samplers .immutable_samplers = immutable_samplers
}; };
@@ -859,7 +888,7 @@ void radv_update_descriptor_set_with_template(struct radv_device *device,
} }
pSrc += templ->entry[i].src_stride; pSrc += templ->entry[i].src_stride;
pDst += templ->entry[i].dst_stride; pDst += templ->entry[i].dst_stride;
buffer_list += templ->entry[i].buffer_count; ++buffer_list;
} }
} }
} }

View File

@@ -26,7 +26,7 @@
#include <vulkan/vulkan.h> #include <vulkan/vulkan.h>
#define MAX_SETS 8 #define MAX_SETS 32
struct radv_descriptor_set_binding_layout { struct radv_descriptor_set_binding_layout {
VkDescriptorType type; VkDescriptorType type;
@@ -38,10 +38,9 @@ struct radv_descriptor_set_binding_layout {
uint32_t buffer_offset; uint32_t buffer_offset;
uint16_t dynamic_offset_offset; uint16_t dynamic_offset_offset;
uint16_t dynamic_offset_count;
/* redundant with the type, each for a single array element */ /* redundant with the type, each for a single array element */
uint32_t size; uint32_t size;
uint32_t buffer_count;
uint16_t dynamic_offset_count;
/* Offset in the radv_descriptor_set_layout of the immutable samplers, or 0 /* Offset in the radv_descriptor_set_layout of the immutable samplers, or 0
* if there are no immutable samplers. */ * if there are no immutable samplers. */

View File

@@ -33,7 +33,7 @@
#include "radv_cs.h" #include "radv_cs.h"
#include "util/disk_cache.h" #include "util/disk_cache.h"
#include "util/strtod.h" #include "util/strtod.h"
#include "util/vk_util.h" #include "vk_util.h"
#include <xf86drm.h> #include <xf86drm.h>
#include <amdgpu.h> #include <amdgpu.h>
#include <amdgpu_drm.h> #include <amdgpu_drm.h>
@@ -42,6 +42,7 @@
#include "ac_llvm_util.h" #include "ac_llvm_util.h"
#include "vk_format.h" #include "vk_format.h"
#include "sid.h" #include "sid.h"
#include "gfx9d.h"
#include "util/debug.h" #include "util/debug.h"
static int static int
@@ -61,6 +62,15 @@ radv_device_get_cache_uuid(enum radeon_family family, void *uuid)
return 0; return 0;
} }
static void
radv_get_device_uuid(drmDevicePtr device, void *uuid) {
memset(uuid, 0, VK_UUID_SIZE);
memcpy((char*)uuid + 0, &device->businfo.pci->domain, 2);
memcpy((char*)uuid + 2, &device->businfo.pci->bus, 1);
memcpy((char*)uuid + 3, &device->businfo.pci->dev, 1);
memcpy((char*)uuid + 4, &device->businfo.pci->func, 1);
}
static const VkExtensionProperties instance_extensions[] = { static const VkExtensionProperties instance_extensions[] = {
{ {
.extensionName = VK_KHR_SURFACE_EXTENSION_NAME, .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
@@ -88,6 +98,10 @@ static const VkExtensionProperties instance_extensions[] = {
.extensionName = VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME, .extensionName = VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME,
.specVersion = 1, .specVersion = 1,
}, },
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME,
.specVersion = 1,
},
}; };
static const VkExtensionProperties common_device_extensions[] = { static const VkExtensionProperties common_device_extensions[] = {
@@ -127,6 +141,14 @@ static const VkExtensionProperties common_device_extensions[] = {
.extensionName = VK_NV_DEDICATED_ALLOCATION_EXTENSION_NAME, .extensionName = VK_NV_DEDICATED_ALLOCATION_EXTENSION_NAME,
.specVersion = 1, .specVersion = 1,
}, },
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_FD_EXTENSION_NAME,
.specVersion = 1,
},
}; };
static VkResult static VkResult
@@ -187,11 +209,40 @@ is_extension_enabled(const VkExtensionProperties *extensions,
return false; return false;
} }
static const char *
get_chip_name(enum radeon_family family)
{
switch (family) {
case CHIP_TAHITI: return "AMD RADV TAHITI";
case CHIP_PITCAIRN: return "AMD RADV PITCAIRN";
case CHIP_VERDE: return "AMD RADV CAPE VERDE";
case CHIP_OLAND: return "AMD RADV OLAND";
case CHIP_HAINAN: return "AMD RADV HAINAN";
case CHIP_BONAIRE: return "AMD RADV BONAIRE";
case CHIP_KAVERI: return "AMD RADV KAVERI";
case CHIP_KABINI: return "AMD RADV KABINI";
case CHIP_HAWAII: return "AMD RADV HAWAII";
case CHIP_MULLINS: return "AMD RADV MULLINS";
case CHIP_TONGA: return "AMD RADV TONGA";
case CHIP_ICELAND: return "AMD RADV ICELAND";
case CHIP_CARRIZO: return "AMD RADV CARRIZO";
case CHIP_FIJI: return "AMD RADV FIJI";
case CHIP_POLARIS10: return "AMD RADV POLARIS10";
case CHIP_POLARIS11: return "AMD RADV POLARIS11";
case CHIP_POLARIS12: return "AMD RADV POLARIS12";
case CHIP_STONEY: return "AMD RADV STONEY";
case CHIP_VEGA10: return "AMD RADV VEGA";
case CHIP_RAVEN: return "AMD RADV RAVEN";
default: return "AMD RADV unknown";
}
}
static VkResult static VkResult
radv_physical_device_init(struct radv_physical_device *device, radv_physical_device_init(struct radv_physical_device *device,
struct radv_instance *instance, struct radv_instance *instance,
const char *path) drmDevicePtr drm_device)
{ {
const char *path = drm_device->nodes[DRM_NODE_RENDER];
VkResult result; VkResult result;
drmVersionPtr version; drmVersionPtr version;
int fd; int fd;
@@ -219,7 +270,8 @@ radv_physical_device_init(struct radv_physical_device *device,
assert(strlen(path) < ARRAY_SIZE(device->path)); assert(strlen(path) < ARRAY_SIZE(device->path));
strncpy(device->path, path, ARRAY_SIZE(device->path)); strncpy(device->path, path, ARRAY_SIZE(device->path));
device->ws = radv_amdgpu_winsys_create(fd, instance->debug_flags); device->ws = radv_amdgpu_winsys_create(fd, instance->debug_flags,
instance->perftest_flags);
if (!device->ws) { if (!device->ws) {
result = VK_ERROR_INCOMPATIBLE_DRIVER; result = VK_ERROR_INCOMPATIBLE_DRIVER;
goto fail; goto fail;
@@ -249,7 +301,15 @@ radv_physical_device_init(struct radv_physical_device *device,
goto fail; goto fail;
fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n"); fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n");
device->name = device->rad_info.name; device->name = get_chip_name(device->rad_info.family);
radv_get_device_uuid(drm_device, device->device_uuid);
if (device->rad_info.family == CHIP_STONEY ||
device->rad_info.chip_class >= GFX9) {
device->has_rbplus = true;
device->rbplus_allowed = device->rad_info.family == CHIP_STONEY;
}
return VK_SUCCESS; return VK_SUCCESS;
@@ -267,7 +327,6 @@ radv_physical_device_finish(struct radv_physical_device *device)
close(device->local_fd); close(device->local_fd);
} }
static void * static void *
default_alloc_func(void *pUserData, size_t size, size_t align, default_alloc_func(void *pUserData, size_t size, size_t align,
VkSystemAllocationScope allocationScope) VkSystemAllocationScope allocationScope)
@@ -309,6 +368,11 @@ static const struct debug_control radv_debug_options[] = {
{NULL, 0} {NULL, 0}
}; };
static const struct debug_control radv_perftest_options[] = {
{"batchchain", RADV_PERFTEST_BATCHCHAIN},
{NULL, 0}
};
VkResult radv_CreateInstance( VkResult radv_CreateInstance(
const VkInstanceCreateInfo* pCreateInfo, const VkInstanceCreateInfo* pCreateInfo,
const VkAllocationCallbacks* pAllocator, const VkAllocationCallbacks* pAllocator,
@@ -366,6 +430,9 @@ VkResult radv_CreateInstance(
instance->debug_flags = parse_debug_string(getenv("RADV_DEBUG"), instance->debug_flags = parse_debug_string(getenv("RADV_DEBUG"),
radv_debug_options); radv_debug_options);
instance->perftest_flags = parse_debug_string(getenv("RADV_PERFTEST"),
radv_perftest_options);
*pInstance = radv_instance_to_handle(instance); *pInstance = radv_instance_to_handle(instance);
return VK_SUCCESS; return VK_SUCCESS;
@@ -401,7 +468,7 @@ radv_enumerate_devices(struct radv_instance *instance)
instance->physicalDeviceCount = 0; instance->physicalDeviceCount = 0;
max_devices = drmGetDevices2(0, devices, sizeof(devices)); max_devices = drmGetDevices2(0, devices, ARRAY_SIZE(devices));
if (max_devices < 1) if (max_devices < 1)
return VK_ERROR_INCOMPATIBLE_DRIVER; return VK_ERROR_INCOMPATIBLE_DRIVER;
@@ -413,13 +480,15 @@ radv_enumerate_devices(struct radv_instance *instance)
result = radv_physical_device_init(instance->physicalDevices + result = radv_physical_device_init(instance->physicalDevices +
instance->physicalDeviceCount, instance->physicalDeviceCount,
instance, instance,
devices[i]->nodes[DRM_NODE_RENDER]); devices[i]);
if (result == VK_SUCCESS) if (result == VK_SUCCESS)
++instance->physicalDeviceCount; ++instance->physicalDeviceCount;
else if (result != VK_ERROR_INCOMPATIBLE_DRIVER) else if (result != VK_ERROR_INCOMPATIBLE_DRIVER)
return result; break;
} }
} }
drmFreeDevices(devices, max_devices);
return result; return result;
} }
@@ -454,8 +523,8 @@ void radv_GetPhysicalDeviceFeatures(
VkPhysicalDevice physicalDevice, VkPhysicalDevice physicalDevice,
VkPhysicalDeviceFeatures* pFeatures) VkPhysicalDeviceFeatures* pFeatures)
{ {
// RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice); RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice);
bool is_gfx9 = pdevice->rad_info.chip_class >= GFX9;
memset(pFeatures, 0, sizeof(*pFeatures)); memset(pFeatures, 0, sizeof(*pFeatures));
*pFeatures = (VkPhysicalDeviceFeatures) { *pFeatures = (VkPhysicalDeviceFeatures) {
@@ -463,8 +532,8 @@ void radv_GetPhysicalDeviceFeatures(
.fullDrawIndexUint32 = true, .fullDrawIndexUint32 = true,
.imageCubeArray = true, .imageCubeArray = true,
.independentBlend = true, .independentBlend = true,
.geometryShader = true, .geometryShader = !is_gfx9,
.tessellationShader = true, .tessellationShader = !is_gfx9,
.sampleRateShading = false, .sampleRateShading = false,
.dualSrcBlend = true, .dualSrcBlend = true,
.logicOp = true, .logicOp = true,
@@ -514,28 +583,6 @@ void radv_GetPhysicalDeviceFeatures2KHR(
return radv_GetPhysicalDeviceFeatures(physicalDevice, &pFeatures->features); return radv_GetPhysicalDeviceFeatures(physicalDevice, &pFeatures->features);
} }
static uint32_t radv_get_driver_version()
{
const char *minor_string = strchr(VERSION, '.');
const char *patch_string = minor_string ? strchr(minor_string + 1, ','): NULL;
int major = atoi(VERSION);
int minor = minor_string ? atoi(minor_string + 1) : 0;
int patch = patch_string ? atoi(patch_string + 1) : 0;
if (strstr(VERSION, "devel")) {
if (patch == 0) {
patch = 99;
if (minor == 0) {
minor = 99;
--major;
} else
--minor;
} else
--patch;
}
uint32_t version = VK_MAKE_VERSION(major, minor, patch);
return version;
}
void radv_GetPhysicalDeviceProperties( void radv_GetPhysicalDeviceProperties(
VkPhysicalDevice physicalDevice, VkPhysicalDevice physicalDevice,
VkPhysicalDeviceProperties* pProperties) VkPhysicalDeviceProperties* pProperties)
@@ -652,7 +699,7 @@ void radv_GetPhysicalDeviceProperties(
.sampledImageStencilSampleCounts = sample_counts, .sampledImageStencilSampleCounts = sample_counts,
.storageImageSampleCounts = VK_SAMPLE_COUNT_1_BIT, .storageImageSampleCounts = VK_SAMPLE_COUNT_1_BIT,
.maxSampleMaskWords = 1, .maxSampleMaskWords = 1,
.timestampComputeAndGraphics = false, .timestampComputeAndGraphics = true,
.timestampPeriod = 1000000.0 / pdevice->rad_info.clock_crystal_freq, .timestampPeriod = 1000000.0 / pdevice->rad_info.clock_crystal_freq,
.maxClipDistances = 8, .maxClipDistances = 8,
.maxCullDistances = 8, .maxCullDistances = 8,
@@ -671,10 +718,10 @@ void radv_GetPhysicalDeviceProperties(
*pProperties = (VkPhysicalDeviceProperties) { *pProperties = (VkPhysicalDeviceProperties) {
.apiVersion = VK_MAKE_VERSION(1, 0, 42), .apiVersion = VK_MAKE_VERSION(1, 0, 42),
.driverVersion = radv_get_driver_version(), .driverVersion = vk_get_driver_version(),
.vendorID = 0x1002, .vendorID = 0x1002,
.deviceID = pdevice->rad_info.pci_id, .deviceID = pdevice->rad_info.pci_id,
.deviceType = VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU, .deviceType = pdevice->rad_info.has_dedicated_vram ? VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU : VK_PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU,
.limits = limits, .limits = limits,
.sparseProperties = {0}, .sparseProperties = {0},
}; };
@@ -687,6 +734,7 @@ void radv_GetPhysicalDeviceProperties2KHR(
VkPhysicalDevice physicalDevice, VkPhysicalDevice physicalDevice,
VkPhysicalDeviceProperties2KHR *pProperties) VkPhysicalDeviceProperties2KHR *pProperties)
{ {
RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice);
radv_GetPhysicalDeviceProperties(physicalDevice, &pProperties->properties); radv_GetPhysicalDeviceProperties(physicalDevice, &pProperties->properties);
vk_foreach_struct(ext, pProperties->pNext) { vk_foreach_struct(ext, pProperties->pNext) {
@@ -697,6 +745,13 @@ void radv_GetPhysicalDeviceProperties2KHR(
properties->maxPushDescriptors = MAX_PUSH_DESCRIPTORS; properties->maxPushDescriptors = MAX_PUSH_DESCRIPTORS;
break; break;
} }
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHX: {
VkPhysicalDeviceIDPropertiesKHX *properties = (VkPhysicalDeviceIDPropertiesKHX*)ext;
radv_device_get_cache_uuid(0, properties->driverUUID);
memcpy(properties->deviceUUID, pdevice->device_uuid, VK_UUID_SIZE);
properties->deviceLUIDValid = false;
break;
}
default: default:
break; break;
} }
@@ -710,7 +765,7 @@ static void radv_get_physical_device_queue_family_properties(
{ {
int num_queue_families = 1; int num_queue_families = 1;
int idx; int idx;
if (pdevice->rad_info.compute_rings > 0 && if (pdevice->rad_info.num_compute_rings > 0 &&
pdevice->rad_info.chip_class >= CIK && pdevice->rad_info.chip_class >= CIK &&
!(pdevice->instance->debug_flags & RADV_DEBUG_NO_COMPUTE_QUEUE)) !(pdevice->instance->debug_flags & RADV_DEBUG_NO_COMPUTE_QUEUE))
num_queue_families++; num_queue_families++;
@@ -737,7 +792,7 @@ static void radv_get_physical_device_queue_family_properties(
idx++; idx++;
} }
if (pdevice->rad_info.compute_rings > 0 && if (pdevice->rad_info.num_compute_rings > 0 &&
pdevice->rad_info.chip_class >= CIK && pdevice->rad_info.chip_class >= CIK &&
!(pdevice->instance->debug_flags & RADV_DEBUG_NO_COMPUTE_QUEUE)) { !(pdevice->instance->debug_flags & RADV_DEBUG_NO_COMPUTE_QUEUE)) {
if (*pCount > idx) { if (*pCount > idx) {
@@ -745,7 +800,7 @@ static void radv_get_physical_device_queue_family_properties(
.queueFlags = VK_QUEUE_COMPUTE_BIT | .queueFlags = VK_QUEUE_COMPUTE_BIT |
VK_QUEUE_TRANSFER_BIT | VK_QUEUE_TRANSFER_BIT |
VK_QUEUE_SPARSE_BINDING_BIT, VK_QUEUE_SPARSE_BINDING_BIT,
.queueCount = pdevice->rad_info.compute_rings, .queueCount = pdevice->rad_info.num_compute_rings,
.timestampValidBits = 64, .timestampValidBits = 64,
.minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 }, .minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 },
}; };
@@ -829,11 +884,11 @@ void radv_GetPhysicalDeviceMemoryProperties(
pMemoryProperties->memoryHeapCount = RADV_MEM_HEAP_COUNT; pMemoryProperties->memoryHeapCount = RADV_MEM_HEAP_COUNT;
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM] = (VkMemoryHeap) { pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM] = (VkMemoryHeap) {
.size = physical_device->rad_info.vram_size - .size = physical_device->rad_info.vram_size -
physical_device->rad_info.visible_vram_size, physical_device->rad_info.vram_vis_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT, .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
}; };
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM_CPU_ACCESS] = (VkMemoryHeap) { pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM_CPU_ACCESS] = (VkMemoryHeap) {
.size = physical_device->rad_info.visible_vram_size, .size = physical_device->rad_info.vram_vis_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT, .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
}; };
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_GTT] = (VkMemoryHeap) { pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_GTT] = (VkMemoryHeap) {
@@ -915,6 +970,9 @@ radv_device_init_gs_info(struct radv_device *device)
case CHIP_FIJI: case CHIP_FIJI:
case CHIP_POLARIS10: case CHIP_POLARIS10:
case CHIP_POLARIS11: case CHIP_POLARIS11:
case CHIP_POLARIS12:
case CHIP_VEGA10:
case CHIP_RAVEN:
device->gs_table_depth = 32; device->gs_table_depth = 32;
return; return;
default: default:
@@ -1038,6 +1096,7 @@ VkResult radv_CreateDevice(
case RADV_QUEUE_COMPUTE: case RADV_QUEUE_COMPUTE:
si_cs_emit_cache_flush(device->flush_cs[family], si_cs_emit_cache_flush(device->flush_cs[family],
device->physical_device->rad_info.chip_class, device->physical_device->rad_info.chip_class,
NULL, 0,
family == RADV_QUEUE_COMPUTE && device->physical_device->rad_info.chip_class >= CIK, family == RADV_QUEUE_COMPUTE && device->physical_device->rad_info.chip_class >= CIK,
RADV_CMD_FLAG_INV_ICACHE | RADV_CMD_FLAG_INV_ICACHE |
RADV_CMD_FLAG_INV_SMEM_L1 | RADV_CMD_FLAG_INV_SMEM_L1 |
@@ -1046,6 +1105,23 @@ VkResult radv_CreateDevice(
break; break;
} }
device->ws->cs_finalize(device->flush_cs[family]); device->ws->cs_finalize(device->flush_cs[family]);
device->flush_shader_cs[family] = device->ws->cs_create(device->ws, family);
switch (family) {
case RADV_QUEUE_GENERAL:
case RADV_QUEUE_COMPUTE:
si_cs_emit_cache_flush(device->flush_shader_cs[family],
device->physical_device->rad_info.chip_class,
NULL, 0,
family == RADV_QUEUE_COMPUTE && device->physical_device->rad_info.chip_class >= CIK,
family == RADV_QUEUE_COMPUTE ? RADV_CMD_FLAG_CS_PARTIAL_FLUSH : (RADV_CMD_FLAG_CS_PARTIAL_FLUSH | RADV_CMD_FLAG_PS_PARTIAL_FLUSH) |
RADV_CMD_FLAG_INV_ICACHE |
RADV_CMD_FLAG_INV_SMEM_L1 |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_INV_GLOBAL_L2);
break;
}
device->ws->cs_finalize(device->flush_shader_cs[family]);
} }
if (getenv("RADV_TRACE_FILE")) { if (getenv("RADV_TRACE_FILE")) {
@@ -1121,6 +1197,8 @@ void radv_DestroyDevice(
device->ws->cs_destroy(device->empty_cs[i]); device->ws->cs_destroy(device->empty_cs[i]);
if (device->flush_cs[i]) if (device->flush_cs[i])
device->ws->cs_destroy(device->flush_cs[i]); device->ws->cs_destroy(device->flush_cs[i]);
if (device->flush_shader_cs[i])
device->ws->cs_destroy(device->flush_shader_cs[i]);
} }
radv_device_finish_meta(device); radv_device_finish_meta(device);
@@ -1397,11 +1475,10 @@ radv_get_hs_offchip_param(struct radv_device *device, uint32_t *max_offchip_buff
max_offchip_buffers = MIN2(max_offchip_buffers, 126); max_offchip_buffers = MIN2(max_offchip_buffers, 126);
break; break;
case CIK: case CIK:
max_offchip_buffers = MIN2(max_offchip_buffers, 508);
break;
case VI: case VI:
case GFX9:
default: default:
max_offchip_buffers = MIN2(max_offchip_buffers, 512); max_offchip_buffers = MIN2(max_offchip_buffers, 508);
break; break;
} }
@@ -1638,6 +1715,10 @@ radv_get_preamble_cs(struct radv_queue *queue,
S_030938_SIZE(tess_factor_ring_size / 4)); S_030938_SIZE(tess_factor_ring_size / 4));
radeon_set_uconfig_reg(cs, R_030940_VGT_TF_MEMORY_BASE, radeon_set_uconfig_reg(cs, R_030940_VGT_TF_MEMORY_BASE,
tf_va >> 8); tf_va >> 8);
if (queue->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_set_uconfig_reg(cs, R_030944_VGT_TF_MEMORY_BASE_HI,
tf_va >> 40);
}
radeon_set_uconfig_reg(cs, R_03093C_VGT_HS_OFFCHIP_PARAM, hs_offchip_param); radeon_set_uconfig_reg(cs, R_03093C_VGT_HS_OFFCHIP_PARAM, hs_offchip_param);
} else { } else {
radeon_set_config_reg(cs, R_008988_VGT_TF_RING_SIZE, radeon_set_config_reg(cs, R_008988_VGT_TF_RING_SIZE,
@@ -1681,6 +1762,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
if (!i) { if (!i) {
si_cs_emit_cache_flush(cs, si_cs_emit_cache_flush(cs,
queue->device->physical_device->rad_info.chip_class, queue->device->physical_device->rad_info.chip_class,
NULL, 0,
queue->queue_family_index == RING_COMPUTE && queue->queue_family_index == RING_COMPUTE &&
queue->device->physical_device->rad_info.chip_class >= CIK, queue->device->physical_device->rad_info.chip_class >= CIK,
RADV_CMD_FLAG_INV_ICACHE | RADV_CMD_FLAG_INV_ICACHE |
@@ -1822,7 +1904,7 @@ VkResult radv_QueueSubmit(
for (uint32_t i = 0; i < submitCount; i++) { for (uint32_t i = 0; i < submitCount; i++) {
struct radeon_winsys_cs **cs_array; struct radeon_winsys_cs **cs_array;
bool do_flush = !i; bool do_flush = !i || pSubmits[i].pWaitDstStageMask;
bool can_patch = !do_flush; bool can_patch = !do_flush;
uint32_t advance; uint32_t advance;
@@ -1849,7 +1931,9 @@ VkResult radv_QueueSubmit(
(pSubmits[i].commandBufferCount + do_flush)); (pSubmits[i].commandBufferCount + do_flush));
if(do_flush) if(do_flush)
cs_array[0] = queue->device->flush_cs[queue->queue_family_index]; cs_array[0] = pSubmits[i].waitSemaphoreCount ?
queue->device->flush_shader_cs[queue->queue_family_index] :
queue->device->flush_cs[queue->queue_family_index];
for (uint32_t j = 0; j < pSubmits[i].commandBufferCount; j++) { for (uint32_t j = 0; j < pSubmits[i].commandBufferCount; j++) {
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer,
@@ -1992,7 +2076,7 @@ VkResult radv_AllocateMemory(
VkResult result; VkResult result;
enum radeon_bo_domain domain; enum radeon_bo_domain domain;
uint32_t flags = 0; uint32_t flags = 0;
const VkDedicatedAllocationMemoryAllocateInfoNV *dedicate_info = NULL;
assert(pAllocateInfo->sType == VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO); assert(pAllocateInfo->sType == VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO);
if (pAllocateInfo->allocationSize == 0) { if (pAllocateInfo->allocationSize == 0) {
@@ -2001,15 +2085,10 @@ VkResult radv_AllocateMemory(
return VK_SUCCESS; return VK_SUCCESS;
} }
vk_foreach_struct(ext, pAllocateInfo->pNext) { const VkImportMemoryFdInfoKHX *import_info =
switch (ext->sType) { vk_find_struct_const(pAllocateInfo->pNext, IMPORT_MEMORY_FD_INFO_KHX);
case VK_STRUCTURE_TYPE_DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV: const VkDedicatedAllocationMemoryAllocateInfoNV *dedicate_info =
dedicate_info = (const VkDedicatedAllocationMemoryAllocateInfoNV *)ext; vk_find_struct_const(pAllocateInfo->pNext, DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV);
break;
default:
break;
}
}
mem = vk_alloc2(&device->alloc, pAllocator, sizeof(*mem), 8, mem = vk_alloc2(&device->alloc, pAllocator, sizeof(*mem), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT); VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
@@ -2024,6 +2103,18 @@ VkResult radv_AllocateMemory(
mem->buffer = NULL; mem->buffer = NULL;
} }
if (import_info) {
assert(import_info->handleType ==
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX);
mem->bo = device->ws->buffer_from_fd(device->ws, import_info->fd,
NULL, NULL);
if (!mem->bo) {
result = VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX;
goto fail;
} else
goto out_success;
}
uint64_t alloc_size = align_u64(pAllocateInfo->allocationSize, 4096); uint64_t alloc_size = align_u64(pAllocateInfo->allocationSize, 4096);
if (pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_WRITE_COMBINE || if (pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_WRITE_COMBINE ||
pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_CACHED) pAllocateInfo->memoryTypeIndex == RADV_MEM_TYPE_GTT_CACHED)
@@ -2047,7 +2138,7 @@ VkResult radv_AllocateMemory(
goto fail; goto fail;
} }
mem->type_index = pAllocateInfo->memoryTypeIndex; mem->type_index = pAllocateInfo->memoryTypeIndex;
out_success:
*pMem = radv_device_memory_to_handle(mem); *pMem = radv_device_memory_to_handle(mem);
return VK_SUCCESS; return VK_SUCCESS;
@@ -2583,9 +2674,9 @@ static inline unsigned
si_tile_mode_index(const struct radv_image *image, unsigned level, bool stencil) si_tile_mode_index(const struct radv_image *image, unsigned level, bool stencil)
{ {
if (stencil) if (stencil)
return image->surface.stencil_tiling_index[level]; return image->surface.u.legacy.stencil_tiling_index[level];
else else
return image->surface.tiling_index[level]; return image->surface.u.legacy.tiling_index[level];
} }
static uint32_t radv_surface_layer_count(struct radv_image_view *iview) static uint32_t radv_surface_layer_count(struct radv_image_view *iview)
@@ -2601,24 +2692,68 @@ radv_initialise_color_surface(struct radv_device *device,
const struct vk_format_description *desc; const struct vk_format_description *desc;
unsigned ntype, format, swap, endian; unsigned ntype, format, swap, endian;
unsigned blend_clamp = 0, blend_bypass = 0; unsigned blend_clamp = 0, blend_bypass = 0;
unsigned pitch_tile_max, slice_tile_max, tile_mode_index;
uint64_t va; uint64_t va;
const struct radeon_surf *surf = &iview->image->surface; const struct radeon_surf *surf = &iview->image->surface;
const struct radeon_surf_level *level_info = &surf->level[iview->base_mip];
desc = vk_format_description(iview->vk_format); desc = vk_format_description(iview->vk_format);
memset(cb, 0, sizeof(*cb)); memset(cb, 0, sizeof(*cb));
/* Intensity is implemented as Red, so treat it that way. */
cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == VK_SWIZZLE_1);
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += level_info->offset;
if (device->physical_device->rad_info.chip_class >= GFX9) {
struct gfx9_surf_meta_flags meta;
if (iview->image->dcc_offset)
meta = iview->image->surface.u.gfx9.dcc;
else
meta = iview->image->surface.u.gfx9.cmask;
cb->cb_color_attrib |= S_028C74_COLOR_SW_MODE(iview->image->surface.u.gfx9.surf.swizzle_mode) |
S_028C74_FMASK_SW_MODE(iview->image->surface.u.gfx9.fmask.swizzle_mode) |
S_028C74_RB_ALIGNED(meta.rb_aligned) |
S_028C74_PIPE_ALIGNED(meta.pipe_aligned);
va += iview->image->surface.u.gfx9.surf_offset >> 8;
} else {
const struct legacy_surf_level *level_info = &surf->u.legacy.level[iview->base_mip];
unsigned pitch_tile_max, slice_tile_max, tile_mode_index;
va += level_info->offset;
pitch_tile_max = level_info->nblk_x / 8 - 1;
slice_tile_max = (level_info->nblk_x * level_info->nblk_y) / 64 - 1;
tile_mode_index = si_tile_mode_index(iview->image, iview->base_mip, false);
cb->cb_color_pitch = S_028C64_TILE_MAX(pitch_tile_max);
cb->cb_color_slice = S_028C68_TILE_MAX(slice_tile_max);
cb->cb_color_cmask_slice = iview->image->cmask.slice_tile_max;
cb->cb_color_attrib |= S_028C74_TILE_MODE_INDEX(tile_mode_index);
cb->micro_tile_mode = iview->image->surface.micro_tile_mode;
if (iview->image->fmask.size) {
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(iview->image->fmask.pitch_in_pixels / 8 - 1);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(iview->image->fmask.tile_mode_index);
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(iview->image->fmask.slice_tile_max);
} else {
/* This must be set for fast clear to work without FMASK. */
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(pitch_tile_max);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(tile_mode_index);
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(slice_tile_max);
}
}
cb->cb_color_base = va >> 8; cb->cb_color_base = va >> 8;
/* CMASK variables */ /* CMASK variables */
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->cmask.offset; va += iview->image->cmask.offset;
cb->cb_color_cmask = va >> 8; cb->cb_color_cmask = va >> 8;
cb->cb_color_cmask_slice = iview->image->cmask.slice_tile_max;
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->dcc_offset; va += iview->image->dcc_offset;
@@ -2628,20 +2763,8 @@ radv_initialise_color_surface(struct radv_device *device,
cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) | cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) |
S_028C6C_SLICE_MAX(iview->base_layer + max_slice - 1); S_028C6C_SLICE_MAX(iview->base_layer + max_slice - 1);
cb->micro_tile_mode = iview->image->surface.micro_tile_mode; if (iview->image->info.samples > 1) {
pitch_tile_max = level_info->nblk_x / 8 - 1; unsigned log_samples = util_logbase2(iview->image->info.samples);
slice_tile_max = (level_info->nblk_x * level_info->nblk_y) / 64 - 1;
tile_mode_index = si_tile_mode_index(iview->image, iview->base_mip, false);
cb->cb_color_pitch = S_028C64_TILE_MAX(pitch_tile_max);
cb->cb_color_slice = S_028C68_TILE_MAX(slice_tile_max);
/* Intensity is implemented as Red, so treat it that way. */
cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == VK_SWIZZLE_1) |
S_028C74_TILE_MODE_INDEX(tile_mode_index);
if (iview->image->samples > 1) {
unsigned log_samples = util_logbase2(iview->image->samples);
cb->cb_color_attrib |= S_028C74_NUM_SAMPLES(log_samples) | cb->cb_color_attrib |= S_028C74_NUM_SAMPLES(log_samples) |
S_028C74_NUM_FRAGMENTS(log_samples); S_028C74_NUM_FRAGMENTS(log_samples);
@@ -2649,18 +2772,9 @@ radv_initialise_color_surface(struct radv_device *device,
if (iview->image->fmask.size) { if (iview->image->fmask.size) {
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset; va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset;
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(iview->image->fmask.pitch_in_pixels / 8 - 1);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(iview->image->fmask.tile_mode_index);
cb->cb_color_fmask = va >> 8; cb->cb_color_fmask = va >> 8;
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(iview->image->fmask.slice_tile_max);
} else { } else {
/* This must be set for fast clear to work without FMASK. */
if (device->physical_device->rad_info.chip_class >= CIK)
cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(pitch_tile_max);
cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(tile_mode_index);
cb->cb_color_fmask = cb->cb_color_base; cb->cb_color_fmask = cb->cb_color_base;
cb->cb_color_fmask_slice = S_028C88_TILE_MAX(slice_tile_max);
} }
ntype = radv_translate_color_numformat(iview->vk_format, ntype = radv_translate_color_numformat(iview->vk_format,
@@ -2705,7 +2819,7 @@ radv_initialise_color_surface(struct radv_device *device,
format != V_028C70_COLOR_24_8) | format != V_028C70_COLOR_24_8) |
S_028C70_NUMBER_TYPE(ntype) | S_028C70_NUMBER_TYPE(ntype) |
S_028C70_ENDIAN(endian); S_028C70_ENDIAN(endian);
if (iview->image->samples > 1) if (iview->image->info.samples > 1)
if (iview->image->fmask.size) if (iview->image->fmask.size)
cb->cb_color_info |= S_028C70_COMPRESSION(1); cb->cb_color_info |= S_028C70_COMPRESSION(1);
@@ -2713,12 +2827,12 @@ radv_initialise_color_surface(struct radv_device *device,
!(device->debug_flags & RADV_DEBUG_NO_FAST_CLEARS)) !(device->debug_flags & RADV_DEBUG_NO_FAST_CLEARS))
cb->cb_color_info |= S_028C70_FAST_CLEAR(1); cb->cb_color_info |= S_028C70_FAST_CLEAR(1);
if (iview->image->surface.dcc_size && level_info->dcc_enabled) if (iview->image->surface.dcc_size && iview->base_mip < surf->num_dcc_levels)
cb->cb_color_info |= S_028C70_DCC_ENABLE(1); cb->cb_color_info |= S_028C70_DCC_ENABLE(1);
if (device->physical_device->rad_info.chip_class >= VI) { if (device->physical_device->rad_info.chip_class >= VI) {
unsigned max_uncompressed_block_size = 2; unsigned max_uncompressed_block_size = 2;
if (iview->image->samples > 1) { if (iview->image->info.samples > 1) {
if (iview->image->surface.bpe == 1) if (iview->image->surface.bpe == 1)
max_uncompressed_block_size = 0; max_uncompressed_block_size = 0;
else if (iview->image->surface.bpe == 2) else if (iview->image->surface.bpe == 2)
@@ -2732,9 +2846,24 @@ radv_initialise_color_surface(struct radv_device *device,
/* This must be set for fast clear to work without FMASK. */ /* This must be set for fast clear to work without FMASK. */
if (!iview->image->fmask.size && if (!iview->image->fmask.size &&
device->physical_device->rad_info.chip_class == SI) { device->physical_device->rad_info.chip_class == SI) {
unsigned bankh = util_logbase2(iview->image->surface.bankh); unsigned bankh = util_logbase2(iview->image->surface.u.legacy.bankh);
cb->cb_color_attrib |= S_028C74_FMASK_BANK_HEIGHT(bankh); cb->cb_color_attrib |= S_028C74_FMASK_BANK_HEIGHT(bankh);
} }
if (device->physical_device->rad_info.chip_class >= GFX9) {
uint32_t max_slice = radv_surface_layer_count(iview);
unsigned mip0_depth = iview->base_layer + max_slice - 1;
cb->cb_color_view |= S_028C6C_MIP_LEVEL(iview->base_mip);
cb->cb_color_attrib |= S_028C74_MIP0_DEPTH(mip0_depth) |
S_028C74_RESOURCE_TYPE(iview->image->surface.u.gfx9.resource_type);
cb->cb_color_attrib2 = S_028C68_MIP0_WIDTH(iview->image->info.width - 1) |
S_028C68_MIP0_HEIGHT(iview->image->info.height - 1) |
S_028C68_MAX_MIP(iview->image->info.levels);
cb->gfx9_epitch = S_0287A0_EPITCH(iview->image->surface.u.gfx9.surf.epitch);
}
} }
static void static void
@@ -2743,9 +2872,8 @@ radv_initialise_ds_surface(struct radv_device *device,
struct radv_image_view *iview) struct radv_image_view *iview)
{ {
unsigned level = iview->base_mip; unsigned level = iview->base_mip;
unsigned format; unsigned format, stencil_format;
uint64_t va, s_offs, z_offs; uint64_t va, s_offs, z_offs;
const struct radeon_surf_level *level_info = &iview->image->surface.level[level];
bool stencil_only = false; bool stencil_only = false;
memset(ds, 0, sizeof(*ds)); memset(ds, 0, sizeof(*ds));
switch (iview->vk_format) { switch (iview->vk_format) {
@@ -2767,98 +2895,121 @@ radv_initialise_ds_surface(struct radv_device *device,
break; break;
case VK_FORMAT_S8_UINT: case VK_FORMAT_S8_UINT:
stencil_only = true; stencil_only = true;
level_info = &iview->image->surface.stencil_level[level];
break; break;
default: default:
break; break;
} }
format = radv_translate_dbformat(iview->vk_format); format = radv_translate_dbformat(iview->vk_format);
stencil_format = iview->image->surface.flags & RADEON_SURF_SBUFFER ?
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; V_028044_STENCIL_8 : V_028044_STENCIL_INVALID;
s_offs = z_offs = va;
z_offs += iview->image->surface.level[level].offset;
s_offs += iview->image->surface.stencil_level[level].offset;
uint32_t max_slice = radv_surface_layer_count(iview); uint32_t max_slice = radv_surface_layer_count(iview);
ds->db_depth_view = S_028008_SLICE_START(iview->base_layer) | ds->db_depth_view = S_028008_SLICE_START(iview->base_layer) |
S_028008_SLICE_MAX(iview->base_layer + max_slice - 1); S_028008_SLICE_MAX(iview->base_layer + max_slice - 1);
ds->db_depth_info = S_02803C_ADDR5_SWIZZLE_MASK(1);
ds->db_z_info = S_028040_FORMAT(format) | S_028040_ZRANGE_PRECISION(1);
if (iview->image->samples > 1) ds->db_htile_data_base = 0;
ds->db_z_info |= S_028040_NUM_SAMPLES(util_logbase2(iview->image->samples)); ds->db_htile_surface = 0;
if (iview->image->surface.flags & RADEON_SURF_SBUFFER) va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
ds->db_stencil_info = S_028044_FORMAT(V_028044_STENCIL_8); s_offs = z_offs = va;
else
ds->db_stencil_info = S_028044_FORMAT(V_028044_STENCIL_INVALID);
if (device->physical_device->rad_info.chip_class >= CIK) { if (device->physical_device->rad_info.chip_class >= GFX9) {
struct radeon_info *info = &device->physical_device->rad_info; assert(iview->image->surface.u.gfx9.surf_offset == 0);
unsigned tiling_index = iview->image->surface.tiling_index[level]; s_offs += iview->image->surface.u.gfx9.stencil_offset;
unsigned stencil_index = iview->image->surface.stencil_tiling_index[level];
unsigned macro_index = iview->image->surface.macro_tile_index; ds->db_z_info = S_028038_FORMAT(format) |
unsigned tile_mode = info->si_tile_mode_array[tiling_index]; S_028038_NUM_SAMPLES(util_logbase2(iview->image->info.samples)) |
unsigned stencil_tile_mode = info->si_tile_mode_array[stencil_index]; S_028038_SW_MODE(iview->image->surface.u.gfx9.surf.swizzle_mode) |
unsigned macro_mode = info->cik_macrotile_mode_array[macro_index]; S_028038_MAXMIP(iview->image->info.levels - 1);
ds->db_stencil_info = S_02803C_FORMAT(stencil_format) |
S_02803C_SW_MODE(iview->image->surface.u.gfx9.stencil.swizzle_mode);
ds->db_z_info2 = S_028068_EPITCH(iview->image->surface.u.gfx9.surf.epitch);
ds->db_stencil_info2 = S_02806C_EPITCH(iview->image->surface.u.gfx9.stencil.epitch);
ds->db_depth_view |= S_028008_MIPID(level);
ds->db_depth_size = S_02801C_X_MAX(iview->image->info.width - 1) |
S_02801C_Y_MAX(iview->image->info.height - 1);
/* Only use HTILE for the first level. */
if (iview->image->surface.htile_size && !level) {
ds->db_z_info |= S_028038_TILE_SURFACE_ENABLE(1);
if (!(iview->image->surface.flags & RADEON_SURF_SBUFFER))
/* Use all of the htile_buffer for depth if there's no stencil. */
ds->db_stencil_info |= S_02803C_TILE_STENCIL_DISABLE(1);
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset +
iview->image->htile_offset;
ds->db_htile_data_base = va >> 8;
ds->db_htile_surface = S_028ABC_FULL_CACHE(1) |
S_028ABC_PIPE_ALIGNED(iview->image->surface.u.gfx9.htile.pipe_aligned) |
S_028ABC_RB_ALIGNED(iview->image->surface.u.gfx9.htile.rb_aligned);
}
} else {
const struct legacy_surf_level *level_info = &iview->image->surface.u.legacy.level[level];
if (stencil_only) if (stencil_only)
tile_mode = stencil_tile_mode; level_info = &iview->image->surface.u.legacy.stencil_level[level];
ds->db_depth_info |= z_offs += iview->image->surface.u.legacy.level[level].offset;
S_02803C_ARRAY_MODE(G_009910_ARRAY_MODE(tile_mode)) | s_offs += iview->image->surface.u.legacy.stencil_level[level].offset;
S_02803C_PIPE_CONFIG(G_009910_PIPE_CONFIG(tile_mode)) |
S_02803C_BANK_WIDTH(G_009990_BANK_WIDTH(macro_mode)) |
S_02803C_BANK_HEIGHT(G_009990_BANK_HEIGHT(macro_mode)) |
S_02803C_MACRO_TILE_ASPECT(G_009990_MACRO_TILE_ASPECT(macro_mode)) |
S_02803C_NUM_BANKS(G_009990_NUM_BANKS(macro_mode));
ds->db_z_info |= S_028040_TILE_SPLIT(G_009910_TILE_SPLIT(tile_mode));
ds->db_stencil_info |= S_028044_TILE_SPLIT(G_009910_TILE_SPLIT(stencil_tile_mode));
} else {
unsigned tile_mode_index = si_tile_mode_index(iview->image, level, false);
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
tile_mode_index = si_tile_mode_index(iview->image, level, true);
ds->db_stencil_info |= S_028044_TILE_MODE_INDEX(tile_mode_index);
}
if (iview->image->surface.htile_size && !level) { ds->db_depth_info = S_02803C_ADDR5_SWIZZLE_MASK(1);
ds->db_z_info |= S_028040_TILE_SURFACE_ENABLE(1) | ds->db_z_info = S_028040_FORMAT(format) | S_028040_ZRANGE_PRECISION(1);
S_028040_ALLOW_EXPCLEAR(1); ds->db_stencil_info = S_028044_FORMAT(stencil_format);
if (iview->image->surface.flags & RADEON_SURF_SBUFFER) { if (iview->image->info.samples > 1)
/* Workaround: For a not yet understood reason, the ds->db_z_info |= S_028040_NUM_SAMPLES(util_logbase2(iview->image->info.samples));
* combination of MSAA, fast stencil clear and stencil
* decompress messes with subsequent stencil buffer
* uses. Problem was reproduced on Verde, Bonaire,
* Tonga, and Carrizo.
*
* Disabling EXPCLEAR works around the problem.
*
* Check piglit's arb_texture_multisample-stencil-clear
* test if you want to try changing this.
*/
if (iview->image->samples <= 1)
ds->db_stencil_info |= S_028044_ALLOW_EXPCLEAR(1);
} else
/* Use all of the htile_buffer for depth if there's no stencil. */
ds->db_stencil_info |= S_028044_TILE_STENCIL_DISABLE(1);
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + if (device->physical_device->rad_info.chip_class >= CIK) {
iview->image->htile_offset; struct radeon_info *info = &device->physical_device->rad_info;
ds->db_htile_data_base = va >> 8; unsigned tiling_index = iview->image->surface.u.legacy.tiling_index[level];
ds->db_htile_surface = S_028ABC_FULL_CACHE(1); unsigned stencil_index = iview->image->surface.u.legacy.stencil_tiling_index[level];
} else { unsigned macro_index = iview->image->surface.u.legacy.macro_tile_index;
ds->db_htile_data_base = 0; unsigned tile_mode = info->si_tile_mode_array[tiling_index];
ds->db_htile_surface = 0; unsigned stencil_tile_mode = info->si_tile_mode_array[stencil_index];
unsigned macro_mode = info->cik_macrotile_mode_array[macro_index];
if (stencil_only)
tile_mode = stencil_tile_mode;
ds->db_depth_info |=
S_02803C_ARRAY_MODE(G_009910_ARRAY_MODE(tile_mode)) |
S_02803C_PIPE_CONFIG(G_009910_PIPE_CONFIG(tile_mode)) |
S_02803C_BANK_WIDTH(G_009990_BANK_WIDTH(macro_mode)) |
S_02803C_BANK_HEIGHT(G_009990_BANK_HEIGHT(macro_mode)) |
S_02803C_MACRO_TILE_ASPECT(G_009990_MACRO_TILE_ASPECT(macro_mode)) |
S_02803C_NUM_BANKS(G_009990_NUM_BANKS(macro_mode));
ds->db_z_info |= S_028040_TILE_SPLIT(G_009910_TILE_SPLIT(tile_mode));
ds->db_stencil_info |= S_028044_TILE_SPLIT(G_009910_TILE_SPLIT(stencil_tile_mode));
} else {
unsigned tile_mode_index = si_tile_mode_index(iview->image, level, false);
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
tile_mode_index = si_tile_mode_index(iview->image, level, true);
ds->db_stencil_info |= S_028044_TILE_MODE_INDEX(tile_mode_index);
}
ds->db_depth_size = S_028058_PITCH_TILE_MAX((level_info->nblk_x / 8) - 1) |
S_028058_HEIGHT_TILE_MAX((level_info->nblk_y / 8) - 1);
ds->db_depth_slice = S_02805C_SLICE_TILE_MAX((level_info->nblk_x * level_info->nblk_y) / 64 - 1);
if (iview->image->surface.htile_size && !level) {
ds->db_z_info |= S_028040_TILE_SURFACE_ENABLE(1);
if (!(iview->image->surface.flags & RADEON_SURF_SBUFFER))
/* Use all of the htile_buffer for depth if there's no stencil. */
ds->db_stencil_info |= S_028044_TILE_STENCIL_DISABLE(1);
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset +
iview->image->htile_offset;
ds->db_htile_data_base = va >> 8;
ds->db_htile_surface = S_028ABC_FULL_CACHE(1);
}
} }
ds->db_z_read_base = ds->db_z_write_base = z_offs >> 8; ds->db_z_read_base = ds->db_z_write_base = z_offs >> 8;
ds->db_stencil_read_base = ds->db_stencil_write_base = s_offs >> 8; ds->db_stencil_read_base = ds->db_stencil_write_base = s_offs >> 8;
ds->db_depth_size = S_028058_PITCH_TILE_MAX((level_info->nblk_x / 8) - 1) |
S_028058_HEIGHT_TILE_MAX((level_info->nblk_y / 8) - 1);
ds->db_depth_slice = S_02805C_SLICE_TILE_MAX((level_info->nblk_x * level_info->nblk_y) / 64 - 1);
} }
VkResult radv_CreateFramebuffer( VkResult radv_CreateFramebuffer(
@@ -3092,7 +3243,6 @@ void radv_DestroySampler(
vk_free2(&device->alloc, pAllocator, sampler); vk_free2(&device->alloc, pAllocator, sampler);
} }
/* vk_icd.h does not declare this function, so we declare it here to /* vk_icd.h does not declare this function, so we declare it here to
* suppress Wmissing-prototypes. * suppress Wmissing-prototypes.
*/ */
@@ -3136,3 +3286,34 @@ vk_icdNegotiateLoaderICDInterfaceVersion(uint32_t *pSupportedVersion)
*pSupportedVersion = MIN2(*pSupportedVersion, 3u); *pSupportedVersion = MIN2(*pSupportedVersion, 3u);
return VK_SUCCESS; return VK_SUCCESS;
} }
VkResult radv_GetMemoryFdKHX(VkDevice _device,
VkDeviceMemory _memory,
VkExternalMemoryHandleTypeFlagsKHX handleType,
int *pFD)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_device_memory, memory, _memory);
/* We support only one handle type. */
assert(handleType == VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX);
bool ret = radv_get_memory_fd(device, memory, pFD);
if (ret == false)
return VK_ERROR_OUT_OF_DEVICE_MEMORY;
return VK_SUCCESS;
}
VkResult radv_GetMemoryFdPropertiesKHX(VkDevice _device,
VkExternalMemoryHandleTypeFlagBitsKHX handleType,
int fd,
VkMemoryFdPropertiesKHX *pMemoryFdProperties)
{
/* The valid usage section for this function says:
*
* "handleType must not be one of the handle types defined as opaque."
*
* Since we only handle opaque handles for now, there are no FD properties.
*/
return VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX;
}

View File

@@ -42,6 +42,9 @@ supported_extensions = [
'VK_KHR_wayland_surface', 'VK_KHR_wayland_surface',
'VK_KHR_xcb_surface', 'VK_KHR_xcb_surface',
'VK_KHR_xlib_surface', 'VK_KHR_xlib_surface',
'VK_KHX_external_memory_capabilities',
'VK_KHX_external_memory',
'VK_KHX_external_memory_fd',
] ]
# We generate a static hash table for entry point lookup # We generate a static hash table for entry point lookup

View File

@@ -28,6 +28,8 @@
#include "sid.h" #include "sid.h"
#include "r600d_common.h" #include "r600d_common.h"
#include "vk_util.h"
#include "util/u_half.h" #include "util/u_half.h"
#include "util/format_srgb.h" #include "util/format_srgb.h"
#include "util/format_r11g11b10f.h" #include "util/format_r11g11b10f.h"
@@ -597,13 +599,13 @@ radv_physical_device_get_format_properties(struct radv_physical_device *physical
tiled |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BLEND_BIT; tiled |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BLEND_BIT;
} }
} }
if (util_is_power_of_two(vk_format_get_blocksize(format)) && !scaled) { if (tiled && util_is_power_of_two(vk_format_get_blocksize(format)) && !scaled) {
tiled |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR | tiled |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR |
VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR; VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR;
} }
} }
if (util_is_power_of_two(vk_format_get_blocksize(format)) && !scaled) { if (linear && util_is_power_of_two(vk_format_get_blocksize(format)) && !scaled) {
linear |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR | linear |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR |
VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR; VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR;
} }
@@ -1006,16 +1008,11 @@ void radv_GetPhysicalDeviceFormatProperties2KHR(
&pFormatProperties->formatProperties); &pFormatProperties->formatProperties);
} }
VkResult radv_GetPhysicalDeviceImageFormatProperties( static VkResult radv_get_image_format_properties(struct radv_physical_device *physical_device,
VkPhysicalDevice physicalDevice, const VkPhysicalDeviceImageFormatInfo2KHR *info,
VkFormat format, VkImageFormatProperties *pImageFormatProperties)
VkImageType type,
VkImageTiling tiling,
VkImageUsageFlags usage,
VkImageCreateFlags createFlags,
VkImageFormatProperties* pImageFormatProperties)
{ {
RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
VkFormatProperties format_props; VkFormatProperties format_props;
VkFormatFeatureFlags format_feature_flags; VkFormatFeatureFlags format_feature_flags;
VkExtent3D maxExtent; VkExtent3D maxExtent;
@@ -1023,11 +1020,11 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties(
uint32_t maxArraySize; uint32_t maxArraySize;
VkSampleCountFlags sampleCounts = VK_SAMPLE_COUNT_1_BIT; VkSampleCountFlags sampleCounts = VK_SAMPLE_COUNT_1_BIT;
radv_physical_device_get_format_properties(physical_device, format, radv_physical_device_get_format_properties(physical_device, info->format,
&format_props); &format_props);
if (tiling == VK_IMAGE_TILING_LINEAR) { if (info->tiling == VK_IMAGE_TILING_LINEAR) {
format_feature_flags = format_props.linearTilingFeatures; format_feature_flags = format_props.linearTilingFeatures;
} else if (tiling == VK_IMAGE_TILING_OPTIMAL) { } else if (info->tiling == VK_IMAGE_TILING_OPTIMAL) {
format_feature_flags = format_props.optimalTilingFeatures; format_feature_flags = format_props.optimalTilingFeatures;
} else { } else {
unreachable("bad VkImageTiling"); unreachable("bad VkImageTiling");
@@ -1036,7 +1033,7 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties(
if (format_feature_flags == 0) if (format_feature_flags == 0)
goto unsupported; goto unsupported;
switch (type) { switch (info->type) {
default: default:
unreachable("bad vkimage type\n"); unreachable("bad vkimage type\n");
case VK_IMAGE_TYPE_1D: case VK_IMAGE_TYPE_1D:
@@ -1062,34 +1059,34 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties(
break; break;
} }
if (tiling == VK_IMAGE_TILING_OPTIMAL && if (info->tiling == VK_IMAGE_TILING_OPTIMAL &&
type == VK_IMAGE_TYPE_2D && info->type == VK_IMAGE_TYPE_2D &&
(format_feature_flags & (VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT | (format_feature_flags & (VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT |
VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT)) && VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT)) &&
!(createFlags & VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT) && !(info->flags & VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT) &&
!(usage & VK_IMAGE_USAGE_STORAGE_BIT)) { !(info->usage & VK_IMAGE_USAGE_STORAGE_BIT)) {
sampleCounts |= VK_SAMPLE_COUNT_2_BIT | VK_SAMPLE_COUNT_4_BIT | VK_SAMPLE_COUNT_8_BIT; sampleCounts |= VK_SAMPLE_COUNT_2_BIT | VK_SAMPLE_COUNT_4_BIT | VK_SAMPLE_COUNT_8_BIT;
} }
if (usage & VK_IMAGE_USAGE_SAMPLED_BIT) { if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT)) { if (!(format_feature_flags & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT)) {
goto unsupported; goto unsupported;
} }
} }
if (usage & VK_IMAGE_USAGE_STORAGE_BIT) { if (info->usage & VK_IMAGE_USAGE_STORAGE_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_STORAGE_IMAGE_BIT)) { if (!(format_feature_flags & VK_FORMAT_FEATURE_STORAGE_IMAGE_BIT)) {
goto unsupported; goto unsupported;
} }
} }
if (usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) { if (info->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT)) { if (!(format_feature_flags & VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT)) {
goto unsupported; goto unsupported;
} }
} }
if (usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) { if (info->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
if (!(format_feature_flags & VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT)) { if (!(format_feature_flags & VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT)) {
goto unsupported; goto unsupported;
} }
@@ -1120,18 +1117,132 @@ unsupported:
return VK_ERROR_FORMAT_NOT_SUPPORTED; return VK_ERROR_FORMAT_NOT_SUPPORTED;
} }
VkResult radv_GetPhysicalDeviceImageFormatProperties(
VkPhysicalDevice physicalDevice,
VkFormat format,
VkImageType type,
VkImageTiling tiling,
VkImageUsageFlags usage,
VkImageCreateFlags createFlags,
VkImageFormatProperties* pImageFormatProperties)
{
RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
const VkPhysicalDeviceImageFormatInfo2KHR info = {
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_IMAGE_FORMAT_INFO_2_KHR,
.pNext = NULL,
.format = format,
.type = type,
.tiling = tiling,
.usage = usage,
.flags = createFlags,
};
return radv_get_image_format_properties(physical_device, &info,
pImageFormatProperties);
}
static void
get_external_image_format_properties(const VkPhysicalDeviceImageFormatInfo2KHR *pImageFormatInfo,
VkExternalMemoryPropertiesKHX *external_properties)
{
VkExternalMemoryFeatureFlagBitsKHX flags = 0;
VkExternalMemoryHandleTypeFlagsKHX export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHX compat_flags = 0;
switch (pImageFormatInfo->type) {
case VK_IMAGE_TYPE_2D:
flags = VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT_KHX|VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHX|VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHX;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX;
break;
default:
break;
}
*external_properties = (VkExternalMemoryPropertiesKHX) {
.externalMemoryFeatures = flags,
.exportFromImportedHandleTypes = export_flags,
.compatibleHandleTypes = compat_flags,
};
}
VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR( VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR(
VkPhysicalDevice physicalDevice, VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceImageFormatInfo2KHR* pImageFormatInfo, const VkPhysicalDeviceImageFormatInfo2KHR *base_info,
VkImageFormatProperties2KHR *pImageFormatProperties) VkImageFormatProperties2KHR *base_props)
{ {
return radv_GetPhysicalDeviceImageFormatProperties(physicalDevice, RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
pImageFormatInfo->format, const VkPhysicalDeviceExternalImageFormatInfoKHX *external_info = NULL;
pImageFormatInfo->type, VkExternalImageFormatPropertiesKHX *external_props = NULL;
pImageFormatInfo->tiling, VkResult result;
pImageFormatInfo->usage,
pImageFormatInfo->flags, result = radv_get_image_format_properties(physical_device, base_info,
&pImageFormatProperties->imageFormatProperties); &base_props->imageFormatProperties);
if (result != VK_SUCCESS)
return result;
/* Extract input structs */
vk_foreach_struct_const(s, base_info->pNext) {
switch (s->sType) {
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_EXTERNAL_IMAGE_FORMAT_INFO_KHX:
external_info = (const void *) s;
break;
default:
break;
}
}
/* Extract output structs */
vk_foreach_struct(s, base_props->pNext) {
switch (s->sType) {
case VK_STRUCTURE_TYPE_EXTERNAL_IMAGE_FORMAT_PROPERTIES_KHX:
external_props = (void *) s;
break;
default:
break;
}
}
/* From the Vulkan 1.0.42 spec:
*
* If handleType is 0, vkGetPhysicalDeviceImageFormatProperties2KHR will
* behave as if VkPhysicalDeviceExternalImageFormatInfoKHX was not
* present and VkExternalImageFormatPropertiesKHX will be ignored.
*/
if (external_info && external_info->handleType != 0) {
switch (external_info->handleType) {
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX:
get_external_image_format_properties(base_info, &external_props->externalMemoryProperties);
break;
default:
/* From the Vulkan 1.0.42 spec:
*
* If handleType is not compatible with the [parameters] specified
* in VkPhysicalDeviceImageFormatInfo2KHR, then
* vkGetPhysicalDeviceImageFormatProperties2KHR returns
* VK_ERROR_FORMAT_NOT_SUPPORTED.
*/
result = vk_errorf(VK_ERROR_FORMAT_NOT_SUPPORTED,
"unsupported VkExternalMemoryTypeFlagBitsKHX 0x%x",
external_info->handleType);
goto fail;
}
}
return VK_SUCCESS;
fail:
if (result == VK_ERROR_FORMAT_NOT_SUPPORTED) {
/* From the Vulkan 1.0.42 spec:
*
* If the combination of parameters to
* vkGetPhysicalDeviceImageFormatProperties2KHR is not supported by
* the implementation for use in vkCreateImage, then all members of
* imageFormatProperties will be filled with zero.
*/
base_props->imageFormatProperties = (VkImageFormatProperties) {0};
}
return result;
} }
void radv_GetPhysicalDeviceSparseImageFormatProperties( void radv_GetPhysicalDeviceSparseImageFormatProperties(
@@ -1157,3 +1268,28 @@ void radv_GetPhysicalDeviceSparseImageFormatProperties2KHR(
/* Sparse images are not yet supported. */ /* Sparse images are not yet supported. */
*pPropertyCount = 0; *pPropertyCount = 0;
} }
void radv_GetPhysicalDeviceExternalBufferPropertiesKHX(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceExternalBufferInfoKHX *pExternalBufferInfo,
VkExternalBufferPropertiesKHX *pExternalBufferProperties)
{
VkExternalMemoryFeatureFlagBitsKHX flags = 0;
VkExternalMemoryHandleTypeFlagsKHX export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHX compat_flags = 0;
switch(pExternalBufferInfo->handleType) {
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX:
flags = VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT_KHX |
VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHX |
VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHX;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX;
break;
default:
break;
}
pExternalBufferProperties->externalMemoryProperties = (VkExternalMemoryPropertiesKHX) {
.externalMemoryFeatures = flags,
.exportFromImportedHandleTypes = export_flags,
.compatibleHandleTypes = compat_flags,
};
}

View File

@@ -29,6 +29,7 @@
#include "vk_format.h" #include "vk_format.h"
#include "radv_radeon_winsys.h" #include "radv_radeon_winsys.h"
#include "sid.h" #include "sid.h"
#include "gfx9d.h"
#include "util/debug.h" #include "util/debug.h"
static unsigned static unsigned
radv_choose_tiling(struct radv_device *Device, radv_choose_tiling(struct radv_device *Device,
@@ -67,22 +68,15 @@ radv_init_surface(struct radv_device *device,
is_depth = vk_format_has_depth(desc); is_depth = vk_format_has_depth(desc);
is_stencil = vk_format_has_stencil(desc); is_stencil = vk_format_has_stencil(desc);
surface->npix_x = pCreateInfo->extent.width;
surface->npix_y = pCreateInfo->extent.height;
surface->npix_z = pCreateInfo->extent.depth;
surface->blk_w = vk_format_get_blockwidth(pCreateInfo->format); surface->blk_w = vk_format_get_blockwidth(pCreateInfo->format);
surface->blk_h = vk_format_get_blockheight(pCreateInfo->format); surface->blk_h = vk_format_get_blockheight(pCreateInfo->format);
surface->blk_d = 1;
surface->array_size = pCreateInfo->arrayLayers;
surface->last_level = pCreateInfo->mipLevels - 1;
surface->bpe = vk_format_get_blocksize(pCreateInfo->format); surface->bpe = vk_format_get_blocksize(pCreateInfo->format);
/* align byte per element on dword */ /* align byte per element on dword */
if (surface->bpe == 3) { if (surface->bpe == 3) {
surface->bpe = 4; surface->bpe = 4;
} }
surface->nsamples = pCreateInfo->samples ? pCreateInfo->samples : 1;
surface->flags = RADEON_SURF_SET(array_mode, MODE); surface->flags = RADEON_SURF_SET(array_mode, MODE);
switch (pCreateInfo->imageType){ switch (pCreateInfo->imageType){
@@ -110,8 +104,7 @@ radv_init_surface(struct radv_device *device,
} }
if (is_stencil) if (is_stencil)
surface->flags |= RADEON_SURF_SBUFFER | surface->flags |= RADEON_SURF_SBUFFER;
RADEON_SURF_HAS_SBUFFER_MIPTREE;
surface->flags |= RADEON_SURF_HAS_TILE_MODE_INDEX; surface->flags |= RADEON_SURF_HAS_TILE_MODE_INDEX;
@@ -137,9 +130,9 @@ static inline unsigned
si_tile_mode_index(const struct radv_image *image, unsigned level, bool stencil) si_tile_mode_index(const struct radv_image *image, unsigned level, bool stencil)
{ {
if (stencil) if (stencil)
return image->surface.stencil_tiling_index[level]; return image->surface.u.legacy.stencil_tiling_index[level];
else else
return image->surface.tiling_index[level]; return image->surface.u.legacy.tiling_index[level];
} }
static unsigned radv_map_swizzle(unsigned swizzle) static unsigned radv_map_swizzle(unsigned swizzle)
@@ -197,33 +190,80 @@ radv_make_buffer_descriptor(struct radv_device *device,
static void static void
si_set_mutable_tex_desc_fields(struct radv_device *device, si_set_mutable_tex_desc_fields(struct radv_device *device,
struct radv_image *image, struct radv_image *image,
const struct radeon_surf_level *base_level_info, const struct legacy_surf_level *base_level_info,
unsigned base_level, unsigned first_level, unsigned base_level, unsigned first_level,
unsigned block_width, bool is_stencil, unsigned block_width, bool is_stencil,
uint32_t *state) uint32_t *state)
{ {
uint64_t gpu_address = device->ws->buffer_get_va(image->bo) + image->offset; uint64_t gpu_address = device->ws->buffer_get_va(image->bo) + image->offset;
uint64_t va = gpu_address + base_level_info->offset; uint64_t va = gpu_address;
unsigned pitch = base_level_info->nblk_x * block_width; unsigned pitch = base_level_info->nblk_x * block_width;
enum chip_class chip_class = device->physical_device->rad_info.chip_class;
state[1] &= C_008F14_BASE_ADDRESS_HI; uint64_t meta_va = 0;
state[3] &= C_008F1C_TILING_INDEX; if (chip_class >= GFX9) {
state[4] &= C_008F20_PITCH_GFX6; if (is_stencil)
state[6] &= C_008F28_COMPRESSION_EN; va += image->surface.u.gfx9.stencil_offset;
else
assert(!(va & 255)); va += image->surface.u.gfx9.surf_offset;
} else
va += base_level_info->offset;
state[0] = va >> 8; state[0] = va >> 8;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40); state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
state[3] |= S_008F1C_TILING_INDEX(si_tile_mode_index(image, base_level, state[3] |= S_008F1C_TILING_INDEX(si_tile_mode_index(image, base_level,
is_stencil)); is_stencil));
state[4] |= S_008F20_PITCH_GFX6(pitch - 1); state[4] |= S_008F20_PITCH_GFX6(pitch - 1);
if (image->surface.dcc_size && image->surface.level[first_level].dcc_enabled) { if (chip_class >= VI) {
state[6] |= S_008F28_COMPRESSION_EN(1); state[6] &= C_008F28_COMPRESSION_EN;
state[7] = (gpu_address + state[7] = 0;
image->dcc_offset + if (image->surface.dcc_size && first_level < image->surface.num_dcc_levels) {
base_level_info->dcc_offset) >> 8; uint64_t meta_va = gpu_address + image->dcc_offset;
if (chip_class <= VI)
meta_va += base_level_info->dcc_offset;
state[6] |= S_008F28_COMPRESSION_EN(1);
state[7] = meta_va >> 8;
}
}
if (chip_class >= GFX9) {
state[3] &= C_008F1C_SW_MODE;
state[4] &= C_008F20_PITCH_GFX9;
if (is_stencil) {
state[3] |= S_008F1C_SW_MODE(image->surface.u.gfx9.stencil.swizzle_mode);
state[4] |= S_008F20_PITCH_GFX9(image->surface.u.gfx9.stencil.epitch);
} else {
state[3] |= S_008F1C_SW_MODE(image->surface.u.gfx9.surf.swizzle_mode);
state[4] |= S_008F20_PITCH_GFX9(image->surface.u.gfx9.surf.epitch);
}
state[5] &= C_008F24_META_DATA_ADDRESS &
C_008F24_META_PIPE_ALIGNED &
C_008F24_META_RB_ALIGNED;
if (meta_va) {
struct gfx9_surf_meta_flags meta;
if (image->dcc_offset)
meta = image->surface.u.gfx9.dcc;
else
meta = image->surface.u.gfx9.htile;
state[5] |= S_008F24_META_DATA_ADDRESS(meta_va >> 40) |
S_008F24_META_PIPE_ALIGNED(meta.pipe_aligned) |
S_008F24_META_RB_ALIGNED(meta.rb_aligned);
}
} else {
/* SI-CI-VI */
unsigned pitch = base_level_info->nblk_x * block_width;
unsigned index = si_tile_mode_index(image, base_level, is_stencil);
state[3] &= C_008F1C_TILING_INDEX;
state[3] |= S_008F1C_TILING_INDEX(index);
state[4] &= C_008F20_PITCH_GFX6;
state[4] |= S_008F20_PITCH_GFX6(pitch - 1);
} }
} }
@@ -249,6 +289,36 @@ static unsigned radv_tex_dim(VkImageType image_type, VkImageViewType view_type,
unreachable("illegale image type"); unreachable("illegale image type");
} }
} }
static unsigned gfx9_border_color_swizzle(const unsigned char swizzle[4])
{
unsigned bc_swizzle = V_008F20_BC_SWIZZLE_XYZW;
if (swizzle[3] == VK_SWIZZLE_X) {
/* For the pre-defined border color values (white, opaque
* black, transparent black), the only thing that matters is
* that the alpha channel winds up in the correct place
* (because the RGB channels are all the same) so either of
* these enumerations will work.
*/
if (swizzle[2] == VK_SWIZZLE_Y)
bc_swizzle = V_008F20_BC_SWIZZLE_WZYX;
else
bc_swizzle = V_008F20_BC_SWIZZLE_WXYZ;
} else if (swizzle[0] == VK_SWIZZLE_X) {
if (swizzle[1] == VK_SWIZZLE_Y)
bc_swizzle = V_008F20_BC_SWIZZLE_XYZW;
else
bc_swizzle = V_008F20_BC_SWIZZLE_XWYZ;
} else if (swizzle[1] == VK_SWIZZLE_X) {
bc_swizzle = V_008F20_BC_SWIZZLE_YXWZ;
} else if (swizzle[2] == VK_SWIZZLE_X) {
bc_swizzle = V_008F20_BC_SWIZZLE_ZYXW;
}
return bc_swizzle;
}
/** /**
* Build the sampler view descriptor for a texture. * Build the sampler view descriptor for a texture.
*/ */
@@ -291,40 +361,59 @@ si_make_texture_descriptor(struct radv_device *device,
data_format = 0; data_format = 0;
} }
type = radv_tex_dim(image->type, view_type, image->array_size, image->samples, type = radv_tex_dim(image->type, view_type, image->info.array_size, image->info.samples,
(image->usage & VK_IMAGE_USAGE_STORAGE_BIT)); (image->usage & VK_IMAGE_USAGE_STORAGE_BIT));
if (type == V_008F1C_SQ_RSRC_IMG_1D_ARRAY) { if (type == V_008F1C_SQ_RSRC_IMG_1D_ARRAY) {
height = 1; height = 1;
depth = image->array_size; depth = image->info.array_size;
} else if (type == V_008F1C_SQ_RSRC_IMG_2D_ARRAY || } else if (type == V_008F1C_SQ_RSRC_IMG_2D_ARRAY ||
type == V_008F1C_SQ_RSRC_IMG_2D_MSAA_ARRAY) { type == V_008F1C_SQ_RSRC_IMG_2D_MSAA_ARRAY) {
if (view_type != VK_IMAGE_VIEW_TYPE_3D) if (view_type != VK_IMAGE_VIEW_TYPE_3D)
depth = image->array_size; depth = image->info.array_size;
} else if (type == V_008F1C_SQ_RSRC_IMG_CUBE) } else if (type == V_008F1C_SQ_RSRC_IMG_CUBE)
depth = image->array_size / 6; depth = image->info.array_size / 6;
state[0] = 0; state[0] = 0;
state[1] = (S_008F14_DATA_FORMAT_GFX6(data_format) | state[1] = (S_008F14_DATA_FORMAT_GFX6(data_format) |
S_008F14_NUM_FORMAT_GFX6(num_format)); S_008F14_NUM_FORMAT_GFX6(num_format));
state[2] = (S_008F18_WIDTH(width - 1) | state[2] = (S_008F18_WIDTH(width - 1) |
S_008F18_HEIGHT(height - 1)); S_008F18_HEIGHT(height - 1) |
S_008F18_PERF_MOD(4));
state[3] = (S_008F1C_DST_SEL_X(radv_map_swizzle(swizzle[0])) | state[3] = (S_008F1C_DST_SEL_X(radv_map_swizzle(swizzle[0])) |
S_008F1C_DST_SEL_Y(radv_map_swizzle(swizzle[1])) | S_008F1C_DST_SEL_Y(radv_map_swizzle(swizzle[1])) |
S_008F1C_DST_SEL_Z(radv_map_swizzle(swizzle[2])) | S_008F1C_DST_SEL_Z(radv_map_swizzle(swizzle[2])) |
S_008F1C_DST_SEL_W(radv_map_swizzle(swizzle[3])) | S_008F1C_DST_SEL_W(radv_map_swizzle(swizzle[3])) |
S_008F1C_BASE_LEVEL(image->samples > 1 ? S_008F1C_BASE_LEVEL(image->info.samples > 1 ?
0 : first_level) | 0 : first_level) |
S_008F1C_LAST_LEVEL(image->samples > 1 ? S_008F1C_LAST_LEVEL(image->info.samples > 1 ?
util_logbase2(image->samples) : util_logbase2(image->info.samples) :
last_level) | last_level) |
S_008F1C_POW2_PAD(image->levels > 1) |
S_008F1C_TYPE(type)); S_008F1C_TYPE(type));
state[4] = S_008F20_DEPTH(depth - 1); state[4] = 0;
state[5] = (S_008F24_BASE_ARRAY(first_layer) | state[5] = S_008F24_BASE_ARRAY(first_layer);
S_008F24_LAST_ARRAY(last_layer));
state[6] = 0; state[6] = 0;
state[7] = 0; state[7] = 0;
if (device->physical_device->rad_info.chip_class >= GFX9) {
unsigned bc_swizzle = gfx9_border_color_swizzle(desc->swizzle);
/* Depth is the the last accessible layer on Gfx9.
* The hw doesn't need to know the total number of layers.
*/
if (type == V_008F1C_SQ_RSRC_IMG_3D)
state[4] |= S_008F20_DEPTH(depth - 1);
else
state[4] |= S_008F20_DEPTH(last_layer);
state[4] |= S_008F20_BC_SWIZZLE(bc_swizzle);
state[5] |= S_008F24_MAX_MIP(image->info.samples > 1 ?
util_logbase2(image->info.samples) :
last_level);
} else {
state[3] |= S_008F1C_POW2_PAD(image->info.levels > 1);
state[4] |= S_008F20_DEPTH(depth - 1);
state[5] |= S_008F24_LAST_ARRAY(last_layer);
}
if (image->dcc_offset) { if (image->dcc_offset) {
unsigned swap = radv_translate_colorswap(vk_format, FALSE); unsigned swap = radv_translate_colorswap(vk_format, FALSE);
@@ -333,7 +422,7 @@ si_make_texture_descriptor(struct radv_device *device,
/* The last dword is unused by hw. The shader uses it to clear /* The last dword is unused by hw. The shader uses it to clear
* bits in the first dword of sampler state. * bits in the first dword of sampler state.
*/ */
if (device->physical_device->rad_info.chip_class <= CIK && image->samples <= 1) { if (device->physical_device->rad_info.chip_class <= CIK && image->info.samples <= 1) {
if (first_level == last_level) if (first_level == last_level)
state[7] = C_008F30_MAX_ANISO_RATIO; state[7] = C_008F30_MAX_ANISO_RATIO;
else else
@@ -343,46 +432,75 @@ si_make_texture_descriptor(struct radv_device *device,
/* Initialize the sampler view for FMASK. */ /* Initialize the sampler view for FMASK. */
if (image->fmask.size) { if (image->fmask.size) {
uint32_t fmask_format; uint32_t fmask_format, num_format;
uint64_t gpu_address = device->ws->buffer_get_va(image->bo); uint64_t gpu_address = device->ws->buffer_get_va(image->bo);
uint64_t va; uint64_t va;
va = gpu_address + image->offset + image->fmask.offset; va = gpu_address + image->offset + image->fmask.offset;
switch (image->samples) { if (device->physical_device->rad_info.chip_class >= GFX9) {
case 2: fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK;
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S2_F2; switch (image->info.samples) {
break; case 2:
case 4: num_format = V_008F14_IMG_FMASK_8_2_2;
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S4_F4; break;
break; case 4:
case 8: num_format = V_008F14_IMG_FMASK_8_4_4;
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK32_S8_F8; break;
break; case 8:
default: num_format = V_008F14_IMG_FMASK_32_8_8;
assert(0); break;
fmask_format = V_008F14_IMG_DATA_FORMAT_INVALID; default:
unreachable("invalid nr_samples");
}
} else {
switch (image->info.samples) {
case 2:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S2_F2;
break;
case 4:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S4_F4;
break;
case 8:
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK32_S8_F8;
break;
default:
assert(0);
fmask_format = V_008F14_IMG_DATA_FORMAT_INVALID;
}
num_format = V_008F14_IMG_NUM_FORMAT_UINT;
} }
fmask_state[0] = va >> 8; fmask_state[0] = va >> 8;
fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >> 40) | fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >> 40) |
S_008F14_DATA_FORMAT_GFX6(fmask_format) | S_008F14_DATA_FORMAT_GFX6(fmask_format) |
S_008F14_NUM_FORMAT_GFX6(V_008F14_IMG_NUM_FORMAT_UINT); S_008F14_NUM_FORMAT_GFX6(num_format);
fmask_state[2] = S_008F18_WIDTH(width - 1) | fmask_state[2] = S_008F18_WIDTH(width - 1) |
S_008F18_HEIGHT(height - 1); S_008F18_HEIGHT(height - 1);
fmask_state[3] = S_008F1C_DST_SEL_X(V_008F1C_SQ_SEL_X) | fmask_state[3] = S_008F1C_DST_SEL_X(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) | S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_Z(V_008F1C_SQ_SEL_X) | S_008F1C_DST_SEL_Z(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_W(V_008F1C_SQ_SEL_X) | S_008F1C_DST_SEL_W(V_008F1C_SQ_SEL_X) |
S_008F1C_TILING_INDEX(image->fmask.tile_mode_index) |
S_008F1C_TYPE(radv_tex_dim(image->type, view_type, 1, 0, false)); S_008F1C_TYPE(radv_tex_dim(image->type, view_type, 1, 0, false));
fmask_state[4] = S_008F20_DEPTH(depth - 1) | fmask_state[4] = 0;
S_008F20_PITCH_GFX6(image->fmask.pitch_in_pixels - 1); fmask_state[5] = S_008F24_BASE_ARRAY(first_layer);
fmask_state[5] = S_008F24_BASE_ARRAY(first_layer) |
S_008F24_LAST_ARRAY(last_layer);
fmask_state[6] = 0; fmask_state[6] = 0;
fmask_state[7] = 0; fmask_state[7] = 0;
}
if (device->physical_device->rad_info.chip_class >= GFX9) {
fmask_state[3] |= S_008F1C_SW_MODE(image->surface.u.gfx9.fmask.swizzle_mode);
fmask_state[4] |= S_008F20_DEPTH(last_layer) |
S_008F20_PITCH_GFX9(image->surface.u.gfx9.fmask.epitch);
fmask_state[5] |= S_008F24_META_PIPE_ALIGNED(image->surface.u.gfx9.cmask.pipe_aligned) |
S_008F24_META_RB_ALIGNED(image->surface.u.gfx9.cmask.rb_aligned);
} else {
fmask_state[3] |= S_008F1C_TILING_INDEX(image->fmask.tile_mode_index);
fmask_state[4] |= S_008F20_DEPTH(depth - 1) |
S_008F20_PITCH_GFX6(image->fmask.pitch_in_pixels - 1);
fmask_state[5] |= S_008F24_LAST_ARRAY(last_layer);
}
} else if (fmask_state)
memset(fmask_state, 0, 8 * 4);
} }
static void static void
@@ -410,13 +528,13 @@ radv_query_opaque_metadata(struct radv_device *device,
si_make_texture_descriptor(device, image, true, si_make_texture_descriptor(device, image, true,
(VkImageViewType)image->type, image->vk_format, (VkImageViewType)image->type, image->vk_format,
&fixedmapping, 0, image->levels - 1, 0, &fixedmapping, 0, image->info.levels - 1, 0,
image->array_size, image->info.array_size,
image->extent.width, image->extent.height, image->info.width, image->info.height,
image->extent.depth, image->info.depth,
desc, NULL); desc, NULL);
si_set_mutable_tex_desc_fields(device, image, &image->surface.level[0], 0, 0, si_set_mutable_tex_desc_fields(device, image, &image->surface.u.legacy.level[0], 0, 0,
image->surface.blk_w, false, desc); image->surface.blk_w, false, desc);
/* Clear the base address and set the relative DCC offset. */ /* Clear the base address and set the relative DCC offset. */
@@ -428,10 +546,10 @@ radv_query_opaque_metadata(struct radv_device *device,
memcpy(&md->metadata[2], desc, sizeof(desc)); memcpy(&md->metadata[2], desc, sizeof(desc));
/* Dwords [10:..] contain the mipmap level offsets. */ /* Dwords [10:..] contain the mipmap level offsets. */
for (i = 0; i <= image->levels - 1; i++) for (i = 0; i <= image->info.levels - 1; i++)
md->metadata[10+i] = image->surface.level[i].offset >> 8; md->metadata[10+i] = image->surface.u.legacy.level[i].offset >> 8;
md->size_metadata = (11 + image->levels - 1) * 4; md->size_metadata = (11 + image->info.levels - 1) * 4;
} }
void void
@@ -442,19 +560,23 @@ radv_init_metadata(struct radv_device *device,
struct radeon_surf *surface = &image->surface; struct radeon_surf *surface = &image->surface;
memset(metadata, 0, sizeof(*metadata)); memset(metadata, 0, sizeof(*metadata));
metadata->microtile = surface->level[0].mode >= RADEON_SURF_MODE_1D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->macrotile = surface->level[0].mode >= RADEON_SURF_MODE_2D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->pipe_config = surface->pipe_config;
metadata->bankw = surface->bankw;
metadata->bankh = surface->bankh;
metadata->tile_split = surface->tile_split;
metadata->mtilea = surface->mtilea;
metadata->num_banks = surface->num_banks;
metadata->stride = surface->level[0].pitch_bytes;
metadata->scanout = (surface->flags & RADEON_SURF_SCANOUT) != 0;
if (device->physical_device->rad_info.chip_class >= GFX9) {
metadata->u.gfx9.swizzle_mode = surface->u.gfx9.surf.swizzle_mode;
} else {
metadata->u.legacy.microtile = surface->u.legacy.level[0].mode >= RADEON_SURF_MODE_1D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->u.legacy.macrotile = surface->u.legacy.level[0].mode >= RADEON_SURF_MODE_2D ?
RADEON_LAYOUT_TILED : RADEON_LAYOUT_LINEAR;
metadata->u.legacy.pipe_config = surface->u.legacy.pipe_config;
metadata->u.legacy.bankw = surface->u.legacy.bankw;
metadata->u.legacy.bankh = surface->u.legacy.bankh;
metadata->u.legacy.tile_split = surface->u.legacy.tile_split;
metadata->u.legacy.mtilea = surface->u.legacy.mtilea;
metadata->u.legacy.num_banks = surface->u.legacy.num_banks;
metadata->u.legacy.stride = surface->u.legacy.level[0].nblk_x * surface->bpe;
metadata->u.legacy.scanout = (surface->flags & RADEON_SURF_SCANOUT) != 0;
}
radv_query_opaque_metadata(device, image, metadata); radv_query_opaque_metadata(device, image, metadata);
} }
@@ -466,14 +588,20 @@ radv_image_get_fmask_info(struct radv_device *device,
struct radv_fmask_info *out) struct radv_fmask_info *out)
{ {
/* FMASK is allocated like an ordinary texture. */ /* FMASK is allocated like an ordinary texture. */
struct radeon_surf fmask = image->surface; struct radeon_surf fmask = {};
struct ac_surf_info info = image->info;
memset(out, 0, sizeof(*out)); memset(out, 0, sizeof(*out));
fmask.bo_alignment = 0; if (device->physical_device->rad_info.chip_class >= GFX9) {
fmask.bo_size = 0; out->alignment = image->surface.u.gfx9.fmask_alignment;
fmask.nsamples = 1; out->size = image->surface.u.gfx9.fmask_size;
fmask.flags |= RADEON_SURF_FMASK; return;
}
fmask.blk_w = image->surface.blk_w;
fmask.blk_h = image->surface.blk_h;
info.samples = 1;
fmask.flags = image->surface.flags | RADEON_SURF_FMASK;
/* Force 2D tiling if it wasn't set. This may occur when creating /* Force 2D tiling if it wasn't set. This may occur when creating
* FMASK for MSAA resolve on R6xx. On R6xx, the single-sample * FMASK for MSAA resolve on R6xx. On R6xx, the single-sample
@@ -481,8 +609,6 @@ radv_image_get_fmask_info(struct radv_device *device,
fmask.flags = RADEON_SURF_CLR(fmask.flags, MODE); fmask.flags = RADEON_SURF_CLR(fmask.flags, MODE);
fmask.flags |= RADEON_SURF_SET(RADEON_SURF_MODE_2D, MODE); fmask.flags |= RADEON_SURF_SET(RADEON_SURF_MODE_2D, MODE);
fmask.flags |= RADEON_SURF_HAS_TILE_MODE_INDEX;
switch (nr_samples) { switch (nr_samples) {
case 2: case 2:
case 4: case 4:
@@ -495,25 +621,25 @@ radv_image_get_fmask_info(struct radv_device *device,
return; return;
} }
device->ws->surface_init(device->ws, &fmask); device->ws->surface_init(device->ws, &info, &fmask);
assert(fmask.level[0].mode == RADEON_SURF_MODE_2D); assert(fmask.u.legacy.level[0].mode == RADEON_SURF_MODE_2D);
out->slice_tile_max = (fmask.level[0].nblk_x * fmask.level[0].nblk_y) / 64; out->slice_tile_max = (fmask.u.legacy.level[0].nblk_x * fmask.u.legacy.level[0].nblk_y) / 64;
if (out->slice_tile_max) if (out->slice_tile_max)
out->slice_tile_max -= 1; out->slice_tile_max -= 1;
out->tile_mode_index = fmask.tiling_index[0]; out->tile_mode_index = fmask.u.legacy.tiling_index[0];
out->pitch_in_pixels = fmask.level[0].nblk_x; out->pitch_in_pixels = fmask.u.legacy.level[0].nblk_x;
out->bank_height = fmask.bankh; out->bank_height = fmask.u.legacy.bankh;
out->alignment = MAX2(256, fmask.bo_alignment); out->alignment = MAX2(256, fmask.surf_alignment);
out->size = fmask.bo_size; out->size = fmask.surf_size;
} }
static void static void
radv_image_alloc_fmask(struct radv_device *device, radv_image_alloc_fmask(struct radv_device *device,
struct radv_image *image) struct radv_image *image)
{ {
radv_image_get_fmask_info(device, image, image->samples, &image->fmask); radv_image_get_fmask_info(device, image, image->info.samples, &image->fmask);
image->fmask.offset = align64(image->size, image->fmask.alignment); image->fmask.offset = align64(image->size, image->fmask.alignment);
image->size = image->fmask.offset + image->fmask.size; image->size = image->fmask.offset + image->fmask.size;
@@ -529,6 +655,12 @@ radv_image_get_cmask_info(struct radv_device *device,
unsigned num_pipes = device->physical_device->rad_info.num_tile_pipes; unsigned num_pipes = device->physical_device->rad_info.num_tile_pipes;
unsigned cl_width, cl_height; unsigned cl_width, cl_height;
if (device->physical_device->rad_info.chip_class >= GFX9) {
out->alignment = image->surface.u.gfx9.cmask_alignment;
out->size = image->surface.u.gfx9.cmask_size;
return;
}
switch (num_pipes) { switch (num_pipes) {
case 2: case 2:
cl_width = 32; cl_width = 32;
@@ -553,8 +685,8 @@ radv_image_get_cmask_info(struct radv_device *device,
unsigned base_align = num_pipes * pipe_interleave_bytes; unsigned base_align = num_pipes * pipe_interleave_bytes;
unsigned width = align(image->surface.npix_x, cl_width*8); unsigned width = align(image->info.width, cl_width*8);
unsigned height = align(image->surface.npix_y, cl_height*8); unsigned height = align(image->info.height, cl_height*8);
unsigned slice_elements = (width * height) / (8*8); unsigned slice_elements = (width * height) / (8*8);
/* Each element of CMASK is a nibble. */ /* Each element of CMASK is a nibble. */
@@ -565,7 +697,7 @@ radv_image_get_cmask_info(struct radv_device *device,
out->slice_tile_max -= 1; out->slice_tile_max -= 1;
out->alignment = MAX2(256, base_align); out->alignment = MAX2(256, base_align);
out->size = (image->type == VK_IMAGE_TYPE_3D ? image->extent.depth : image->array_size) * out->size = (image->type == VK_IMAGE_TYPE_3D ? image->info.depth : image->info.array_size) *
align(slice_bytes, base_align); align(slice_bytes, base_align);
} }
@@ -597,7 +729,7 @@ static void
radv_image_alloc_htile(struct radv_device *device, radv_image_alloc_htile(struct radv_device *device,
struct radv_image *image) struct radv_image *image)
{ {
if ((device->debug_flags & RADV_DEBUG_NO_HIZ) || image->levels > 1) { if ((device->debug_flags & RADV_DEBUG_NO_HIZ) || image->info.levels > 1) {
image->surface.htile_size = 0; image->surface.htile_size = 0;
return; return;
} }
@@ -636,11 +768,14 @@ radv_image_create(VkDevice _device,
memset(image, 0, sizeof(*image)); memset(image, 0, sizeof(*image));
image->type = pCreateInfo->imageType; image->type = pCreateInfo->imageType;
image->extent = pCreateInfo->extent; image->info.width = pCreateInfo->extent.width;
image->info.height = pCreateInfo->extent.height;
image->info.depth = pCreateInfo->extent.depth;
image->info.samples = pCreateInfo->samples;
image->info.array_size = pCreateInfo->arrayLayers;
image->info.levels = pCreateInfo->mipLevels;
image->vk_format = pCreateInfo->format; image->vk_format = pCreateInfo->format;
image->levels = pCreateInfo->mipLevels;
image->array_size = pCreateInfo->arrayLayers;
image->samples = pCreateInfo->samples;
image->tiling = pCreateInfo->tiling; image->tiling = pCreateInfo->tiling;
image->usage = pCreateInfo->usage; image->usage = pCreateInfo->usage;
image->flags = pCreateInfo->flags; image->flags = pCreateInfo->flags;
@@ -648,15 +783,18 @@ radv_image_create(VkDevice _device,
image->exclusive = pCreateInfo->sharingMode == VK_SHARING_MODE_EXCLUSIVE; image->exclusive = pCreateInfo->sharingMode == VK_SHARING_MODE_EXCLUSIVE;
if (pCreateInfo->sharingMode == VK_SHARING_MODE_CONCURRENT) { if (pCreateInfo->sharingMode == VK_SHARING_MODE_CONCURRENT) {
for (uint32_t i = 0; i < pCreateInfo->queueFamilyIndexCount; ++i) for (uint32_t i = 0; i < pCreateInfo->queueFamilyIndexCount; ++i)
image->queue_family_mask |= 1u << pCreateInfo->pQueueFamilyIndices[i]; if (pCreateInfo->pQueueFamilyIndices[i] == VK_QUEUE_FAMILY_EXTERNAL_KHX)
image->queue_family_mask |= (1u << RADV_MAX_QUEUE_FAMILIES) - 1u;
else
image->queue_family_mask |= 1u << pCreateInfo->pQueueFamilyIndices[i];
} }
radv_init_surface(device, &image->surface, create_info); radv_init_surface(device, &image->surface, create_info);
device->ws->surface_init(device->ws, &image->surface); device->ws->surface_init(device->ws, &image->info, &image->surface);
image->size = image->surface.bo_size; image->size = image->surface.surf_size;
image->alignment = image->surface.bo_alignment; image->alignment = image->surface.surf_alignment;
if (image->exclusive || image->queue_family_mask == 1) if (image->exclusive || image->queue_family_mask == 1)
can_cmask_dcc = true; can_cmask_dcc = true;
@@ -669,22 +807,15 @@ radv_image_create(VkDevice _device,
if ((pCreateInfo->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) && if ((pCreateInfo->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) &&
pCreateInfo->mipLevels == 1 && pCreateInfo->mipLevels == 1 &&
!image->surface.dcc_size && image->extent.depth == 1 && can_cmask_dcc) !image->surface.dcc_size && image->info.depth == 1 && can_cmask_dcc)
radv_image_alloc_cmask(device, image); radv_image_alloc_cmask(device, image);
if (image->samples > 1 && vk_format_is_color(pCreateInfo->format)) { if (image->info.samples > 1 && vk_format_is_color(pCreateInfo->format)) {
radv_image_alloc_fmask(device, image); radv_image_alloc_fmask(device, image);
} else if (vk_format_is_depth(pCreateInfo->format)) { } else if (vk_format_is_depth(pCreateInfo->format)) {
radv_image_alloc_htile(device, image); radv_image_alloc_htile(device, image);
} }
if (create_info->stride && create_info->stride != image->surface.level[0].pitch_bytes) {
image->surface.level[0].nblk_x = create_info->stride / image->surface.bpe;
image->surface.level[0].pitch_bytes = create_info->stride;
image->surface.level[0].slice_size = create_info->stride * image->surface.level[0].nblk_y;
}
if (pCreateInfo->flags & VK_IMAGE_CREATE_SPARSE_BINDING_BIT) { if (pCreateInfo->flags & VK_IMAGE_CREATE_SPARSE_BINDING_BIT) {
image->alignment = MAX2(image->alignment, 4096); image->alignment = MAX2(image->alignment, 4096);
image->size = align64(image->size, image->alignment); image->size = align64(image->size, image->alignment);
@@ -706,9 +837,7 @@ radv_image_create(VkDevice _device,
void void
radv_image_view_init(struct radv_image_view *iview, radv_image_view_init(struct radv_image_view *iview,
struct radv_device *device, struct radv_device *device,
const VkImageViewCreateInfo* pCreateInfo, const VkImageViewCreateInfo* pCreateInfo)
struct radv_cmd_buffer *cmd_buffer,
VkImageUsageFlags usage_mask)
{ {
RADV_FROM_HANDLE(radv_image, image, pCreateInfo->image); RADV_FROM_HANDLE(radv_image, image, pCreateInfo->image);
const VkImageSubresourceRange *range = &pCreateInfo->subresourceRange; const VkImageSubresourceRange *range = &pCreateInfo->subresourceRange;
@@ -717,11 +846,11 @@ radv_image_view_init(struct radv_image_view *iview,
switch (image->type) { switch (image->type) {
case VK_IMAGE_TYPE_1D: case VK_IMAGE_TYPE_1D:
case VK_IMAGE_TYPE_2D: case VK_IMAGE_TYPE_2D:
assert(range->baseArrayLayer + radv_get_layerCount(image, range) - 1 <= image->array_size); assert(range->baseArrayLayer + radv_get_layerCount(image, range) - 1 <= image->info.array_size);
break; break;
case VK_IMAGE_TYPE_3D: case VK_IMAGE_TYPE_3D:
assert(range->baseArrayLayer + radv_get_layerCount(image, range) - 1 assert(range->baseArrayLayer + radv_get_layerCount(image, range) - 1
<= radv_minify(image->extent.depth, range->baseMipLevel)); <= radv_minify(image->info.depth, range->baseMipLevel));
break; break;
default: default:
unreachable("bad VkImageType"); unreachable("bad VkImageType");
@@ -740,9 +869,9 @@ radv_image_view_init(struct radv_image_view *iview,
} }
iview->extent = (VkExtent3D) { iview->extent = (VkExtent3D) {
.width = radv_minify(image->extent.width , range->baseMipLevel), .width = radv_minify(image->info.width , range->baseMipLevel),
.height = radv_minify(image->extent.height, range->baseMipLevel), .height = radv_minify(image->info.height, range->baseMipLevel),
.depth = radv_minify(image->extent.depth , range->baseMipLevel), .depth = radv_minify(image->info.depth , range->baseMipLevel),
}; };
iview->extent.width = round_up_u32(iview->extent.width * vk_format_get_blockwidth(iview->vk_format), iview->extent.width = round_up_u32(iview->extent.width * vk_format_get_blockwidth(iview->vk_format),
@@ -769,91 +898,31 @@ radv_image_view_init(struct radv_image_view *iview,
iview->descriptor, iview->descriptor,
iview->fmask_descriptor); iview->fmask_descriptor);
si_set_mutable_tex_desc_fields(device, image, si_set_mutable_tex_desc_fields(device, image,
is_stencil ? &image->surface.stencil_level[range->baseMipLevel] : &image->surface.level[range->baseMipLevel], range->baseMipLevel, is_stencil ? &image->surface.u.legacy.stencil_level[range->baseMipLevel]
: &image->surface.u.legacy.level[range->baseMipLevel],
range->baseMipLevel,
range->baseMipLevel, range->baseMipLevel,
blk_w, is_stencil, iview->descriptor); blk_w, is_stencil, iview->descriptor);
} }
void radv_image_set_optimal_micro_tile_mode(struct radv_device *device,
struct radv_image *image, uint32_t micro_tile_mode)
{
/* These magic numbers were copied from addrlib. It doesn't use any
* definitions for them either. They are all 2D_TILED_THIN1 modes with
* different bpp and micro tile mode.
*/
if (device->physical_device->rad_info.chip_class >= CIK) {
switch (micro_tile_mode) {
case 0: /* displayable */
image->surface.tiling_index[0] = 10;
break;
case 1: /* thin */
image->surface.tiling_index[0] = 14;
break;
case 3: /* rotated */
image->surface.tiling_index[0] = 28;
break;
default: /* depth, thick */
assert(!"unexpected micro mode");
return;
}
} else { /* SI */
switch (micro_tile_mode) {
case 0: /* displayable */
switch (image->surface.bpe) {
case 1:
image->surface.tiling_index[0] = 10;
break;
case 2:
image->surface.tiling_index[0] = 11;
break;
default: /* 4, 8 */
image->surface.tiling_index[0] = 12;
break;
}
break;
case 1: /* thin */
switch (image->surface.bpe) {
case 1:
image->surface.tiling_index[0] = 14;
break;
case 2:
image->surface.tiling_index[0] = 15;
break;
case 4:
image->surface.tiling_index[0] = 16;
break;
default: /* 8, 16 */
image->surface.tiling_index[0] = 17;
break;
}
break;
default: /* depth, thick */
assert(!"unexpected micro mode");
return;
}
}
image->surface.micro_tile_mode = micro_tile_mode;
}
bool radv_layout_has_htile(const struct radv_image *image, bool radv_layout_has_htile(const struct radv_image *image,
VkImageLayout layout) VkImageLayout layout,
unsigned queue_mask)
{ {
return (layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL || return image->surface.htile_size &&
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL); (layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) &&
queue_mask == (1u << RADV_QUEUE_GENERAL);
} }
bool radv_layout_is_htile_compressed(const struct radv_image *image, bool radv_layout_is_htile_compressed(const struct radv_image *image,
VkImageLayout layout) VkImageLayout layout,
unsigned queue_mask)
{ {
return layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; return image->surface.htile_size &&
} (layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) &&
bool radv_layout_can_expclear(const struct radv_image *image, queue_mask == (1u << RADV_QUEUE_GENERAL);
VkImageLayout layout)
{
return (layout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
layout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
} }
bool radv_layout_can_fast_clear(const struct radv_image *image, bool radv_layout_can_fast_clear(const struct radv_image *image,
@@ -869,6 +938,8 @@ unsigned radv_image_queue_family_mask(const struct radv_image *image, uint32_t f
{ {
if (!image->exclusive) if (!image->exclusive)
return image->queue_family_mask; return image->queue_family_mask;
if (family == VK_QUEUE_FAMILY_EXTERNAL_KHX)
return (1u << RADV_MAX_QUEUE_FAMILIES) - 1u;
if (family == VK_QUEUE_FAMILY_IGNORED) if (family == VK_QUEUE_FAMILY_IGNORED)
return 1u << queue_family; return 1u << queue_family;
return 1u << family; return 1u << family;
@@ -914,14 +985,15 @@ void radv_GetImageSubresourceLayout(
RADV_FROM_HANDLE(radv_image, image, _image); RADV_FROM_HANDLE(radv_image, image, _image);
int level = pSubresource->mipLevel; int level = pSubresource->mipLevel;
int layer = pSubresource->arrayLayer; int layer = pSubresource->arrayLayer;
struct radeon_surf *surface = &image->surface;
pLayout->offset = image->surface.level[level].offset + image->surface.level[level].slice_size * layer; pLayout->offset = surface->u.legacy.level[level].offset + surface->u.legacy.level[level].slice_size * layer;
pLayout->rowPitch = image->surface.level[level].pitch_bytes; pLayout->rowPitch = surface->u.legacy.level[level].nblk_x * surface->bpe;
pLayout->arrayPitch = image->surface.level[level].slice_size; pLayout->arrayPitch = surface->u.legacy.level[level].slice_size;
pLayout->depthPitch = image->surface.level[level].slice_size; pLayout->depthPitch = surface->u.legacy.level[level].slice_size;
pLayout->size = image->surface.level[level].slice_size; pLayout->size = surface->u.legacy.level[level].slice_size;
if (image->type == VK_IMAGE_TYPE_3D) if (image->type == VK_IMAGE_TYPE_3D)
pLayout->size *= image->surface.level[level].nblk_z; pLayout->size *= u_minify(image->info.depth, level);
} }
@@ -939,7 +1011,7 @@ radv_CreateImageView(VkDevice _device,
if (view == NULL) if (view == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
radv_image_view_init(view, device, pCreateInfo, NULL, ~0); radv_image_view_init(view, device, pCreateInfo);
*pView = radv_image_view_to_handle(view); *pView = radv_image_view_to_handle(view);

View File

@@ -30,33 +30,35 @@
#include <pwd.h> #include <pwd.h>
#include <sys/stat.h> #include <sys/stat.h>
void static void
radv_meta_save(struct radv_meta_saved_state *state, radv_meta_save_novertex(struct radv_meta_saved_state *state,
const struct radv_cmd_buffer *cmd_buffer, const struct radv_cmd_buffer *cmd_buffer,
uint32_t dynamic_mask) uint32_t dynamic_mask)
{ {
state->old_pipeline = cmd_buffer->state.pipeline; state->old_pipeline = cmd_buffer->state.pipeline;
state->old_descriptor_set0 = cmd_buffer->state.descriptors[0]; state->old_descriptor_set0 = cmd_buffer->state.descriptors[0];
memcpy(state->old_vertex_bindings, cmd_buffer->state.vertex_bindings,
sizeof(state->old_vertex_bindings));
state->dynamic_mask = dynamic_mask; state->dynamic_mask = dynamic_mask;
radv_dynamic_state_copy(&state->dynamic, &cmd_buffer->state.dynamic, radv_dynamic_state_copy(&state->dynamic, &cmd_buffer->state.dynamic,
dynamic_mask); dynamic_mask);
memcpy(state->push_constants, cmd_buffer->push_constants, MAX_PUSH_CONSTANTS_SIZE); memcpy(state->push_constants, cmd_buffer->push_constants, MAX_PUSH_CONSTANTS_SIZE);
state->vertex_saved = false;
} }
void void
radv_meta_restore(const struct radv_meta_saved_state *state, radv_meta_restore(const struct radv_meta_saved_state *state,
struct radv_cmd_buffer *cmd_buffer) struct radv_cmd_buffer *cmd_buffer)
{ {
cmd_buffer->state.pipeline = state->old_pipeline; radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer), VK_PIPELINE_BIND_POINT_GRAPHICS,
radv_bind_descriptor_set(cmd_buffer, state->old_descriptor_set0, 0); radv_pipeline_to_handle(state->old_pipeline));
memcpy(cmd_buffer->state.vertex_bindings, state->old_vertex_bindings, cmd_buffer->state.descriptors[0] = state->old_descriptor_set0;
sizeof(state->old_vertex_bindings)); if (state->vertex_saved) {
memcpy(cmd_buffer->state.vertex_bindings, state->old_vertex_bindings,
sizeof(state->old_vertex_bindings));
cmd_buffer->state.vb_dirty |= (1 << RADV_META_VERTEX_BINDING_COUNT) - 1;
}
cmd_buffer->state.vb_dirty |= (1 << RADV_META_VERTEX_BINDING_COUNT) - 1;
cmd_buffer->state.dirty |= RADV_CMD_DIRTY_PIPELINE; cmd_buffer->state.dirty |= RADV_CMD_DIRTY_PIPELINE;
radv_dynamic_state_copy(&cmd_buffer->state.dynamic, &state->dynamic, radv_dynamic_state_copy(&cmd_buffer->state.dynamic, &state->dynamic,
@@ -110,7 +112,8 @@ radv_meta_restore_compute(const struct radv_meta_saved_compute_state *state,
{ {
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer), VK_PIPELINE_BIND_POINT_COMPUTE, radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer), VK_PIPELINE_BIND_POINT_COMPUTE,
radv_pipeline_to_handle(state->old_pipeline)); radv_pipeline_to_handle(state->old_pipeline));
radv_bind_descriptor_set(cmd_buffer, state->old_descriptor_set0, 0);
cmd_buffer->state.descriptors[0] = state->old_descriptor_set0;
if (push_constant_size) { if (push_constant_size) {
memcpy(cmd_buffer->push_constants, state->push_constants, push_constant_size); memcpy(cmd_buffer->push_constants, state->push_constants, push_constant_size);
@@ -335,8 +338,14 @@ radv_device_init_meta(struct radv_device *device)
result = radv_device_init_meta_resolve_compute_state(device); result = radv_device_init_meta_resolve_compute_state(device);
if (result != VK_SUCCESS) if (result != VK_SUCCESS)
goto fail_resolve_compute; goto fail_resolve_compute;
result = radv_device_init_meta_resolve_fragment_state(device);
if (result != VK_SUCCESS)
goto fail_resolve_fragment;
return VK_SUCCESS; return VK_SUCCESS;
fail_resolve_fragment:
radv_device_finish_meta_resolve_compute_state(device);
fail_resolve_compute: fail_resolve_compute:
radv_device_finish_meta_fast_clear_flush_state(device); radv_device_finish_meta_fast_clear_flush_state(device);
fail_fast_clear: fail_fast_clear:
@@ -373,6 +382,7 @@ radv_device_finish_meta(struct radv_device *device)
radv_device_finish_meta_buffer_state(device); radv_device_finish_meta_buffer_state(device);
radv_device_finish_meta_fast_clear_flush_state(device); radv_device_finish_meta_fast_clear_flush_state(device);
radv_device_finish_meta_resolve_compute_state(device); radv_device_finish_meta_resolve_compute_state(device);
radv_device_finish_meta_resolve_fragment_state(device);
radv_store_meta_pipeline(device); radv_store_meta_pipeline(device);
radv_pipeline_cache_finish(&device->meta_state.cache); radv_pipeline_cache_finish(&device->meta_state.cache);
@@ -384,12 +394,212 @@ radv_device_finish_meta(struct radv_device *device)
* should have no effect. * should have no effect.
*/ */
void void
radv_meta_save_graphics_reset_vport_scissor(struct radv_meta_saved_state *saved_state, radv_meta_save_graphics_reset_vport_scissor_novertex(struct radv_meta_saved_state *saved_state,
struct radv_cmd_buffer *cmd_buffer) struct radv_cmd_buffer *cmd_buffer)
{ {
uint32_t dirty_state = (1 << VK_DYNAMIC_STATE_VIEWPORT) | (1 << VK_DYNAMIC_STATE_SCISSOR); uint32_t dirty_state = (1 << VK_DYNAMIC_STATE_VIEWPORT) | (1 << VK_DYNAMIC_STATE_SCISSOR);
radv_meta_save(saved_state, cmd_buffer, dirty_state); radv_meta_save_novertex(saved_state, cmd_buffer, dirty_state);
cmd_buffer->state.dynamic.viewport.count = 0; cmd_buffer->state.dynamic.viewport.count = 0;
cmd_buffer->state.dynamic.scissor.count = 0; cmd_buffer->state.dynamic.scissor.count = 0;
cmd_buffer->state.dirty |= dirty_state; cmd_buffer->state.dirty |= dirty_state;
} }
nir_ssa_def *radv_meta_gen_rect_vertices_comp2(nir_builder *vs_b, nir_ssa_def *comp2)
{
nir_intrinsic_instr *vertex_id = nir_intrinsic_instr_create(vs_b->shader, nir_intrinsic_load_vertex_id_zero_base);
nir_ssa_dest_init(&vertex_id->instr, &vertex_id->dest, 1, 32, "vertexid");
nir_builder_instr_insert(vs_b, &vertex_id->instr);
/* vertex 0 - -1.0, -1.0 */
/* vertex 1 - -1.0, 1.0 */
/* vertex 2 - 1.0, -1.0 */
/* so channel 0 is vertex_id != 2 ? -1.0 : 1.0
channel 1 is vertex id != 1 ? -1.0 : 1.0 */
nir_ssa_def *c0cmp = nir_ine(vs_b, &vertex_id->dest.ssa,
nir_imm_int(vs_b, 2));
nir_ssa_def *c1cmp = nir_ine(vs_b, &vertex_id->dest.ssa,
nir_imm_int(vs_b, 1));
nir_ssa_def *comp[4];
comp[0] = nir_bcsel(vs_b, c0cmp,
nir_imm_float(vs_b, -1.0),
nir_imm_float(vs_b, 1.0));
comp[1] = nir_bcsel(vs_b, c1cmp,
nir_imm_float(vs_b, -1.0),
nir_imm_float(vs_b, 1.0));
comp[2] = comp2;
comp[3] = nir_imm_float(vs_b, 1.0);
nir_ssa_def *outvec = nir_vec(vs_b, comp, 4);
return outvec;
}
nir_ssa_def *radv_meta_gen_rect_vertices(nir_builder *vs_b)
{
return radv_meta_gen_rect_vertices_comp2(vs_b, nir_imm_float(vs_b, 0.0));
}
/* vertex shader that generates vertices */
nir_shader *
radv_meta_build_nir_vs_generate_vertices(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_vs_gen_verts");
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_store_var(&b, v_position, outvec, 0xf);
return b.shader;
}
nir_shader *
radv_meta_build_nir_fs_noop(void)
{
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_asprintf(b.shader,
"meta_noop_fs");
return b.shader;
}
static nir_ssa_def *radv_meta_build_resolve_srgb_conversion(nir_builder *b,
nir_ssa_def *input)
{
nir_const_value v;
unsigned i;
v.u32[0] = 0x3b4d2e1c; // 0.00313080009
nir_ssa_def *cmp[3];
for (i = 0; i < 3; i++)
cmp[i] = nir_flt(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *ltvals[3];
v.f32[0] = 12.92;
for (i = 0; i < 3; i++)
ltvals[i] = nir_fmul(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
nir_ssa_def *gtvals[3];
for (i = 0; i < 3; i++) {
v.f32[0] = 1.0/2.4;
gtvals[i] = nir_fpow(b, nir_channel(b, input, i),
nir_build_imm(b, 1, 32, v));
v.f32[0] = 1.055;
gtvals[i] = nir_fmul(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
v.f32[0] = 0.055;
gtvals[i] = nir_fsub(b, gtvals[i],
nir_build_imm(b, 1, 32, v));
}
nir_ssa_def *comp[4];
for (i = 0; i < 3; i++)
comp[i] = nir_bcsel(b, cmp[i], ltvals[i], gtvals[i]);
comp[3] = nir_channels(b, input, 3);
return nir_vec(b, comp, 4);
}
void radv_meta_build_resolve_shader_core(nir_builder *b,
bool is_integer,
bool is_srgb,
int samples,
nir_variable *input_img,
nir_variable *color,
nir_ssa_def *img_coord)
{
/* do a txf_ms on each sample */
nir_ssa_def *tmp;
nir_if *outer_if = NULL;
nir_tex_instr *tex = nir_tex_instr_create(b->shader, 2);
tex->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex->op = nir_texop_txf_ms;
tex->src[0].src_type = nir_tex_src_coord;
tex->src[0].src = nir_src_for_ssa(img_coord);
tex->src[1].src_type = nir_tex_src_ms_index;
tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, 0));
tex->dest_type = nir_type_float;
tex->is_array = false;
tex->coord_components = 2;
tex->texture = nir_deref_var_create(tex, input_img);
tex->sampler = NULL;
nir_ssa_dest_init(&tex->instr, &tex->dest, 4, 32, "tex");
nir_builder_instr_insert(b, &tex->instr);
tmp = &tex->dest.ssa;
if (!is_integer && samples > 1) {
nir_tex_instr *tex_all_same = nir_tex_instr_create(b->shader, 1);
tex_all_same->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_all_same->op = nir_texop_samples_identical;
tex_all_same->src[0].src_type = nir_tex_src_coord;
tex_all_same->src[0].src = nir_src_for_ssa(img_coord);
tex_all_same->dest_type = nir_type_float;
tex_all_same->is_array = false;
tex_all_same->coord_components = 2;
tex_all_same->texture = nir_deref_var_create(tex_all_same, input_img);
tex_all_same->sampler = NULL;
nir_ssa_dest_init(&tex_all_same->instr, &tex_all_same->dest, 1, 32, "tex");
nir_builder_instr_insert(b, &tex_all_same->instr);
nir_ssa_def *all_same = nir_ine(b, &tex_all_same->dest.ssa, nir_imm_int(b, 0));
nir_if *if_stmt = nir_if_create(b->shader);
if_stmt->condition = nir_src_for_ssa(all_same);
nir_cf_node_insert(b->cursor, &if_stmt->cf_node);
b->cursor = nir_after_cf_list(&if_stmt->then_list);
for (int i = 1; i < samples; i++) {
nir_tex_instr *tex_add = nir_tex_instr_create(b->shader, 2);
tex_add->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_add->op = nir_texop_txf_ms;
tex_add->src[0].src_type = nir_tex_src_coord;
tex_add->src[0].src = nir_src_for_ssa(img_coord);
tex_add->src[1].src_type = nir_tex_src_ms_index;
tex_add->src[1].src = nir_src_for_ssa(nir_imm_int(b, i));
tex_add->dest_type = nir_type_float;
tex_add->is_array = false;
tex_add->coord_components = 2;
tex_add->texture = nir_deref_var_create(tex_add, input_img);
tex_add->sampler = NULL;
nir_ssa_dest_init(&tex_add->instr, &tex_add->dest, 4, 32, "tex");
nir_builder_instr_insert(b, &tex_add->instr);
tmp = nir_fadd(b, tmp, &tex_add->dest.ssa);
}
tmp = nir_fdiv(b, tmp, nir_imm_float(b, samples));
nir_store_var(b, color, tmp, 0xf);
b->cursor = nir_after_cf_list(&if_stmt->else_list);
outer_if = if_stmt;
}
nir_store_var(b, color, &tex->dest.ssa, 0xf);
if (outer_if)
b->cursor = nir_after_cf_node(&outer_if->cf_node);
if (is_srgb) {
nir_ssa_def *newv = nir_load_var(b, color);
newv = radv_meta_build_resolve_srgb_conversion(b, newv);
nir_store_var(b, color, newv, 0xf);
}
}

View File

@@ -35,6 +35,7 @@ extern "C" {
#define RADV_META_VERTEX_BINDING_COUNT 2 #define RADV_META_VERTEX_BINDING_COUNT 2
struct radv_meta_saved_state { struct radv_meta_saved_state {
bool vertex_saved;
struct radv_vertex_binding old_vertex_bindings[RADV_META_VERTEX_BINDING_COUNT]; struct radv_vertex_binding old_vertex_bindings[RADV_META_VERTEX_BINDING_COUNT];
struct radv_descriptor_set *old_descriptor_set0; struct radv_descriptor_set *old_descriptor_set0;
struct radv_pipeline *old_pipeline; struct radv_pipeline *old_pipeline;
@@ -90,9 +91,9 @@ void radv_device_finish_meta_query_state(struct radv_device *device);
VkResult radv_device_init_meta_resolve_compute_state(struct radv_device *device); VkResult radv_device_init_meta_resolve_compute_state(struct radv_device *device);
void radv_device_finish_meta_resolve_compute_state(struct radv_device *device); void radv_device_finish_meta_resolve_compute_state(struct radv_device *device);
void radv_meta_save(struct radv_meta_saved_state *state,
const struct radv_cmd_buffer *cmd_buffer, VkResult radv_device_init_meta_resolve_fragment_state(struct radv_device *device);
uint32_t dynamic_mask); void radv_device_finish_meta_resolve_fragment_state(struct radv_device *device);
void radv_meta_restore(const struct radv_meta_saved_state *state, void radv_meta_restore(const struct radv_meta_saved_state *state,
struct radv_cmd_buffer *cmd_buffer); struct radv_cmd_buffer *cmd_buffer);
@@ -200,8 +201,8 @@ void radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image, struct radv_image *image,
const VkImageSubresourceRange *subresourceRange); const VkImageSubresourceRange *subresourceRange);
void radv_meta_save_graphics_reset_vport_scissor(struct radv_meta_saved_state *saved_state, void radv_meta_save_graphics_reset_vport_scissor_novertex(struct radv_meta_saved_state *saved_state,
struct radv_cmd_buffer *cmd_buffer); struct radv_cmd_buffer *cmd_buffer);
void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer, void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image, struct radv_image *src_image,
@@ -211,9 +212,33 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
uint32_t region_count, uint32_t region_count,
const VkImageResolve *regions); const VkImageResolve *regions);
void radv_meta_resolve_fragment_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image,
VkImageLayout src_image_layout,
struct radv_image *dest_image,
VkImageLayout dest_image_layout,
uint32_t region_count,
const VkImageResolve *regions);
void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer, void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image, struct radv_image *image,
struct radv_image *linear_image); struct radv_image *linear_image);
/* common nir builder helpers */
#include "nir/nir_builder.h"
nir_ssa_def *radv_meta_gen_rect_vertices(nir_builder *vs_b);
nir_ssa_def *radv_meta_gen_rect_vertices_comp2(nir_builder *vs_b, nir_ssa_def *comp2);
nir_shader *radv_meta_build_nir_vs_generate_vertices(void);
nir_shader *radv_meta_build_nir_fs_noop(void);
void radv_meta_build_resolve_shader_core(nir_builder *b,
bool is_integer,
bool is_srgb,
int samples,
nir_variable *input_img,
nir_variable *color,
nir_ssa_def *img_coord);
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif

View File

@@ -38,25 +38,64 @@ build_nir_vertex_shader(void)
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_blit_vs"); b.shader->info.name = ralloc_strdup(b.shader, "meta_blit_vs");
nir_variable *pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "a_pos");
pos_in->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out, nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "gl_Position"); vec4, "gl_Position");
pos_out->data.location = VARYING_SLOT_POS; pos_out->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, pos_out, pos_in);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "a_tex_pos");
tex_pos_in->data.location = VERT_ATTRIB_GENERIC1;
nir_variable *tex_pos_out = nir_variable_create(b.shader, nir_var_shader_out, nir_variable *tex_pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "v_tex_pos"); vec4, "v_tex_pos");
tex_pos_out->data.location = VARYING_SLOT_VAR0; tex_pos_out->data.location = VARYING_SLOT_VAR0;
tex_pos_out->data.interpolation = INTERP_MODE_SMOOTH; tex_pos_out->data.interpolation = INTERP_MODE_SMOOTH;
nir_copy_var(&b, tex_pos_out, tex_pos_in);
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
nir_store_var(&b, pos_out, outvec, 0xf);
nir_intrinsic_instr *src_box = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
src_box->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
nir_intrinsic_set_base(src_box, 0);
nir_intrinsic_set_range(src_box, 16);
src_box->num_components = 4;
nir_ssa_dest_init(&src_box->instr, &src_box->dest, 4, 32, "src_box");
nir_builder_instr_insert(&b, &src_box->instr);
nir_intrinsic_instr *src0_z = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
src0_z->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
nir_intrinsic_set_base(src0_z, 16);
nir_intrinsic_set_range(src0_z, 4);
src0_z->num_components = 1;
nir_ssa_dest_init(&src0_z->instr, &src0_z->dest, 1, 32, "src0_z");
nir_builder_instr_insert(&b, &src0_z->instr);
nir_intrinsic_instr *vertex_id = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_vertex_id_zero_base);
nir_ssa_dest_init(&vertex_id->instr, &vertex_id->dest, 1, 32, "vertexid");
nir_builder_instr_insert(&b, &vertex_id->instr);
/* vertex 0 - src0_x, src0_y, src0_z */
/* vertex 1 - src0_x, src1_y, src0_z*/
/* vertex 2 - src1_x, src0_y, src0_z */
/* so channel 0 is vertex_id != 2 ? src_x : src_x + w
channel 1 is vertex id != 1 ? src_y : src_y + w */
nir_ssa_def *c0cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 2));
nir_ssa_def *c1cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 1));
nir_ssa_def *comp[4];
comp[0] = nir_bcsel(&b, c0cmp,
nir_channel(&b, &src_box->dest.ssa, 0),
nir_channel(&b, &src_box->dest.ssa, 2));
comp[1] = nir_bcsel(&b, c1cmp,
nir_channel(&b, &src_box->dest.ssa, 1),
nir_channel(&b, &src_box->dest.ssa, 3));
comp[2] = &src0_z->dest.ssa;
comp[3] = nir_imm_float(&b, 1.0);
nir_ssa_def *out_tex_vec = nir_vec(&b, comp, 4);
nir_store_var(&b, tex_pos_out, out_tex_vec, 0xf);
return b.shader; return b.shader;
} }
@@ -70,7 +109,7 @@ build_nir_copy_fragment_shader(enum glsl_sampler_dim tex_dim)
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
sprintf(shader_name, "meta_blit_fs.%d", tex_dim); sprintf(shader_name, "meta_blit_fs.%d", tex_dim);
b.shader->info->name = ralloc_strdup(b.shader, shader_name); b.shader->info.name = ralloc_strdup(b.shader, shader_name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in, nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "v_tex_pos"); vec4, "v_tex_pos");
@@ -124,7 +163,7 @@ build_nir_copy_fragment_shader_depth(enum glsl_sampler_dim tex_dim)
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
sprintf(shader_name, "meta_blit_depth_fs.%d", tex_dim); sprintf(shader_name, "meta_blit_depth_fs.%d", tex_dim);
b.shader->info->name = ralloc_strdup(b.shader, shader_name); b.shader->info.name = ralloc_strdup(b.shader, shader_name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in, nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "v_tex_pos"); vec4, "v_tex_pos");
@@ -178,7 +217,7 @@ build_nir_copy_fragment_shader_stencil(enum glsl_sampler_dim tex_dim)
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
sprintf(shader_name, "meta_blit_stencil_fs.%d", tex_dim); sprintf(shader_name, "meta_blit_stencil_fs.%d", tex_dim);
b.shader->info->name = ralloc_strdup(b.shader, shader_name); b.shader->info.name = ralloc_strdup(b.shader, shader_name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in, nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "v_tex_pos"); vec4, "v_tex_pos");
@@ -236,65 +275,21 @@ meta_emit_blit(struct radv_cmd_buffer *cmd_buffer,
VkFilter blit_filter) VkFilter blit_filter)
{ {
struct radv_device *device = cmd_buffer->device; struct radv_device *device = cmd_buffer->device;
unsigned offset = 0;
struct blit_vb_data {
float pos[2];
float tex_coord[3];
} vb_data[3];
assert(src_image->samples == dest_image->samples); assert(src_image->info.samples == dest_image->info.samples);
unsigned vb_size = 3 * sizeof(*vb_data);
vb_data[0] = (struct blit_vb_data) { float vertex_push_constants[5] = {
.pos = { (float)src_offset_0.x / (float)src_iview->extent.width,
-1.0, (float)src_offset_0.y / (float)src_iview->extent.height,
-1.0, (float)src_offset_1.x / (float)src_iview->extent.width,
}, (float)src_offset_1.y / (float)src_iview->extent.height,
.tex_coord = { (float)src_offset_0.z / (float)src_iview->extent.depth,
(float)src_offset_0.x / (float)src_iview->extent.width,
(float)src_offset_0.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
},
}; };
vb_data[1] = (struct blit_vb_data) { radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
.pos = { device->meta_state.blit.pipeline_layout,
-1.0, VK_SHADER_STAGE_VERTEX_BIT, 0, 20,
1.0, vertex_push_constants);
},
.tex_coord = {
(float)src_offset_0.x / (float)src_iview->extent.width,
(float)src_offset_1.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
},
};
vb_data[2] = (struct blit_vb_data) {
.pos = {
1.0,
-1.0,
},
.tex_coord = {
(float)src_offset_1.x / (float)src_iview->extent.width,
(float)src_offset_0.y / (float)src_iview->extent.height,
(float)src_offset_0.z / (float)src_iview->extent.depth,
},
};
radv_cmd_buffer_upload_data(cmd_buffer, vb_size, 16, vb_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = vb_size,
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
radv_CmdBindVertexBuffers(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1,
(VkBuffer[]) {
radv_buffer_to_handle(&vertex_buffer)
},
(VkDeviceSize[]) {
0,
});
VkSampler sampler; VkSampler sampler;
radv_CreateSampler(radv_device_to_handle(device), radv_CreateSampler(radv_device_to_handle(device),
@@ -509,10 +504,10 @@ void radv_CmdBlitImage(
* vkCmdBlitImage must not be used for multisampled source or * vkCmdBlitImage must not be used for multisampled source or
* destination images. Use vkCmdResolveImage for this purpose. * destination images. Use vkCmdResolveImage for this purpose.
*/ */
assert(src_image->samples == 1); assert(src_image->info.samples == 1);
assert(dest_image->samples == 1); assert(dest_image->info.samples == 1);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (unsigned r = 0; r < regionCount; r++) { for (unsigned r = 0; r < regionCount; r++) {
const VkImageSubresourceLayers *src_res = &pRegions[r].srcSubresource; const VkImageSubresourceLayers *src_res = &pRegions[r].srcSubresource;
@@ -531,8 +526,7 @@ void radv_CmdBlitImage(
.baseArrayLayer = src_res->baseArrayLayer, .baseArrayLayer = src_res->baseArrayLayer,
.layerCount = 1 .layerCount = 1
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_SAMPLED_BIT);
unsigned dst_start, dst_end; unsigned dst_start, dst_end;
if (dest_image->type == VK_IMAGE_TYPE_3D) { if (dest_image->type == VK_IMAGE_TYPE_3D) {
@@ -580,12 +574,6 @@ void radv_CmdBlitImage(
dest_box.extent.height = abs(dst_y1 - dst_y0); dest_box.extent.height = abs(dst_y1 - dst_y0);
struct radv_image_view dest_iview; struct radv_image_view dest_iview;
unsigned usage;
if (dst_res->aspectMask == VK_IMAGE_ASPECT_COLOR_BIT)
usage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
else
usage = VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
const unsigned num_layers = dst_end - dst_start; const unsigned num_layers = dst_end - dst_start;
for (unsigned i = 0; i < num_layers; i++) { for (unsigned i = 0; i < num_layers; i++) {
const VkOffset3D dest_offset_0 = { const VkOffset3D dest_offset_0 = {
@@ -625,8 +613,7 @@ void radv_CmdBlitImage(
.baseArrayLayer = dest_array_slice, .baseArrayLayer = dest_array_slice,
.layerCount = 1 .layerCount = 1
}, },
}, });
cmd_buffer, usage);
meta_emit_blit(cmd_buffer, meta_emit_blit(cmd_buffer,
src_image, &src_iview, src_image, &src_iview,
src_offset_0, src_offset_1, src_offset_0, src_offset_1,
@@ -765,31 +752,8 @@ radv_device_init_meta_blit_color(struct radv_device *device,
VkPipelineVertexInputStateCreateInfo vi_create_info = { VkPipelineVertexInputStateCreateInfo vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = 5 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = 8
}
}
}; };
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = { VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
@@ -944,31 +908,8 @@ radv_device_init_meta_blit_depth(struct radv_device *device,
VkPipelineVertexInputStateCreateInfo vi_create_info = { VkPipelineVertexInputStateCreateInfo vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = 5 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = 8
}
}
}; };
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = { VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
@@ -1125,31 +1066,8 @@ radv_device_init_meta_blit_stencil(struct radv_device *device,
VkPipelineVertexInputStateCreateInfo vi_create_info = { VkPipelineVertexInputStateCreateInfo vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = 5 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = 8
}
}
}; };
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = { VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
@@ -1308,11 +1226,15 @@ radv_device_init_meta_blit_state(struct radv_device *device)
if (result != VK_SUCCESS) if (result != VK_SUCCESS)
goto fail; goto fail;
const VkPushConstantRange push_constant_range = {VK_SHADER_STAGE_VERTEX_BIT, 0, 20};
result = radv_CreatePipelineLayout(radv_device_to_handle(device), result = radv_CreatePipelineLayout(radv_device_to_handle(device),
&(VkPipelineLayoutCreateInfo) { &(VkPipelineLayoutCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1, .setLayoutCount = 1,
.pSetLayouts = &device->meta_state.blit.ds_layout, .pSetLayouts = &device->meta_state.blit.ds_layout,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &push_constant_range,
}, },
&device->meta_state.alloc, &device->meta_state.blit.pipeline_layout); &device->meta_state.alloc, &device->meta_state.blit.pipeline_layout);
if (result != VK_SUCCESS) if (result != VK_SUCCESS)
@@ -1329,12 +1251,10 @@ radv_device_init_meta_blit_state(struct radv_device *device)
goto fail; goto fail;
result = radv_device_init_meta_blit_stencil(device, &vs); result = radv_device_init_meta_blit_stencil(device, &vs);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
fail: fail:
ralloc_free(vs.nir); ralloc_free(vs.nir);
radv_device_finish_meta_blit_state(device); if (result != VK_SUCCESS)
radv_device_finish_meta_blit_state(device);
return result; return result;
} }

View File

@@ -53,7 +53,6 @@ enum blit2d_src_type {
static void static void
create_iview(struct radv_cmd_buffer *cmd_buffer, create_iview(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_surf *surf, struct radv_meta_blit2d_surf *surf,
VkImageUsageFlags usage,
struct radv_image_view *iview, VkFormat depth_format) struct radv_image_view *iview, VkFormat depth_format)
{ {
VkFormat format; VkFormat format;
@@ -76,7 +75,7 @@ create_iview(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = surf->layer, .baseArrayLayer = surf->layer,
.layerCount = 1 .layerCount = 1
}, },
}, cmd_buffer, usage); });
} }
static void static void
@@ -136,11 +135,10 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer), radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type], device->meta_state.blit2d.p_layouts[src_type],
VK_SHADER_STAGE_FRAGMENT_BIT, 0, 4, VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4,
&src_buf->pitch); &src_buf->pitch);
} else { } else {
create_iview(cmd_buffer, src_img, VK_IMAGE_USAGE_SAMPLED_BIT, &tmp->iview, create_iview(cmd_buffer, src_img, &tmp->iview, depth_format);
depth_format);
radv_meta_push_descriptor_set(cmd_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, radv_meta_push_descriptor_set(cmd_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
device->meta_state.blit2d.p_layouts[src_type], device->meta_state.blit2d.p_layouts[src_type],
@@ -179,15 +177,7 @@ blit2d_bind_dst(struct radv_cmd_buffer *cmd_buffer,
VkFormat depth_format, VkFormat depth_format,
struct blit2d_dst_temps *tmp) struct blit2d_dst_temps *tmp)
{ {
VkImageUsageFlagBits bits; create_iview(cmd_buffer, dst, &tmp->iview, depth_format);
if (dst->aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT)
bits = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
else
bits = VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
create_iview(cmd_buffer, dst, bits,
&tmp->iview, depth_format);
radv_CreateFramebuffer(radv_device_to_handle(cmd_buffer->device), radv_CreateFramebuffer(radv_device_to_handle(cmd_buffer->device),
&(VkFramebufferCreateInfo) { &(VkFramebufferCreateInfo) {
@@ -268,69 +258,21 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
struct blit2d_src_temps src_temps; struct blit2d_src_temps src_temps;
blit2d_bind_src(cmd_buffer, src_img, src_buf, &src_temps, src_type, depth_format); blit2d_bind_src(cmd_buffer, src_img, src_buf, &src_temps, src_type, depth_format);
uint32_t offset = 0;
struct blit2d_dst_temps dst_temps; struct blit2d_dst_temps dst_temps;
blit2d_bind_dst(cmd_buffer, dst, rects[r].dst_x + rects[r].width, blit2d_bind_dst(cmd_buffer, dst, rects[r].dst_x + rects[r].width,
rects[r].dst_y + rects[r].height, depth_format, &dst_temps); rects[r].dst_y + rects[r].height, depth_format, &dst_temps);
struct blit_vb_data { float vertex_push_constants[4] = {
float pos[2]; rects[r].src_x,
float tex_coord[2]; rects[r].src_y,
} vb_data[3]; rects[r].src_x + rects[r].width,
rects[r].src_y + rects[r].height,
unsigned vb_size = 3 * sizeof(*vb_data);
vb_data[0] = (struct blit_vb_data) {
.pos = {
-1.0,
-1.0,
},
.tex_coord = {
rects[r].src_x,
rects[r].src_y,
},
}; };
vb_data[1] = (struct blit_vb_data) { radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
.pos = { device->meta_state.blit2d.p_layouts[src_type],
-1.0, VK_SHADER_STAGE_VERTEX_BIT, 0, 16,
1.0, vertex_push_constants);
},
.tex_coord = {
rects[r].src_x,
rects[r].src_y + rects[r].height,
},
};
vb_data[2] = (struct blit_vb_data) {
.pos = {
1.0,
-1.0,
},
.tex_coord = {
rects[r].src_x + rects[r].width,
rects[r].src_y,
},
};
radv_cmd_buffer_upload_data(cmd_buffer, vb_size, 16, vb_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = vb_size,
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
radv_CmdBindVertexBuffers(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1,
(VkBuffer[]) {
radv_buffer_to_handle(&vertex_buffer),
},
(VkDeviceSize[]) {
0,
});
if (dst->aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT) { if (dst->aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT) {
unsigned fs_key = radv_format_meta_fs_key(dst_temps.iview.vk_format); unsigned fs_key = radv_format_meta_fs_key(dst_temps.iview.vk_format);
@@ -433,25 +375,53 @@ build_nir_vertex_shader(void)
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_blit_vs"); b.shader->info.name = ralloc_strdup(b.shader, "meta_blit2d_vs");
nir_variable *pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec4, "a_pos");
pos_in->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out, nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "gl_Position"); vec4, "gl_Position");
pos_out->data.location = VARYING_SLOT_POS; pos_out->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, pos_out, pos_in);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "a_tex_pos");
tex_pos_in->data.location = VERT_ATTRIB_GENERIC1;
nir_variable *tex_pos_out = nir_variable_create(b.shader, nir_var_shader_out, nir_variable *tex_pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec2, "v_tex_pos"); vec2, "v_tex_pos");
tex_pos_out->data.location = VARYING_SLOT_VAR0; tex_pos_out->data.location = VARYING_SLOT_VAR0;
tex_pos_out->data.interpolation = INTERP_MODE_SMOOTH; tex_pos_out->data.interpolation = INTERP_MODE_SMOOTH;
nir_copy_var(&b, tex_pos_out, tex_pos_in);
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
nir_store_var(&b, pos_out, outvec, 0xf);
nir_intrinsic_instr *src_box = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
src_box->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
nir_intrinsic_set_base(src_box, 0);
nir_intrinsic_set_range(src_box, 16);
src_box->num_components = 4;
nir_ssa_dest_init(&src_box->instr, &src_box->dest, 4, 32, "src_box");
nir_builder_instr_insert(&b, &src_box->instr);
nir_intrinsic_instr *vertex_id = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_vertex_id_zero_base);
nir_ssa_dest_init(&vertex_id->instr, &vertex_id->dest, 1, 32, "vertexid");
nir_builder_instr_insert(&b, &vertex_id->instr);
/* vertex 0 - src_x, src_y */
/* vertex 1 - src_x, src_y+h */
/* vertex 2 - src_x+w, src_y */
/* so channel 0 is vertex_id != 2 ? src_x : src_x + w
channel 1 is vertex id != 1 ? src_y : src_y + w */
nir_ssa_def *c0cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 2));
nir_ssa_def *c1cmp = nir_ine(&b, &vertex_id->dest.ssa,
nir_imm_int(&b, 1));
nir_ssa_def *comp[2];
comp[0] = nir_bcsel(&b, c0cmp,
nir_channel(&b, &src_box->dest.ssa, 0),
nir_channel(&b, &src_box->dest.ssa, 2));
comp[1] = nir_bcsel(&b, c1cmp,
nir_channel(&b, &src_box->dest.ssa, 1),
nir_channel(&b, &src_box->dest.ssa, 3));
nir_ssa_def *out_tex_vec = nir_vec(&b, comp, 2);
nir_store_var(&b, tex_pos_out, out_tex_vec, 0x3);
return b.shader; return b.shader;
} }
@@ -502,6 +472,8 @@ build_nir_buffer_fetch(struct nir_builder *b, struct radv_device *device,
sampler->data.binding = 0; sampler->data.binding = 0;
nir_intrinsic_instr *width = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *width = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(width, 16);
nir_intrinsic_set_range(width, 4);
width->src[0] = nir_src_for_ssa(nir_imm_int(b, 0)); width->src[0] = nir_src_for_ssa(nir_imm_int(b, 0));
width->num_components = 1; width->num_components = 1;
nir_ssa_dest_init(&width->instr, &width->dest, 1, 32, "width"); nir_ssa_dest_init(&width->instr, &width->dest, 1, 32, "width");
@@ -532,31 +504,8 @@ build_nir_buffer_fetch(struct nir_builder *b, struct radv_device *device,
static const VkPipelineVertexInputStateCreateInfo normal_vi_create_info = { static const VkPipelineVertexInputStateCreateInfo normal_vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = 4 * sizeof(float),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 0
},
{
/* Texture Coordinate */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = 8
},
},
}; };
static nir_shader * static nir_shader *
@@ -568,7 +517,7 @@ build_nir_copy_fragment_shader(struct radv_device *device,
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_strdup(b.shader, name); b.shader->info.name = ralloc_strdup(b.shader, name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in, nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "v_tex_pos"); vec2, "v_tex_pos");
@@ -597,7 +546,7 @@ build_nir_copy_fragment_shader_depth(struct radv_device *device,
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_strdup(b.shader, name); b.shader->info.name = ralloc_strdup(b.shader, name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in, nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "v_tex_pos"); vec2, "v_tex_pos");
@@ -626,7 +575,7 @@ build_nir_copy_fragment_shader_stencil(struct radv_device *device,
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_strdup(b.shader, name); b.shader->info.name = ralloc_strdup(b.shader, name);
nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in, nir_variable *tex_pos_in = nir_variable_create(b.shader, nir_var_shader_in,
vec2, "v_tex_pos"); vec2, "v_tex_pos");
@@ -754,8 +703,8 @@ blit2d_init_color_pipeline(struct radv_device *device,
.format = format, .format = format,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD, .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE, .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL, .initialLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL, .finalLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
}, },
.subpassCount = 1, .subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) { .pSubpasses = &(VkSubpassDescription) {
@@ -764,12 +713,12 @@ blit2d_init_color_pipeline(struct radv_device *device,
.colorAttachmentCount = 1, .colorAttachmentCount = 1,
.pColorAttachments = &(VkAttachmentReference) { .pColorAttachments = &(VkAttachmentReference) {
.attachment = 0, .attachment = 0,
.layout = VK_IMAGE_LAYOUT_GENERAL, .layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
}, },
.pResolveAttachments = NULL, .pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) { .pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = VK_ATTACHMENT_UNUSED, .attachment = VK_ATTACHMENT_UNUSED,
.layout = VK_IMAGE_LAYOUT_GENERAL, .layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
}, },
.preserveAttachmentCount = 1, .preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 }, .pPreserveAttachments = (uint32_t[]) { 0 },
@@ -912,8 +861,8 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
.format = 0, .format = 0,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD, .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE, .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL, .initialLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL, .finalLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
}, },
.subpassCount = 1, .subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) { .pSubpasses = &(VkSubpassDescription) {
@@ -924,7 +873,7 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
.pResolveAttachments = NULL, .pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) { .pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = 0, .attachment = 0,
.layout = VK_IMAGE_LAYOUT_GENERAL, .layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
}, },
.preserveAttachmentCount = 1, .preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 }, .pPreserveAttachments = (uint32_t[]) { 0 },
@@ -1067,8 +1016,8 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
.format = 0, .format = 0,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD, .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE, .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL, .initialLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL, .finalLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
}, },
.subpassCount = 1, .subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) { .pSubpasses = &(VkSubpassDescription) {
@@ -1079,7 +1028,7 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
.pResolveAttachments = NULL, .pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) { .pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = 0, .attachment = 0,
.layout = VK_IMAGE_LAYOUT_GENERAL, .layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
}, },
.preserveAttachmentCount = 1, .preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 }, .pPreserveAttachments = (uint32_t[]) { 0 },
@@ -1201,6 +1150,10 @@ radv_device_init_meta_blit2d_state(struct radv_device *device)
zero(device->meta_state.blit2d); zero(device->meta_state.blit2d);
const VkPushConstantRange push_constant_ranges[] = {
{VK_SHADER_STAGE_VERTEX_BIT, 0, 16},
{VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4},
};
result = radv_CreateDescriptorSetLayout(radv_device_to_handle(device), result = radv_CreateDescriptorSetLayout(radv_device_to_handle(device),
&(VkDescriptorSetLayoutCreateInfo) { &(VkDescriptorSetLayoutCreateInfo) {
.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
@@ -1224,6 +1177,8 @@ radv_device_init_meta_blit2d_state(struct radv_device *device)
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1, .setLayoutCount = 1,
.pSetLayouts = &device->meta_state.blit2d.ds_layouts[BLIT2D_SRC_TYPE_IMAGE], .pSetLayouts = &device->meta_state.blit2d.ds_layouts[BLIT2D_SRC_TYPE_IMAGE],
.pushConstantRangeCount = 1,
.pPushConstantRanges = push_constant_ranges,
}, },
&device->meta_state.alloc, &device->meta_state.blit2d.p_layouts[BLIT2D_SRC_TYPE_IMAGE]); &device->meta_state.alloc, &device->meta_state.blit2d.p_layouts[BLIT2D_SRC_TYPE_IMAGE]);
if (result != VK_SUCCESS) if (result != VK_SUCCESS)
@@ -1247,14 +1202,14 @@ radv_device_init_meta_blit2d_state(struct radv_device *device)
if (result != VK_SUCCESS) if (result != VK_SUCCESS)
goto fail; goto fail;
const VkPushConstantRange push_constant_range = {VK_SHADER_STAGE_FRAGMENT_BIT, 0, 4};
result = radv_CreatePipelineLayout(radv_device_to_handle(device), result = radv_CreatePipelineLayout(radv_device_to_handle(device),
&(VkPipelineLayoutCreateInfo) { &(VkPipelineLayoutCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1, .setLayoutCount = 1,
.pSetLayouts = &device->meta_state.blit2d.ds_layouts[BLIT2D_SRC_TYPE_BUFFER], .pSetLayouts = &device->meta_state.blit2d.ds_layouts[BLIT2D_SRC_TYPE_BUFFER],
.pushConstantRangeCount = 1, .pushConstantRangeCount = 2,
.pPushConstantRanges = &push_constant_range, .pPushConstantRanges = push_constant_ranges,
}, },
&device->meta_state.alloc, &device->meta_state.blit2d.p_layouts[BLIT2D_SRC_TYPE_BUFFER]); &device->meta_state.alloc, &device->meta_state.blit2d.p_layouts[BLIT2D_SRC_TYPE_BUFFER]);
if (result != VK_SUCCESS) if (result != VK_SUCCESS)

View File

@@ -10,17 +10,17 @@ build_buffer_fill_shader(struct radv_device *dev)
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_buffer_fill"); b.shader->info.name = ralloc_strdup(b.shader, "meta_buffer_fill");
b.shader->info->cs.local_size[0] = 64; b.shader->info.cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1; b.shader->info.cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
@@ -36,6 +36,8 @@ build_buffer_fill_shader(struct radv_device *dev)
nir_builder_instr_insert(&b, &dst_buf->instr); nir_builder_instr_insert(&b, &dst_buf->instr);
nir_intrinsic_instr *load = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *load = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(load, 0);
nir_intrinsic_set_range(load, 4);
load->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0)); load->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
load->num_components = 1; load->num_components = 1;
nir_ssa_dest_init(&load->instr, &load->dest, 1, 32, "fill_value"); nir_ssa_dest_init(&load->instr, &load->dest, 1, 32, "fill_value");
@@ -60,17 +62,17 @@ build_buffer_copy_shader(struct radv_device *dev)
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_buffer_copy"); b.shader->info.name = ralloc_strdup(b.shader, "meta_buffer_copy");
b.shader->info->cs.local_size[0] = 64; b.shader->info.cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1; b.shader->info.cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);

View File

@@ -42,10 +42,10 @@ build_nir_itob_compute_shader(struct radv_device *dev)
false, false,
GLSL_TYPE_FLOAT); GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_itob_cs"); b.shader->info.name = ralloc_strdup(b.shader, "meta_itob_cs");
b.shader->info->cs.local_size[0] = 16; b.shader->info.cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16; b.shader->info.cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform, nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
sampler_type, "s_tex"); sampler_type, "s_tex");
input_img->data.descriptor_set = 0; input_img->data.descriptor_set = 0;
@@ -59,21 +59,25 @@ build_nir_itob_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(offset, 0);
nir_intrinsic_set_range(offset, 12);
offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0)); offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
offset->num_components = 2; offset->num_components = 2;
nir_ssa_dest_init(&offset->instr, &offset->dest, 2, 32, "offset"); nir_ssa_dest_init(&offset->instr, &offset->dest, 2, 32, "offset");
nir_builder_instr_insert(&b, &offset->instr); nir_builder_instr_insert(&b, &offset->instr);
nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(stride, 0);
nir_intrinsic_set_range(stride, 12);
stride->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8)); stride->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
stride->num_components = 1; stride->num_components = 1;
nir_ssa_dest_init(&stride->instr, &stride->dest, 1, 32, "stride"); nir_ssa_dest_init(&stride->instr, &stride->dest, 1, 32, "stride");
@@ -240,10 +244,10 @@ build_nir_btoi_compute_shader(struct radv_device *dev)
false, false,
GLSL_TYPE_FLOAT); GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_btoi_cs"); b.shader->info.name = ralloc_strdup(b.shader, "meta_btoi_cs");
b.shader->info->cs.local_size[0] = 16; b.shader->info.cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16; b.shader->info.cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform, nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
buf_type, "s_tex"); buf_type, "s_tex");
input_img->data.descriptor_set = 0; input_img->data.descriptor_set = 0;
@@ -257,19 +261,23 @@ build_nir_btoi_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(offset, 0);
nir_intrinsic_set_range(offset, 12);
offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0)); offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
offset->num_components = 2; offset->num_components = 2;
nir_ssa_dest_init(&offset->instr, &offset->dest, 2, 32, "offset"); nir_ssa_dest_init(&offset->instr, &offset->dest, 2, 32, "offset");
nir_builder_instr_insert(&b, &offset->instr); nir_builder_instr_insert(&b, &offset->instr);
nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(stride, 0);
nir_intrinsic_set_range(stride, 12);
stride->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8)); stride->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
stride->num_components = 1; stride->num_components = 1;
nir_ssa_dest_init(&stride->instr, &stride->dest, 1, 32, "stride"); nir_ssa_dest_init(&stride->instr, &stride->dest, 1, 32, "stride");
@@ -436,10 +444,10 @@ build_nir_itoi_compute_shader(struct radv_device *dev)
false, false,
GLSL_TYPE_FLOAT); GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_itoi_cs"); b.shader->info.name = ralloc_strdup(b.shader, "meta_itoi_cs");
b.shader->info->cs.local_size[0] = 16; b.shader->info.cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16; b.shader->info.cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform, nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
buf_type, "s_tex"); buf_type, "s_tex");
input_img->data.descriptor_set = 0; input_img->data.descriptor_set = 0;
@@ -453,19 +461,23 @@ build_nir_itoi_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(src_offset, 0);
nir_intrinsic_set_range(src_offset, 16);
src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0)); src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
src_offset->num_components = 2; src_offset->num_components = 2;
nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset"); nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset");
nir_builder_instr_insert(&b, &src_offset->instr); nir_builder_instr_insert(&b, &src_offset->instr);
nir_intrinsic_instr *dst_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *dst_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(dst_offset, 0);
nir_intrinsic_set_range(dst_offset, 16);
dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8)); dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
dst_offset->num_components = 2; dst_offset->num_components = 2;
nir_ssa_dest_init(&dst_offset->instr, &dst_offset->dest, 2, 32, "dst_offset"); nir_ssa_dest_init(&dst_offset->instr, &dst_offset->dest, 2, 32, "dst_offset");
@@ -622,10 +634,10 @@ build_nir_cleari_compute_shader(struct radv_device *dev)
false, false,
GLSL_TYPE_FLOAT); GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_cleari_cs"); b.shader->info.name = ralloc_strdup(b.shader, "meta_cleari_cs");
b.shader->info->cs.local_size[0] = 16; b.shader->info.cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16; b.shader->info.cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_variable *output_img = nir_variable_create(b.shader, nir_var_uniform, nir_variable *output_img = nir_variable_create(b.shader, nir_var_uniform,
img_type, "out_img"); img_type, "out_img");
@@ -635,13 +647,15 @@ build_nir_cleari_compute_shader(struct radv_device *dev)
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *clear_val = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *clear_val = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(clear_val, 0);
nir_intrinsic_set_range(clear_val, 16);
clear_val->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0)); clear_val->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
clear_val->num_components = 4; clear_val->num_components = 4;
nir_ssa_dest_init(&clear_val->instr, &clear_val->dest, 4, 32, "clear_value"); nir_ssa_dest_init(&clear_val->instr, &clear_val->dest, 4, 32, "clear_value");
@@ -845,7 +859,6 @@ radv_meta_end_cleari(struct radv_cmd_buffer *cmd_buffer,
static void static void
create_iview(struct radv_cmd_buffer *cmd_buffer, create_iview(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_surf *surf, struct radv_meta_blit2d_surf *surf,
VkImageUsageFlags usage,
struct radv_image_view *iview) struct radv_image_view *iview)
{ {
@@ -862,7 +875,7 @@ create_iview(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = surf->layer, .baseArrayLayer = surf->layer,
.layerCount = 1 .layerCount = 1
}, },
}, cmd_buffer, usage); });
} }
static void static void
@@ -948,7 +961,7 @@ radv_meta_image_to_buffer(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device; struct radv_device *device = cmd_buffer->device;
struct itob_temps temps; struct itob_temps temps;
create_iview(cmd_buffer, src, VK_IMAGE_USAGE_SAMPLED_BIT, &temps.src_iview); create_iview(cmd_buffer, src, &temps.src_iview);
create_bview(cmd_buffer, dst->buffer, dst->offset, dst->format, &temps.dst_bview); create_bview(cmd_buffer, dst->buffer, dst->offset, dst->format, &temps.dst_bview);
itob_bind_descriptors(cmd_buffer, &temps); itob_bind_descriptors(cmd_buffer, &temps);
@@ -1034,7 +1047,7 @@ radv_meta_buffer_to_image_cs(struct radv_cmd_buffer *cmd_buffer,
struct btoi_temps temps; struct btoi_temps temps;
create_bview(cmd_buffer, src->buffer, src->offset, src->format, &temps.src_bview); create_bview(cmd_buffer, src->buffer, src->offset, src->format, &temps.src_bview);
create_iview(cmd_buffer, dst, VK_IMAGE_USAGE_STORAGE_BIT, &temps.dst_iview); create_iview(cmd_buffer, dst, &temps.dst_iview);
btoi_bind_descriptors(cmd_buffer, &temps); btoi_bind_descriptors(cmd_buffer, &temps);
btoi_bind_pipeline(cmd_buffer); btoi_bind_pipeline(cmd_buffer);
@@ -1124,8 +1137,8 @@ radv_meta_image_to_image_cs(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device; struct radv_device *device = cmd_buffer->device;
struct itoi_temps temps; struct itoi_temps temps;
create_iview(cmd_buffer, src, VK_IMAGE_USAGE_SAMPLED_BIT, &temps.src_iview); create_iview(cmd_buffer, src, &temps.src_iview);
create_iview(cmd_buffer, dst, VK_IMAGE_USAGE_STORAGE_BIT, &temps.dst_iview); create_iview(cmd_buffer, dst, &temps.dst_iview);
itoi_bind_descriptors(cmd_buffer, &temps); itoi_bind_descriptors(cmd_buffer, &temps);
@@ -1196,7 +1209,7 @@ radv_meta_clear_image_cs(struct radv_cmd_buffer *cmd_buffer,
struct radv_device *device = cmd_buffer->device; struct radv_device *device = cmd_buffer->device;
struct radv_image_view dst_iview; struct radv_image_view dst_iview;
create_iview(cmd_buffer, dst, VK_IMAGE_USAGE_STORAGE_BIT, &dst_iview); create_iview(cmd_buffer, dst, &dst_iview);
cleari_bind_descriptors(cmd_buffer, &dst_iview); cleari_bind_descriptors(cmd_buffer, &dst_iview);
cleari_bind_pipeline(cmd_buffer); cleari_bind_pipeline(cmd_buffer);
@@ -1213,5 +1226,5 @@ radv_meta_clear_image_cs(struct radv_cmd_buffer *cmd_buffer,
VK_SHADER_STAGE_COMPUTE_BIT, 0, 16, VK_SHADER_STAGE_COMPUTE_BIT, 0, 16,
push_constants); push_constants);
radv_unaligned_dispatch(cmd_buffer, dst->image->extent.width, dst->image->extent.height, 1); radv_unaligned_dispatch(cmd_buffer, dst->image->info.width, dst->image->info.height, 1);
} }

View File

@@ -27,17 +27,6 @@
#include "util/format_rgb9e5.h" #include "util/format_rgb9e5.h"
#include "vk_format.h" #include "vk_format.h"
/** Vertex attributes for color clears. */
struct color_clear_vattrs {
float position[2];
VkClearColorValue color;
};
/** Vertex attributes for depthstencil clears. */
struct depthstencil_clear_vattrs {
float position[2];
float depth_clear;
};
enum { enum {
DEPTH_CLEAR_SLOW, DEPTH_CLEAR_SLOW,
@@ -56,47 +45,34 @@ build_color_shaders(struct nir_shader **out_vs,
nir_builder_init_simple_shader(&vs_b, NULL, MESA_SHADER_VERTEX, NULL); nir_builder_init_simple_shader(&vs_b, NULL, MESA_SHADER_VERTEX, NULL);
nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL);
vs_b.shader->info->name = ralloc_strdup(vs_b.shader, "meta_clear_color_vs"); vs_b.shader->info.name = ralloc_strdup(vs_b.shader, "meta_clear_color_vs");
fs_b.shader->info->name = ralloc_strdup(fs_b.shader, "meta_clear_color_fs"); fs_b.shader->info.name = ralloc_strdup(fs_b.shader, "meta_clear_color_fs");
const struct glsl_type *position_type = glsl_vec4_type(); const struct glsl_type *position_type = glsl_vec4_type();
const struct glsl_type *color_type = glsl_vec4_type(); const struct glsl_type *color_type = glsl_vec4_type();
nir_variable *vs_in_pos =
nir_variable_create(vs_b.shader, nir_var_shader_in, position_type,
"a_position");
vs_in_pos->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *vs_out_pos = nir_variable *vs_out_pos =
nir_variable_create(vs_b.shader, nir_var_shader_out, position_type, nir_variable_create(vs_b.shader, nir_var_shader_out, position_type,
"gl_Position"); "gl_Position");
vs_out_pos->data.location = VARYING_SLOT_POS; vs_out_pos->data.location = VARYING_SLOT_POS;
nir_variable *vs_in_color = nir_intrinsic_instr *in_color_load = nir_intrinsic_instr_create(fs_b.shader, nir_intrinsic_load_push_constant);
nir_variable_create(vs_b.shader, nir_var_shader_in, color_type, nir_intrinsic_set_base(in_color_load, 0);
"a_color"); nir_intrinsic_set_range(in_color_load, 16);
vs_in_color->data.location = VERT_ATTRIB_GENERIC1; in_color_load->src[0] = nir_src_for_ssa(nir_imm_int(&fs_b, 0));
in_color_load->num_components = 4;
nir_variable *vs_out_color = nir_ssa_dest_init(&in_color_load->instr, &in_color_load->dest, 4, 32, "clear color");
nir_variable_create(vs_b.shader, nir_var_shader_out, color_type, nir_builder_instr_insert(&fs_b, &in_color_load->instr);
"v_color");
vs_out_color->data.location = VARYING_SLOT_VAR0;
vs_out_color->data.interpolation = INTERP_MODE_FLAT;
nir_variable *fs_in_color =
nir_variable_create(fs_b.shader, nir_var_shader_in, color_type,
"v_color");
fs_in_color->data.location = vs_out_color->data.location;
fs_in_color->data.interpolation = vs_out_color->data.interpolation;
nir_variable *fs_out_color = nir_variable *fs_out_color =
nir_variable_create(fs_b.shader, nir_var_shader_out, color_type, nir_variable_create(fs_b.shader, nir_var_shader_out, color_type,
"f_color"); "f_color");
fs_out_color->data.location = FRAG_RESULT_DATA0 + frag_output; fs_out_color->data.location = FRAG_RESULT_DATA0 + frag_output;
nir_copy_var(&vs_b, vs_out_pos, vs_in_pos); nir_store_var(&fs_b, fs_out_color, &in_color_load->dest.ssa, 0xf);
nir_copy_var(&vs_b, vs_out_color, vs_in_color);
nir_copy_var(&fs_b, fs_out_color, fs_in_color); nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&vs_b);
nir_store_var(&vs_b, vs_out_pos, outvec, 0xf);
const struct glsl_type *layer_type = glsl_int_type(); const struct glsl_type *layer_type = glsl_int_type();
nir_variable *vs_out_layer = nir_variable *vs_out_layer =
@@ -121,6 +97,7 @@ create_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo *vi_state, const VkPipelineVertexInputStateCreateInfo *vi_state,
const VkPipelineDepthStencilStateCreateInfo *ds_state, const VkPipelineDepthStencilStateCreateInfo *ds_state,
const VkPipelineColorBlendStateCreateInfo *cb_state, const VkPipelineColorBlendStateCreateInfo *cb_state,
const VkPipelineLayout layout,
const struct radv_graphics_pipeline_create_info *extra, const struct radv_graphics_pipeline_create_info *extra,
const VkAllocationCallbacks *alloc, const VkAllocationCallbacks *alloc,
struct radv_pipeline **pipeline) struct radv_pipeline **pipeline)
@@ -200,10 +177,11 @@ create_pipeline(struct radv_device *device,
VK_DYNAMIC_STATE_STENCIL_REFERENCE, VK_DYNAMIC_STATE_STENCIL_REFERENCE,
}, },
}, },
.flags = 0, .layout = layout,
.renderPass = radv_render_pass_to_handle(render_pass), .flags = 0,
.subpass = 0, .renderPass = radv_render_pass_to_handle(render_pass),
}, .subpass = 0,
},
extra, extra,
alloc, alloc,
&pipeline_h); &pipeline_h);
@@ -269,31 +247,8 @@ create_color_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo vi_state = { const VkPipelineVertexInputStateCreateInfo vi_state = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = sizeof(struct color_clear_vattrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 2,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct color_clear_vattrs, position),
},
{
/* Color */
.location = 1,
.binding = 0,
.format = VK_FORMAT_R32G32B32A32_SFLOAT,
.offset = offsetof(struct color_clear_vattrs, color),
},
},
}; };
const VkPipelineDepthStencilStateCreateInfo ds_state = { const VkPipelineDepthStencilStateCreateInfo ds_state = {
@@ -326,6 +281,7 @@ create_color_pipeline(struct radv_device *device,
}; };
result = create_pipeline(device, radv_render_pass_from_handle(pass), result = create_pipeline(device, radv_render_pass_from_handle(pass),
samples, vs_nir, fs_nir, &vi_state, &ds_state, &cb_state, samples, vs_nir, fs_nir, &vi_state, &ds_state, &cb_state,
device->meta_state.clear_color_p_layout,
&extra, &device->meta_state.alloc, pipeline); &extra, &device->meta_state.alloc, pipeline);
return result; return result;
@@ -368,7 +324,12 @@ radv_device_finish_meta_clear_state(struct radv_device *device)
} }
destroy_render_pass(device, state->clear[i].depthstencil_rp); destroy_render_pass(device, state->clear[i].depthstencil_rp);
} }
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->clear_color_p_layout,
&state->alloc);
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->clear_depth_p_layout,
&state->alloc);
} }
static void static void
@@ -382,14 +343,13 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
const uint32_t subpass_att = clear_att->colorAttachment; const uint32_t subpass_att = clear_att->colorAttachment;
const uint32_t pass_att = subpass->color_attachments[subpass_att].attachment; const uint32_t pass_att = subpass->color_attachments[subpass_att].attachment;
const struct radv_image_view *iview = fb->attachments[pass_att].attachment; const struct radv_image_view *iview = fb->attachments[pass_att].attachment;
const uint32_t samples = iview->image->samples; const uint32_t samples = iview->image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1; const uint32_t samples_log2 = ffs(samples) - 1;
unsigned fs_key = radv_format_meta_fs_key(iview->vk_format); unsigned fs_key = radv_format_meta_fs_key(iview->vk_format);
struct radv_pipeline *pipeline; struct radv_pipeline *pipeline;
VkClearColorValue clear_value = clear_att->clearValue.color; VkClearColorValue clear_value = clear_att->clearValue.color;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer); VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
VkPipeline pipeline_h; VkPipeline pipeline_h;
uint32_t offset;
if (fs_key == -1) { if (fs_key == -1) {
radv_finishme("color clears incomplete"); radv_finishme("color clears incomplete");
@@ -407,29 +367,10 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
assert(clear_att->aspectMask == VK_IMAGE_ASPECT_COLOR_BIT); assert(clear_att->aspectMask == VK_IMAGE_ASPECT_COLOR_BIT);
assert(clear_att->colorAttachment < subpass->color_count); assert(clear_att->colorAttachment < subpass->color_count);
const struct color_clear_vattrs vertex_data[3] = { radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
{ device->meta_state.clear_color_p_layout,
.position = { VK_SHADER_STAGE_FRAGMENT_BIT, 0, 16,
-1.0, &clear_value);
-1.0,
},
.color = clear_value,
},
{
.position = {
-1.0,
1.0,
},
.color = clear_value,
},
{
.position = {
1.0,
-1.0,
},
.color = clear_value,
},
};
struct radv_subpass clear_subpass = { struct radv_subpass clear_subpass = {
.color_count = 1, .color_count = 1,
@@ -441,19 +382,6 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
radv_cmd_buffer_set_subpass(cmd_buffer, &clear_subpass, false); radv_cmd_buffer_set_subpass(cmd_buffer, &clear_subpass, false);
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
radv_CmdBindVertexBuffers(cmd_buffer_h, 0, 1,
(VkBuffer[]) { radv_buffer_to_handle(&vertex_buffer) },
(VkDeviceSize[]) { 0 });
if (cmd_buffer->state.pipeline != pipeline) { if (cmd_buffer->state.pipeline != pipeline) {
radv_CmdBindPipeline(cmd_buffer_h, VK_PIPELINE_BIND_POINT_GRAPHICS, radv_CmdBindPipeline(cmd_buffer_h, VK_PIPELINE_BIND_POINT_GRAPHICS,
pipeline_h); pipeline_h);
@@ -484,21 +412,25 @@ build_depthstencil_shader(struct nir_shader **out_vs, struct nir_shader **out_fs
nir_builder_init_simple_shader(&vs_b, NULL, MESA_SHADER_VERTEX, NULL); nir_builder_init_simple_shader(&vs_b, NULL, MESA_SHADER_VERTEX, NULL);
nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL);
vs_b.shader->info->name = ralloc_strdup(vs_b.shader, "meta_clear_depthstencil_vs"); vs_b.shader->info.name = ralloc_strdup(vs_b.shader, "meta_clear_depthstencil_vs");
fs_b.shader->info->name = ralloc_strdup(fs_b.shader, "meta_clear_depthstencil_fs"); fs_b.shader->info.name = ralloc_strdup(fs_b.shader, "meta_clear_depthstencil_fs");
const struct glsl_type *position_type = glsl_vec4_type(); const struct glsl_type *position_out_type = glsl_vec4_type();
nir_variable *vs_in_pos =
nir_variable_create(vs_b.shader, nir_var_shader_in, position_type,
"a_position");
vs_in_pos->data.location = VERT_ATTRIB_GENERIC0;
nir_variable *vs_out_pos = nir_variable *vs_out_pos =
nir_variable_create(vs_b.shader, nir_var_shader_out, position_type, nir_variable_create(vs_b.shader, nir_var_shader_out, position_out_type,
"gl_Position"); "gl_Position");
vs_out_pos->data.location = VARYING_SLOT_POS; vs_out_pos->data.location = VARYING_SLOT_POS;
nir_copy_var(&vs_b, vs_out_pos, vs_in_pos); nir_intrinsic_instr *in_color_load = nir_intrinsic_instr_create(vs_b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(in_color_load, 0);
nir_intrinsic_set_range(in_color_load, 4);
in_color_load->src[0] = nir_src_for_ssa(nir_imm_int(&vs_b, 0));
in_color_load->num_components = 1;
nir_ssa_dest_init(&in_color_load->instr, &in_color_load->dest, 1, 32, "depth value");
nir_builder_instr_insert(&vs_b, &in_color_load->instr);
nir_ssa_def *outvec = radv_meta_gen_rect_vertices_comp2(&vs_b, &in_color_load->dest.ssa);
nir_store_var(&vs_b, vs_out_pos, outvec, 0xf);
const struct glsl_type *layer_type = glsl_int_type(); const struct glsl_type *layer_type = glsl_int_type();
nir_variable *vs_out_layer = nir_variable *vs_out_layer =
@@ -562,24 +494,8 @@ create_depthstencil_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo vi_state = { const VkPipelineVertexInputStateCreateInfo vi_state = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = sizeof(struct depthstencil_clear_vattrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32B32_SFLOAT,
.offset = offsetof(struct depthstencil_clear_vattrs, position),
},
},
}; };
const VkPipelineDepthStencilStateCreateInfo ds_state = { const VkPipelineDepthStencilStateCreateInfo ds_state = {
@@ -619,14 +535,19 @@ create_depthstencil_pipeline(struct radv_device *device,
} }
result = create_pipeline(device, radv_render_pass_from_handle(render_pass), result = create_pipeline(device, radv_render_pass_from_handle(render_pass),
samples, vs_nir, fs_nir, &vi_state, &ds_state, &cb_state, samples, vs_nir, fs_nir, &vi_state, &ds_state, &cb_state,
device->meta_state.clear_depth_p_layout,
&extra, &device->meta_state.alloc, pipeline); &extra, &device->meta_state.alloc, pipeline);
return result; return result;
} }
static bool depth_view_can_fast_clear(const struct radv_image_view *iview, static bool depth_view_can_fast_clear(struct radv_cmd_buffer *cmd_buffer,
const struct radv_image_view *iview,
VkImageLayout layout, VkImageLayout layout,
const VkClearRect *clear_rect) const VkClearRect *clear_rect)
{ {
uint32_t queue_mask = radv_image_queue_family_mask(iview->image,
cmd_buffer->queue_family_index,
cmd_buffer->queue_family_index);
if (clear_rect->rect.offset.x || clear_rect->rect.offset.y || if (clear_rect->rect.offset.x || clear_rect->rect.offset.y ||
clear_rect->rect.extent.width != iview->extent.width || clear_rect->rect.extent.width != iview->extent.width ||
clear_rect->rect.extent.height != iview->extent.height) clear_rect->rect.extent.height != iview->extent.height)
@@ -634,14 +555,15 @@ static bool depth_view_can_fast_clear(const struct radv_image_view *iview,
if (iview->image->surface.htile_size && if (iview->image->surface.htile_size &&
iview->base_mip == 0 && iview->base_mip == 0 &&
iview->base_layer == 0 && iview->base_layer == 0 &&
radv_layout_can_expclear(iview->image, layout) && radv_layout_is_htile_compressed(iview->image, layout, queue_mask) &&
memcmp(&iview->extent, &iview->image->extent, sizeof(iview->extent)) == 0) !radv_image_extent_compare(iview->image, &iview->extent))
return true; return true;
return false; return false;
} }
static struct radv_pipeline * static struct radv_pipeline *
pick_depthstencil_pipeline(struct radv_meta_state *meta_state, pick_depthstencil_pipeline(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_state *meta_state,
const struct radv_image_view *iview, const struct radv_image_view *iview,
int samples_log2, int samples_log2,
VkImageAspectFlags aspects, VkImageAspectFlags aspects,
@@ -649,7 +571,7 @@ pick_depthstencil_pipeline(struct radv_meta_state *meta_state,
const VkClearRect *clear_rect, const VkClearRect *clear_rect,
VkClearDepthStencilValue clear_value) VkClearDepthStencilValue clear_value)
{ {
bool fast = depth_view_can_fast_clear(iview, layout, clear_rect); bool fast = depth_view_can_fast_clear(cmd_buffer, iview, layout, clear_rect);
int index = DEPTH_CLEAR_SLOW; int index = DEPTH_CLEAR_SLOW;
if (fast) { if (fast) {
@@ -682,10 +604,9 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
VkClearDepthStencilValue clear_value = clear_att->clearValue.depthStencil; VkClearDepthStencilValue clear_value = clear_att->clearValue.depthStencil;
VkImageAspectFlags aspects = clear_att->aspectMask; VkImageAspectFlags aspects = clear_att->aspectMask;
const struct radv_image_view *iview = fb->attachments[pass_att].attachment; const struct radv_image_view *iview = fb->attachments[pass_att].attachment;
const uint32_t samples = iview->image->samples; const uint32_t samples = iview->image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1; const uint32_t samples_log2 = ffs(samples) - 1;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer); VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
assert(aspects == VK_IMAGE_ASPECT_DEPTH_BIT || assert(aspects == VK_IMAGE_ASPECT_DEPTH_BIT ||
aspects == VK_IMAGE_ASPECT_STENCIL_BIT || aspects == VK_IMAGE_ASPECT_STENCIL_BIT ||
@@ -693,48 +614,21 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
VK_IMAGE_ASPECT_STENCIL_BIT)); VK_IMAGE_ASPECT_STENCIL_BIT));
assert(pass_att != VK_ATTACHMENT_UNUSED); assert(pass_att != VK_ATTACHMENT_UNUSED);
const struct depthstencil_clear_vattrs vertex_data[3] = { if (!(aspects & VK_IMAGE_ASPECT_DEPTH_BIT))
{ clear_value.depth = 1.0f;
.position = {
-1.0,
-1.0
},
.depth_clear = clear_value.depth,
},
{
.position = {
-1.0,
1.0,
},
.depth_clear = clear_value.depth,
},
{
.position = {
1.0,
-1.0,
},
.depth_clear = clear_value.depth,
},
};
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset); radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
struct radv_buffer vertex_buffer = { device->meta_state.clear_depth_p_layout,
.device = device, VK_SHADER_STAGE_VERTEX_BIT, 0, 4,
.size = sizeof(vertex_data), &clear_value.depth);
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT) { if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT) {
radv_CmdSetStencilReference(cmd_buffer_h, VK_STENCIL_FACE_FRONT_BIT, radv_CmdSetStencilReference(cmd_buffer_h, VK_STENCIL_FACE_FRONT_BIT,
clear_value.stencil); clear_value.stencil);
} }
radv_CmdBindVertexBuffers(cmd_buffer_h, 0, 1, struct radv_pipeline *pipeline = pick_depthstencil_pipeline(cmd_buffer,
(VkBuffer[]) { radv_buffer_to_handle(&vertex_buffer) }, meta_state,
(VkDeviceSize[]) { 0 });
struct radv_pipeline *pipeline = pick_depthstencil_pipeline(meta_state,
iview, iview,
samples_log2, samples_log2,
aspects, aspects,
@@ -746,7 +640,7 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
radv_pipeline_to_handle(pipeline)); radv_pipeline_to_handle(pipeline));
} }
if (depth_view_can_fast_clear(iview, subpass->depth_stencil_attachment.layout, clear_rect)) if (depth_view_can_fast_clear(cmd_buffer, iview, subpass->depth_stencil_attachment.layout, clear_rect))
radv_set_depth_clear_regs(cmd_buffer, iview->image, clear_value, aspects); radv_set_depth_clear_regs(cmd_buffer, iview->image, clear_value, aspects);
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) { radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) {
@@ -763,6 +657,95 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, 0); radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, 0);
} }
static bool
emit_fast_htile_clear(struct radv_cmd_buffer *cmd_buffer,
const VkClearAttachment *clear_att,
const VkClearRect *clear_rect,
enum radv_cmd_flush_bits *pre_flush,
enum radv_cmd_flush_bits *post_flush)
{
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
const uint32_t pass_att = subpass->depth_stencil_attachment.attachment;
VkImageLayout image_layout = subpass->depth_stencil_attachment.layout;
const struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_image_view *iview = fb->attachments[pass_att].attachment;
VkClearDepthStencilValue clear_value = clear_att->clearValue.depthStencil;
VkImageAspectFlags aspects = clear_att->aspectMask;
uint32_t clear_word;
if (!iview->image->surface.htile_size)
return false;
if (cmd_buffer->device->debug_flags & RADV_DEBUG_NO_FAST_CLEARS)
return false;
if (!radv_layout_is_htile_compressed(iview->image, image_layout, radv_image_queue_family_mask(iview->image, cmd_buffer->queue_family_index, cmd_buffer->queue_family_index)))
goto fail;
/* don't fast clear 3D */
if (iview->image->type == VK_IMAGE_TYPE_3D)
goto fail;
/* all layers are bound */
if (iview->base_layer > 0)
goto fail;
if (iview->image->info.array_size != iview->layer_count)
goto fail;
if (iview->image->info.levels > 1)
goto fail;
if (!radv_image_extent_compare(iview->image, &iview->extent))
goto fail;
if (clear_rect->rect.offset.x || clear_rect->rect.offset.y ||
clear_rect->rect.extent.width != iview->image->info.width ||
clear_rect->rect.extent.height != iview->image->info.height)
goto fail;
if (clear_rect->baseArrayLayer != 0)
goto fail;
if (clear_rect->layerCount != iview->image->info.array_size)
goto fail;
/* Don't do stencil clears till we have figured out if the clear words are
* correct. */
if (vk_format_aspects(iview->image->vk_format) & VK_IMAGE_ASPECT_STENCIL_BIT)
goto fail;
if (clear_value.depth == 1.0)
clear_word = 0xfffffff0;
else if (clear_value.depth == 0.0)
clear_word = 0;
else
goto fail;
if (pre_flush) {
cmd_buffer->state.flush_bits |= (RADV_CMD_FLAG_FLUSH_AND_INV_DB |
RADV_CMD_FLAG_FLUSH_AND_INV_DB_META) & ~ *pre_flush;
*pre_flush |= cmd_buffer->state.flush_bits;
} else
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_DB |
RADV_CMD_FLAG_FLUSH_AND_INV_DB_META;
radv_fill_buffer(cmd_buffer, iview->image->bo,
iview->image->offset + iview->image->htile_offset,
iview->image->surface.htile_size, clear_word);
radv_set_depth_clear_regs(cmd_buffer, iview->image, clear_value, aspects);
if (post_flush)
*post_flush |= RADV_CMD_FLAG_CS_PARTIAL_FLUSH |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2;
else
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_CS_PARTIAL_FLUSH |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2;
return true;
fail:
return false;
}
static VkFormat pipeline_formats[] = { static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM, VK_FORMAT_R8G8B8A8_UNORM,
@@ -785,6 +768,34 @@ radv_device_init_meta_clear_state(struct radv_device *device)
memset(&device->meta_state.clear, 0, sizeof(device->meta_state.clear)); memset(&device->meta_state.clear, 0, sizeof(device->meta_state.clear));
VkPipelineLayoutCreateInfo pl_color_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 0,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &(VkPushConstantRange){VK_SHADER_STAGE_FRAGMENT_BIT, 0, 16},
};
res = radv_CreatePipelineLayout(radv_device_to_handle(device),
&pl_color_create_info,
&device->meta_state.alloc,
&device->meta_state.clear_color_p_layout);
if (res != VK_SUCCESS)
goto fail;
VkPipelineLayoutCreateInfo pl_depth_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 0,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &(VkPushConstantRange){VK_SHADER_STAGE_VERTEX_BIT, 0, 4},
};
res = radv_CreatePipelineLayout(radv_device_to_handle(device),
&pl_depth_create_info,
&device->meta_state.alloc,
&device->meta_state.clear_depth_p_layout);
if (res != VK_SUCCESS)
goto fail;
for (uint32_t i = 0; i < ARRAY_SIZE(state->clear); ++i) { for (uint32_t i = 0; i < ARRAY_SIZE(state->clear); ++i) {
uint32_t samples = 1 << i; uint32_t samples = 1 << i;
for (uint32_t j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) { for (uint32_t j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) {
@@ -882,26 +893,30 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
/* all layers are bound */ /* all layers are bound */
if (iview->base_layer > 0) if (iview->base_layer > 0)
goto fail; goto fail;
if (iview->image->array_size != iview->layer_count) if (iview->image->info.array_size != iview->layer_count)
goto fail; goto fail;
if (iview->image->levels > 1) if (iview->image->info.levels > 1)
goto fail; goto fail;
if (iview->image->surface.level[0].mode < RADEON_SURF_MODE_1D) if (iview->image->surface.u.legacy.level[0].mode < RADEON_SURF_MODE_1D)
goto fail; goto fail;
if (!radv_image_extent_compare(iview->image, &iview->extent))
if (memcmp(&iview->extent, &iview->image->extent, sizeof(iview->extent)))
goto fail; goto fail;
if (clear_rect->rect.offset.x || clear_rect->rect.offset.y || if (clear_rect->rect.offset.x || clear_rect->rect.offset.y ||
clear_rect->rect.extent.width != iview->image->extent.width || clear_rect->rect.extent.width != iview->image->info.width ||
clear_rect->rect.extent.height != iview->image->extent.height) clear_rect->rect.extent.height != iview->image->info.height)
goto fail; goto fail;
if (clear_rect->baseArrayLayer != 0) if (clear_rect->baseArrayLayer != 0)
goto fail; goto fail;
if (clear_rect->layerCount != iview->image->array_size) if (clear_rect->layerCount != iview->image->info.array_size)
goto fail;
/* RB+ doesn't work with CMASK fast clear on Stoney. */
if (!iview->image->surface.dcc_size &&
cmd_buffer->device->physical_device->rad_info.family == CHIP_STONEY)
goto fail; goto fail;
/* DCC */ /* DCC */
@@ -962,7 +977,9 @@ emit_clear(struct radv_cmd_buffer *cmd_buffer,
} else { } else {
assert(clear_att->aspectMask & (VK_IMAGE_ASPECT_DEPTH_BIT | assert(clear_att->aspectMask & (VK_IMAGE_ASPECT_DEPTH_BIT |
VK_IMAGE_ASPECT_STENCIL_BIT)); VK_IMAGE_ASPECT_STENCIL_BIT));
emit_depthstencil_clear(cmd_buffer, clear_att, clear_rect); if (!emit_fast_htile_clear(cmd_buffer, clear_att, clear_rect,
pre_flush, post_flush))
emit_depthstencil_clear(cmd_buffer, clear_att, clear_rect);
} }
} }
@@ -1006,7 +1023,7 @@ radv_cmd_buffer_clear_subpass(struct radv_cmd_buffer *cmd_buffer)
if (!subpass_needs_clear(cmd_buffer)) if (!subpass_needs_clear(cmd_buffer))
return; return;
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
VkClearRect clear_rect = { VkClearRect clear_rect = {
.rect = cmd_state->render_area, .rect = cmd_state->render_area,
@@ -1077,8 +1094,7 @@ radv_clear_image_layer(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = range->baseArrayLayer + layer, .baseArrayLayer = range->baseArrayLayer + layer,
.layerCount = 1 .layerCount = 1
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT);
VkFramebuffer fb; VkFramebuffer fb;
radv_CreateFramebuffer(device_h, radv_CreateFramebuffer(device_h,
@@ -1211,7 +1227,7 @@ radv_cmd_clear_image(struct radv_cmd_buffer *cmd_buffer,
const VkImageSubresourceRange *range = &ranges[r]; const VkImageSubresourceRange *range = &ranges[r];
for (uint32_t l = 0; l < radv_get_levelCount(image, range); ++l) { for (uint32_t l = 0; l < radv_get_levelCount(image, range); ++l) {
const uint32_t layer_count = image->type == VK_IMAGE_TYPE_3D ? const uint32_t layer_count = image->type == VK_IMAGE_TYPE_3D ?
radv_minify(image->extent.depth, range->baseMipLevel + l) : radv_minify(image->info.depth, range->baseMipLevel + l) :
radv_get_layerCount(image, range); radv_get_layerCount(image, range);
for (uint32_t s = 0; s < layer_count; ++s) { for (uint32_t s = 0; s < layer_count; ++s) {
@@ -1254,7 +1270,7 @@ void radv_CmdClearColorImage(
if (cs) if (cs)
radv_meta_begin_cleari(cmd_buffer, &saved_state.compute); radv_meta_begin_cleari(cmd_buffer, &saved_state.compute);
else else
radv_meta_save_graphics_reset_vport_scissor(&saved_state.gfx, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state.gfx, cmd_buffer);
radv_cmd_clear_image(cmd_buffer, image, imageLayout, radv_cmd_clear_image(cmd_buffer, image, imageLayout,
(const VkClearValue *) pColor, (const VkClearValue *) pColor,
@@ -1278,7 +1294,7 @@ void radv_CmdClearDepthStencilImage(
RADV_FROM_HANDLE(radv_image, image, image_h); RADV_FROM_HANDLE(radv_image, image, image_h);
struct radv_meta_saved_state saved_state; struct radv_meta_saved_state saved_state;
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
radv_cmd_clear_image(cmd_buffer, image, imageLayout, radv_cmd_clear_image(cmd_buffer, image, imageLayout,
(const VkClearValue *) pDepthStencil, (const VkClearValue *) pDepthStencil,
@@ -1302,7 +1318,7 @@ void radv_CmdClearAttachments(
if (!cmd_buffer->state.subpass) if (!cmd_buffer->state.subpass)
return; return;
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
/* FINISHME: We can do better than this dumb loop. It thrashes too much /* FINISHME: We can do better than this dumb loop. It thrashes too much
* state. * state.

View File

@@ -118,12 +118,12 @@ meta_copy_buffer_to_image(struct radv_cmd_buffer *cmd_buffer,
/* The Vulkan 1.0 spec says "dstImage must have a sample count equal to /* The Vulkan 1.0 spec says "dstImage must have a sample count equal to
* VK_SAMPLE_COUNT_1_BIT." * VK_SAMPLE_COUNT_1_BIT."
*/ */
assert(image->samples == 1); assert(image->info.samples == 1);
if (cs) if (cs)
radv_meta_begin_bufimage(cmd_buffer, &saved_state.compute); radv_meta_begin_bufimage(cmd_buffer, &saved_state.compute);
else else
radv_meta_save_graphics_reset_vport_scissor(&saved_state.gfx, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state.gfx, cmd_buffer);
for (unsigned r = 0; r < regionCount; r++) { for (unsigned r = 0; r < regionCount; r++) {
@@ -337,11 +337,11 @@ meta_copy_image(struct radv_cmd_buffer *cmd_buffer,
* vkCmdCopyImage can be used to copy image data between multisample * vkCmdCopyImage can be used to copy image data between multisample
* images, but both images must have the same number of samples. * images, but both images must have the same number of samples.
*/ */
assert(src_image->samples == dest_image->samples); assert(src_image->info.samples == dest_image->info.samples);
if (cs) if (cs)
radv_meta_begin_itoi(cmd_buffer, &saved_state.compute); radv_meta_begin_itoi(cmd_buffer, &saved_state.compute);
else else
radv_meta_save_graphics_reset_vport_scissor(&saved_state.gfx, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state.gfx, cmd_buffer);
for (unsigned r = 0; r < regionCount; r++) { for (unsigned r = 0; r < regionCount; r++) {
assert(pRegions[r].srcSubresource.aspectMask == assert(pRegions[r].srcSubresource.aspectMask ==
@@ -447,8 +447,8 @@ void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer,
image_copy.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; image_copy.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
image_copy.dstSubresource.layerCount = 1; image_copy.dstSubresource.layerCount = 1;
image_copy.extent.width = image->extent.width; image_copy.extent.width = image->info.width;
image_copy.extent.height = image->extent.height; image_copy.extent.height = image->info.height;
image_copy.extent.depth = 1; image_copy.extent.depth = 1;
meta_copy_image(cmd_buffer, image, linear_image, meta_copy_image(cmd_buffer, image, linear_image,

View File

@@ -26,53 +26,7 @@
#include "radv_meta.h" #include "radv_meta.h"
#include "radv_private.h" #include "radv_private.h"
#include "nir/nir_builder.h"
#include "sid.h" #include "sid.h"
/**
* Vertex attributes used by all pipelines.
*/
struct vertex_attrs {
float position[2]; /**< 3DPRIM_RECTLIST */
};
/* passthrough vertex shader */
static nir_shader *
build_nir_vs(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *a_position;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_depth_decomp_vs");
a_position = nir_variable_create(b.shader, nir_var_shader_in, vec4,
"a_position");
a_position->data.location = VERT_ATTRIB_GENERIC0;
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, v_position, a_position);
return b.shader;
}
/* simple passthrough shader */
static nir_shader *
build_nir_fs(void)
{
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_asprintf(b.shader,
"meta_depth_decomp_noop_fs");
return b.shader;
}
static VkResult static VkResult
create_pass(struct radv_device *device) create_pass(struct radv_device *device)
@@ -124,7 +78,7 @@ create_pipeline(struct radv_device *device,
VkDevice device_h = radv_device_to_handle(device); VkDevice device_h = radv_device_to_handle(device);
struct radv_shader_module fs_module = { struct radv_shader_module fs_module = {
.nir = build_nir_fs(), .nir = radv_meta_build_nir_fs_noop(),
}; };
if (!fs_module.nir) { if (!fs_module.nir) {
@@ -152,24 +106,8 @@ create_pipeline(struct radv_device *device,
}, },
.pVertexInputState = &(VkPipelineVertexInputStateCreateInfo) { .pVertexInputState = &(VkPipelineVertexInputStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = sizeof(struct vertex_attrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct vertex_attrs, position),
},
},
}, },
.pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) { .pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
@@ -285,7 +223,7 @@ radv_device_init_meta_depth_decomp_state(struct radv_device *device)
zero(device->meta_state.depth_decomp); zero(device->meta_state.depth_decomp);
struct radv_shader_module vs_module = { .nir = build_nir_vs() }; struct radv_shader_module vs_module = { .nir = radv_meta_build_nir_vs_generate_vertices() };
if (!vs_module.nir) { if (!vs_module.nir) {
/* XXX: Need more accurate error */ /* XXX: Need more accurate error */
res = VK_ERROR_OUT_OF_HOST_MEMORY; res = VK_ERROR_OUT_OF_HOST_MEMORY;
@@ -318,45 +256,7 @@ emit_depth_decomp(struct radv_cmd_buffer *cmd_buffer,
const VkExtent2D *depth_decomp_extent, const VkExtent2D *depth_decomp_extent,
VkPipeline pipeline_h) VkPipeline pipeline_h)
{ {
struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer); VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
const struct vertex_attrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0,
},
},
{
.position = {
-1.0,
1.0,
},
},
{
.position = {
1.0,
-1.0,
},
},
};
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
VkBuffer vertex_buffer_h = radv_buffer_to_handle(&vertex_buffer);
radv_CmdBindVertexBuffers(cmd_buffer_h,
/*firstBinding*/ 0,
/*bindingCount*/ 1,
(VkBuffer[]) { vertex_buffer_h },
(VkDeviceSize[]) { 0 });
RADV_FROM_HANDLE(radv_pipeline, pipeline, pipeline_h); RADV_FROM_HANDLE(radv_pipeline, pipeline, pipeline_h);
@@ -392,16 +292,16 @@ static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_saved_pass_state saved_pass_state; struct radv_meta_saved_pass_state saved_pass_state;
VkDevice device_h = radv_device_to_handle(cmd_buffer->device); VkDevice device_h = radv_device_to_handle(cmd_buffer->device);
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer); VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t width = radv_minify(image->extent.width, uint32_t width = radv_minify(image->info.width,
subresourceRange->baseMipLevel); subresourceRange->baseMipLevel);
uint32_t height = radv_minify(image->extent.height, uint32_t height = radv_minify(image->info.height,
subresourceRange->baseMipLevel); subresourceRange->baseMipLevel);
if (!image->surface.htile_size) if (!image->surface.htile_size)
return; return;
radv_meta_save_pass(&saved_pass_state, cmd_buffer); radv_meta_save_pass(&saved_pass_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t layer = 0; layer < radv_get_layerCount(image, subresourceRange); layer++) { for (uint32_t layer = 0; layer < radv_get_layerCount(image, subresourceRange); layer++) {
struct radv_image_view iview; struct radv_image_view iview;
@@ -418,8 +318,7 @@ static void radv_process_depth_image_inplace(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = subresourceRange->baseArrayLayer + layer, .baseArrayLayer = subresourceRange->baseArrayLayer + layer,
.layerCount = 1, .layerCount = 1,
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT);
VkFramebuffer fb_h; VkFramebuffer fb_h;

View File

@@ -26,53 +26,7 @@
#include "radv_meta.h" #include "radv_meta.h"
#include "radv_private.h" #include "radv_private.h"
#include "nir/nir_builder.h"
#include "sid.h" #include "sid.h"
/**
* Vertex attributes used by all pipelines.
*/
struct vertex_attrs {
float position[2]; /**< 3DPRIM_RECTLIST */
};
/* passthrough vertex shader */
static nir_shader *
build_nir_vs(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *a_position;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_fast_clear_vs");
a_position = nir_variable_create(b.shader, nir_var_shader_in, vec4,
"a_position");
a_position->data.location = VERT_ATTRIB_GENERIC0;
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, v_position, a_position);
return b.shader;
}
/* simple passthrough shader */
static nir_shader *
build_nir_fs(void)
{
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_asprintf(b.shader,
"meta_fast_clear_noop_fs");
return b.shader;
}
static VkResult static VkResult
create_pass(struct radv_device *device) create_pass(struct radv_device *device)
@@ -128,7 +82,7 @@ create_pipeline(struct radv_device *device,
VkDevice device_h = radv_device_to_handle(device); VkDevice device_h = radv_device_to_handle(device);
struct radv_shader_module fs_module = { struct radv_shader_module fs_module = {
.nir = build_nir_fs(), .nir = radv_meta_build_nir_fs_noop(),
}; };
if (!fs_module.nir) { if (!fs_module.nir) {
@@ -154,24 +108,8 @@ create_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo vi_state = { const VkPipelineVertexInputStateCreateInfo vi_state = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = sizeof(struct vertex_attrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct vertex_attrs, position),
},
}
}; };
const VkPipelineInputAssemblyStateCreateInfo ia_state = { const VkPipelineInputAssemblyStateCreateInfo ia_state = {
@@ -330,7 +268,7 @@ radv_device_init_meta_fast_clear_flush_state(struct radv_device *device)
zero(device->meta_state.fast_clear_flush); zero(device->meta_state.fast_clear_flush);
struct radv_shader_module vs_module = { .nir = build_nir_vs() }; struct radv_shader_module vs_module = { .nir = radv_meta_build_nir_vs_generate_vertices() };
if (!vs_module.nir) { if (!vs_module.nir) {
/* XXX: Need more accurate error */ /* XXX: Need more accurate error */
res = VK_ERROR_OUT_OF_HOST_MEMORY; res = VK_ERROR_OUT_OF_HOST_MEMORY;
@@ -364,43 +302,6 @@ emit_fast_clear_flush(struct radv_cmd_buffer *cmd_buffer,
{ {
struct radv_device *device = cmd_buffer->device; struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer); VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
const struct vertex_attrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0,
},
},
{
.position = {
-1.0,
1.0,
},
},
{
.position = {
1.0,
-1.0,
},
},
};
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
VkBuffer vertex_buffer_h = radv_buffer_to_handle(&vertex_buffer);
radv_CmdBindVertexBuffers(cmd_buffer_h,
/*firstBinding*/ 0,
/*bindingCount*/ 1,
(VkBuffer[]) { vertex_buffer_h },
(VkDeviceSize[]) { 0 });
VkPipeline pipeline_h; VkPipeline pipeline_h;
if (fmask_decompress) if (fmask_decompress)
@@ -448,7 +349,7 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
assert(cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL); assert(cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL);
radv_meta_save_pass(&saved_pass_state, cmd_buffer); radv_meta_save_pass(&saved_pass_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t layer = 0; layer < layer_count; ++layer) { for (uint32_t layer = 0; layer < layer_count; ++layer) {
struct radv_image_view iview; struct radv_image_view iview;
@@ -466,8 +367,7 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = subresourceRange->baseArrayLayer + layer, .baseArrayLayer = subresourceRange->baseArrayLayer + layer,
.layerCount = 1, .layerCount = 1,
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT);
VkFramebuffer fb_h; VkFramebuffer fb_h;
radv_CreateFramebuffer(device_h, radv_CreateFramebuffer(device_h,
@@ -477,8 +377,8 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
.pAttachments = (VkImageView[]) { .pAttachments = (VkImageView[]) {
radv_image_view_to_handle(&iview) radv_image_view_to_handle(&iview)
}, },
.width = image->extent.width, .width = image->info.width,
.height = image->extent.height, .height = image->info.height,
.layers = 1 .layers = 1
}, },
&cmd_buffer->pool->alloc, &cmd_buffer->pool->alloc,
@@ -495,8 +395,8 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
0, 0,
}, },
.extent = { .extent = {
image->extent.width, image->info.width,
image->extent.height, image->info.height,
} }
}, },
.clearValueCount = 0, .clearValueCount = 0,
@@ -505,7 +405,7 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
VK_SUBPASS_CONTENTS_INLINE); VK_SUBPASS_CONTENTS_INLINE);
emit_fast_clear_flush(cmd_buffer, emit_fast_clear_flush(cmd_buffer,
&(VkExtent2D) { image->extent.width, image->extent.height }, &(VkExtent2D) { image->info.width, image->info.height },
image->fmask.size > 0); image->fmask.size > 0);
radv_CmdEndRenderPass(cmd_buffer_h); radv_CmdEndRenderPass(cmd_buffer_h);

View File

@@ -28,40 +28,8 @@
#include "radv_private.h" #include "radv_private.h"
#include "nir/nir_builder.h" #include "nir/nir_builder.h"
#include "sid.h" #include "sid.h"
/**
* Vertex attributes used by all pipelines.
*/
struct vertex_attrs {
float position[2]; /**< 3DPRIM_RECTLIST */
};
/* passthrough vertex shader */ /* emit 0, 0, 0, 1 */
static nir_shader *
build_nir_vs(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_variable *a_position;
nir_variable *v_position;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "meta_resolve_vs");
a_position = nir_variable_create(b.shader, nir_var_shader_in, vec4,
"a_position");
a_position->data.location = VERT_ATTRIB_GENERIC0;
v_position = nir_variable_create(b.shader, nir_var_shader_out, vec4,
"gl_Position");
v_position->data.location = VARYING_SLOT_POS;
nir_copy_var(&b, v_position, a_position);
return b.shader;
}
/* simple passthrough shader */
static nir_shader * static nir_shader *
build_nir_fs(void) build_nir_fs(void)
{ {
@@ -70,7 +38,7 @@ build_nir_fs(void)
nir_variable *f_color; /* vec4, fragment output color */ nir_variable *f_color; /* vec4, fragment output color */
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info->name = ralloc_asprintf(b.shader, b.shader->info.name = ralloc_asprintf(b.shader,
"meta_resolve_fs"); "meta_resolve_fs");
f_color = nir_variable_create(b.shader, nir_var_shader_out, vec4, f_color = nir_variable_create(b.shader, nir_var_shader_out, vec4,
@@ -174,24 +142,8 @@ create_pipeline(struct radv_device *device,
}, },
.pVertexInputState = &(VkPipelineVertexInputStateCreateInfo) { .pVertexInputState = &(VkPipelineVertexInputStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 1, .vertexBindingDescriptionCount = 0,
.pVertexBindingDescriptions = (VkVertexInputBindingDescription[]) { .vertexAttributeDescriptionCount = 0,
{
.binding = 0,
.stride = sizeof(struct vertex_attrs),
.inputRate = VK_VERTEX_INPUT_RATE_VERTEX
},
},
.vertexAttributeDescriptionCount = 1,
.pVertexAttributeDescriptions = (VkVertexInputAttributeDescription[]) {
{
/* Position */
.location = 0,
.binding = 0,
.format = VK_FORMAT_R32G32_SFLOAT,
.offset = offsetof(struct vertex_attrs, position),
},
},
}, },
.pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) { .pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
@@ -288,7 +240,7 @@ radv_device_init_meta_resolve_state(struct radv_device *device)
zero(device->meta_state.resolve); zero(device->meta_state.resolve);
struct radv_shader_module vs_module = { .nir = build_nir_vs() }; struct radv_shader_module vs_module = { .nir = radv_meta_build_nir_vs_generate_vertices() };
if (!vs_module.nir) { if (!vs_module.nir) {
/* XXX: Need more accurate error */ /* XXX: Need more accurate error */
res = VK_ERROR_OUT_OF_HOST_MEMORY; res = VK_ERROR_OUT_OF_HOST_MEMORY;
@@ -322,44 +274,8 @@ emit_resolve(struct radv_cmd_buffer *cmd_buffer,
{ {
struct radv_device *device = cmd_buffer->device; struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer); VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
uint32_t offset;
const struct vertex_attrs vertex_data[3] = {
{
.position = {
-1.0,
-1.0,
},
},
{
.position = {
-1.0,
1.0,
},
},
{
.position = {
1.0,
-1.0,
},
},
};
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB; cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
radv_cmd_buffer_upload_data(cmd_buffer, sizeof(vertex_data), 16, vertex_data, &offset);
struct radv_buffer vertex_buffer = {
.device = device,
.size = sizeof(vertex_data),
.bo = cmd_buffer->upload.upload_bo,
.offset = offset,
};
VkBuffer vertex_buffer_h = radv_buffer_to_handle(&vertex_buffer);
radv_CmdBindVertexBuffers(cmd_buffer_h,
/*firstBinding*/ 0,
/*bindingCount*/ 1,
(VkBuffer[]) { vertex_buffer_h },
(VkDeviceSize[]) { 0 });
VkPipeline pipeline_h = device->meta_state.resolve.pipeline; VkPipeline pipeline_h = device->meta_state.resolve.pipeline;
RADV_FROM_HANDLE(radv_pipeline, pipeline, pipeline_h); RADV_FROM_HANDLE(radv_pipeline, pipeline, pipeline_h);
@@ -387,6 +303,25 @@ emit_resolve(struct radv_cmd_buffer *cmd_buffer,
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB; cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
} }
enum radv_resolve_method {
RESOLVE_HW,
RESOLVE_COMPUTE,
RESOLVE_FRAGMENT,
};
static void radv_pick_resolve_method_images(struct radv_image *src_image,
struct radv_image *dest_image,
enum radv_resolve_method *method)
{
if (dest_image->surface.micro_tile_mode != src_image->surface.micro_tile_mode) {
if (dest_image->surface.num_dcc_levels > 0)
*method = RESOLVE_FRAGMENT;
else
*method = RESOLVE_COMPUTE;
}
}
void radv_CmdResolveImage( void radv_CmdResolveImage(
VkCommandBuffer cmd_buffer_h, VkCommandBuffer cmd_buffer_h,
VkImage src_image_h, VkImage src_image_h,
@@ -402,28 +337,39 @@ void radv_CmdResolveImage(
struct radv_device *device = cmd_buffer->device; struct radv_device *device = cmd_buffer->device;
struct radv_meta_saved_state saved_state; struct radv_meta_saved_state saved_state;
VkDevice device_h = radv_device_to_handle(device); VkDevice device_h = radv_device_to_handle(device);
bool use_compute_resolve = false; enum radv_resolve_method resolve_method = RESOLVE_HW;
/* we can use the hw resolve only for single full resolves */ /* we can use the hw resolve only for single full resolves */
if (region_count == 1) { if (region_count == 1) {
if (regions[0].srcOffset.x || if (regions[0].srcOffset.x ||
regions[0].srcOffset.y || regions[0].srcOffset.y ||
regions[0].srcOffset.z) regions[0].srcOffset.z)
use_compute_resolve = true; resolve_method = RESOLVE_COMPUTE;
if (regions[0].dstOffset.x || if (regions[0].dstOffset.x ||
regions[0].dstOffset.y || regions[0].dstOffset.y ||
regions[0].dstOffset.z) regions[0].dstOffset.z)
use_compute_resolve = true; resolve_method = RESOLVE_COMPUTE;
if (regions[0].extent.width != src_image->extent.width || if (regions[0].extent.width != src_image->info.width ||
regions[0].extent.height != src_image->extent.height || regions[0].extent.height != src_image->info.height ||
regions[0].extent.depth != src_image->extent.depth) regions[0].extent.depth != src_image->info.depth)
use_compute_resolve = true; resolve_method = RESOLVE_COMPUTE;
} else } else
use_compute_resolve = true; resolve_method = RESOLVE_COMPUTE;
if (use_compute_resolve) { radv_pick_resolve_method_images(src_image, dest_image,
&resolve_method);
if (resolve_method == RESOLVE_FRAGMENT) {
radv_meta_resolve_fragment_image(cmd_buffer,
src_image,
src_image_layout,
dest_image,
dest_image_layout,
region_count, regions);
return;
}
if (resolve_method == RESOLVE_COMPUTE) {
radv_meta_resolve_compute_image(cmd_buffer, radv_meta_resolve_compute_image(cmd_buffer,
src_image, src_image,
src_image_layout, src_image_layout,
@@ -433,12 +379,12 @@ void radv_CmdResolveImage(
return; return;
} }
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
assert(src_image->samples > 1); assert(src_image->info.samples > 1);
assert(dest_image->samples == 1); assert(dest_image->info.samples == 1);
if (src_image->samples >= 16) { if (src_image->info.samples >= 16) {
/* See commit aa3f9aaf31e9056a255f9e0472ebdfdaa60abe54 for the /* See commit aa3f9aaf31e9056a255f9e0472ebdfdaa60abe54 for the
* glBlitFramebuffer workaround for samples >= 16. * glBlitFramebuffer workaround for samples >= 16.
*/ */
@@ -446,7 +392,7 @@ void radv_CmdResolveImage(
"samples >= 16"); "samples >= 16");
} }
if (src_image->array_size > 1) if (src_image->info.array_size > 1)
radv_finishme("vkCmdResolveImage: multisample array images"); radv_finishme("vkCmdResolveImage: multisample array images");
if (dest_image->surface.dcc_size) { if (dest_image->surface.dcc_size) {
@@ -512,8 +458,7 @@ void radv_CmdResolveImage(
.baseArrayLayer = src_base_layer + layer, .baseArrayLayer = src_base_layer + layer,
.layerCount = 1, .layerCount = 1,
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_SAMPLED_BIT);
struct radv_image_view dest_iview; struct radv_image_view dest_iview;
radv_image_view_init(&dest_iview, cmd_buffer->device, radv_image_view_init(&dest_iview, cmd_buffer->device,
@@ -529,8 +474,7 @@ void radv_CmdResolveImage(
.baseArrayLayer = dest_base_layer + layer, .baseArrayLayer = dest_base_layer + layer,
.layerCount = 1, .layerCount = 1,
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT);
VkFramebuffer fb_h; VkFramebuffer fb_h;
radv_CreateFramebuffer(device_h, radv_CreateFramebuffer(device_h,
@@ -541,9 +485,9 @@ void radv_CmdResolveImage(
radv_image_view_to_handle(&src_iview), radv_image_view_to_handle(&src_iview),
radv_image_view_to_handle(&dest_iview), radv_image_view_to_handle(&dest_iview),
}, },
.width = radv_minify(dest_image->extent.width, .width = radv_minify(dest_image->info.width,
region->dstSubresource.mipLevel), region->dstSubresource.mipLevel),
.height = radv_minify(dest_image->extent.height, .height = radv_minify(dest_image->info.height,
region->dstSubresource.mipLevel), region->dstSubresource.mipLevel),
.layers = 1 .layers = 1
}, },
@@ -599,6 +543,7 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
struct radv_framebuffer *fb = cmd_buffer->state.framebuffer; struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass; const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radv_meta_saved_state saved_state; struct radv_meta_saved_state saved_state;
enum radv_resolve_method resolve_method = RESOLVE_HW;
/* FINISHME(perf): Skip clears for resolve attachments. /* FINISHME(perf): Skip clears for resolve attachments.
* *
@@ -612,7 +557,27 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
if (!subpass->has_resolve) if (!subpass->has_resolve)
return; return;
radv_meta_save_graphics_reset_vport_scissor(&saved_state, cmd_buffer); for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
struct radv_image *src_img = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment->image;
radv_pick_resolve_method_images(dst_img, src_img, &resolve_method);
if (resolve_method == RESOLVE_FRAGMENT) {
break;
}
}
if (resolve_method == RESOLVE_COMPUTE) {
radv_cmd_buffer_resolve_subpass_cs(cmd_buffer);
return;
} else if (resolve_method == RESOLVE_FRAGMENT) {
radv_cmd_buffer_resolve_subpass_fs(cmd_buffer);
return;
}
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t i = 0; i < subpass->color_count; ++i) { for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i]; VkAttachmentReference src_att = subpass->color_attachments[i];

View File

@@ -32,11 +32,10 @@
#include "vk_format.h" #include "vk_format.h"
static nir_shader * static nir_shader *
build_resolve_compute_shader(struct radv_device *dev, bool is_integer, int samples) build_resolve_compute_shader(struct radv_device *dev, bool is_integer, bool is_srgb, int samples)
{ {
nir_builder b; nir_builder b;
char name[64]; char name[64];
nir_if *outer_if = NULL;
const struct glsl_type *sampler_type = glsl_sampler_type(GLSL_SAMPLER_DIM_MS, const struct glsl_type *sampler_type = glsl_sampler_type(GLSL_SAMPLER_DIM_MS,
false, false,
false, false,
@@ -45,12 +44,12 @@ build_resolve_compute_shader(struct radv_device *dev, bool is_integer, int sampl
false, false,
false, false,
GLSL_TYPE_FLOAT); GLSL_TYPE_FLOAT);
snprintf(name, 64, "meta_resolve_cs-%d-%s", samples, is_integer ? "int" : "float"); snprintf(name, 64, "meta_resolve_cs-%d-%s", samples, is_integer ? "int" : (is_srgb ? "srgb" : "float"));
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, name); b.shader->info.name = ralloc_strdup(b.shader, name);
b.shader->info->cs.local_size[0] = 16; b.shader->info.cs.local_size[0] = 16;
b.shader->info->cs.local_size[1] = 16; b.shader->info.cs.local_size[1] = 16;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform, nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
sampler_type, "s_tex"); sampler_type, "s_tex");
@@ -64,105 +63,40 @@ build_resolve_compute_shader(struct radv_device *dev, bool is_integer, int sampl
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(src_offset, 0);
nir_intrinsic_set_range(src_offset, 16);
src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0)); src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
src_offset->num_components = 2; src_offset->num_components = 2;
nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset"); nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset");
nir_builder_instr_insert(&b, &src_offset->instr); nir_builder_instr_insert(&b, &src_offset->instr);
nir_intrinsic_instr *dst_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *dst_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(dst_offset, 0);
nir_intrinsic_set_range(dst_offset, 16);
dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8)); dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 8));
dst_offset->num_components = 2; dst_offset->num_components = 2;
nir_ssa_dest_init(&dst_offset->instr, &dst_offset->dest, 2, 32, "dst_offset"); nir_ssa_dest_init(&dst_offset->instr, &dst_offset->dest, 2, 32, "dst_offset");
nir_builder_instr_insert(&b, &dst_offset->instr); nir_builder_instr_insert(&b, &dst_offset->instr);
nir_ssa_def *img_coord = nir_channels(&b, nir_iadd(&b, global_id, &src_offset->dest.ssa), 0x3); nir_ssa_def *img_coord = nir_channels(&b, nir_iadd(&b, global_id, &src_offset->dest.ssa), 0x3);
/* do a txf_ms on each sample */ nir_variable *color = nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
nir_ssa_def *tmp;
nir_tex_instr *tex = nir_tex_instr_create(b.shader, 2); radv_meta_build_resolve_shader_core(&b, is_integer, is_srgb, samples,
tex->sampler_dim = GLSL_SAMPLER_DIM_MS; input_img, color, img_coord);
tex->op = nir_texop_txf_ms;
tex->src[0].src_type = nir_tex_src_coord;
tex->src[0].src = nir_src_for_ssa(img_coord);
tex->src[1].src_type = nir_tex_src_ms_index;
tex->src[1].src = nir_src_for_ssa(nir_imm_int(&b, 0));
tex->dest_type = nir_type_float;
tex->is_array = false;
tex->coord_components = 2;
tex->texture = nir_deref_var_create(tex, input_img);
tex->sampler = NULL;
nir_ssa_dest_init(&tex->instr, &tex->dest, 4, 32, "tex"); nir_ssa_def *outval = nir_load_var(&b, color);
nir_builder_instr_insert(&b, &tex->instr);
tmp = &tex->dest.ssa;
nir_variable *color =
nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
if (!is_integer && samples > 1) {
nir_tex_instr *tex_all_same = nir_tex_instr_create(b.shader, 1);
tex_all_same->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_all_same->op = nir_texop_samples_identical;
tex_all_same->src[0].src_type = nir_tex_src_coord;
tex_all_same->src[0].src = nir_src_for_ssa(img_coord);
tex_all_same->dest_type = nir_type_float;
tex_all_same->is_array = false;
tex_all_same->coord_components = 2;
tex_all_same->texture = nir_deref_var_create(tex_all_same, input_img);
tex_all_same->sampler = NULL;
nir_ssa_dest_init(&tex_all_same->instr, &tex_all_same->dest, 1, 32, "tex");
nir_builder_instr_insert(&b, &tex_all_same->instr);
nir_ssa_def *all_same = nir_ine(&b, &tex_all_same->dest.ssa, nir_imm_int(&b, 0));
nir_if *if_stmt = nir_if_create(b.shader);
if_stmt->condition = nir_src_for_ssa(all_same);
nir_cf_node_insert(b.cursor, &if_stmt->cf_node);
b.cursor = nir_after_cf_list(&if_stmt->then_list);
for (int i = 1; i < samples; i++) {
nir_tex_instr *tex_add = nir_tex_instr_create(b.shader, 2);
tex_add->sampler_dim = GLSL_SAMPLER_DIM_MS;
tex_add->op = nir_texop_txf_ms;
tex_add->src[0].src_type = nir_tex_src_coord;
tex_add->src[0].src = nir_src_for_ssa(img_coord);
tex_add->src[1].src_type = nir_tex_src_ms_index;
tex_add->src[1].src = nir_src_for_ssa(nir_imm_int(&b, i));
tex_add->dest_type = nir_type_float;
tex_add->is_array = false;
tex_add->coord_components = 2;
tex_add->texture = nir_deref_var_create(tex_add, input_img);
tex_add->sampler = NULL;
nir_ssa_dest_init(&tex_add->instr, &tex_add->dest, 4, 32, "tex");
nir_builder_instr_insert(&b, &tex_add->instr);
tmp = nir_fadd(&b, tmp, &tex_add->dest.ssa);
}
tmp = nir_fdiv(&b, tmp, nir_imm_float(&b, samples));
nir_store_var(&b, color, tmp, 0xf);
b.cursor = nir_after_cf_list(&if_stmt->else_list);
outer_if = if_stmt;
}
nir_store_var(&b, color, &tex->dest.ssa, 0xf);
if (outer_if)
b.cursor = nir_after_cf_node(&outer_if->cf_node);
nir_ssa_def *newv = nir_load_var(&b, color);
nir_ssa_def *coord = nir_iadd(&b, global_id, &dst_offset->dest.ssa); nir_ssa_def *coord = nir_iadd(&b, global_id, &dst_offset->dest.ssa);
nir_intrinsic_instr *store = nir_intrinsic_instr_create(b.shader, nir_intrinsic_image_store); nir_intrinsic_instr *store = nir_intrinsic_instr_create(b.shader, nir_intrinsic_image_store);
store->src[0] = nir_src_for_ssa(coord); store->src[0] = nir_src_for_ssa(coord);
store->src[1] = nir_src_for_ssa(nir_ssa_undef(&b, 1, 32)); store->src[1] = nir_src_for_ssa(nir_ssa_undef(&b, 1, 32));
store->src[2] = nir_src_for_ssa(newv); store->src[2] = nir_src_for_ssa(outval);
store->variables[0] = nir_deref_var_create(store, output_img); store->variables[0] = nir_deref_var_create(store, output_img);
nir_builder_instr_insert(&b, &store->instr); nir_builder_instr_insert(&b, &store->instr);
return b.shader; return b.shader;
@@ -230,12 +164,13 @@ static VkResult
create_resolve_pipeline(struct radv_device *device, create_resolve_pipeline(struct radv_device *device,
int samples, int samples,
bool is_integer, bool is_integer,
bool is_srgb,
VkPipeline *pipeline) VkPipeline *pipeline)
{ {
VkResult result; VkResult result;
struct radv_shader_module cs = { .nir = NULL }; struct radv_shader_module cs = { .nir = NULL };
cs.nir = build_resolve_compute_shader(device, is_integer, samples); cs.nir = build_resolve_compute_shader(device, is_integer, is_srgb, samples);
/* compute shader */ /* compute shader */
@@ -282,12 +217,15 @@ radv_device_init_meta_resolve_compute_state(struct radv_device *device)
for (uint32_t i = 0; i < MAX_SAMPLES_LOG2; ++i) { for (uint32_t i = 0; i < MAX_SAMPLES_LOG2; ++i) {
uint32_t samples = 1 << i; uint32_t samples = 1 << i;
res = create_resolve_pipeline(device, samples, false, res = create_resolve_pipeline(device, samples, false, false,
&state->resolve_compute.rc[i].pipeline); &state->resolve_compute.rc[i].pipeline);
res = create_resolve_pipeline(device, samples, true, res = create_resolve_pipeline(device, samples, true, false,
&state->resolve_compute.rc[i].i_pipeline); &state->resolve_compute.rc[i].i_pipeline);
res = create_resolve_pipeline(device, samples, false, true,
&state->resolve_compute.rc[i].srgb_pipeline);
} }
return res; return res;
@@ -305,6 +243,10 @@ radv_device_finish_meta_resolve_compute_state(struct radv_device *device)
radv_DestroyPipeline(radv_device_to_handle(device), radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_compute.rc[i].i_pipeline, state->resolve_compute.rc[i].i_pipeline,
&state->alloc); &state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_compute.rc[i].srgb_pipeline,
&state->alloc);
} }
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device), radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
@@ -315,6 +257,78 @@ radv_device_finish_meta_resolve_compute_state(struct radv_device *device)
&state->alloc); &state->alloc);
} }
static void
emit_resolve(struct radv_cmd_buffer *cmd_buffer,
struct radv_image_view *src_iview,
struct radv_image_view *dest_iview,
const VkOffset2D *src_offset,
const VkOffset2D *dest_offset,
const VkExtent2D *resolve_extent)
{
struct radv_device *device = cmd_buffer->device;
const uint32_t samples = src_iview->image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1;
radv_meta_push_descriptor_set(cmd_buffer,
VK_PIPELINE_BIND_POINT_COMPUTE,
device->meta_state.resolve_compute.p_layout,
0, /* set */
2, /* descriptorWriteCount */
(VkWriteDescriptorSet[]) {
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 0,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(src_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL },
}
},
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 1,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(dest_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
}
});
VkPipeline pipeline;
if (vk_format_is_int(src_iview->image->vk_format))
pipeline = device->meta_state.resolve_compute.rc[samples_log2].i_pipeline;
else if (vk_format_is_srgb(src_iview->image->vk_format))
pipeline = device->meta_state.resolve_compute.rc[samples_log2].srgb_pipeline;
else
pipeline = device->meta_state.resolve_compute.rc[samples_log2].pipeline;
if (cmd_buffer->state.compute_pipeline != radv_pipeline_from_handle(pipeline)) {
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
}
unsigned push_constants[4] = {
src_offset->x,
src_offset->y,
dest_offset->x,
dest_offset->y,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.resolve_compute.p_layout,
VK_SHADER_STAGE_COMPUTE_BIT, 0, 16,
push_constants);
radv_unaligned_dispatch(cmd_buffer, resolve_extent->width, resolve_extent->height, 1);
}
void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer, void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image, struct radv_image *src_image,
VkImageLayout src_image_layout, VkImageLayout src_image_layout,
@@ -323,10 +337,7 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
uint32_t region_count, uint32_t region_count,
const VkImageResolve *regions) const VkImageResolve *regions)
{ {
struct radv_device *device = cmd_buffer->device;
struct radv_meta_saved_compute_state saved_state; struct radv_meta_saved_compute_state saved_state;
const uint32_t samples = src_image->samples;
const uint32_t samples_log2 = ffs(samples) - 1;
for (uint32_t r = 0; r < region_count; ++r) { for (uint32_t r = 0; r < region_count; ++r) {
const VkImageResolve *region = &regions[r]; const VkImageResolve *region = &regions[r];
@@ -383,8 +394,7 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = src_base_layer + layer, .baseArrayLayer = src_base_layer + layer,
.layerCount = 1, .layerCount = 1,
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_SAMPLED_BIT);
struct radv_image_view dest_iview; struct radv_image_view dest_iview;
radv_image_view_init(&dest_iview, cmd_buffer->device, radv_image_view_init(&dest_iview, cmd_buffer->device,
@@ -400,68 +410,108 @@ void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
.baseArrayLayer = dest_base_layer + layer, .baseArrayLayer = dest_base_layer + layer,
.layerCount = 1, .layerCount = 1,
}, },
}, });
cmd_buffer, VK_IMAGE_USAGE_STORAGE_BIT);
emit_resolve(cmd_buffer,
radv_meta_push_descriptor_set(cmd_buffer, &src_iview,
VK_PIPELINE_BIND_POINT_COMPUTE, &dest_iview,
device->meta_state.resolve_compute.p_layout, &(VkOffset2D) {srcOffset.x, srcOffset.y },
0, /* set */ &(VkOffset2D) {dstOffset.x, dstOffset.y },
2, /* descriptorWriteCount */ &(VkExtent2D) {extent.width, extent.height });
(VkWriteDescriptorSet[]) {
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 0,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(&src_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
},
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 1,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(&dest_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
}
});
VkPipeline pipeline;
if (vk_format_is_int(src_image->vk_format))
pipeline = device->meta_state.resolve_compute.rc[samples_log2].i_pipeline;
else
pipeline = device->meta_state.resolve_compute.rc[samples_log2].pipeline;
if (cmd_buffer->state.compute_pipeline != radv_pipeline_from_handle(pipeline)) {
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
}
unsigned push_constants[4] = {
srcOffset.x,
srcOffset.y,
dstOffset.x,
dstOffset.y,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.resolve_compute.p_layout,
VK_SHADER_STAGE_COMPUTE_BIT, 0, 16,
push_constants);
radv_unaligned_dispatch(cmd_buffer, extent.width, extent.height, 1);
} }
} }
radv_meta_restore_compute(&saved_state, cmd_buffer, 16); radv_meta_restore_compute(&saved_state, cmd_buffer, 16);
} }
/**
* Emit any needed resolves for the current subpass.
*/
void
radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer *cmd_buffer)
{
struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radv_meta_saved_compute_state saved_state;
/* FINISHME(perf): Skip clears for resolve attachments.
*
* From the Vulkan 1.0 spec:
*
* If the first use of an attachment in a render pass is as a resolve
* attachment, then the loadOp is effectively ignored as the resolve is
* guaranteed to overwrite all pixels in the render area.
*/
if (!subpass->has_resolve)
return;
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
if (dst_img->surface.dcc_size) {
radv_initialize_dcc(cmd_buffer, dst_img, 0xffffffff);
cmd_buffer->state.attachments[dest_att.attachment].current_layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
}
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = 0;
range.levelCount = 1;
range.baseArrayLayer = 0;
range.layerCount = 1;
radv_fast_clear_flush_image_inplace(cmd_buffer, src_iview->image, &range);
}
radv_meta_save_compute(&saved_state, cmd_buffer, 16);
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
struct radv_image_view *dst_iview = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_subpass resolve_subpass = {
.color_count = 1,
.color_attachments = (VkAttachmentReference[]) { dest_att },
.depth_stencil_attachment = { .attachment = VK_ATTACHMENT_UNUSED },
};
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
/* Subpass resolves must respect the render area. We can ignore the
* render area here because vkCmdBeginRenderPass set the render area
* with 3DSTATE_DRAWING_RECTANGLE.
*
* XXX(chadv): Does the hardware really respect
* 3DSTATE_DRAWING_RECTANGLE when draing a 3DPRIM_RECTLIST?
*/
emit_resolve(cmd_buffer,
src_iview,
dst_iview,
&(VkOffset2D) { 0, 0 },
&(VkOffset2D) { 0, 0 },
&(VkExtent2D) { fb->width, fb->height });
}
radv_meta_restore_compute(&saved_state, cmd_buffer, 16);
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = 0;
range.levelCount = 1;
range.baseArrayLayer = 0;
range.layerCount = 1;
radv_fast_clear_flush_image_inplace(cmd_buffer, dst_img, &range);
}
}

View File

@@ -0,0 +1,666 @@
/*
* Copyright © 2016 Dave Airlie
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include <assert.h>
#include <stdbool.h>
#include "radv_meta.h"
#include "radv_private.h"
#include "nir/nir_builder.h"
#include "sid.h"
#include "vk_format.h"
static nir_shader *
build_nir_vertex_shader(void)
{
const struct glsl_type *vec4 = glsl_vec4_type();
nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_VERTEX, NULL);
b.shader->info.name = ralloc_strdup(b.shader, "meta_resolve_vs");
nir_variable *pos_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "gl_Position");
pos_out->data.location = VARYING_SLOT_POS;
nir_ssa_def *outvec = radv_meta_gen_rect_vertices(&b);
nir_store_var(&b, pos_out, outvec, 0xf);
return b.shader;
}
static nir_shader *
build_resolve_fragment_shader(struct radv_device *dev, bool is_integer, bool is_srgb, int samples)
{
nir_builder b;
char name[64];
const struct glsl_type *vec2 = glsl_vector_type(GLSL_TYPE_FLOAT, 2);
const struct glsl_type *vec4 = glsl_vec4_type();
const struct glsl_type *sampler_type = glsl_sampler_type(GLSL_SAMPLER_DIM_MS,
false,
false,
GLSL_TYPE_FLOAT);
snprintf(name, 64, "meta_resolve_fs-%d-%s", samples, is_integer ? "int" : (is_srgb ? "srgb" : "float"));
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_FRAGMENT, NULL);
b.shader->info.name = ralloc_strdup(b.shader, name);
nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
sampler_type, "s_tex");
input_img->data.descriptor_set = 0;
input_img->data.binding = 0;
nir_variable *fs_pos_in = nir_variable_create(b.shader, nir_var_shader_in, vec2, "fs_pos_in");
fs_pos_in->data.location = VARYING_SLOT_POS;
nir_variable *color_out = nir_variable_create(b.shader, nir_var_shader_out,
vec4, "f_color");
color_out->data.location = FRAG_RESULT_DATA0;
nir_ssa_def *pos_in = nir_load_var(&b, fs_pos_in);
nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(src_offset, 0);
nir_intrinsic_set_range(src_offset, 8);
src_offset->src[0] = nir_src_for_ssa(nir_imm_int(&b, 0));
src_offset->num_components = 2;
nir_ssa_dest_init(&src_offset->instr, &src_offset->dest, 2, 32, "src_offset");
nir_builder_instr_insert(&b, &src_offset->instr);
nir_ssa_def *pos_int = nir_f2i32(&b, pos_in);
nir_ssa_def *img_coord = nir_channels(&b, nir_iadd(&b, pos_int, &src_offset->dest.ssa), 0x3);
nir_variable *color = nir_local_variable_create(b.impl, glsl_vec4_type(), "color");
radv_meta_build_resolve_shader_core(&b, is_integer, is_srgb,samples,
input_img, color, img_coord);
nir_ssa_def *outval = nir_load_var(&b, color);
nir_store_var(&b, color_out, outval, 0xf);
return b.shader;
}
static VkResult
create_layout(struct radv_device *device)
{
VkResult result;
/*
* one descriptors for the image being sampled
*/
VkDescriptorSetLayoutCreateInfo ds_create_info = {
.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
.flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR,
.bindingCount = 1,
.pBindings = (VkDescriptorSetLayoutBinding[]) {
{
.binding = 0,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.descriptorCount = 1,
.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT,
.pImmutableSamplers = NULL
},
}
};
result = radv_CreateDescriptorSetLayout(radv_device_to_handle(device),
&ds_create_info,
&device->meta_state.alloc,
&device->meta_state.resolve_fragment.ds_layout);
if (result != VK_SUCCESS)
goto fail;
VkPipelineLayoutCreateInfo pl_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1,
.pSetLayouts = &device->meta_state.resolve_fragment.ds_layout,
.pushConstantRangeCount = 1,
.pPushConstantRanges = &(VkPushConstantRange){VK_SHADER_STAGE_FRAGMENT_BIT, 0, 8},
};
result = radv_CreatePipelineLayout(radv_device_to_handle(device),
&pl_create_info,
&device->meta_state.alloc,
&device->meta_state.resolve_fragment.p_layout);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
fail:
return result;
}
static const VkPipelineVertexInputStateCreateInfo normal_vi_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
.vertexBindingDescriptionCount = 0,
.vertexAttributeDescriptionCount = 0,
};
static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,
VK_FORMAT_R16G16B16A16_SINT,
VK_FORMAT_R32_SFLOAT,
VK_FORMAT_R32G32_SFLOAT,
VK_FORMAT_R32G32B32A32_SFLOAT
};
static VkResult
create_resolve_pipeline(struct radv_device *device,
int samples_log2,
VkFormat format)
{
VkResult result;
bool is_integer = false, is_srgb = false;
uint32_t samples = 1 << samples_log2;
unsigned fs_key = radv_format_meta_fs_key(format);
const VkPipelineVertexInputStateCreateInfo *vi_create_info;
vi_create_info = &normal_vi_create_info;
if (vk_format_is_int(format))
is_integer = true;
else if (vk_format_is_srgb(format))
is_srgb = true;
struct radv_shader_module fs = { .nir = NULL };
fs.nir = build_resolve_fragment_shader(device, is_integer, is_srgb, samples);
struct radv_shader_module vs = {
.nir = build_nir_vertex_shader(),
};
VkRenderPass *rp = is_srgb ?
&device->meta_state.resolve_fragment.rc[samples_log2].srgb_render_pass :
&device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
assert(!*rp);
VkPipeline *pipeline = is_srgb ?
&device->meta_state.resolve_fragment.rc[samples_log2].srgb_pipeline :
&device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
assert(!*pipeline);
VkPipelineShaderStageCreateInfo pipeline_shader_stages[] = {
{
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.stage = VK_SHADER_STAGE_VERTEX_BIT,
.module = radv_shader_module_to_handle(&vs),
.pName = "main",
.pSpecializationInfo = NULL
}, {
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.stage = VK_SHADER_STAGE_FRAGMENT_BIT,
.module = radv_shader_module_to_handle(&fs),
.pName = "main",
.pSpecializationInfo = NULL
},
};
result = radv_CreateRenderPass(radv_device_to_handle(device),
&(VkRenderPassCreateInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = &(VkAttachmentDescription) {
.format = format,
.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.initialLayout = VK_IMAGE_LAYOUT_GENERAL,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL,
},
.subpassCount = 1,
.pSubpasses = &(VkSubpassDescription) {
.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS,
.inputAttachmentCount = 0,
.colorAttachmentCount = 1,
.pColorAttachments = &(VkAttachmentReference) {
.attachment = 0,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.pResolveAttachments = NULL,
.pDepthStencilAttachment = &(VkAttachmentReference) {
.attachment = VK_ATTACHMENT_UNUSED,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
},
.dependencyCount = 0,
}, &device->meta_state.alloc, rp);
const VkGraphicsPipelineCreateInfo vk_pipeline_info = {
.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,
.stageCount = ARRAY_SIZE(pipeline_shader_stages),
.pStages = pipeline_shader_stages,
.pVertexInputState = vi_create_info,
.pInputAssemblyState = &(VkPipelineInputAssemblyStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP,
.primitiveRestartEnable = false,
},
.pViewportState = &(VkPipelineViewportStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO,
.viewportCount = 1,
.scissorCount = 1,
},
.pRasterizationState = &(VkPipelineRasterizationStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO,
.rasterizerDiscardEnable = false,
.polygonMode = VK_POLYGON_MODE_FILL,
.cullMode = VK_CULL_MODE_NONE,
.frontFace = VK_FRONT_FACE_COUNTER_CLOCKWISE
},
.pMultisampleState = &(VkPipelineMultisampleStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,
.rasterizationSamples = 1,
.sampleShadingEnable = false,
.pSampleMask = (VkSampleMask[]) { UINT32_MAX },
},
.pColorBlendState = &(VkPipelineColorBlendStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = (VkPipelineColorBlendAttachmentState []) {
{ .colorWriteMask =
VK_COLOR_COMPONENT_A_BIT |
VK_COLOR_COMPONENT_R_BIT |
VK_COLOR_COMPONENT_G_BIT |
VK_COLOR_COMPONENT_B_BIT },
}
},
.pDynamicState = &(VkPipelineDynamicStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO,
.dynamicStateCount = 9,
.pDynamicStates = (VkDynamicState[]) {
VK_DYNAMIC_STATE_VIEWPORT,
VK_DYNAMIC_STATE_SCISSOR,
VK_DYNAMIC_STATE_LINE_WIDTH,
VK_DYNAMIC_STATE_DEPTH_BIAS,
VK_DYNAMIC_STATE_BLEND_CONSTANTS,
VK_DYNAMIC_STATE_DEPTH_BOUNDS,
VK_DYNAMIC_STATE_STENCIL_COMPARE_MASK,
VK_DYNAMIC_STATE_STENCIL_WRITE_MASK,
VK_DYNAMIC_STATE_STENCIL_REFERENCE,
},
},
.flags = 0,
.layout = device->meta_state.resolve_fragment.p_layout,
.renderPass = *rp,
.subpass = 0,
};
const struct radv_graphics_pipeline_create_info radv_pipeline_info = {
.use_rectlist = true
};
result = radv_graphics_pipeline_create(radv_device_to_handle(device),
radv_pipeline_cache_to_handle(&device->meta_state.cache),
&vk_pipeline_info, &radv_pipeline_info,
&device->meta_state.alloc,
pipeline);
ralloc_free(vs.nir);
ralloc_free(fs.nir);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
fail:
ralloc_free(vs.nir);
ralloc_free(fs.nir);
return result;
}
VkResult
radv_device_init_meta_resolve_fragment_state(struct radv_device *device)
{
struct radv_meta_state *state = &device->meta_state;
VkResult res;
memset(&state->resolve_fragment, 0, sizeof(state->resolve_fragment));
res = create_layout(device);
if (res != VK_SUCCESS)
return res;
for (uint32_t i = 0; i < MAX_SAMPLES_LOG2; ++i) {
for (unsigned j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) {
res = create_resolve_pipeline(device, i, pipeline_formats[j]);
}
res = create_resolve_pipeline(device, i, VK_FORMAT_R8G8B8A8_SRGB);
}
return res;
}
void
radv_device_finish_meta_resolve_fragment_state(struct radv_device *device)
{
struct radv_meta_state *state = &device->meta_state;
for (uint32_t i = 0; i < MAX_SAMPLES_LOG2; ++i) {
for (unsigned j = 0; j < NUM_META_FS_KEYS; ++j) {
radv_DestroyRenderPass(radv_device_to_handle(device),
state->resolve_fragment.rc[i].render_pass[j],
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_fragment.rc[i].pipeline[j],
&state->alloc);
}
radv_DestroyRenderPass(radv_device_to_handle(device),
state->resolve_fragment.rc[i].srgb_render_pass,
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->resolve_fragment.rc[i].srgb_pipeline,
&state->alloc);
}
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
state->resolve_fragment.ds_layout,
&state->alloc);
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->resolve_fragment.p_layout,
&state->alloc);
}
static void
emit_resolve(struct radv_cmd_buffer *cmd_buffer,
struct radv_image_view *src_iview,
struct radv_image_view *dest_iview,
const VkOffset2D *src_offset,
const VkOffset2D *dest_offset,
const VkExtent2D *resolve_extent)
{
struct radv_device *device = cmd_buffer->device;
VkCommandBuffer cmd_buffer_h = radv_cmd_buffer_to_handle(cmd_buffer);
const uint32_t samples = src_iview->image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1;
radv_meta_push_descriptor_set(cmd_buffer,
VK_PIPELINE_BIND_POINT_GRAPHICS,
cmd_buffer->device->meta_state.resolve_fragment.p_layout,
0, /* set */
1, /* descriptorWriteCount */
(VkWriteDescriptorSet[]) {
{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.dstBinding = 0,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
.pImageInfo = (VkDescriptorImageInfo[]) {
{
.sampler = VK_NULL_HANDLE,
.imageView = radv_image_view_to_handle(src_iview),
.imageLayout = VK_IMAGE_LAYOUT_GENERAL,
},
}
},
});
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
unsigned push_constants[2] = {
src_offset->x,
src_offset->y,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.resolve_fragment.p_layout,
VK_SHADER_STAGE_FRAGMENT_BIT, 0, 8,
push_constants);
unsigned fs_key = radv_format_meta_fs_key(dest_iview->vk_format);
VkPipeline pipeline_h = vk_format_is_srgb(dest_iview->vk_format) ?
device->meta_state.resolve_fragment.rc[samples_log2].srgb_pipeline :
device->meta_state.resolve_fragment.rc[samples_log2].pipeline[fs_key];
radv_CmdBindPipeline(cmd_buffer_h, VK_PIPELINE_BIND_POINT_GRAPHICS,
pipeline_h);
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) {
.x = dest_offset->x,
.y = dest_offset->y,
.width = resolve_extent->width,
.height = resolve_extent->height,
.minDepth = 0.0f,
.maxDepth = 1.0f
});
radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkRect2D) {
.offset = *dest_offset,
.extent = *resolve_extent,
});
radv_CmdDraw(cmd_buffer_h, 3, 1, 0, 0);
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_CB;
}
void radv_meta_resolve_fragment_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *src_image,
VkImageLayout src_image_layout,
struct radv_image *dest_image,
VkImageLayout dest_image_layout,
uint32_t region_count,
const VkImageResolve *regions)
{
struct radv_device *device = cmd_buffer->device;
struct radv_meta_saved_state saved_state;
const uint32_t samples = src_image->info.samples;
const uint32_t samples_log2 = ffs(samples) - 1;
unsigned fs_key = radv_format_meta_fs_key(dest_image->vk_format);
VkRenderPass rp;
for (uint32_t r = 0; r < region_count; ++r) {
const VkImageResolve *region = &regions[r];
const uint32_t src_base_layer =
radv_meta_get_iview_layer(src_image, &region->srcSubresource,
&region->srcOffset);
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = region->srcSubresource.mipLevel;
range.levelCount = 1;
range.baseArrayLayer = src_base_layer;
range.layerCount = region->srcSubresource.layerCount;
radv_fast_clear_flush_image_inplace(cmd_buffer, src_image, &range);
}
rp = vk_format_is_srgb(dest_image->vk_format) ?
device->meta_state.resolve_fragment.rc[samples_log2].srgb_render_pass :
device->meta_state.resolve_fragment.rc[samples_log2].render_pass[fs_key];
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t r = 0; r < region_count; ++r) {
const VkImageResolve *region = &regions[r];
assert(region->srcSubresource.aspectMask == VK_IMAGE_ASPECT_COLOR_BIT);
assert(region->dstSubresource.aspectMask == VK_IMAGE_ASPECT_COLOR_BIT);
assert(region->srcSubresource.layerCount == region->dstSubresource.layerCount);
const uint32_t src_base_layer =
radv_meta_get_iview_layer(src_image, &region->srcSubresource,
&region->srcOffset);
const uint32_t dest_base_layer =
radv_meta_get_iview_layer(dest_image, &region->dstSubresource,
&region->dstOffset);
const struct VkExtent3D extent =
radv_sanitize_image_extent(src_image->type, region->extent);
const struct VkOffset3D srcOffset =
radv_sanitize_image_offset(src_image->type, region->srcOffset);
const struct VkOffset3D dstOffset =
radv_sanitize_image_offset(dest_image->type, region->dstOffset);
for (uint32_t layer = 0; layer < region->srcSubresource.layerCount;
++layer) {
struct radv_image_view src_iview;
radv_image_view_init(&src_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(src_image),
.viewType = radv_meta_get_view_type(src_image),
.format = src_image->vk_format,
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = region->srcSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = src_base_layer + layer,
.layerCount = 1,
},
});
struct radv_image_view dest_iview;
radv_image_view_init(&dest_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(dest_image),
.viewType = radv_meta_get_view_type(dest_image),
.format = dest_image->vk_format,
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = region->dstSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = dest_base_layer + layer,
.layerCount = 1,
},
});
VkFramebuffer fb;
radv_CreateFramebuffer(radv_device_to_handle(cmd_buffer->device),
&(VkFramebufferCreateInfo) {
.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO,
.attachmentCount = 1,
.pAttachments = (VkImageView[]) {
radv_image_view_to_handle(&dest_iview),
},
.width = extent.width,
.height = extent.height,
.layers = 1
}, &cmd_buffer->pool->alloc, &fb);
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = rp,
.framebuffer = fb,
.renderArea = {
.offset = { dstOffset.x, dstOffset.y, },
.extent = { extent.width, extent.height },
},
.clearValueCount = 0,
.pClearValues = NULL,
}, VK_SUBPASS_CONTENTS_INLINE);
emit_resolve(cmd_buffer,
&src_iview,
&dest_iview,
&(VkOffset2D) { srcOffset.x, srcOffset.y },
&(VkOffset2D) { dstOffset.x, dstOffset.y },
&(VkExtent2D) { extent.width, extent.height });
radv_CmdEndRenderPass(radv_cmd_buffer_to_handle(cmd_buffer));
radv_DestroyFramebuffer(radv_device_to_handle(cmd_buffer->device), fb, &cmd_buffer->pool->alloc);
}
}
radv_meta_restore(&saved_state, cmd_buffer);
}
/**
* Emit any needed resolves for the current subpass.
*/
void
radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer)
{
struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radv_meta_saved_state saved_state;
/* FINISHME(perf): Skip clears for resolve attachments.
*
* From the Vulkan 1.0 spec:
*
* If the first use of an attachment in a render pass is as a resolve
* attachment, then the loadOp is effectively ignored as the resolve is
* guaranteed to overwrite all pixels in the render area.
*/
if (!subpass->has_resolve)
return;
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image_view *dest_iview = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment;
struct radv_image *dst_img = dest_iview->image;
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
if (dst_img->surface.dcc_size) {
radv_initialize_dcc(cmd_buffer, dst_img, 0xffffffff);
cmd_buffer->state.attachments[dest_att.attachment].current_layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
}
{
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
range.baseMipLevel = 0;
range.levelCount = 1;
range.baseArrayLayer = 0;
range.layerCount = 1;
radv_fast_clear_flush_image_inplace(cmd_buffer, src_iview->image, &range);
}
struct radv_subpass resolve_subpass = {
.color_count = 1,
.color_attachments = (VkAttachmentReference[]) { dest_att },
.depth_stencil_attachment = { .attachment = VK_ATTACHMENT_UNUSED },
};
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
/* Subpass resolves must respect the render area. We can ignore the
* render area here because vkCmdBeginRenderPass set the render area
* with 3DSTATE_DRAWING_RECTANGLE.
*
* XXX(chadv): Does the hardware really respect
* 3DSTATE_DRAWING_RECTANGLE when draing a 3DPRIM_RECTLIST?
*/
emit_resolve(cmd_buffer,
src_iview,
dest_iview,
&(VkOffset2D) { 0, 0 },
&(VkOffset2D) { 0, 0 },
&(VkExtent2D) { fb->width, fb->height });
}
cmd_buffer->state.subpass = subpass;
radv_meta_restore(&saved_state, cmd_buffer);
}

View File

@@ -26,6 +26,7 @@
*/ */
#include "util/mesa-sha1.h" #include "util/mesa-sha1.h"
#include "util/u_atomic.h"
#include "radv_private.h" #include "radv_private.h"
#include "nir/nir.h" #include "nir/nir.h"
#include "nir/nir_builder.h" #include "nir/nir_builder.h"
@@ -35,12 +36,14 @@
#include <llvm-c/TargetMachine.h> #include <llvm-c/TargetMachine.h>
#include "sid.h" #include "sid.h"
#include "gfx9d.h"
#include "r600d_common.h" #include "r600d_common.h"
#include "ac_binary.h" #include "ac_binary.h"
#include "ac_llvm_util.h" #include "ac_llvm_util.h"
#include "ac_nir_to_llvm.h" #include "ac_nir_to_llvm.h"
#include "vk_format.h" #include "vk_format.h"
#include "util/debug.h" #include "util/debug.h"
#include "ac_exp_param.h"
void radv_shader_variant_destroy(struct radv_device *device, void radv_shader_variant_destroy(struct radv_device *device,
struct radv_shader_variant *variant); struct radv_shader_variant *variant);
@@ -50,6 +53,8 @@ static const struct nir_shader_compiler_options nir_options = {
.lower_scmp = true, .lower_scmp = true,
.lower_flrp32 = true, .lower_flrp32 = true,
.lower_fsat = true, .lower_fsat = true,
.lower_fdiv = true,
.lower_sub = true,
.lower_pack_snorm_2x16 = true, .lower_pack_snorm_2x16 = true,
.lower_pack_snorm_4x8 = true, .lower_pack_snorm_4x8 = true,
.lower_pack_unorm_2x16 = true, .lower_pack_unorm_2x16 = true,
@@ -60,6 +65,7 @@ static const struct nir_shader_compiler_options nir_options = {
.lower_unpack_unorm_4x8 = true, .lower_unpack_unorm_4x8 = true,
.lower_extract_byte = true, .lower_extract_byte = true,
.lower_extract_word = true, .lower_extract_word = true,
.max_unroll_iterations = 32
}; };
VkResult radv_CreateShaderModule( VkResult radv_CreateShaderModule(
@@ -151,6 +157,12 @@ radv_optimize_nir(struct nir_shader *shader)
NIR_PASS(progress, shader, nir_copy_prop); NIR_PASS(progress, shader, nir_copy_prop);
NIR_PASS(progress, shader, nir_opt_remove_phis); NIR_PASS(progress, shader, nir_opt_remove_phis);
NIR_PASS(progress, shader, nir_opt_dce); NIR_PASS(progress, shader, nir_opt_dce);
if (nir_opt_trivial_continues(shader)) {
progress = true;
NIR_PASS(progress, shader, nir_copy_prop);
NIR_PASS(progress, shader, nir_opt_dce);
}
NIR_PASS(progress, shader, nir_opt_if);
NIR_PASS(progress, shader, nir_opt_dead_cf); NIR_PASS(progress, shader, nir_opt_dead_cf);
NIR_PASS(progress, shader, nir_opt_cse); NIR_PASS(progress, shader, nir_opt_cse);
NIR_PASS(progress, shader, nir_opt_peephole_select, 8); NIR_PASS(progress, shader, nir_opt_peephole_select, 8);
@@ -158,6 +170,9 @@ radv_optimize_nir(struct nir_shader *shader)
NIR_PASS(progress, shader, nir_opt_constant_folding); NIR_PASS(progress, shader, nir_opt_constant_folding);
NIR_PASS(progress, shader, nir_opt_undef); NIR_PASS(progress, shader, nir_opt_undef);
NIR_PASS(progress, shader, nir_opt_conditional_discard); NIR_PASS(progress, shader, nir_opt_conditional_discard);
if (shader->options->max_unroll_iterations) {
NIR_PASS(progress, shader, nir_opt_loop_unroll, 0);
}
} while (progress); } while (progress);
} }
@@ -251,7 +266,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
} }
/* Vulkan uses the separate-shader linking model */ /* Vulkan uses the separate-shader linking model */
nir->info->separate_shader = true; nir->info.separate_shader = true;
nir_shader_gather_info(nir, entry_point->impl); nir_shader_gather_info(nir, entry_point->impl);
@@ -360,7 +375,7 @@ static void radv_dump_pipeline_stats(struct radv_device *device, struct radv_pip
void radv_shader_variant_destroy(struct radv_device *device, void radv_shader_variant_destroy(struct radv_device *device,
struct radv_shader_variant *variant) struct radv_shader_variant *variant)
{ {
if (__sync_fetch_and_sub(&variant->ref_count, 1) != 1) if (!p_atomic_dec_zero(&variant->ref_count))
return; return;
device->ws->buffer_destroy(variant->bo); device->ws->buffer_destroy(variant->bo);
@@ -526,8 +541,8 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS); bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS);
if (module->nir) if (module->nir)
_mesa_sha1_compute(module->nir->info->name, _mesa_sha1_compute(module->nir->info.name,
strlen(module->nir->info->name), strlen(module->nir->info.name),
module->sha1); module->sha1);
radv_hash_shader(sha1, module, entrypoint, spec_info, layout, key, 0); radv_hash_shader(sha1, module, entrypoint, spec_info, layout, key, 0);
@@ -591,11 +606,14 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
} }
static union ac_shader_variant_key static union ac_shader_variant_key
radv_compute_tes_key(bool as_es) radv_compute_tes_key(bool as_es, bool export_prim_id)
{ {
union ac_shader_variant_key key; union ac_shader_variant_key key;
memset(&key, 0, sizeof(key)); memset(&key, 0, sizeof(key));
key.tes.as_es = as_es; key.tes.as_es = as_es;
/* export prim id only happens when no geom shader */
if (!as_es)
key.tes.export_prim_id = export_prim_id;
return key; return key;
} }
@@ -626,13 +644,15 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
nir_shader *tes_nir, *tcs_nir; nir_shader *tes_nir, *tcs_nir;
void *tes_code = NULL, *tcs_code = NULL; void *tes_code = NULL, *tcs_code = NULL;
unsigned tes_code_size = 0, tcs_code_size = 0; unsigned tes_code_size = 0, tcs_code_size = 0;
union ac_shader_variant_key tes_key = radv_compute_tes_key(radv_pipeline_has_gs(pipeline)); union ac_shader_variant_key tes_key;
union ac_shader_variant_key tcs_key; union ac_shader_variant_key tcs_key;
bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS); bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS);
tes_key = radv_compute_tes_key(radv_pipeline_has_gs(pipeline),
pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.prim_id_input);
if (tes_module->nir) if (tes_module->nir)
_mesa_sha1_compute(tes_module->nir->info->name, _mesa_sha1_compute(tes_module->nir->info.name,
strlen(tes_module->nir->info->name), strlen(tes_module->nir->info.name),
tes_module->sha1); tes_module->sha1);
radv_hash_shader(tes_sha1, tes_module, tes_entrypoint, tes_spec_info, layout, &tes_key, 0); radv_hash_shader(tes_sha1, tes_module, tes_entrypoint, tes_spec_info, layout, &tes_key, 0);
@@ -644,8 +664,8 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
tcs_key = radv_compute_tcs_key(tes_variant->info.tes.primitive_mode, input_vertices); tcs_key = radv_compute_tcs_key(tes_variant->info.tes.primitive_mode, input_vertices);
if (tcs_module->nir) if (tcs_module->nir)
_mesa_sha1_compute(tcs_module->nir->info->name, _mesa_sha1_compute(tcs_module->nir->info.name,
strlen(tcs_module->nir->info->name), strlen(tcs_module->nir->info.name),
tcs_module->sha1); tcs_module->sha1);
radv_hash_shader(tcs_sha1, tcs_module, tcs_entrypoint, tcs_spec_info, layout, &tcs_key, 0); radv_hash_shader(tcs_sha1, tcs_module, tcs_entrypoint, tcs_spec_info, layout, &tcs_key, 0);
@@ -674,16 +694,16 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
return; return;
nir_lower_tes_patch_vertices(tes_nir, nir_lower_tes_patch_vertices(tes_nir,
tcs_nir->info->tess.tcs_vertices_out); tcs_nir->info.tess.tcs_vertices_out);
tes_variant = radv_shader_variant_create(pipeline->device, tes_nir, tes_variant = radv_shader_variant_create(pipeline->device, tes_nir,
layout, &tes_key, &tes_code, layout, &tes_key, &tes_code,
&tes_code_size, dump); &tes_code_size, dump);
tcs_key = radv_compute_tcs_key(tes_nir->info->tess.primitive_mode, input_vertices); tcs_key = radv_compute_tcs_key(tes_nir->info.tess.primitive_mode, input_vertices);
if (tcs_module->nir) if (tcs_module->nir)
_mesa_sha1_compute(tcs_module->nir->info->name, _mesa_sha1_compute(tcs_module->nir->info.name,
strlen(tcs_module->nir->info->name), strlen(tcs_module->nir->info.name),
tcs_module->sha1); tcs_module->sha1);
radv_hash_shader(tcs_sha1, tcs_module, tcs_entrypoint, tcs_spec_info, layout, &tcs_key, 0); radv_hash_shader(tcs_sha1, tcs_module, tcs_entrypoint, tcs_spec_info, layout, &tcs_key, 0);
@@ -1318,11 +1338,12 @@ radv_pipeline_init_multisample_state(struct radv_pipeline *pipeline,
S_028A4C_MULTI_SHADER_ENGINE_PRIM_DISCARD_ENABLE(1) | S_028A4C_MULTI_SHADER_ENGINE_PRIM_DISCARD_ENABLE(1) |
EG_S_028A4C_FORCE_EOV_CNTDWN_ENABLE(1) | EG_S_028A4C_FORCE_EOV_CNTDWN_ENABLE(1) |
EG_S_028A4C_FORCE_EOV_REZ_ENABLE(1); EG_S_028A4C_FORCE_EOV_REZ_ENABLE(1);
ms->pa_sc_mode_cntl_0 = S_028A48_ALTERNATE_RBS_PER_TILE(pipeline->device->physical_device->rad_info.chip_class >= GFX9);
if (ms->num_samples > 1) { if (ms->num_samples > 1) {
unsigned log_samples = util_logbase2(ms->num_samples); unsigned log_samples = util_logbase2(ms->num_samples);
unsigned log_ps_iter_samples = util_logbase2(util_next_power_of_two(ps_iter_samples)); unsigned log_ps_iter_samples = util_logbase2(util_next_power_of_two(ps_iter_samples));
ms->pa_sc_mode_cntl_0 = S_028A48_MSAA_ENABLE(1); ms->pa_sc_mode_cntl_0 |= S_028A48_MSAA_ENABLE(1);
ms->pa_sc_line_cntl |= S_028BDC_EXPAND_LINE_WIDTH(1); /* CM_R_028BDC_PA_SC_LINE_CNTL */ ms->pa_sc_line_cntl |= S_028BDC_EXPAND_LINE_WIDTH(1); /* CM_R_028BDC_PA_SC_LINE_CNTL */
ms->db_eqaa |= S_028804_MAX_ANCHOR_SAMPLES(log_samples) | ms->db_eqaa |= S_028804_MAX_ANCHOR_SAMPLES(log_samples) |
S_028804_PS_ITER_SAMPLES(log_ps_iter_samples) | S_028804_PS_ITER_SAMPLES(log_ps_iter_samples) |
@@ -1591,7 +1612,7 @@ radv_pipeline_init_dynamic_state(struct radv_pipeline *pipeline,
} }
static union ac_shader_variant_key static union ac_shader_variant_key
radv_compute_vs_key(const VkGraphicsPipelineCreateInfo *pCreateInfo, bool as_es, bool as_ls) radv_compute_vs_key(const VkGraphicsPipelineCreateInfo *pCreateInfo, bool as_es, bool as_ls, bool export_prim_id)
{ {
union ac_shader_variant_key key; union ac_shader_variant_key key;
const VkPipelineVertexInputStateCreateInfo *input_state = const VkPipelineVertexInputStateCreateInfo *input_state =
@@ -1601,6 +1622,7 @@ radv_compute_vs_key(const VkGraphicsPipelineCreateInfo *pCreateInfo, bool as_es,
key.vs.instance_rate_inputs = 0; key.vs.instance_rate_inputs = 0;
key.vs.as_es = as_es; key.vs.as_es = as_es;
key.vs.as_ls = as_ls; key.vs.as_ls = as_ls;
key.vs.export_prim_id = export_prim_id;
for (unsigned i = 0; i < input_state->vertexAttributeDescriptionCount; ++i) { for (unsigned i = 0; i < input_state->vertexAttributeDescriptionCount; ++i) {
unsigned binding; unsigned binding;
@@ -1842,6 +1864,24 @@ static uint32_t si_vgt_gs_mode(struct radv_shader_variant *gs)
S_028A40_GS_WRITE_OPTIMIZE(1); S_028A40_GS_WRITE_OPTIMIZE(1);
} }
static void calculate_vgt_gs_mode(struct radv_pipeline *pipeline)
{
struct radv_shader_variant *vs;
vs = radv_pipeline_has_gs(pipeline) ? pipeline->gs_copy_shader : (radv_pipeline_has_tess(pipeline) ? pipeline->shaders[MESA_SHADER_TESS_EVAL] : pipeline->shaders[MESA_SHADER_VERTEX]);
struct ac_vs_output_info *outinfo = &vs->info.vs.outinfo;
pipeline->graphics.vgt_primitiveid_en = false;
pipeline->graphics.vgt_gs_mode = 0;
if (radv_pipeline_has_gs(pipeline)) {
pipeline->graphics.vgt_gs_mode = si_vgt_gs_mode(pipeline->shaders[MESA_SHADER_GEOMETRY]);
} else if (outinfo->export_prim_id) {
pipeline->graphics.vgt_gs_mode = S_028A40_MODE(V_028A40_GS_SCENARIO_A);
pipeline->graphics.vgt_primitiveid_en = true;
}
}
static void calculate_pa_cl_vs_out_cntl(struct radv_pipeline *pipeline) static void calculate_pa_cl_vs_out_cntl(struct radv_pipeline *pipeline)
{ {
struct radv_shader_variant *vs; struct radv_shader_variant *vs;
@@ -1869,6 +1909,25 @@ static void calculate_pa_cl_vs_out_cntl(struct radv_pipeline *pipeline)
clip_dist_mask; clip_dist_mask;
} }
static uint32_t offset_to_ps_input(uint32_t offset, bool flat_shade)
{
uint32_t ps_input_cntl;
if (offset <= AC_EXP_PARAM_OFFSET_31) {
ps_input_cntl = S_028644_OFFSET(offset);
if (flat_shade)
ps_input_cntl |= S_028644_FLAT_SHADE(1);
} else {
/* The input is a DEFAULT_VAL constant. */
assert(offset >= AC_EXP_PARAM_DEFAULT_VAL_0000 &&
offset <= AC_EXP_PARAM_DEFAULT_VAL_1111);
offset -= AC_EXP_PARAM_DEFAULT_VAL_0000;
ps_input_cntl = S_028644_OFFSET(0x20) |
S_028644_DEFAULT_VAL(offset);
}
return ps_input_cntl;
}
static void calculate_ps_inputs(struct radv_pipeline *pipeline) static void calculate_ps_inputs(struct radv_pipeline *pipeline)
{ {
struct radv_shader_variant *ps, *vs; struct radv_shader_variant *ps, *vs;
@@ -1880,6 +1939,23 @@ static void calculate_ps_inputs(struct radv_pipeline *pipeline)
outinfo = &vs->info.vs.outinfo; outinfo = &vs->info.vs.outinfo;
unsigned ps_offset = 0; unsigned ps_offset = 0;
if (ps->info.fs.prim_id_input) {
unsigned vs_offset = outinfo->vs_output_param_offset[VARYING_SLOT_PRIMITIVE_ID];
if (vs_offset != AC_EXP_PARAM_UNDEFINED) {
pipeline->graphics.ps_input_cntl[ps_offset] = offset_to_ps_input(vs_offset, true);
++ps_offset;
}
}
if (ps->info.fs.layer_input) {
unsigned vs_offset = outinfo->vs_output_param_offset[VARYING_SLOT_LAYER];
if (vs_offset != AC_EXP_PARAM_UNDEFINED) {
pipeline->graphics.ps_input_cntl[ps_offset] = offset_to_ps_input(vs_offset, true);
++ps_offset;
}
}
if (ps->info.fs.has_pcoord) { if (ps->info.fs.has_pcoord) {
unsigned val; unsigned val;
val = S_028644_PT_SPRITE_TEX(1) | S_028644_OFFSET(0x20); val = S_028644_PT_SPRITE_TEX(1) | S_028644_OFFSET(0x20);
@@ -1887,52 +1963,22 @@ static void calculate_ps_inputs(struct radv_pipeline *pipeline)
ps_offset++; ps_offset++;
} }
if (ps->info.fs.prim_id_input && (outinfo->prim_id_output != 0xffffffff)) {
unsigned vs_offset, flat_shade;
unsigned val;
vs_offset = outinfo->prim_id_output;
flat_shade = true;
val = S_028644_OFFSET(vs_offset) | S_028644_FLAT_SHADE(flat_shade);
pipeline->graphics.ps_input_cntl[ps_offset] = val;
++ps_offset;
}
if (ps->info.fs.layer_input && (outinfo->layer_output != 0xffffffff)) {
unsigned vs_offset, flat_shade;
unsigned val;
vs_offset = outinfo->layer_output;
flat_shade = true;
val = S_028644_OFFSET(vs_offset) | S_028644_FLAT_SHADE(flat_shade);
pipeline->graphics.ps_input_cntl[ps_offset] = val;
++ps_offset;
}
for (unsigned i = 0; i < 32 && (1u << i) <= ps->info.fs.input_mask; ++i) { for (unsigned i = 0; i < 32 && (1u << i) <= ps->info.fs.input_mask; ++i) {
unsigned vs_offset, flat_shade; unsigned vs_offset;
unsigned val; bool flat_shade;
if (!(ps->info.fs.input_mask & (1u << i))) if (!(ps->info.fs.input_mask & (1u << i)))
continue; continue;
if (!(outinfo->export_mask & (1u << i))) { vs_offset = outinfo->vs_output_param_offset[VARYING_SLOT_VAR0 + i];
if (vs_offset == AC_EXP_PARAM_UNDEFINED) {
pipeline->graphics.ps_input_cntl[ps_offset] = S_028644_OFFSET(0x20); pipeline->graphics.ps_input_cntl[ps_offset] = S_028644_OFFSET(0x20);
++ps_offset; ++ps_offset;
continue; continue;
} }
vs_offset = util_bitcount(outinfo->export_mask & ((1u << i) - 1));
if (outinfo->prim_id_output != 0xffffffff) {
if (vs_offset >= outinfo->prim_id_output)
vs_offset++;
}
if (outinfo->layer_output != 0xffffffff) {
if (vs_offset >= outinfo->layer_output)
vs_offset++;
}
flat_shade = !!(ps->info.fs.flat_shaded_mask & (1u << ps_offset)); flat_shade = !!(ps->info.fs.flat_shaded_mask & (1u << ps_offset));
val = S_028644_OFFSET(vs_offset) | S_028644_FLAT_SHADE(flat_shade); pipeline->graphics.ps_input_cntl[ps_offset] = offset_to_ps_input(vs_offset, flat_shade);
pipeline->graphics.ps_input_cntl[ps_offset] = val;
++ps_offset; ++ps_offset;
} }
@@ -1967,62 +2013,10 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
radv_pipeline_init_blend_state(pipeline, pCreateInfo, extra); radv_pipeline_init_blend_state(pipeline, pCreateInfo, extra);
if (modules[MESA_SHADER_VERTEX]) {
bool as_es = false;
bool as_ls = false;
if (modules[MESA_SHADER_TESS_CTRL])
as_ls = true;
else if (modules[MESA_SHADER_GEOMETRY])
as_es = true;
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, as_es, as_ls);
pipeline->shaders[MESA_SHADER_VERTEX] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_VERTEX],
pStages[MESA_SHADER_VERTEX]->pName,
MESA_SHADER_VERTEX,
pStages[MESA_SHADER_VERTEX]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_VERTEX);
}
if (modules[MESA_SHADER_GEOMETRY]) {
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, false, false);
pipeline->shaders[MESA_SHADER_GEOMETRY] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_GEOMETRY],
pStages[MESA_SHADER_GEOMETRY]->pName,
MESA_SHADER_GEOMETRY,
pStages[MESA_SHADER_GEOMETRY]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_GEOMETRY);
pipeline->graphics.vgt_gs_mode = si_vgt_gs_mode(pipeline->shaders[MESA_SHADER_GEOMETRY]);
} else
pipeline->graphics.vgt_gs_mode = 0;
if (modules[MESA_SHADER_TESS_EVAL]) {
assert(modules[MESA_SHADER_TESS_CTRL]);
radv_tess_pipeline_compile(pipeline,
cache,
modules[MESA_SHADER_TESS_CTRL],
modules[MESA_SHADER_TESS_EVAL],
pStages[MESA_SHADER_TESS_CTRL]->pName,
pStages[MESA_SHADER_TESS_EVAL]->pName,
pStages[MESA_SHADER_TESS_CTRL]->pSpecializationInfo,
pStages[MESA_SHADER_TESS_EVAL]->pSpecializationInfo,
pipeline->layout,
pCreateInfo->pTessellationState->patchControlPoints);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_TESS_EVAL) |
mesa_to_vk_shader_stage(MESA_SHADER_TESS_CTRL);
}
if (!modules[MESA_SHADER_FRAGMENT]) { if (!modules[MESA_SHADER_FRAGMENT]) {
nir_builder fs_b; nir_builder fs_b;
nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL); nir_builder_init_simple_shader(&fs_b, NULL, MESA_SHADER_FRAGMENT, NULL);
fs_b.shader->info->name = ralloc_strdup(fs_b.shader, "noop_fs"); fs_b.shader->info.name = ralloc_strdup(fs_b.shader, "noop_fs");
fs_m.nir = fs_b.shader; fs_m.nir = fs_b.shader;
modules[MESA_SHADER_FRAGMENT] = &fs_m; modules[MESA_SHADER_FRAGMENT] = &fs_m;
} }
@@ -2046,6 +2040,58 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
if (fs_m.nir) if (fs_m.nir)
ralloc_free(fs_m.nir); ralloc_free(fs_m.nir);
if (modules[MESA_SHADER_VERTEX]) {
bool as_es = false;
bool as_ls = false;
bool export_prim_id = false;
if (modules[MESA_SHADER_TESS_CTRL])
as_ls = true;
else if (modules[MESA_SHADER_GEOMETRY])
as_es = true;
else if (pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.prim_id_input)
export_prim_id = true;
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, as_es, as_ls, export_prim_id);
pipeline->shaders[MESA_SHADER_VERTEX] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_VERTEX],
pStages[MESA_SHADER_VERTEX]->pName,
MESA_SHADER_VERTEX,
pStages[MESA_SHADER_VERTEX]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_VERTEX);
}
if (modules[MESA_SHADER_GEOMETRY]) {
union ac_shader_variant_key key = radv_compute_vs_key(pCreateInfo, false, false, false);
pipeline->shaders[MESA_SHADER_GEOMETRY] =
radv_pipeline_compile(pipeline, cache, modules[MESA_SHADER_GEOMETRY],
pStages[MESA_SHADER_GEOMETRY]->pName,
MESA_SHADER_GEOMETRY,
pStages[MESA_SHADER_GEOMETRY]->pSpecializationInfo,
pipeline->layout, &key);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_GEOMETRY);
}
if (modules[MESA_SHADER_TESS_EVAL]) {
assert(modules[MESA_SHADER_TESS_CTRL]);
radv_tess_pipeline_compile(pipeline,
cache,
modules[MESA_SHADER_TESS_CTRL],
modules[MESA_SHADER_TESS_EVAL],
pStages[MESA_SHADER_TESS_CTRL]->pName,
pStages[MESA_SHADER_TESS_EVAL]->pName,
pStages[MESA_SHADER_TESS_CTRL]->pSpecializationInfo,
pStages[MESA_SHADER_TESS_EVAL]->pSpecializationInfo,
pipeline->layout,
pCreateInfo->pTessellationState->patchControlPoints);
pipeline->active_stages |= mesa_to_vk_shader_stage(MESA_SHADER_TESS_EVAL) |
mesa_to_vk_shader_stage(MESA_SHADER_TESS_CTRL);
}
radv_pipeline_init_depth_stencil_state(pipeline, pCreateInfo, extra); radv_pipeline_init_depth_stencil_state(pipeline, pCreateInfo, extra);
radv_pipeline_init_raster_state(pipeline, pCreateInfo); radv_pipeline_init_raster_state(pipeline, pCreateInfo);
radv_pipeline_init_multisample_state(pipeline, pCreateInfo); radv_pipeline_init_multisample_state(pipeline, pCreateInfo);
@@ -2109,9 +2155,16 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
ps->info.fs.writes_z ? V_028710_SPI_SHADER_32_R : ps->info.fs.writes_z ? V_028710_SPI_SHADER_32_R :
V_028710_SPI_SHADER_ZERO; V_028710_SPI_SHADER_ZERO;
calculate_vgt_gs_mode(pipeline);
calculate_pa_cl_vs_out_cntl(pipeline); calculate_pa_cl_vs_out_cntl(pipeline);
calculate_ps_inputs(pipeline); calculate_ps_inputs(pipeline);
for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
if (pipeline->shaders[i]) {
pipeline->need_indirect_descriptor_sets |= pipeline->shaders[i]->info.need_indirect_descriptor_sets;
}
}
uint32_t stages = 0; uint32_t stages = 0;
if (radv_pipeline_has_tess(pipeline)) { if (radv_pipeline_has_tess(pipeline)) {
stages |= S_028B54_LS_EN(V_028B54_LS_STAGE_ON) | stages |= S_028B54_LS_EN(V_028B54_LS_STAGE_ON) |
@@ -2123,10 +2176,15 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER); S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER);
else else
stages |= S_028B54_VS_EN(V_028B54_VS_STAGE_DS); stages |= S_028B54_VS_EN(V_028B54_VS_STAGE_DS);
} else if (radv_pipeline_has_gs(pipeline)) } else if (radv_pipeline_has_gs(pipeline))
stages |= S_028B54_ES_EN(V_028B54_ES_STAGE_REAL) | stages |= S_028B54_ES_EN(V_028B54_ES_STAGE_REAL) |
S_028B54_GS_EN(1) | S_028B54_GS_EN(1) |
S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER); S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER);
if (device->physical_device->rad_info.chip_class >= GFX9)
stages |= S_028B54_MAX_PRIMGRP_IN_WAVE(2);
pipeline->graphics.vgt_shader_stages_en = stages; pipeline->graphics.vgt_shader_stages_en = stages;
if (radv_pipeline_has_gs(pipeline)) if (radv_pipeline_has_gs(pipeline))
@@ -2174,6 +2232,16 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
pipeline->binding_stride[desc->binding] = desc->stride; pipeline->binding_stride[desc->binding] = desc->stride;
} }
struct ac_userdata_info *loc = radv_lookup_user_sgpr(pipeline, MESA_SHADER_VERTEX,
AC_UD_VS_BASE_VERTEX_START_INSTANCE);
if (loc->sgpr_idx != -1) {
pipeline->graphics.vtx_base_sgpr = radv_shader_stage_to_user_data_0(MESA_SHADER_VERTEX, radv_pipeline_has_gs(pipeline), radv_pipeline_has_tess(pipeline));
pipeline->graphics.vtx_base_sgpr += loc->sgpr_idx * 4;
if (pipeline->shaders[MESA_SHADER_VERTEX]->info.info.vs.needs_draw_id)
pipeline->graphics.vtx_emit_num = 3;
else
pipeline->graphics.vtx_emit_num = 2;
}
if (device->debug_flags & RADV_DEBUG_DUMP_SHADER_STATS) { if (device->debug_flags & RADV_DEBUG_DUMP_SHADER_STATS) {
radv_dump_pipeline_stats(device, pipeline); radv_dump_pipeline_stats(device, pipeline);
} }
@@ -2270,6 +2338,7 @@ static VkResult radv_compute_pipeline_create(
pipeline->layout, NULL); pipeline->layout, NULL);
pipeline->need_indirect_descriptor_sets |= pipeline->shaders[MESA_SHADER_COMPUTE]->info.need_indirect_descriptor_sets;
result = radv_pipeline_scratch_init(device, pipeline); result = radv_pipeline_scratch_init(device, pipeline);
if (result != VK_SUCCESS) { if (result != VK_SUCCESS) {
radv_pipeline_destroy(device, pipeline, pAllocator); radv_pipeline_destroy(device, pipeline, pAllocator);

View File

@@ -23,6 +23,7 @@
#include "util/mesa-sha1.h" #include "util/mesa-sha1.h"
#include "util/debug.h" #include "util/debug.h"
#include "util/u_atomic.h"
#include "radv_private.h" #include "radv_private.h"
#include "ac_nir_to_llvm.h" #include "ac_nir_to_llvm.h"
@@ -171,6 +172,7 @@ radv_create_shader_variant_from_pipeline_cache(struct radv_device *device,
variant->info = entry->variant_info; variant->info = entry->variant_info;
variant->rsrc1 = entry->rsrc1; variant->rsrc1 = entry->rsrc1;
variant->rsrc2 = entry->rsrc2; variant->rsrc2 = entry->rsrc2;
variant->code_size = entry->code_size;
variant->ref_count = 1; variant->ref_count = 1;
variant->bo = device->ws->buffer_create(device->ws, entry->code_size, 256, variant->bo = device->ws->buffer_create(device->ws, entry->code_size, 256,
@@ -183,7 +185,7 @@ radv_create_shader_variant_from_pipeline_cache(struct radv_device *device,
entry->variant = variant; entry->variant = variant;
} }
__sync_fetch_and_add(&entry->variant->ref_count, 1); p_atomic_inc(&entry->variant->ref_count);
return entry->variant; return entry->variant;
} }
@@ -275,7 +277,7 @@ radv_pipeline_cache_insert_shader(struct radv_pipeline_cache *cache,
} else { } else {
entry->variant = variant; entry->variant = variant;
} }
__sync_fetch_and_add(&variant->ref_count, 1); p_atomic_inc(&variant->ref_count);
pthread_mutex_unlock(&cache->mutex); pthread_mutex_unlock(&cache->mutex);
return variant; return variant;
} }
@@ -295,7 +297,7 @@ radv_pipeline_cache_insert_shader(struct radv_pipeline_cache *cache,
entry->rsrc2 = variant->rsrc2; entry->rsrc2 = variant->rsrc2;
entry->code_size = code_size; entry->code_size = code_size;
entry->variant = variant; entry->variant = variant;
__sync_fetch_and_add(&variant->ref_count, 1); p_atomic_inc(&variant->ref_count);
radv_pipeline_cache_add_entry(cache, entry); radv_pipeline_cache_add_entry(cache, entry);

View File

@@ -47,12 +47,14 @@
#include "compiler/shader_enums.h" #include "compiler/shader_enums.h"
#include "util/macros.h" #include "util/macros.h"
#include "util/list.h" #include "util/list.h"
#include "util/vk_alloc.h"
#include "main/macros.h" #include "main/macros.h"
#include "vk_alloc.h"
#include "radv_radeon_winsys.h" #include "radv_radeon_winsys.h"
#include "ac_binary.h" #include "ac_binary.h"
#include "ac_nir_to_llvm.h" #include "ac_nir_to_llvm.h"
#include "ac_gpu_info.h"
#include "ac_surface.h"
#include "radv_debug.h" #include "radv_debug.h"
#include "radv_descriptor_set.h" #include "radv_descriptor_set.h"
@@ -266,10 +268,14 @@ struct radv_physical_device {
char path[20]; char path[20];
const char * name; const char * name;
uint8_t uuid[VK_UUID_SIZE]; uint8_t uuid[VK_UUID_SIZE];
uint8_t device_uuid[VK_UUID_SIZE];
int local_fd; int local_fd;
struct wsi_device wsi_device; struct wsi_device wsi_device;
struct radv_extensions extensions; struct radv_extensions extensions;
bool has_rbplus; /* if RB+ register exist */
bool rbplus_allowed; /* if RB+ is allowed */
}; };
struct radv_instance { struct radv_instance {
@@ -282,6 +288,7 @@ struct radv_instance {
struct radv_physical_device physicalDevices[RADV_MAX_DRM_DEVICES]; struct radv_physical_device physicalDevices[RADV_MAX_DRM_DEVICES];
uint64_t debug_flags; uint64_t debug_flags;
uint64_t perftest_flags;
}; };
VkResult radv_init_wsi(struct radv_physical_device *physical_device); VkResult radv_init_wsi(struct radv_physical_device *physical_device);
@@ -343,6 +350,8 @@ struct radv_meta_state {
struct radv_pipeline *depthstencil_pipeline[NUM_DEPTH_CLEAR_PIPELINES]; struct radv_pipeline *depthstencil_pipeline[NUM_DEPTH_CLEAR_PIPELINES];
} clear[1 + MAX_SAMPLES_LOG2]; } clear[1 + MAX_SAMPLES_LOG2];
VkPipelineLayout clear_color_p_layout;
VkPipelineLayout clear_depth_p_layout;
struct { struct {
VkRenderPass render_pass[NUM_META_FS_KEYS]; VkRenderPass render_pass[NUM_META_FS_KEYS];
@@ -415,9 +424,22 @@ struct radv_meta_state {
struct { struct {
VkPipeline pipeline; VkPipeline pipeline;
VkPipeline i_pipeline; VkPipeline i_pipeline;
VkPipeline srgb_pipeline;
} rc[MAX_SAMPLES_LOG2]; } rc[MAX_SAMPLES_LOG2];
} resolve_compute; } resolve_compute;
struct {
VkDescriptorSetLayout ds_layout;
VkPipelineLayout p_layout;
struct {
VkRenderPass srgb_render_pass;
VkPipeline srgb_pipeline;
VkRenderPass render_pass[NUM_META_FS_KEYS];
VkPipeline pipeline[NUM_META_FS_KEYS];
} rc[MAX_SAMPLES_LOG2];
} resolve_fragment;
struct { struct {
VkPipeline decompress_pipeline; VkPipeline decompress_pipeline;
VkPipeline resummarize_pipeline; VkPipeline resummarize_pipeline;
@@ -495,7 +517,7 @@ struct radv_device {
int queue_count[RADV_MAX_QUEUE_FAMILIES]; int queue_count[RADV_MAX_QUEUE_FAMILIES];
struct radeon_winsys_cs *empty_cs[RADV_MAX_QUEUE_FAMILIES]; struct radeon_winsys_cs *empty_cs[RADV_MAX_QUEUE_FAMILIES];
struct radeon_winsys_cs *flush_cs[RADV_MAX_QUEUE_FAMILIES]; struct radeon_winsys_cs *flush_cs[RADV_MAX_QUEUE_FAMILIES];
struct radeon_winsys_cs *flush_shader_cs[RADV_MAX_QUEUE_FAMILIES];
uint64_t debug_flags; uint64_t debug_flags;
bool llvm_supports_spill; bool llvm_supports_spill;
@@ -570,6 +592,10 @@ struct radv_descriptor_pool {
uint64_t size; uint64_t size;
struct list_head vram_list; struct list_head vram_list;
uint8_t *host_memory_base;
uint8_t *host_memory_ptr;
uint8_t *host_memory_end;
}; };
struct radv_descriptor_update_template_entry { struct radv_descriptor_update_template_entry {
@@ -585,7 +611,6 @@ struct radv_descriptor_update_template_entry {
uint32_t dst_stride; uint32_t dst_stride;
uint32_t buffer_offset; uint32_t buffer_offset;
uint32_t buffer_count;
/* Only valid for combined image samplers and samplers */ /* Only valid for combined image samplers and samplers */
uint16_t has_sampler; uint16_t has_sampler;
@@ -726,7 +751,6 @@ struct radv_attachment_state {
struct radv_cmd_state { struct radv_cmd_state {
uint32_t vb_dirty; uint32_t vb_dirty;
radv_cmd_dirty_mask_t dirty; radv_cmd_dirty_mask_t dirty;
bool vertex_descriptors_dirty;
bool push_descriptors_dirty; bool push_descriptors_dirty;
struct radv_pipeline * pipeline; struct radv_pipeline * pipeline;
@@ -741,9 +765,9 @@ struct radv_cmd_state {
struct radv_descriptor_set * descriptors[MAX_SETS]; struct radv_descriptor_set * descriptors[MAX_SETS];
struct radv_attachment_state * attachments; struct radv_attachment_state * attachments;
VkRect2D render_area; VkRect2D render_area;
struct radv_buffer * index_buffer;
uint32_t index_type; uint32_t index_type;
uint32_t index_offset; uint64_t index_va;
uint32_t max_index_count;
int32_t last_primitive_reset_en; int32_t last_primitive_reset_en;
uint32_t last_primitive_reset_index; uint32_t last_primitive_reset_index;
enum radv_cmd_flush_bits flush_bits; enum radv_cmd_flush_bits flush_bits;
@@ -791,8 +815,6 @@ struct radv_cmd_buffer {
struct radv_cmd_buffer_upload upload; struct radv_cmd_buffer_upload upload;
bool record_fail;
uint32_t scratch_size_needed; uint32_t scratch_size_needed;
uint32_t compute_scratch_size_needed; uint32_t compute_scratch_size_needed;
uint32_t esgs_ring_size_needed; uint32_t esgs_ring_size_needed;
@@ -800,7 +822,12 @@ struct radv_cmd_buffer {
bool tess_rings_needed; bool tess_rings_needed;
bool sample_positions_needed; bool sample_positions_needed;
bool record_fail;
int ring_offsets_idx; /* just used for verification */ int ring_offsets_idx; /* just used for verification */
uint32_t gfx9_fence_offset;
struct radeon_winsys_bo *gfx9_fence_bo;
uint32_t gfx9_fence_idx;
}; };
struct radv_image; struct radv_image;
@@ -820,18 +847,29 @@ void si_write_scissors(struct radeon_winsys_cs *cs, int first,
uint32_t si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer, uint32_t si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
bool instanced_draw, bool indirect_draw, bool instanced_draw, bool indirect_draw,
uint32_t draw_vertex_count); uint32_t draw_vertex_count);
void si_cs_emit_write_event_eop(struct radeon_winsys_cs *cs,
enum chip_class chip_class,
bool is_mec,
unsigned event, unsigned event_flags,
unsigned data_sel,
uint64_t va,
uint32_t old_fence,
uint32_t new_fence);
void si_emit_wait_fence(struct radeon_winsys_cs *cs,
uint64_t va, uint32_t ref,
uint32_t mask);
void si_cs_emit_cache_flush(struct radeon_winsys_cs *cs, void si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
enum chip_class chip_class, enum chip_class chip_class,
bool is_mec, uint32_t *fence_ptr, uint64_t va,
enum radv_cmd_flush_bits flush_bits); bool is_mec,
void si_cs_emit_cache_flush(struct radeon_winsys_cs *cs, enum radv_cmd_flush_bits flush_bits);
enum chip_class chip_class,
bool is_mec,
enum radv_cmd_flush_bits flush_bits);
void si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer); void si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer);
void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer, void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
uint64_t src_va, uint64_t dest_va, uint64_t src_va, uint64_t dest_va,
uint64_t size); uint64_t size);
void si_cp_dma_prefetch(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
unsigned size);
void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va, void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
uint64_t size, unsigned value); uint64_t size, unsigned value);
void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer); void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer);
@@ -856,6 +894,8 @@ void
radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer); radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_clear_subpass(struct radv_cmd_buffer *cmd_buffer); void radv_cmd_buffer_clear_subpass(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer); void radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer *cmd_buffer);
void radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer);
void radv_cayman_emit_msaa_sample_locs(struct radeon_winsys_cs *cs, int nr_samples); void radv_cayman_emit_msaa_sample_locs(struct radeon_winsys_cs *cs, int nr_samples);
unsigned radv_cayman_get_maxdist(int log_samples); unsigned radv_cayman_get_maxdist(int log_samples);
void radv_device_init_msaa(struct radv_device *device); void radv_device_init_msaa(struct radv_device *device);
@@ -1007,7 +1047,7 @@ struct radv_pipeline {
struct radv_pipeline_layout * layout; struct radv_pipeline_layout * layout;
bool needs_data_cache; bool needs_data_cache;
bool need_indirect_descriptor_sets;
struct radv_shader_variant * shaders[MESA_SHADER_STAGES]; struct radv_shader_variant * shaders[MESA_SHADER_STAGES];
struct radv_shader_variant *gs_copy_shader; struct radv_shader_variant *gs_copy_shader;
VkShaderStageFlags active_stages; VkShaderStageFlags active_stages;
@@ -1031,6 +1071,7 @@ struct radv_pipeline {
unsigned prim; unsigned prim;
unsigned gs_out; unsigned gs_out;
uint32_t vgt_gs_mode; uint32_t vgt_gs_mode;
bool vgt_primitiveid_en;
bool prim_restart_enable; bool prim_restart_enable;
unsigned esgs_ring_size; unsigned esgs_ring_size;
unsigned gsvs_ring_size; unsigned gsvs_ring_size;
@@ -1038,6 +1079,8 @@ struct radv_pipeline {
uint32_t ps_input_cntl_num; uint32_t ps_input_cntl_num;
uint32_t pa_cl_vs_out_cntl; uint32_t pa_cl_vs_out_cntl;
uint32_t vgt_shader_stages_en; uint32_t vgt_shader_stages_en;
uint32_t vtx_base_sgpr;
uint8_t vtx_emit_num;
struct radv_prim_vertex_count prim_vertex_count; struct radv_prim_vertex_count prim_vertex_count;
bool can_use_guardband; bool can_use_guardband;
} graphics; } graphics;
@@ -1057,6 +1100,11 @@ static inline bool radv_pipeline_has_tess(struct radv_pipeline *pipeline)
return pipeline->shaders[MESA_SHADER_TESS_EVAL] ? true : false; return pipeline->shaders[MESA_SHADER_TESS_EVAL] ? true : false;
} }
uint32_t radv_shader_stage_to_user_data_0(gl_shader_stage stage, bool has_gs, bool has_tess);
struct ac_userdata_info *radv_lookup_user_sgpr(struct radv_pipeline *pipeline,
gl_shader_stage stage,
int idx);
struct radv_graphics_pipeline_create_info { struct radv_graphics_pipeline_create_info {
bool use_rectlist; bool use_rectlist;
bool db_depth_clear; bool db_depth_clear;
@@ -1141,10 +1189,7 @@ struct radv_image {
*/ */
VkFormat vk_format; VkFormat vk_format;
VkImageAspectFlags aspects; VkImageAspectFlags aspects;
VkExtent3D extent; struct ac_surf_info info;
uint32_t levels;
uint32_t array_size;
uint32_t samples; /**< VkImageCreateInfo::samples */
VkImageUsageFlags usage; /**< Superset of VkImageCreateInfo::usage. */ VkImageUsageFlags usage; /**< Superset of VkImageCreateInfo::usage. */
VkImageTiling tiling; /** VkImageCreateInfo::tiling */ VkImageTiling tiling; /** VkImageCreateInfo::tiling */
VkImageCreateFlags flags; /** VkImageCreateInfo::flags */ VkImageCreateFlags flags; /** VkImageCreateInfo::flags */
@@ -1167,12 +1212,22 @@ struct radv_image {
uint32_t clear_value_offset; uint32_t clear_value_offset;
}; };
/* Whether the image has a htile that is known consistent with the contents of
* the image. */
bool radv_layout_has_htile(const struct radv_image *image, bool radv_layout_has_htile(const struct radv_image *image,
VkImageLayout layout); VkImageLayout layout,
unsigned queue_mask);
/* Whether the image has a htile that is known consistent with the contents of
* the image and is allowed to be in compressed form.
*
* If this is false reads that don't use the htile should be able to return
* correct results.
*/
bool radv_layout_is_htile_compressed(const struct radv_image *image, bool radv_layout_is_htile_compressed(const struct radv_image *image,
VkImageLayout layout); VkImageLayout layout,
bool radv_layout_can_expclear(const struct radv_image *image, unsigned queue_mask);
VkImageLayout layout);
bool radv_layout_can_fast_clear(const struct radv_image *image, bool radv_layout_can_fast_clear(const struct radv_image *image,
VkImageLayout layout, VkImageLayout layout,
unsigned queue_mask); unsigned queue_mask);
@@ -1185,7 +1240,7 @@ radv_get_layerCount(const struct radv_image *image,
const VkImageSubresourceRange *range) const VkImageSubresourceRange *range)
{ {
return range->layerCount == VK_REMAINING_ARRAY_LAYERS ? return range->layerCount == VK_REMAINING_ARRAY_LAYERS ?
image->array_size - range->baseArrayLayer : range->layerCount; image->info.array_size - range->baseArrayLayer : range->layerCount;
} }
static inline uint32_t static inline uint32_t
@@ -1193,7 +1248,7 @@ radv_get_levelCount(const struct radv_image *image,
const VkImageSubresourceRange *range) const VkImageSubresourceRange *range)
{ {
return range->levelCount == VK_REMAINING_MIP_LEVELS ? return range->levelCount == VK_REMAINING_MIP_LEVELS ?
image->levels - range->baseMipLevel : range->levelCount; image->info.levels - range->baseMipLevel : range->levelCount;
} }
struct radeon_bo_metadata; struct radeon_bo_metadata;
@@ -1220,7 +1275,6 @@ struct radv_image_view {
struct radv_image_create_info { struct radv_image_create_info {
const VkImageCreateInfo *vk_info; const VkImageCreateInfo *vk_info;
uint32_t stride;
bool scanout; bool scanout;
}; };
@@ -1231,11 +1285,8 @@ VkResult radv_image_create(VkDevice _device,
void radv_image_view_init(struct radv_image_view *view, void radv_image_view_init(struct radv_image_view *view,
struct radv_device *device, struct radv_device *device,
const VkImageViewCreateInfo* pCreateInfo, const VkImageViewCreateInfo* pCreateInfo);
struct radv_cmd_buffer *cmd_buffer,
VkImageUsageFlags usage_mask);
void radv_image_set_optimal_micro_tile_mode(struct radv_device *device,
struct radv_image *image, uint32_t micro_tile_mode);
struct radv_buffer_view { struct radv_buffer_view {
struct radeon_winsys_bo *bo; struct radeon_winsys_bo *bo;
VkFormat vk_format; VkFormat vk_format;
@@ -1279,42 +1330,57 @@ radv_sanitize_image_offset(const VkImageType imageType,
} }
} }
static inline bool
radv_image_extent_compare(const struct radv_image *image,
const VkExtent3D *extent)
{
if (extent->width != image->info.width ||
extent->height != image->info.height ||
extent->depth != image->info.depth)
return false;
return true;
}
struct radv_sampler { struct radv_sampler {
uint32_t state[4]; uint32_t state[4];
}; };
struct radv_color_buffer_info { struct radv_color_buffer_info {
uint32_t cb_color_base; uint64_t cb_color_base;
uint64_t cb_color_cmask;
uint64_t cb_color_fmask;
uint64_t cb_dcc_base;
uint32_t cb_color_pitch; uint32_t cb_color_pitch;
uint32_t cb_color_slice; uint32_t cb_color_slice;
uint32_t cb_color_view; uint32_t cb_color_view;
uint32_t cb_color_info; uint32_t cb_color_info;
uint32_t cb_color_attrib; uint32_t cb_color_attrib;
uint32_t cb_color_attrib2;
uint32_t cb_dcc_control; uint32_t cb_dcc_control;
uint32_t cb_color_cmask;
uint32_t cb_color_cmask_slice; uint32_t cb_color_cmask_slice;
uint32_t cb_color_fmask;
uint32_t cb_color_fmask_slice; uint32_t cb_color_fmask_slice;
uint32_t cb_clear_value0; uint32_t cb_clear_value0;
uint32_t cb_clear_value1; uint32_t cb_clear_value1;
uint32_t cb_dcc_base;
uint32_t micro_tile_mode; uint32_t micro_tile_mode;
uint32_t gfx9_epitch;
}; };
struct radv_ds_buffer_info { struct radv_ds_buffer_info {
uint64_t db_z_read_base;
uint64_t db_stencil_read_base;
uint64_t db_z_write_base;
uint64_t db_stencil_write_base;
uint64_t db_htile_data_base;
uint32_t db_depth_info; uint32_t db_depth_info;
uint32_t db_z_info; uint32_t db_z_info;
uint32_t db_stencil_info; uint32_t db_stencil_info;
uint32_t db_z_read_base;
uint32_t db_stencil_read_base;
uint32_t db_z_write_base;
uint32_t db_stencil_write_base;
uint32_t db_depth_view; uint32_t db_depth_view;
uint32_t db_depth_size; uint32_t db_depth_size;
uint32_t db_depth_slice; uint32_t db_depth_slice;
uint32_t db_htile_surface; uint32_t db_htile_surface;
uint32_t db_htile_data_base;
uint32_t pa_su_poly_offset_db_fmt_cntl; uint32_t pa_su_poly_offset_db_fmt_cntl;
uint32_t db_z_info2;
uint32_t db_stencil_info2;
float offset_scale; float offset_scale;
}; };
@@ -1343,8 +1409,8 @@ struct radv_subpass_barrier {
struct radv_subpass { struct radv_subpass {
uint32_t input_count; uint32_t input_count;
VkAttachmentReference * input_attachments;
uint32_t color_count; uint32_t color_count;
VkAttachmentReference * input_attachments;
VkAttachmentReference * color_attachments; VkAttachmentReference * color_attachments;
VkAttachmentReference * resolve_attachments; VkAttachmentReference * resolve_attachments;
VkAttachmentReference depth_stencil_attachment; VkAttachmentReference depth_stencil_attachment;

View File

@@ -44,11 +44,6 @@ static unsigned get_max_db(struct radv_device *device)
unsigned num_db = device->physical_device->rad_info.num_render_backends; unsigned num_db = device->physical_device->rad_info.num_render_backends;
MAYBE_UNUSED unsigned rb_mask = device->physical_device->rad_info.enabled_rb_mask; MAYBE_UNUSED unsigned rb_mask = device->physical_device->rad_info.enabled_rb_mask;
if (device->physical_device->rad_info.chip_class == SI)
num_db = 8;
else
num_db = MAX2(8, num_db);
/* Otherwise we need to change the query reset procedure */ /* Otherwise we need to change the query reset procedure */
assert(rb_mask == ((1ull << num_db) - 1)); assert(rb_mask == ((1ull << num_db) - 1));
@@ -77,6 +72,8 @@ static struct nir_ssa_def *
radv_load_push_int(nir_builder *b, unsigned offset, const char *name) radv_load_push_int(nir_builder *b, unsigned offset, const char *name)
{ {
nir_intrinsic_instr *flags = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant); nir_intrinsic_instr *flags = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(flags, 0);
nir_intrinsic_set_range(flags, 16);
flags->src[0] = nir_src_for_ssa(nir_imm_int(b, offset)); flags->src[0] = nir_src_for_ssa(nir_imm_int(b, offset));
flags->num_components = 1; flags->num_components = 1;
nir_ssa_dest_init(&flags->instr, &flags->dest, 1, 32, name); nir_ssa_dest_init(&flags->instr, &flags->dest, 1, 32, name);
@@ -125,10 +122,10 @@ build_occlusion_query_shader(struct radv_device *device) {
*/ */
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "occlusion_query"); b.shader->info.name = ralloc_strdup(b.shader, "occlusion_query");
b.shader->info->cs.local_size[0] = 64; b.shader->info.cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1; b.shader->info.cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_variable *result = nir_local_variable_create(b.impl, glsl_uint64_t_type(), "result"); nir_variable *result = nir_local_variable_create(b.impl, glsl_uint64_t_type(), "result");
nir_variable *outer_counter = nir_local_variable_create(b.impl, glsl_int_type(), "outer_counter"); nir_variable *outer_counter = nir_local_variable_create(b.impl, glsl_int_type(), "outer_counter");
@@ -158,9 +155,9 @@ build_occlusion_query_shader(struct radv_device *device) {
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
global_id = nir_channel(&b, global_id, 0); // We only care about x here. global_id = nir_channel(&b, global_id, 0); // We only care about x here.
@@ -320,10 +317,10 @@ build_pipeline_statistics_query_shader(struct radv_device *device) {
*/ */
nir_builder b; nir_builder b;
nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL); nir_builder_init_simple_shader(&b, NULL, MESA_SHADER_COMPUTE, NULL);
b.shader->info->name = ralloc_strdup(b.shader, "pipeline_statistics_query"); b.shader->info.name = ralloc_strdup(b.shader, "pipeline_statistics_query");
b.shader->info->cs.local_size[0] = 64; b.shader->info.cs.local_size[0] = 64;
b.shader->info->cs.local_size[1] = 1; b.shader->info.cs.local_size[1] = 1;
b.shader->info->cs.local_size[2] = 1; b.shader->info.cs.local_size[2] = 1;
nir_variable *output_offset = nir_local_variable_create(b.impl, glsl_int_type(), "output_offset"); nir_variable *output_offset = nir_local_variable_create(b.impl, glsl_int_type(), "output_offset");
@@ -350,9 +347,9 @@ build_pipeline_statistics_query_shader(struct radv_device *device) {
nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0); nir_ssa_def *invoc_id = nir_load_system_value(&b, nir_intrinsic_load_local_invocation_id, 0);
nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0); nir_ssa_def *wg_id = nir_load_system_value(&b, nir_intrinsic_load_work_group_id, 0);
nir_ssa_def *block_size = nir_imm_ivec4(&b, nir_ssa_def *block_size = nir_imm_ivec4(&b,
b.shader->info->cs.local_size[0], b.shader->info.cs.local_size[0],
b.shader->info->cs.local_size[1], b.shader->info.cs.local_size[1],
b.shader->info->cs.local_size[2], 0); b.shader->info.cs.local_size[2], 0);
nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id); nir_ssa_def *global_id = nir_iadd(&b, nir_imul(&b, wg_id, block_size), invoc_id);
global_id = nir_channel(&b, global_id, 0); // We only care about x here. global_id = nir_channel(&b, global_id, 0); // We only care about x here.
@@ -612,12 +609,10 @@ VkResult radv_device_init_meta_query_state(struct radv_device *device)
radv_pipeline_cache_to_handle(&device->meta_state.cache), radv_pipeline_cache_to_handle(&device->meta_state.cache),
1, &pipeline_statistics_vk_pipeline_info, NULL, 1, &pipeline_statistics_vk_pipeline_info, NULL,
&device->meta_state.query.pipeline_statistics_query_pipeline); &device->meta_state.query.pipeline_statistics_query_pipeline);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
fail: fail:
radv_device_finish_meta_query_state(device); if (result != VK_SUCCESS)
radv_device_finish_meta_query_state(device);
ralloc_free(occlusion_cs.nir); ralloc_free(occlusion_cs.nir);
ralloc_free(pipeline_statistics_cs.nir); ralloc_free(pipeline_statistics_cs.nir);
return result; return result;
@@ -997,13 +992,7 @@ void radv_CmdCopyQueryPoolResults(
uint64_t avail_va = va + pool->availability_offset + 4 * query; uint64_t avail_va = va + pool->availability_offset + 4 * query;
/* This waits on the ME. All copies below are done on the ME */ /* This waits on the ME. All copies below are done on the ME */
radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0)); si_emit_wait_fence(cs, avail_va, 1, 0xffffffff);
radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1));
radeon_emit(cs, avail_va);
radeon_emit(cs, avail_va >> 32);
radeon_emit(cs, 1); /* reference value */
radeon_emit(cs, 0xffffffff); /* mask */
radeon_emit(cs, 4); /* poll interval */
} }
} }
radv_query_shader(cmd_buffer, cmd_buffer->device->meta_state.query.pipeline_statistics_query_pipeline, radv_query_shader(cmd_buffer, cmd_buffer->device->meta_state.query.pipeline_statistics_query_pipeline,
@@ -1026,13 +1015,7 @@ void radv_CmdCopyQueryPoolResults(
uint64_t avail_va = va + pool->availability_offset + 4 * query; uint64_t avail_va = va + pool->availability_offset + 4 * query;
/* This waits on the ME. All copies below are done on the ME */ /* This waits on the ME. All copies below are done on the ME */
radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0)); si_emit_wait_fence(cs, avail_va, 1, 0xffffffff);
radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1));
radeon_emit(cs, avail_va);
radeon_emit(cs, avail_va >> 32);
radeon_emit(cs, 1); /* reference value */
radeon_emit(cs, 0xffffffff); /* mask */
radeon_emit(cs, 4); /* poll interval */
} }
if (flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) { if (flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) {
uint64_t avail_va = va + pool->availability_offset + 4 * query; uint64_t avail_va = va + pool->availability_offset + 4 * query;
@@ -1156,7 +1139,7 @@ void radv_CmdEndQuery(
break; break;
case VK_QUERY_TYPE_PIPELINE_STATISTICS: case VK_QUERY_TYPE_PIPELINE_STATISTICS:
radeon_check_space(cmd_buffer->device->ws, cs, 10); radeon_check_space(cmd_buffer->device->ws, cs, 16);
va += pipelinestat_block_size; va += pipelinestat_block_size;
@@ -1165,13 +1148,11 @@ void radv_CmdEndQuery(
radeon_emit(cs, va); radeon_emit(cs, va);
radeon_emit(cs, va >> 32); radeon_emit(cs, va >> 32);
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0)); si_cs_emit_write_event_eop(cs,
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_BOTTOM_OF_PIPE_TS) | cmd_buffer->device->physical_device->rad_info.chip_class,
EVENT_INDEX(5)); false,
radeon_emit(cs, avail_va); EVENT_TYPE_BOTTOM_OF_PIPE_TS, 0,
radeon_emit(cs, (avail_va >> 32) | EOP_DATA_SEL(1)); 1, avail_va, 0, 1);
radeon_emit(cs, 1);
radeon_emit(cs, 0);
break; break;
default: default:
unreachable("ending unhandled query type"); unreachable("ending unhandled query type");
@@ -1194,32 +1175,40 @@ void radv_CmdWriteTimestamp(
cmd_buffer->device->ws->cs_add_buffer(cs, pool->bo, 5); cmd_buffer->device->ws->cs_add_buffer(cs, pool->bo, 5);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 12); MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 28);
if (mec) { switch(pipelineStage) {
radeon_emit(cs, PKT3(PKT3_RELEASE_MEM, 5, 0)); case VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT:
radeon_emit(cs, EVENT_TYPE(V_028A90_BOTTOM_OF_PIPE_TS) | EVENT_INDEX(5)); radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, 3 << 29); radeon_emit(cs, COPY_DATA_COUNT_SEL | COPY_DATA_WR_CONFIRM |
COPY_DATA_SRC_SEL(COPY_DATA_TIMESTAMP) |
COPY_DATA_DST_SEL(V_370_MEM_ASYNC));
radeon_emit(cs, 0);
radeon_emit(cs, 0);
radeon_emit(cs, query_va); radeon_emit(cs, query_va);
radeon_emit(cs, query_va >> 32); radeon_emit(cs, query_va >> 32);
radeon_emit(cs, 0);
radeon_emit(cs, 0);
} else {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_BOTTOM_OF_PIPE_TS) | EVENT_INDEX(5));
radeon_emit(cs, query_va);
radeon_emit(cs, (3 << 29) | ((query_va >> 32) & 0xFFFF));
radeon_emit(cs, 0);
radeon_emit(cs, 0);
}
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0)); radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
radeon_emit(cs, S_370_DST_SEL(mec ? V_370_MEM_ASYNC : V_370_MEMORY_SYNC) | radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
S_370_WR_CONFIRM(1) | S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_ME)); S_370_ENGINE_SEL(V_370_ME));
radeon_emit(cs, avail_va); radeon_emit(cs, avail_va);
radeon_emit(cs, avail_va >> 32); radeon_emit(cs, avail_va >> 32);
radeon_emit(cs, 1); radeon_emit(cs, 1);
break;
default:
si_cs_emit_write_event_eop(cs,
cmd_buffer->device->physical_device->rad_info.chip_class,
mec,
V_028A90_BOTTOM_OF_PIPE_TS, 0,
3, query_va, 0, 0);
si_cs_emit_write_event_eop(cs,
cmd_buffer->device->physical_device->rad_info.chip_class,
mec,
V_028A90_BOTTOM_OF_PIPE_TS, 0,
1, avail_va, 0, 1);
break;
}
assert(cmd_buffer->cs->cdw <= cdw_max); assert(cmd_buffer->cs->cdw <= cdw_max);
} }

View File

@@ -35,6 +35,10 @@
#include "main/macros.h" #include "main/macros.h"
#include "amd_family.h" #include "amd_family.h"
struct radeon_info;
struct ac_surf_info;
struct radeon_surf;
#define FREE(x) free(x) #define FREE(x) free(x)
enum radeon_bo_domain { /* bitfield */ enum radeon_bo_domain { /* bitfield */
@@ -71,63 +75,6 @@ struct radeon_winsys_cs {
uint32_t *buf; /* The base pointer of the chunk. */ uint32_t *buf; /* The base pointer of the chunk. */
}; };
struct radeon_info {
/* PCI info: domain:bus:dev:func */
uint32_t pci_domain;
uint32_t pci_bus;
uint32_t pci_dev;
uint32_t pci_func;
/* Device info. */
uint32_t pci_id;
enum radeon_family family;
const char *name;
enum chip_class chip_class;
uint32_t gart_page_size;
uint64_t gart_size;
uint64_t vram_size;
uint64_t visible_vram_size;
bool has_dedicated_vram;
bool has_virtual_memory;
bool gfx_ib_pad_with_type2;
bool has_uvd;
uint32_t sdma_rings;
uint32_t compute_rings;
uint32_t vce_fw_version;
uint32_t vce_harvest_config;
uint32_t clock_crystal_freq; /* in kHz */
/* Kernel info. */
uint32_t drm_major; /* version */
uint32_t drm_minor;
uint32_t drm_patchlevel;
bool has_userptr;
/* Shader cores. */
uint32_t r600_max_quad_pipes; /* wave size / 16 */
uint32_t max_shader_clock;
uint32_t num_good_compute_units;
uint32_t max_se; /* shader engines */
uint32_t max_sh_per_se; /* shader arrays per shader engine */
/* Render backends (color + depth blocks). */
uint32_t r300_num_gb_pipes;
uint32_t r300_num_z_pipes;
uint32_t r600_gb_backend_map; /* R600 harvest config */
bool r600_gb_backend_map_valid;
uint32_t r600_num_banks;
uint32_t num_render_backends;
uint32_t num_tile_pipes; /* pipe count from PIPE_CONFIG */
uint32_t pipe_interleave_bytes;
uint32_t enabled_rb_mask; /* GCN harvest config */
/* Tile modes. */
uint32_t si_tile_mode_array[32];
uint32_t cik_macrotile_mode_array[16];
};
#define RADEON_SURF_MAX_LEVEL 32
#define RADEON_SURF_TYPE_MASK 0xFF #define RADEON_SURF_TYPE_MASK 0xFF
#define RADEON_SURF_TYPE_SHIFT 0 #define RADEON_SURF_TYPE_SHIFT 0
#define RADEON_SURF_TYPE_1D 0 #define RADEON_SURF_TYPE_1D 0
@@ -138,93 +85,11 @@ struct radeon_info {
#define RADEON_SURF_TYPE_2D_ARRAY 5 #define RADEON_SURF_TYPE_2D_ARRAY 5
#define RADEON_SURF_MODE_MASK 0xFF #define RADEON_SURF_MODE_MASK 0xFF
#define RADEON_SURF_MODE_SHIFT 8 #define RADEON_SURF_MODE_SHIFT 8
#define RADEON_SURF_MODE_LINEAR_ALIGNED 1
#define RADEON_SURF_MODE_1D 2
#define RADEON_SURF_MODE_2D 3
#define RADEON_SURF_SCANOUT (1 << 16)
#define RADEON_SURF_ZBUFFER (1 << 17)
#define RADEON_SURF_SBUFFER (1 << 18)
#define RADEON_SURF_Z_OR_SBUFFER (RADEON_SURF_ZBUFFER | RADEON_SURF_SBUFFER)
#define RADEON_SURF_HAS_SBUFFER_MIPTREE (1 << 19)
#define RADEON_SURF_HAS_TILE_MODE_INDEX (1 << 20)
#define RADEON_SURF_FMASK (1 << 21)
#define RADEON_SURF_DISABLE_DCC (1 << 22)
#define RADEON_SURF_TC_COMPATIBLE_HTILE (1 << 23)
#define RADEON_SURF_GET(v, field) (((v) >> RADEON_SURF_ ## field ## _SHIFT) & RADEON_SURF_ ## field ## _MASK) #define RADEON_SURF_GET(v, field) (((v) >> RADEON_SURF_ ## field ## _SHIFT) & RADEON_SURF_ ## field ## _MASK)
#define RADEON_SURF_SET(v, field) (((v) & RADEON_SURF_ ## field ## _MASK) << RADEON_SURF_ ## field ## _SHIFT) #define RADEON_SURF_SET(v, field) (((v) & RADEON_SURF_ ## field ## _MASK) << RADEON_SURF_ ## field ## _SHIFT)
#define RADEON_SURF_CLR(v, field) ((v) & ~(RADEON_SURF_ ## field ## _MASK << RADEON_SURF_ ## field ## _SHIFT)) #define RADEON_SURF_CLR(v, field) ((v) & ~(RADEON_SURF_ ## field ## _MASK << RADEON_SURF_ ## field ## _SHIFT))
struct radeon_surf_level {
uint64_t offset;
uint64_t slice_size;
uint32_t npix_x;
uint32_t npix_y;
uint32_t npix_z;
uint32_t nblk_x;
uint32_t nblk_y;
uint32_t nblk_z;
uint32_t pitch_bytes;
uint32_t mode;
uint64_t dcc_offset;
uint64_t dcc_fast_clear_size;
bool dcc_enabled;
};
/* surface defintions from the winsys */
struct radeon_surf {
/* These are inputs to the calculator. */
uint32_t npix_x;
uint32_t npix_y;
uint32_t npix_z;
uint32_t blk_w;
uint32_t blk_h;
uint32_t blk_d;
uint32_t array_size;
uint32_t last_level;
uint32_t bpe;
uint32_t nsamples;
uint32_t flags;
/* These are return values. Some of them can be set by the caller, but
* they will be treated as hints (e.g. bankw, bankh) and might be
* changed by the calculator.
*/
uint64_t bo_size;
uint64_t bo_alignment;
/* This applies to EG and later. */
uint32_t bankw;
uint32_t bankh;
uint32_t mtilea;
uint32_t tile_split;
uint32_t stencil_tile_split;
uint64_t stencil_offset;
struct radeon_surf_level level[RADEON_SURF_MAX_LEVEL];
struct radeon_surf_level stencil_level[RADEON_SURF_MAX_LEVEL];
uint32_t tiling_index[RADEON_SURF_MAX_LEVEL];
uint32_t stencil_tiling_index[RADEON_SURF_MAX_LEVEL];
uint32_t pipe_config;
uint32_t num_banks;
uint32_t macro_tile_index;
uint32_t micro_tile_mode; /* displayable, thin, depth, rotated */
/* Whether the depth miptree or stencil miptree as used by the DB are
* adjusted from their TC compatible form to ensure depth/stencil
* compatibility. If either is true, the corresponding plane cannot be
* sampled from.
*/
bool depth_adjusted;
bool stencil_adjusted;
uint64_t dcc_size;
uint64_t dcc_alignment;
uint64_t htile_size;
uint64_t htile_slice_size;
uint64_t htile_alignment;
};
enum radeon_bo_layout { enum radeon_bo_layout {
RADEON_LAYOUT_LINEAR = 0, RADEON_LAYOUT_LINEAR = 0,
RADEON_LAYOUT_TILED, RADEON_LAYOUT_TILED,
@@ -238,16 +103,25 @@ struct radeon_bo_metadata {
/* Tiling flags describing the texture layout for display code /* Tiling flags describing the texture layout for display code
* and DRI sharing. * and DRI sharing.
*/ */
enum radeon_bo_layout microtile; union {
enum radeon_bo_layout macrotile; struct {
unsigned pipe_config; enum radeon_bo_layout microtile;
unsigned bankw; enum radeon_bo_layout macrotile;
unsigned bankh; unsigned pipe_config;
unsigned tile_split; unsigned bankw;
unsigned mtilea; unsigned bankh;
unsigned num_banks; unsigned tile_split;
unsigned stride; unsigned mtilea;
bool scanout; unsigned num_banks;
unsigned stride;
bool scanout;
} legacy;
struct {
/* surface flags */
unsigned swizzle_mode:5;
} gfx9;
} u;
/* Additional metadata associated with the buffer, in bytes. /* Additional metadata associated with the buffer, in bytes.
* The maximum size is 64 * 4. This is opaque for the winsys & kernel. * The maximum size is 64 * 4. This is opaque for the winsys & kernel.
@@ -334,6 +208,7 @@ struct radeon_winsys {
void (*cs_dump)(struct radeon_winsys_cs *cs, FILE* file, uint32_t trace_id); void (*cs_dump)(struct radeon_winsys_cs *cs, FILE* file, uint32_t trace_id);
int (*surface_init)(struct radeon_winsys *ws, int (*surface_init)(struct radeon_winsys *ws,
const struct ac_surf_info *surf_info,
struct radeon_surf *surf); struct radeon_surf *surf);
int (*surface_best)(struct radeon_winsys *ws, int (*surface_best)(struct radeon_winsys *ws,

View File

@@ -26,7 +26,7 @@
#include "radv_private.h" #include "radv_private.h"
#include "radv_meta.h" #include "radv_meta.h"
#include "wsi_common.h" #include "wsi_common.h"
#include "util/vk_util.h" #include "vk_util.h"
static const struct wsi_callbacks wsi_cbs = { static const struct wsi_callbacks wsi_cbs = {
.get_phys_device_format_properties = radv_GetPhysicalDeviceFormatProperties, .get_phys_device_format_properties = radv_GetPhysicalDeviceFormatProperties,
@@ -224,7 +224,7 @@ radv_wsi_image_create(VkDevice device_h,
*memory_p = memory_h; *memory_p = memory_h;
*size = image->size; *size = image->size;
*offset = image->offset; *offset = image->offset;
*row_pitch = surface->level[0].pitch_bytes; *row_pitch = surface->u.legacy.level[0].nblk_x * surface->bpe;
return VK_SUCCESS; return VK_SUCCESS;
fail_alloc_memory: fail_alloc_memory:
radv_FreeMemory(device_h, memory_h, pAllocator); radv_FreeMemory(device_h, memory_h, pAllocator);
@@ -438,7 +438,7 @@ VkResult radv_AcquireNextImageKHR(
VkResult result = swapchain->acquire_next_image(swapchain, timeout, semaphore, VkResult result = swapchain->acquire_next_image(swapchain, timeout, semaphore,
pImageIndex); pImageIndex);
if (fence && result == VK_SUCCESS) { if (fence && (result == VK_SUCCESS || result == VK_SUBOPTIMAL_KHR)) {
fence->submitted = true; fence->submitted = true;
fence->signalled = true; fence->signalled = true;
} }
@@ -460,16 +460,20 @@ VkResult radv_QueuePresentKHR(
RADV_FROM_HANDLE(wsi_swapchain, swapchain, pPresentInfo->pSwapchains[i]); RADV_FROM_HANDLE(wsi_swapchain, swapchain, pPresentInfo->pSwapchains[i]);
struct radeon_winsys_cs *cs; struct radeon_winsys_cs *cs;
const VkPresentRegionKHR *region = NULL; const VkPresentRegionKHR *region = NULL;
VkResult item_result;
assert(radv_device_from_handle(swapchain->device) == queue->device); assert(radv_device_from_handle(swapchain->device) == queue->device);
if (swapchain->fences[0] == VK_NULL_HANDLE) { if (swapchain->fences[0] == VK_NULL_HANDLE) {
result = radv_CreateFence(radv_device_to_handle(queue->device), item_result = radv_CreateFence(radv_device_to_handle(queue->device),
&(VkFenceCreateInfo) { &(VkFenceCreateInfo) {
.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO, .sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO,
.flags = 0, .flags = 0,
}, &swapchain->alloc, &swapchain->fences[0]); }, &swapchain->alloc, &swapchain->fences[0]);
if (result != VK_SUCCESS) if (pPresentInfo->pResults != NULL)
return result; pPresentInfo->pResults[i] = item_result;
result = result == VK_SUCCESS ? item_result : result;
if (item_result != VK_SUCCESS)
continue;
} else { } else {
radv_ResetFences(radv_device_to_handle(queue->device), radv_ResetFences(radv_device_to_handle(queue->device),
1, &swapchain->fences[0]); 1, &swapchain->fences[0]);
@@ -493,12 +497,15 @@ VkResult radv_QueuePresentKHR(
if (regions && regions->pRegions) if (regions && regions->pRegions)
region = &regions->pRegions[i]; region = &regions->pRegions[i];
result = swapchain->queue_present(swapchain, item_result = swapchain->queue_present(swapchain,
pPresentInfo->pImageIndices[i], pPresentInfo->pImageIndices[i],
region); region);
/* TODO: What if one of them returns OUT_OF_DATE? */ /* TODO: What if one of them returns OUT_OF_DATE? */
if (result != VK_SUCCESS) if (pPresentInfo->pResults != NULL)
return result; pPresentInfo->pResults[i] = item_result;
result = result == VK_SUCCESS ? item_result : result;
if (item_result != VK_SUCCESS)
continue;
VkFence last = swapchain->fences[2]; VkFence last = swapchain->fences[2];
swapchain->fences[2] = swapchain->fences[1]; swapchain->fences[2] = swapchain->fences[1];

View File

@@ -30,6 +30,7 @@
#include "radv_private.h" #include "radv_private.h"
#include "radv_cs.h" #include "radv_cs.h"
#include "sid.h" #include "sid.h"
#include "gfx9d.h"
#include "radv_util.h" #include "radv_util.h"
#include "main/macros.h" #include "main/macros.h"
@@ -241,6 +242,9 @@ si_emit_config(struct radv_physical_device *physical_device,
radeon_set_context_reg(cs, R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET, 0); radeon_set_context_reg(cs, R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET, 0);
radeon_set_context_reg(cs, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0); radeon_set_context_reg(cs, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0);
radeon_set_context_reg(cs, R_028AA0_VGT_INSTANCE_STEP_RATE_0, 1);
if (physical_device->rad_info.chip_class >= GFX9)
radeon_set_context_reg(cs, R_028AB4_VGT_REUSE_OFF, 0);
radeon_set_context_reg(cs, R_028AB8_VGT_VTX_CNT_EN, 0x0); radeon_set_context_reg(cs, R_028AB8_VGT_VTX_CNT_EN, 0x0);
if (physical_device->rad_info.chip_class < CIK) if (physical_device->rad_info.chip_class < CIK)
radeon_set_config_reg(cs, R_008A14_PA_CL_ENHANCE, S_008A14_NUM_CLIP_SEQ(3) | radeon_set_config_reg(cs, R_008A14_PA_CL_ENHANCE, S_008A14_NUM_CLIP_SEQ(3) |
@@ -297,6 +301,7 @@ si_emit_config(struct radv_physical_device *physical_device,
raster_config_1 = 0x0000002a; raster_config_1 = 0x0000002a;
break; break;
case CHIP_POLARIS11: case CHIP_POLARIS11:
case CHIP_POLARIS12:
raster_config = 0x16000012; raster_config = 0x16000012;
raster_config_1 = 0x00000000; raster_config_1 = 0x00000000;
break; break;
@@ -327,24 +332,28 @@ si_emit_config(struct radv_physical_device *physical_device,
raster_config_1 = 0x00000000; raster_config_1 = 0x00000000;
break; break;
default: default:
fprintf(stderr, if (physical_device->rad_info.chip_class <= VI) {
"radeonsi: Unknown GPU, using 0 for raster_config\n"); fprintf(stderr,
raster_config = 0x00000000; "radeonsi: Unknown GPU, using 0 for raster_config\n");
raster_config_1 = 0x00000000; raster_config = 0x00000000;
raster_config_1 = 0x00000000;
}
break; break;
} }
/* Always use the default config when all backends are enabled /* Always use the default config when all backends are enabled
* (or when we failed to determine the enabled backends). * (or when we failed to determine the enabled backends).
*/ */
if (!rb_mask || util_bitcount(rb_mask) >= num_rb) { if (physical_device->rad_info.chip_class <= VI) {
radeon_set_context_reg(cs, R_028350_PA_SC_RASTER_CONFIG, if (!rb_mask || util_bitcount(rb_mask) >= num_rb) {
raster_config); radeon_set_context_reg(cs, R_028350_PA_SC_RASTER_CONFIG,
if (physical_device->rad_info.chip_class >= CIK) raster_config);
radeon_set_context_reg(cs, R_028354_PA_SC_RASTER_CONFIG_1, if (physical_device->rad_info.chip_class >= CIK)
raster_config_1); radeon_set_context_reg(cs, R_028354_PA_SC_RASTER_CONFIG_1,
} else { raster_config_1);
si_write_harvested_raster_configs(physical_device, cs, raster_config, raster_config_1); } else {
si_write_harvested_raster_configs(physical_device, cs, raster_config, raster_config_1);
}
} }
radeon_set_context_reg(cs, R_028204_PA_SC_WINDOW_SCISSOR_TL, S_028204_WINDOW_OFFSET_DISABLE(1)); radeon_set_context_reg(cs, R_028204_PA_SC_WINDOW_SCISSOR_TL, S_028204_WINDOW_OFFSET_DISABLE(1));
@@ -368,22 +377,31 @@ si_emit_config(struct radv_physical_device *physical_device,
S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) | S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) |
S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE)); S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE));
radeon_set_context_reg(cs, R_028400_VGT_MAX_VTX_INDX, ~0); if (physical_device->rad_info.chip_class >= GFX9) {
radeon_set_context_reg(cs, R_028404_VGT_MIN_VTX_INDX, 0); radeon_set_uconfig_reg(cs, R_030920_VGT_MAX_VTX_INDX, ~0);
radeon_set_context_reg(cs, R_028408_VGT_INDX_OFFSET, 0); radeon_set_uconfig_reg(cs, R_030924_VGT_MIN_VTX_INDX, 0);
radeon_set_uconfig_reg(cs, R_030928_VGT_INDX_OFFSET, 0);
} else {
radeon_set_context_reg(cs, R_028400_VGT_MAX_VTX_INDX, ~0);
radeon_set_context_reg(cs, R_028404_VGT_MIN_VTX_INDX, 0);
radeon_set_context_reg(cs, R_028408_VGT_INDX_OFFSET, 0);
}
if (physical_device->rad_info.chip_class >= CIK) { if (physical_device->rad_info.chip_class >= CIK) {
/* If this is 0, Bonaire can hang even if GS isn't being used. if (physical_device->rad_info.chip_class >= GFX9) {
* Other chips are unaffected. These are suboptimal values, radeon_set_sh_reg(cs, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, S_00B41C_CU_EN(0xffff));
* but we don't use on-chip GS. } else {
*/ radeon_set_sh_reg(cs, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, S_00B51C_CU_EN(0xffff));
radeon_set_context_reg(cs, R_028A44_VGT_GS_ONCHIP_CNTL, radeon_set_sh_reg(cs, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0);
S_028A44_ES_VERTS_PER_SUBGRP(64) | radeon_set_sh_reg(cs, R_00B31C_SPI_SHADER_PGM_RSRC3_ES, S_00B31C_CU_EN(0xffff));
S_028A44_GS_PRIMS_PER_SUBGRP(4)); /* If this is 0, Bonaire can hang even if GS isn't being used.
* Other chips are unaffected. These are suboptimal values,
radeon_set_sh_reg(cs, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, S_00B51C_CU_EN(0xffff)); * but we don't use on-chip GS.
radeon_set_sh_reg(cs, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0); */
radeon_set_sh_reg(cs, R_00B31C_SPI_SHADER_PGM_RSRC3_ES, S_00B31C_CU_EN(0xffff)); radeon_set_context_reg(cs, R_028A44_VGT_GS_ONCHIP_CNTL,
S_028A44_ES_VERTS_PER_SUBGRP(64) |
S_028A44_GS_PRIMS_PER_SUBGRP(4));
}
radeon_set_sh_reg(cs, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, S_00B21C_CU_EN(0xffff)); radeon_set_sh_reg(cs, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, S_00B21C_CU_EN(0xffff));
if (physical_device->rad_info.num_good_compute_units / if (physical_device->rad_info.num_good_compute_units /
@@ -434,9 +452,41 @@ si_emit_config(struct radv_physical_device *physical_device,
radeon_set_context_reg(cs, R_028C5C_VGT_OUT_DEALLOC_CNTL, 16); radeon_set_context_reg(cs, R_028C5C_VGT_OUT_DEALLOC_CNTL, 16);
} }
if (physical_device->rad_info.family == CHIP_STONEY) if (physical_device->has_rbplus)
radeon_set_context_reg(cs, R_028C40_PA_SC_SHADER_CONTROL, 0); radeon_set_context_reg(cs, R_028C40_PA_SC_SHADER_CONTROL, 0);
if (physical_device->rad_info.chip_class >= GFX9) {
unsigned num_se = physical_device->rad_info.max_se;
unsigned pc_lines = 0;
switch (physical_device->rad_info.family) {
case CHIP_VEGA10:
pc_lines = 4096;
break;
case CHIP_RAVEN:
pc_lines = 1024;
break;
default:
assert(0);
}
radeon_set_context_reg(cs, R_028060_DB_DFSM_CONTROL,
S_028060_PUNCHOUT_MODE(V_028060_FORCE_OFF));
radeon_set_context_reg(cs, R_028064_DB_RENDER_FILTER, 0);
/* TODO: We can use this to disable RBs for rendering to GART: */
radeon_set_context_reg(cs, R_02835C_PA_SC_TILE_STEERING_OVERRIDE, 0);
radeon_set_context_reg(cs, R_02883C_PA_SU_OVER_RASTERIZATION_CNTL, 0);
/* TODO: Enable the binner: */
radeon_set_context_reg(cs, R_028C44_PA_SC_BINNER_CNTL_0,
S_028C44_BINNING_MODE(V_028C44_DISABLE_BINNING_USE_LEGACY_SC) |
S_028C44_DISABLE_START_OF_PRIM(1));
radeon_set_context_reg(cs, R_028C48_PA_SC_BINNER_CNTL_1,
S_028C48_MAX_ALLOC_COUNT(MIN2(128, pc_lines / (4 * num_se))) |
S_028C48_MAX_PRIM_PER_BATCH(1023));
radeon_set_context_reg(cs, R_028C4C_PA_SC_CONSERVATIVE_RASTERIZATION_CNTL,
S_028C4C_NULL_SQUAD_AA_MASK_ENABLE(1));
radeon_set_uconfig_reg(cs, R_030968_VGT_INSTANCE_BASE_ID, 0);
}
si_emit_compute(physical_device, cs); si_emit_compute(physical_device, cs);
} }
@@ -650,6 +700,9 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
multi_instances_smaller_than_primgroup = indirect_draw || (instanced_draw && multi_instances_smaller_than_primgroup = indirect_draw || (instanced_draw &&
num_prims < primgroup_size); num_prims < primgroup_size);
if (cmd_buffer->state.pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.prim_id_input)
ia_switch_on_eoi = true;
if (radv_pipeline_has_tess(cmd_buffer->state.pipeline)) { if (radv_pipeline_has_tess(cmd_buffer->state.pipeline)) {
/* SWITCH_ON_EOI must be set if PrimID is used. */ /* SWITCH_ON_EOI must be set if PrimID is used. */
if (cmd_buffer->state.pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.tcs.uses_prim_id || if (cmd_buffer->state.pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.tcs.uses_prim_id ||
@@ -666,12 +719,14 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
/* Needed for 028B6C_DISTRIBUTION_MODE != 0 */ /* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
if (cmd_buffer->device->has_distributed_tess) { if (cmd_buffer->device->has_distributed_tess) {
if (radv_pipeline_has_gs(cmd_buffer->state.pipeline)) { if (radv_pipeline_has_gs(cmd_buffer->state.pipeline)) {
partial_es_wave = true; if (chip_class <= VI)
partial_es_wave = true;
if (family == CHIP_TONGA || if (family == CHIP_TONGA ||
family == CHIP_FIJI || family == CHIP_FIJI ||
family == CHIP_POLARIS10 || family == CHIP_POLARIS10 ||
family == CHIP_POLARIS11) family == CHIP_POLARIS11 ||
family == CHIP_POLARIS12)
partial_vs_wave = true; partial_vs_wave = true;
} else { } else {
partial_vs_wave = true; partial_vs_wave = true;
@@ -733,10 +788,15 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
assert(wd_switch_on_eop || !ia_switch_on_eop); assert(wd_switch_on_eop || !ia_switch_on_eop);
} }
/* If SWITCH_ON_EOI is set, PARTIAL_ES_WAVE must be set too. */ /* If SWITCH_ON_EOI is set, PARTIAL_ES_WAVE must be set too. */
if (ia_switch_on_eoi) if (chip_class <= VI && ia_switch_on_eoi)
partial_es_wave = true; partial_es_wave = true;
if (radv_pipeline_has_gs(cmd_buffer->state.pipeline)) { if (radv_pipeline_has_gs(cmd_buffer->state.pipeline)) {
if (radv_pipeline_has_gs(cmd_buffer->state.pipeline) &&
cmd_buffer->state.pipeline->shaders[MESA_SHADER_GEOMETRY]->info.gs.uses_prim_id)
ia_switch_on_eoi = true;
/* GS requirement. */ /* GS requirement. */
if (SI_GS_PER_ES / primgroup_size >= cmd_buffer->device->gs_table_depth - 3) if (SI_GS_PER_ES / primgroup_size >= cmd_buffer->device->gs_table_depth - 3)
partial_es_wave = true; partial_es_wave = true;
@@ -755,22 +815,88 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
S_028AA8_PARTIAL_ES_WAVE_ON(partial_es_wave) | S_028AA8_PARTIAL_ES_WAVE_ON(partial_es_wave) |
S_028AA8_PRIMGROUP_SIZE(primgroup_size - 1) | S_028AA8_PRIMGROUP_SIZE(primgroup_size - 1) |
S_028AA8_WD_SWITCH_ON_EOP(chip_class >= CIK ? wd_switch_on_eop : 0) | S_028AA8_WD_SWITCH_ON_EOP(chip_class >= CIK ? wd_switch_on_eop : 0) |
S_028AA8_MAX_PRIMGRP_IN_WAVE(chip_class >= VI ? /* The following field was moved to VGT_SHADER_STAGES_EN in GFX9. */
max_primgroup_in_wave : 0); S_028AA8_MAX_PRIMGRP_IN_WAVE(chip_class == VI ?
max_primgroup_in_wave : 0) |
S_030960_EN_INST_OPT_BASIC(chip_class >= GFX9) |
S_030960_EN_INST_OPT_ADV(chip_class >= GFX9);
} }
void si_cs_emit_write_event_eop(struct radeon_winsys_cs *cs,
enum chip_class chip_class,
bool is_mec,
unsigned event, unsigned event_flags,
unsigned data_sel,
uint64_t va,
uint32_t old_fence,
uint32_t new_fence)
{
unsigned op = EVENT_TYPE(event) |
EVENT_INDEX(5) |
event_flags;
unsigned is_gfx8_mec = is_mec && chip_class < GFX9;
if (chip_class >= GFX9 || is_gfx8_mec) {
radeon_emit(cs, PKT3(PKT3_RELEASE_MEM, is_gfx8_mec ? 5 : 6, 0));
radeon_emit(cs, op);
radeon_emit(cs, EOP_DATA_SEL(data_sel));
radeon_emit(cs, va); /* address lo */
radeon_emit(cs, va >> 32); /* address hi */
radeon_emit(cs, new_fence); /* immediate data lo */
radeon_emit(cs, 0); /* immediate data hi */
if (!is_gfx8_mec)
radeon_emit(cs, 0); /* unused */
} else {
if (chip_class == CIK ||
chip_class == VI) {
/* Two EOP events are required to make all engines go idle
* (and optional cache flushes executed) before the timestamp
* is written.
*/
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, op);
radeon_emit(cs, va);
radeon_emit(cs, ((va >> 32) & 0xffff) | EOP_DATA_SEL(data_sel));
radeon_emit(cs, old_fence); /* immediate data */
radeon_emit(cs, 0); /* unused */
}
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, op);
radeon_emit(cs, va);
radeon_emit(cs, ((va >> 32) & 0xffff) | EOP_DATA_SEL(data_sel));
radeon_emit(cs, new_fence); /* immediate data */
radeon_emit(cs, 0); /* unused */
}
}
void
si_emit_wait_fence(struct radeon_winsys_cs *cs,
uint64_t va, uint32_t ref,
uint32_t mask)
{
radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0));
radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit(cs, ref); /* reference value */
radeon_emit(cs, mask); /* mask */
radeon_emit(cs, 4); /* poll interval */
}
static void static void
si_emit_acquire_mem(struct radeon_winsys_cs *cs, si_emit_acquire_mem(struct radeon_winsys_cs *cs,
bool is_mec, bool is_mec, bool is_gfx9,
unsigned cp_coher_cntl) unsigned cp_coher_cntl)
{ {
if (is_mec) { if (is_mec || is_gfx9) {
uint32_t hi_val = is_gfx9 ? 0xffffff : 0xff;
radeon_emit(cs, PKT3(PKT3_ACQUIRE_MEM, 5, 0) | radeon_emit(cs, PKT3(PKT3_ACQUIRE_MEM, 5, 0) |
PKT3_SHADER_TYPE_S(1)); PKT3_SHADER_TYPE_S(is_mec));
radeon_emit(cs, cp_coher_cntl); /* CP_COHER_CNTL */ radeon_emit(cs, cp_coher_cntl); /* CP_COHER_CNTL */
radeon_emit(cs, 0xffffffff); /* CP_COHER_SIZE */ radeon_emit(cs, 0xffffffff); /* CP_COHER_SIZE */
radeon_emit(cs, 0xff); /* CP_COHER_SIZE_HI */ radeon_emit(cs, hi_val); /* CP_COHER_SIZE_HI */
radeon_emit(cs, 0); /* CP_COHER_BASE */ radeon_emit(cs, 0); /* CP_COHER_BASE */
radeon_emit(cs, 0); /* CP_COHER_BASE_HI */ radeon_emit(cs, 0); /* CP_COHER_BASE_HI */
radeon_emit(cs, 0x0000000A); /* POLL_INTERVAL */ radeon_emit(cs, 0x0000000A); /* POLL_INTERVAL */
@@ -787,42 +913,45 @@ si_emit_acquire_mem(struct radeon_winsys_cs *cs,
void void
si_cs_emit_cache_flush(struct radeon_winsys_cs *cs, si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
enum chip_class chip_class, enum chip_class chip_class,
uint32_t *flush_cnt,
uint64_t flush_va,
bool is_mec, bool is_mec,
enum radv_cmd_flush_bits flush_bits) enum radv_cmd_flush_bits flush_bits)
{ {
unsigned cp_coher_cntl = 0; unsigned cp_coher_cntl = 0;
uint32_t flush_cb_db = flush_bits & (RADV_CMD_FLAG_FLUSH_AND_INV_CB |
RADV_CMD_FLAG_FLUSH_AND_INV_DB);
if (flush_bits & RADV_CMD_FLAG_INV_ICACHE) if (flush_bits & RADV_CMD_FLAG_INV_ICACHE)
cp_coher_cntl |= S_0085F0_SH_ICACHE_ACTION_ENA(1); cp_coher_cntl |= S_0085F0_SH_ICACHE_ACTION_ENA(1);
if (flush_bits & RADV_CMD_FLAG_INV_SMEM_L1) if (flush_bits & RADV_CMD_FLAG_INV_SMEM_L1)
cp_coher_cntl |= S_0085F0_SH_KCACHE_ACTION_ENA(1); cp_coher_cntl |= S_0085F0_SH_KCACHE_ACTION_ENA(1);
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_CB) { if (chip_class <= VI) {
cp_coher_cntl |= S_0085F0_CB_ACTION_ENA(1) | if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_CB) {
S_0085F0_CB0_DEST_BASE_ENA(1) | cp_coher_cntl |= S_0085F0_CB_ACTION_ENA(1) |
S_0085F0_CB1_DEST_BASE_ENA(1) | S_0085F0_CB0_DEST_BASE_ENA(1) |
S_0085F0_CB2_DEST_BASE_ENA(1) | S_0085F0_CB1_DEST_BASE_ENA(1) |
S_0085F0_CB3_DEST_BASE_ENA(1) | S_0085F0_CB2_DEST_BASE_ENA(1) |
S_0085F0_CB4_DEST_BASE_ENA(1) | S_0085F0_CB3_DEST_BASE_ENA(1) |
S_0085F0_CB5_DEST_BASE_ENA(1) | S_0085F0_CB4_DEST_BASE_ENA(1) |
S_0085F0_CB6_DEST_BASE_ENA(1) | S_0085F0_CB5_DEST_BASE_ENA(1) |
S_0085F0_CB7_DEST_BASE_ENA(1); S_0085F0_CB6_DEST_BASE_ENA(1) |
S_0085F0_CB7_DEST_BASE_ENA(1);
/* Necessary for DCC */ /* Necessary for DCC */
if (chip_class >= VI) { if (chip_class >= VI) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0)); si_cs_emit_write_event_eop(cs,
radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_CB_DATA_TS) | chip_class,
EVENT_INDEX(5)); is_mec,
radeon_emit(cs, 0); V_028A90_FLUSH_AND_INV_CB_DATA_TS,
radeon_emit(cs, 0); 0, 0, 0, 0, 0);
radeon_emit(cs, 0); }
radeon_emit(cs, 0); }
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_DB) {
cp_coher_cntl |= S_0085F0_DB_ACTION_ENA(1) |
S_0085F0_DB_DEST_BASE_ENA(1);
} }
}
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_DB) {
cp_coher_cntl |= S_0085F0_DB_ACTION_ENA(1) |
S_0085F0_DB_DEST_BASE_ENA(1);
} }
if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_CB_META) { if (flush_bits & RADV_CMD_FLAG_FLUSH_AND_INV_CB_META) {
@@ -835,8 +964,7 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_DB_META) | EVENT_INDEX(0)); radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_DB_META) | EVENT_INDEX(0));
} }
if (!(flush_bits & (RADV_CMD_FLAG_FLUSH_AND_INV_CB | if (!flush_cb_db) {
RADV_CMD_FLAG_FLUSH_AND_INV_DB))) {
if (flush_bits & RADV_CMD_FLAG_PS_PARTIAL_FLUSH) { if (flush_bits & RADV_CMD_FLAG_PS_PARTIAL_FLUSH) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_PS_PARTIAL_FLUSH) | EVENT_INDEX(4)); radeon_emit(cs, EVENT_TYPE(V_028A90_PS_PARTIAL_FLUSH) | EVENT_INDEX(4));
@@ -851,6 +979,54 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
radeon_emit(cs, EVENT_TYPE(V_028A90_CS_PARTIAL_FLUSH) | EVENT_INDEX(4)); radeon_emit(cs, EVENT_TYPE(V_028A90_CS_PARTIAL_FLUSH) | EVENT_INDEX(4));
} }
if (chip_class >= GFX9 && flush_cb_db) {
unsigned cb_db_event, tc_flags;
/* Set the CB/DB flush event. */
switch (flush_cb_db) {
case RADV_CMD_FLAG_FLUSH_AND_INV_CB:
cb_db_event = V_028A90_FLUSH_AND_INV_CB_DATA_TS;
break;
case RADV_CMD_FLAG_FLUSH_AND_INV_DB:
cb_db_event = V_028A90_FLUSH_AND_INV_DB_DATA_TS;
break;
default:
/* both CB & DB */
cb_db_event = V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT;
}
/* TC | TC_WB = invalidate L2 data
* TC_MD | TC_WB = invalidate L2 metadata
* TC | TC_WB | TC_MD = invalidate L2 data & metadata
*
* The metadata cache must always be invalidated for coherency
* between CB/DB and shaders. (metadata = HTILE, CMASK, DCC)
*
* TC must be invalidated on GFX9 only if the CB/DB surface is
* not pipe-aligned. If the surface is RB-aligned, it might not
* strictly be pipe-aligned since RB alignment takes precendence.
*/
tc_flags = EVENT_TC_WB_ACTION_ENA |
EVENT_TC_MD_ACTION_ENA;
/* Ideally flush TC together with CB/DB. */
if (flush_bits & RADV_CMD_FLAG_INV_GLOBAL_L2) {
tc_flags |= EVENT_TC_ACTION_ENA |
EVENT_TCL1_ACTION_ENA;
/* Clear the flags. */
flush_bits &= ~(RADV_CMD_FLAG_INV_GLOBAL_L2 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 |
RADV_CMD_FLAG_INV_VMEM_L1);
}
assert(flush_cnt);
uint32_t old_fence = (*flush_cnt)++;
si_cs_emit_write_event_eop(cs, chip_class, false, cb_db_event, tc_flags, 1,
flush_va, old_fence, *flush_cnt);
si_emit_wait_fence(cs, flush_va, *flush_cnt, 0xffffffff);
}
/* VGT state sync */ /* VGT state sync */
if (flush_bits & RADV_CMD_FLAG_VGT_FLUSH) { if (flush_bits & RADV_CMD_FLAG_VGT_FLUSH) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
@@ -860,7 +1036,11 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
/* Make sure ME is idle (it executes most packets) before continuing. /* Make sure ME is idle (it executes most packets) before continuing.
* This prevents read-after-write hazards between PFP and ME. * This prevents read-after-write hazards between PFP and ME.
*/ */
if ((cp_coher_cntl || (flush_bits & RADV_CMD_FLAG_CS_PARTIAL_FLUSH)) && if ((cp_coher_cntl ||
(flush_bits & (RADV_CMD_FLAG_CS_PARTIAL_FLUSH |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_INV_GLOBAL_L2 |
RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2))) &&
!is_mec) { !is_mec) {
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0)); radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0); radeon_emit(cs, 0);
@@ -868,27 +1048,39 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
if ((flush_bits & RADV_CMD_FLAG_INV_GLOBAL_L2) || if ((flush_bits & RADV_CMD_FLAG_INV_GLOBAL_L2) ||
(chip_class <= CIK && (flush_bits & RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2))) { (chip_class <= CIK && (flush_bits & RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2))) {
cp_coher_cntl |= S_0085F0_TC_ACTION_ENA(1); si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9,
if (chip_class >= VI) cp_coher_cntl |
cp_coher_cntl |= S_0301F0_TC_WB_ACTION_ENA(1); S_0085F0_TC_ACTION_ENA(1) |
} else if(flush_bits & RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2) { S_0085F0_TCL1_ACTION_ENA(1) |
cp_coher_cntl |= S_0301F0_TC_WB_ACTION_ENA(1) | S_0301F0_TC_WB_ACTION_ENA(chip_class >= VI));
S_0301F0_TC_NC_ACTION_ENA(1);
/* L2 writeback doesn't combine with L1 invalidate */
si_emit_acquire_mem(cs, is_mec, cp_coher_cntl);
cp_coher_cntl = 0; cp_coher_cntl = 0;
} else {
if(flush_bits & RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2) {
/* WB = write-back
* NC = apply to non-coherent MTYPEs
* (i.e. MTYPE <= 1, which is what we use everywhere)
*
* WB doesn't work without NC.
*/
si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9,
cp_coher_cntl |
S_0301F0_TC_WB_ACTION_ENA(1) |
S_0301F0_TC_NC_ACTION_ENA(1));
cp_coher_cntl = 0;
}
if (flush_bits & RADV_CMD_FLAG_INV_VMEM_L1) {
si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9,
cp_coher_cntl |
S_0085F0_TCL1_ACTION_ENA(1));
cp_coher_cntl = 0;
}
} }
if (flush_bits & RADV_CMD_FLAG_INV_VMEM_L1)
cp_coher_cntl |= S_0085F0_TCL1_ACTION_ENA(1);
/* When one of the DEST_BASE flags is set, SURFACE_SYNC waits for idle. /* When one of the DEST_BASE flags is set, SURFACE_SYNC waits for idle.
* Therefore, it should be last. Done in PFP. * Therefore, it should be last. Done in PFP.
*/ */
if (cp_coher_cntl) if (cp_coher_cntl)
si_emit_acquire_mem(cs, is_mec, cp_coher_cntl); si_emit_acquire_mem(cs, is_mec, chip_class >= GFX9, cp_coher_cntl);
} }
void void
@@ -905,67 +1097,118 @@ si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer)
RADV_CMD_FLAG_VS_PARTIAL_FLUSH | RADV_CMD_FLAG_VS_PARTIAL_FLUSH |
RADV_CMD_FLAG_VGT_FLUSH); RADV_CMD_FLAG_VGT_FLUSH);
if (!cmd_buffer->state.flush_bits)
return;
enum chip_class chip_class = cmd_buffer->device->physical_device->rad_info.chip_class;
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 128); radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 128);
uint32_t *ptr = NULL;
uint64_t va = 0;
if (chip_class == GFX9) {
va = cmd_buffer->device->ws->buffer_get_va(cmd_buffer->gfx9_fence_bo) + cmd_buffer->gfx9_fence_offset;
ptr = &cmd_buffer->gfx9_fence_idx;
}
si_cs_emit_cache_flush(cmd_buffer->cs, si_cs_emit_cache_flush(cmd_buffer->cs,
cmd_buffer->device->physical_device->rad_info.chip_class, cmd_buffer->device->physical_device->rad_info.chip_class,
ptr, va,
radv_cmd_buffer_uses_mec(cmd_buffer), radv_cmd_buffer_uses_mec(cmd_buffer),
cmd_buffer->state.flush_bits); cmd_buffer->state.flush_bits);
if (cmd_buffer->state.flush_bits) radv_cmd_buffer_trace_emit(cmd_buffer);
radv_cmd_buffer_trace_emit(cmd_buffer);
cmd_buffer->state.flush_bits = 0; cmd_buffer->state.flush_bits = 0;
} }
/* Set this if you want the 3D engine to wait until CP DMA is done. /* Set this if you want the 3D engine to wait until CP DMA is done.
* It should be set on the last CP DMA packet. */ * It should be set on the last CP DMA packet. */
#define R600_CP_DMA_SYNC (1 << 0) /* R600+ */ #define CP_DMA_SYNC (1 << 0)
/* Set this if the source data was used as a destination in a previous CP DMA /* Set this if the source data was used as a destination in a previous CP DMA
* packet. It's for preventing a read-after-write (RAW) hazard between two * packet. It's for preventing a read-after-write (RAW) hazard between two
* CP DMA packets. */ * CP DMA packets. */
#define SI_CP_DMA_RAW_WAIT (1 << 1) /* SI+ */ #define CP_DMA_RAW_WAIT (1 << 1)
#define CIK_CP_DMA_USE_L2 (1 << 2) #define CP_DMA_USE_L2 (1 << 2)
#define CP_DMA_CLEAR (1 << 3)
/* Alignment for optimal performance. */ /* Alignment for optimal performance. */
#define CP_DMA_ALIGNMENT 32 #define SI_CPDMA_ALIGNMENT 32
/* The max number of bytes to copy per packet. */
#define CP_DMA_MAX_BYTE_COUNT ((1 << 21) - CP_DMA_ALIGNMENT)
static void si_emit_cp_dma_copy_buffer(struct radv_cmd_buffer *cmd_buffer, /* The max number of bytes that can be copied per packet. */
uint64_t dst_va, uint64_t src_va, static inline unsigned cp_dma_max_byte_count(struct radv_cmd_buffer *cmd_buffer)
unsigned size, unsigned flags) {
unsigned max = cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9 ?
S_414_BYTE_COUNT_GFX9(~0u) :
S_414_BYTE_COUNT_GFX6(~0u);
/* make it aligned for optimal performance */
return max & ~(SI_CPDMA_ALIGNMENT - 1);
}
/* Emit a CP DMA packet to do a copy from one buffer to another, or to clear
* a buffer. The size must fit in bits [20:0]. If CP_DMA_CLEAR is set, src_va is a 32-bit
* clear value.
*/
static void si_emit_cp_dma(struct radv_cmd_buffer *cmd_buffer,
uint64_t dst_va, uint64_t src_va,
unsigned size, unsigned flags)
{ {
struct radeon_winsys_cs *cs = cmd_buffer->cs; struct radeon_winsys_cs *cs = cmd_buffer->cs;
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? S_411_CP_SYNC(1) : 0; uint32_t header = 0, command = 0;
uint32_t wr_confirm = !(flags & R600_CP_DMA_SYNC) ? S_414_DISABLE_WR_CONFIRM_GFX6(1) : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? S_414_RAW_WAIT(1) : 0;
uint32_t sel = flags & CIK_CP_DMA_USE_L2 ?
S_411_SRC_SEL(V_411_SRC_ADDR_TC_L2) |
S_411_DSL_SEL(V_411_DST_ADDR_TC_L2) : 0;
assert(size); assert(size);
assert((size & ((1<<21)-1)) == size); assert(size <= cp_dma_max_byte_count(cmd_buffer));
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 9); radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 9);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9)
command |= S_414_BYTE_COUNT_GFX9(size);
else
command |= S_414_BYTE_COUNT_GFX6(size);
/* Sync flags. */
if (flags & CP_DMA_SYNC)
header |= S_411_CP_SYNC(1);
else {
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9)
command |= S_414_DISABLE_WR_CONFIRM_GFX9(1);
else
command |= S_414_DISABLE_WR_CONFIRM_GFX6(1);
}
if (flags & CP_DMA_RAW_WAIT)
command |= S_414_RAW_WAIT(1);
/* Src and dst flags. */
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9 &&
!(flags & CP_DMA_CLEAR) &&
src_va == dst_va)
header |= S_411_DSL_SEL(V_411_NOWHERE); /* prefetch only */
else if (flags & CP_DMA_USE_L2)
header |= S_411_DSL_SEL(V_411_DST_ADDR_TC_L2);
if (flags & CP_DMA_CLEAR)
header |= S_411_SRC_SEL(V_411_DATA);
else if (flags & CP_DMA_USE_L2)
header |= S_411_SRC_SEL(V_411_SRC_ADDR_TC_L2);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) { if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) {
radeon_emit(cs, PKT3(PKT3_DMA_DATA, 5, 0)); radeon_emit(cs, PKT3(PKT3_DMA_DATA, 5, 0));
radeon_emit(cs, sync_flag | sel); /* CP_SYNC [31] */ radeon_emit(cs, header);
radeon_emit(cs, src_va); /* SRC_ADDR_LO [31:0] */ radeon_emit(cs, src_va); /* SRC_ADDR_LO [31:0] */
radeon_emit(cs, src_va >> 32); /* SRC_ADDR_HI [31:0] */ radeon_emit(cs, src_va >> 32); /* SRC_ADDR_HI [31:0] */
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */ radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, dst_va >> 32); /* DST_ADDR_HI [31:0] */ radeon_emit(cs, dst_va >> 32); /* DST_ADDR_HI [31:0] */
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */ radeon_emit(cs, command);
} else { } else {
assert(!(flags & CP_DMA_USE_L2));
header |= S_411_SRC_ADDR_HI(src_va >> 32);
radeon_emit(cs, PKT3(PKT3_CP_DMA, 4, 0)); radeon_emit(cs, PKT3(PKT3_CP_DMA, 4, 0));
radeon_emit(cs, src_va); /* SRC_ADDR_LO [31:0] */ radeon_emit(cs, src_va); /* SRC_ADDR_LO [31:0] */
radeon_emit(cs, sync_flag | ((src_va >> 32) & 0xffff)); /* CP_SYNC [31] | SRC_ADDR_HI [15:0] */ radeon_emit(cs, header); /* SRC_ADDR_HI [15:0] + flags. */
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */ radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, (dst_va >> 32) & 0xffff); /* DST_ADDR_HI [15:0] */ radeon_emit(cs, (dst_va >> 32) & 0xffff); /* DST_ADDR_HI [15:0] */
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */ radeon_emit(cs, command);
} }
/* CP DMA is executed in ME, but index buffers are read by PFP. /* CP DMA is executed in ME, but index buffers are read by PFP.
@@ -973,7 +1216,7 @@ static void si_emit_cp_dma_copy_buffer(struct radv_cmd_buffer *cmd_buffer,
* indices. If we wanted to execute CP DMA in PFP, this packet * indices. If we wanted to execute CP DMA in PFP, this packet
* should precede it. * should precede it.
*/ */
if (sync_flag && cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) { if ((flags & CP_DMA_SYNC) && cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0)); radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0); radeon_emit(cs, 0);
} }
@@ -981,45 +1224,14 @@ static void si_emit_cp_dma_copy_buffer(struct radv_cmd_buffer *cmd_buffer,
radv_cmd_buffer_trace_emit(cmd_buffer); radv_cmd_buffer_trace_emit(cmd_buffer);
} }
/* Emit a CP DMA packet to clear a buffer. The size must fit in bits [20:0]. */ void si_cp_dma_prefetch(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
static void si_emit_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, unsigned size)
uint64_t dst_va, unsigned size,
uint32_t clear_value, unsigned flags)
{ {
struct radeon_winsys_cs *cs = cmd_buffer->cs; uint64_t aligned_va = va & ~(SI_CPDMA_ALIGNMENT - 1);
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? S_411_CP_SYNC(1) : 0; uint64_t aligned_size = ((va + size + SI_CPDMA_ALIGNMENT -1) & ~(SI_CPDMA_ALIGNMENT - 1)) - aligned_va;
uint32_t wr_confirm = !(flags & R600_CP_DMA_SYNC) ? S_414_DISABLE_WR_CONFIRM_GFX6(1) : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? S_414_RAW_WAIT(1) : 0;
uint32_t dst_sel = flags & CIK_CP_DMA_USE_L2 ? S_411_DSL_SEL(V_411_DST_ADDR_TC_L2) : 0;
assert(size); si_emit_cp_dma(cmd_buffer, aligned_va, aligned_va,
assert((size & ((1<<21)-1)) == size); aligned_size, CP_DMA_USE_L2);
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 9);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) {
radeon_emit(cs, PKT3(PKT3_DMA_DATA, 5, 0));
radeon_emit(cs, sync_flag | dst_sel | S_411_SRC_SEL(V_411_DATA)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, clear_value); /* DATA [31:0] */
radeon_emit(cs, 0);
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, dst_va >> 32); /* DST_ADDR_HI [15:0] */
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */
} else {
radeon_emit(cs, PKT3(PKT3_CP_DMA, 4, 0));
radeon_emit(cs, clear_value); /* DATA [31:0] */
radeon_emit(cs, sync_flag | S_411_SRC_SEL(V_411_DATA)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, (dst_va >> 32) & 0xffff); /* DST_ADDR_HI [15:0] */
radeon_emit(cs, size | wr_confirm | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */
}
/* See "copy_buffer" for explanation. */
if (sync_flag && cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0);
}
radv_cmd_buffer_trace_emit(cmd_buffer);
} }
static void si_cp_dma_prepare(struct radv_cmd_buffer *cmd_buffer, uint64_t byte_count, static void si_cp_dma_prepare(struct radv_cmd_buffer *cmd_buffer, uint64_t byte_count,
@@ -1031,14 +1243,14 @@ static void si_cp_dma_prepare(struct radv_cmd_buffer *cmd_buffer, uint64_t byte_
*/ */
if (cmd_buffer->state.flush_bits) { if (cmd_buffer->state.flush_bits) {
si_emit_cache_flush(cmd_buffer); si_emit_cache_flush(cmd_buffer);
*flags |= SI_CP_DMA_RAW_WAIT; *flags |= CP_DMA_RAW_WAIT;
} }
/* Do the synchronization after the last dma, so that all data /* Do the synchronization after the last dma, so that all data
* is written to memory. * is written to memory.
*/ */
if (byte_count == remaining_size) if (byte_count == remaining_size)
*flags |= R600_CP_DMA_SYNC; *flags |= CP_DMA_SYNC;
} }
static void si_cp_dma_realign_engine(struct radv_cmd_buffer *cmd_buffer, unsigned size) static void si_cp_dma_realign_engine(struct radv_cmd_buffer *cmd_buffer, unsigned size)
@@ -1046,20 +1258,20 @@ static void si_cp_dma_realign_engine(struct radv_cmd_buffer *cmd_buffer, unsigne
uint64_t va; uint64_t va;
uint32_t offset; uint32_t offset;
unsigned dma_flags = 0; unsigned dma_flags = 0;
unsigned buf_size = CP_DMA_ALIGNMENT * 2; unsigned buf_size = SI_CPDMA_ALIGNMENT * 2;
void *ptr; void *ptr;
assert(size < CP_DMA_ALIGNMENT); assert(size < SI_CPDMA_ALIGNMENT);
radv_cmd_buffer_upload_alloc(cmd_buffer, buf_size, CP_DMA_ALIGNMENT, &offset, &ptr); radv_cmd_buffer_upload_alloc(cmd_buffer, buf_size, SI_CPDMA_ALIGNMENT, &offset, &ptr);
va = cmd_buffer->device->ws->buffer_get_va(cmd_buffer->upload.upload_bo); va = cmd_buffer->device->ws->buffer_get_va(cmd_buffer->upload.upload_bo);
va += offset; va += offset;
si_cp_dma_prepare(cmd_buffer, size, size, &dma_flags); si_cp_dma_prepare(cmd_buffer, size, size, &dma_flags);
si_emit_cp_dma_copy_buffer(cmd_buffer, va, va + CP_DMA_ALIGNMENT, size, si_emit_cp_dma(cmd_buffer, va, va + SI_CPDMA_ALIGNMENT, size,
dma_flags); dma_flags);
} }
void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer, void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
@@ -1076,15 +1288,15 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
* just to align the internal counter. Otherwise, the DMA engine * just to align the internal counter. Otherwise, the DMA engine
* would slow down by an order of magnitude for following copies. * would slow down by an order of magnitude for following copies.
*/ */
if (size % CP_DMA_ALIGNMENT) if (size % SI_CPDMA_ALIGNMENT)
realign_size = CP_DMA_ALIGNMENT - (size % CP_DMA_ALIGNMENT); realign_size = SI_CPDMA_ALIGNMENT - (size % SI_CPDMA_ALIGNMENT);
/* If the copy begins unaligned, we must start copying from the next /* If the copy begins unaligned, we must start copying from the next
* aligned block and the skipped part should be copied after everything * aligned block and the skipped part should be copied after everything
* else has been copied. Only the src alignment matters, not dst. * else has been copied. Only the src alignment matters, not dst.
*/ */
if (src_va % CP_DMA_ALIGNMENT) { if (src_va % SI_CPDMA_ALIGNMENT) {
skipped_size = CP_DMA_ALIGNMENT - (src_va % CP_DMA_ALIGNMENT); skipped_size = SI_CPDMA_ALIGNMENT - (src_va % SI_CPDMA_ALIGNMENT);
/* The main part will be skipped if the size is too small. */ /* The main part will be skipped if the size is too small. */
skipped_size = MIN2(skipped_size, size); skipped_size = MIN2(skipped_size, size);
size -= skipped_size; size -= skipped_size;
@@ -1095,14 +1307,14 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
while (size) { while (size) {
unsigned dma_flags = 0; unsigned dma_flags = 0;
unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT); unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer));
si_cp_dma_prepare(cmd_buffer, byte_count, si_cp_dma_prepare(cmd_buffer, byte_count,
size + skipped_size + realign_size, size + skipped_size + realign_size,
&dma_flags); &dma_flags);
si_emit_cp_dma_copy_buffer(cmd_buffer, main_dest_va, main_src_va, si_emit_cp_dma(cmd_buffer, main_dest_va, main_src_va,
byte_count, dma_flags); byte_count, dma_flags);
size -= byte_count; size -= byte_count;
main_src_va += byte_count; main_src_va += byte_count;
@@ -1116,8 +1328,8 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
size + skipped_size + realign_size, size + skipped_size + realign_size,
&dma_flags); &dma_flags);
si_emit_cp_dma_copy_buffer(cmd_buffer, dest_va, src_va, si_emit_cp_dma(cmd_buffer, dest_va, src_va,
skipped_size, dma_flags); skipped_size, dma_flags);
} }
if (realign_size) if (realign_size)
si_cp_dma_realign_engine(cmd_buffer, realign_size); si_cp_dma_realign_engine(cmd_buffer, realign_size);
@@ -1133,14 +1345,14 @@ void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
assert(va % 4 == 0 && size % 4 == 0); assert(va % 4 == 0 && size % 4 == 0);
while (size) { while (size) {
unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT); unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer));
unsigned dma_flags = 0; unsigned dma_flags = CP_DMA_CLEAR;
si_cp_dma_prepare(cmd_buffer, byte_count, size, &dma_flags); si_cp_dma_prepare(cmd_buffer, byte_count, size, &dma_flags);
/* Emit the clear packet. */ /* Emit the clear packet. */
si_emit_cp_dma_clear_buffer(cmd_buffer, va, byte_count, value, si_emit_cp_dma(cmd_buffer, va, value, byte_count,
dma_flags); dma_flags);
size -= byte_count; size -= byte_count;
va += byte_count; va += byte_count;

View File

@@ -396,6 +396,13 @@ vk_format_is_int(VkFormat format)
return channel >= 0 && desc->channel[channel].pure_integer; return channel >= 0 && desc->channel[channel].pure_integer;
} }
static inline bool
vk_format_is_srgb(VkFormat format)
{
const struct vk_format_description *desc = vk_format_description(format);
return desc->colorspace == VK_FORMAT_COLORSPACE_SRGB;
}
static inline VkFormat static inline VkFormat
vk_format_stencil_only(VkFormat format) vk_format_stencil_only(VkFormat format)
{ {

View File

@@ -467,25 +467,29 @@ radv_amdgpu_winsys_bo_set_metadata(struct radeon_winsys_bo *_bo,
struct amdgpu_bo_metadata metadata = {0}; struct amdgpu_bo_metadata metadata = {0};
uint32_t tiling_flags = 0; uint32_t tiling_flags = 0;
if (md->macrotile == RADEON_LAYOUT_TILED) if (bo->ws->info.chip_class >= GFX9) {
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 4); /* 2D_TILED_THIN1 */ tiling_flags |= AMDGPU_TILING_SET(SWIZZLE_MODE, md->u.gfx9.swizzle_mode);
else if (md->microtile == RADEON_LAYOUT_TILED) } else {
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 2); /* 1D_TILED_THIN1 */ if (md->u.legacy.macrotile == RADEON_LAYOUT_TILED)
else tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 4); /* 2D_TILED_THIN1 */
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 1); /* LINEAR_ALIGNED */ else if (md->u.legacy.microtile == RADEON_LAYOUT_TILED)
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 2); /* 1D_TILED_THIN1 */
else
tiling_flags |= AMDGPU_TILING_SET(ARRAY_MODE, 1); /* LINEAR_ALIGNED */
tiling_flags |= AMDGPU_TILING_SET(PIPE_CONFIG, md->pipe_config); tiling_flags |= AMDGPU_TILING_SET(PIPE_CONFIG, md->u.legacy.pipe_config);
tiling_flags |= AMDGPU_TILING_SET(BANK_WIDTH, util_logbase2(md->bankw)); tiling_flags |= AMDGPU_TILING_SET(BANK_WIDTH, util_logbase2(md->u.legacy.bankw));
tiling_flags |= AMDGPU_TILING_SET(BANK_HEIGHT, util_logbase2(md->bankh)); tiling_flags |= AMDGPU_TILING_SET(BANK_HEIGHT, util_logbase2(md->u.legacy.bankh));
if (md->tile_split) if (md->u.legacy.tile_split)
tiling_flags |= AMDGPU_TILING_SET(TILE_SPLIT, radv_eg_tile_split_rev(md->tile_split)); tiling_flags |= AMDGPU_TILING_SET(TILE_SPLIT, radv_eg_tile_split_rev(md->u.legacy.tile_split));
tiling_flags |= AMDGPU_TILING_SET(MACRO_TILE_ASPECT, util_logbase2(md->mtilea)); tiling_flags |= AMDGPU_TILING_SET(MACRO_TILE_ASPECT, util_logbase2(md->u.legacy.mtilea));
tiling_flags |= AMDGPU_TILING_SET(NUM_BANKS, util_logbase2(md->num_banks)-1); tiling_flags |= AMDGPU_TILING_SET(NUM_BANKS, util_logbase2(md->u.legacy.num_banks)-1);
if (md->scanout) if (md->u.legacy.scanout)
tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 0); /* DISPLAY_MICRO_TILING */ tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 0); /* DISPLAY_MICRO_TILING */
else else
tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 1); /* THIN_MICRO_TILING */ tiling_flags |= AMDGPU_TILING_SET(MICRO_TILE_MODE, 1); /* THIN_MICRO_TILING */
}
metadata.tiling_info = tiling_flags; metadata.tiling_info = tiling_flags;
metadata.size_metadata = md->size_metadata; metadata.size_metadata = md->size_metadata;

View File

@@ -90,25 +90,26 @@ static int ring_to_hw_ip(enum ring_type ring)
} }
static void radv_amdgpu_request_to_fence(struct radv_amdgpu_ctx *ctx, static void radv_amdgpu_request_to_fence(struct radv_amdgpu_ctx *ctx,
struct amdgpu_cs_fence *fence, struct radv_amdgpu_fence *fence,
struct amdgpu_cs_request *req) struct amdgpu_cs_request *req)
{ {
fence->context = ctx->ctx; fence->fence.context = ctx->ctx;
fence->ip_type = req->ip_type; fence->fence.ip_type = req->ip_type;
fence->ip_instance = req->ip_instance; fence->fence.ip_instance = req->ip_instance;
fence->ring = req->ring; fence->fence.ring = req->ring;
fence->fence = req->seq_no; fence->fence.fence = req->seq_no;
fence->user_ptr = (volatile uint64_t*)(ctx->fence_map + (req->ip_type * MAX_RINGS_PER_TYPE + req->ring) * sizeof(uint64_t));
} }
static struct radeon_winsys_fence *radv_amdgpu_create_fence() static struct radeon_winsys_fence *radv_amdgpu_create_fence()
{ {
struct radv_amdgpu_cs_fence *fence = calloc(1, sizeof(struct amdgpu_cs_fence)); struct radv_amdgpu_fence *fence = calloc(1, sizeof(struct radv_amdgpu_fence));
return (struct radeon_winsys_fence*)fence; return (struct radeon_winsys_fence*)fence;
} }
static void radv_amdgpu_destroy_fence(struct radeon_winsys_fence *_fence) static void radv_amdgpu_destroy_fence(struct radeon_winsys_fence *_fence)
{ {
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence; struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
free(fence); free(fence);
} }
@@ -117,16 +118,23 @@ static bool radv_amdgpu_fence_wait(struct radeon_winsys *_ws,
bool absolute, bool absolute,
uint64_t timeout) uint64_t timeout)
{ {
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence; struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
unsigned flags = absolute ? AMDGPU_QUERY_FENCE_TIMEOUT_IS_ABSOLUTE : 0; unsigned flags = absolute ? AMDGPU_QUERY_FENCE_TIMEOUT_IS_ABSOLUTE : 0;
int r; int r;
uint32_t expired = 0; uint32_t expired = 0;
if (fence->user_ptr) {
if (*fence->user_ptr >= fence->fence.fence)
return true;
if (!absolute && !timeout)
return false;
}
/* Now use the libdrm query. */ /* Now use the libdrm query. */
r = amdgpu_cs_query_fence_status(fence, r = amdgpu_cs_query_fence_status(&fence->fence,
timeout, timeout,
flags, flags,
&expired); &expired);
if (r) { if (r) {
fprintf(stderr, "amdgpu: radv_amdgpu_cs_query_fence_status failed.\n"); fprintf(stderr, "amdgpu: radv_amdgpu_cs_query_fence_status failed.\n");
@@ -619,6 +627,16 @@ static int radv_amdgpu_create_bo_list(struct radv_amdgpu_winsys *ws,
return r; return r;
} }
static struct amdgpu_cs_fence_info radv_set_cs_fence(struct radv_amdgpu_ctx *ctx, int ip_type, int ring)
{
struct amdgpu_cs_fence_info ret = {0};
if (ctx->fence_map) {
ret.handle = radv_amdgpu_winsys_bo(ctx->fence_bo)->bo;
ret.offset = (ip_type * MAX_RINGS_PER_TYPE + ring) * sizeof(uint64_t);
}
return ret;
}
static void radv_assign_last_submit(struct radv_amdgpu_ctx *ctx, static void radv_assign_last_submit(struct radv_amdgpu_ctx *ctx,
struct amdgpu_cs_request *request) struct amdgpu_cs_request *request)
{ {
@@ -637,7 +655,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
{ {
int r; int r;
struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx); struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx);
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence; struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
struct radv_amdgpu_cs *cs0 = radv_amdgpu_cs(cs_array[0]); struct radv_amdgpu_cs *cs0 = radv_amdgpu_cs(cs_array[0]);
amdgpu_bo_list_handle bo_list; amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request = {0}; struct amdgpu_cs_request request = {0};
@@ -676,6 +694,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
request.number_of_ibs = 1; request.number_of_ibs = 1;
request.ibs = &cs0->ib; request.ibs = &cs0->ib;
request.resources = bo_list; request.resources = bo_list;
request.fence_info = radv_set_cs_fence(ctx, cs0->hw_ip, queue_idx);
if (initial_preamble_cs) { if (initial_preamble_cs) {
request.ibs = ibs; request.ibs = ibs;
@@ -713,7 +732,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
{ {
int r; int r;
struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx); struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx);
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence; struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
amdgpu_bo_list_handle bo_list; amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request; struct amdgpu_cs_request request;
@@ -740,6 +759,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
request.resources = bo_list; request.resources = bo_list;
request.number_of_ibs = cnt + !!preamble_cs; request.number_of_ibs = cnt + !!preamble_cs;
request.ibs = ibs; request.ibs = ibs;
request.fence_info = radv_set_cs_fence(ctx, cs0->hw_ip, queue_idx);
if (preamble_cs) { if (preamble_cs) {
ibs[0] = radv_amdgpu_cs(preamble_cs)->ib; ibs[0] = radv_amdgpu_cs(preamble_cs)->ib;
@@ -789,14 +809,14 @@ static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
{ {
int r; int r;
struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx); struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx);
struct amdgpu_cs_fence *fence = (struct amdgpu_cs_fence *)_fence; struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
struct radv_amdgpu_cs *cs0 = radv_amdgpu_cs(cs_array[0]); struct radv_amdgpu_cs *cs0 = radv_amdgpu_cs(cs_array[0]);
struct radeon_winsys *ws = (struct radeon_winsys*)cs0->ws; struct radeon_winsys *ws = (struct radeon_winsys*)cs0->ws;
amdgpu_bo_list_handle bo_list; amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request; struct amdgpu_cs_request request;
uint32_t pad_word = 0xffff1000U; uint32_t pad_word = 0xffff1000U;
if (radv_amdgpu_winsys(ws)->family == FAMILY_SI) if (radv_amdgpu_winsys(ws)->info.chip_class == SI)
pad_word = 0x80000000; pad_word = 0x80000000;
assert(cs_count); assert(cs_count);
@@ -858,6 +878,7 @@ static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
request.resources = bo_list; request.resources = bo_list;
request.number_of_ibs = 1; request.number_of_ibs = 1;
request.ibs = &ib; request.ibs = &ib;
request.fence_info = radv_set_cs_fence(ctx, cs0->hw_ip, queue_idx);
r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1); r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1);
if (r) { if (r) {
@@ -910,7 +931,7 @@ static int radv_amdgpu_winsys_cs_submit(struct radeon_winsys_ctx *_ctx,
if (!cs->ws->use_ib_bos) { if (!cs->ws->use_ib_bos) {
ret = radv_amdgpu_winsys_cs_submit_sysmem(_ctx, queue_idx, cs_array, ret = radv_amdgpu_winsys_cs_submit_sysmem(_ctx, queue_idx, cs_array,
cs_count, initial_preamble_cs, continue_preamble_cs, _fence); cs_count, initial_preamble_cs, continue_preamble_cs, _fence);
} else if (can_patch && cs_count > AMDGPU_CS_MAX_IBS_PER_SUBMIT && false) { } else if (can_patch && cs_count > AMDGPU_CS_MAX_IBS_PER_SUBMIT && cs->ws->batchchain) {
ret = radv_amdgpu_winsys_cs_submit_chained(_ctx, queue_idx, cs_array, ret = radv_amdgpu_winsys_cs_submit_chained(_ctx, queue_idx, cs_array,
cs_count, initial_preamble_cs, continue_preamble_cs, _fence); cs_count, initial_preamble_cs, continue_preamble_cs, _fence);
} else { } else {
@@ -931,6 +952,9 @@ static void *radv_amdgpu_winsys_get_cpu_addr(void *_cs, uint64_t addr)
{ {
struct radv_amdgpu_cs *cs = (struct radv_amdgpu_cs *)_cs; struct radv_amdgpu_cs *cs = (struct radv_amdgpu_cs *)_cs;
void *ret = NULL; void *ret = NULL;
if (!cs->ib_buffer)
return NULL;
for (unsigned i = 0; i <= cs->num_old_ib_buffers; ++i) { for (unsigned i = 0; i <= cs->num_old_ib_buffers; ++i) {
struct radv_amdgpu_winsys_bo *bo; struct radv_amdgpu_winsys_bo *bo;
@@ -949,10 +973,15 @@ static void radv_amdgpu_winsys_cs_dump(struct radeon_winsys_cs *_cs,
uint32_t trace_id) uint32_t trace_id)
{ {
struct radv_amdgpu_cs *cs = (struct radv_amdgpu_cs *)_cs; struct radv_amdgpu_cs *cs = (struct radv_amdgpu_cs *)_cs;
void *ib = cs->base.buf;
int num_dw = cs->base.cdw;
ac_parse_ib(file, if (cs->ws->use_ib_bos) {
radv_amdgpu_winsys_get_cpu_addr(cs, cs->ib.ib_mc_address), ib = radv_amdgpu_winsys_get_cpu_addr(cs, cs->ib.ib_mc_address);
cs->ib.size, trace_id, "main IB", cs->ws->info.chip_class, num_dw = cs->ib.size;
}
assert(ib);
ac_parse_ib(file, ib, num_dw, trace_id, "main IB", cs->ws->info.chip_class,
radv_amdgpu_winsys_get_cpu_addr, cs); radv_amdgpu_winsys_get_cpu_addr, cs);
} }
@@ -970,6 +999,15 @@ static struct radeon_winsys_ctx *radv_amdgpu_ctx_create(struct radeon_winsys *_w
goto error_create; goto error_create;
} }
ctx->ws = ws; ctx->ws = ws;
assert(AMDGPU_HW_IP_NUM * MAX_RINGS_PER_TYPE * sizeof(uint64_t) <= 4096);
ctx->fence_bo = ws->base.buffer_create(&ws->base, 4096, 8,
RADEON_DOMAIN_GTT,
RADEON_FLAG_CPU_ACCESS);
if (ctx->fence_bo)
ctx->fence_map = (uint64_t*)ws->base.buffer_map(ctx->fence_bo);
if (ctx->fence_map)
memset(ctx->fence_map, 0, 4096);
return (struct radeon_winsys_ctx *)ctx; return (struct radeon_winsys_ctx *)ctx;
error_create: error_create:
FREE(ctx); FREE(ctx);
@@ -979,6 +1017,7 @@ error_create:
static void radv_amdgpu_ctx_destroy(struct radeon_winsys_ctx *rwctx) static void radv_amdgpu_ctx_destroy(struct radeon_winsys_ctx *rwctx)
{ {
struct radv_amdgpu_ctx *ctx = (struct radv_amdgpu_ctx *)rwctx; struct radv_amdgpu_ctx *ctx = (struct radv_amdgpu_ctx *)rwctx;
ctx->ws->base.buffer_destroy(ctx->fence_bo);
amdgpu_cs_ctx_free(ctx->ctx); amdgpu_cs_ctx_free(ctx->ctx);
FREE(ctx); FREE(ctx);
} }
@@ -989,9 +1028,9 @@ static bool radv_amdgpu_ctx_wait_idle(struct radeon_winsys_ctx *rwctx,
struct radv_amdgpu_ctx *ctx = (struct radv_amdgpu_ctx *)rwctx; struct radv_amdgpu_ctx *ctx = (struct radv_amdgpu_ctx *)rwctx;
int ip_type = ring_to_hw_ip(ring_type); int ip_type = ring_to_hw_ip(ring_type);
if (ctx->last_submission[ip_type][ring_index].fence) { if (ctx->last_submission[ip_type][ring_index].fence.fence) {
uint32_t expired; uint32_t expired;
int ret = amdgpu_cs_query_fence_status(&ctx->last_submission[ip_type][ring_index], int ret = amdgpu_cs_query_fence_status(&ctx->last_submission[ip_type][ring_index].fence,
1000000000ull, 0, &expired); 1000000000ull, 0, &expired);
if (ret || !expired) if (ret || !expired)

View File

@@ -42,10 +42,19 @@ enum {
MAX_RINGS_PER_TYPE = 8 MAX_RINGS_PER_TYPE = 8
}; };
struct radv_amdgpu_fence {
struct amdgpu_cs_fence fence;
volatile uint64_t *user_ptr;
};
struct radv_amdgpu_ctx { struct radv_amdgpu_ctx {
struct radv_amdgpu_winsys *ws; struct radv_amdgpu_winsys *ws;
amdgpu_context_handle ctx; amdgpu_context_handle ctx;
struct amdgpu_cs_fence last_submission[AMDGPU_HW_IP_DMA + 1][MAX_RINGS_PER_TYPE]; struct radv_amdgpu_fence last_submission[AMDGPU_HW_IP_DMA + 1][MAX_RINGS_PER_TYPE];
struct radeon_winsys_bo *fence_bo;
uint64_t *fence_map;
}; };
static inline struct radv_amdgpu_ctx * static inline struct radv_amdgpu_ctx *

View File

@@ -35,63 +35,39 @@
#include "radv_amdgpu_surface.h" #include "radv_amdgpu_surface.h"
#include "sid.h" #include "sid.h"
#ifndef NO_ENTRIES #include "ac_surface.h"
#define NO_ENTRIES 32
#endif
#ifndef NO_MACRO_ENTRIES static int radv_amdgpu_surface_sanity(const struct ac_surf_info *surf_info,
#define NO_MACRO_ENTRIES 16 const struct radeon_surf *surf)
#endif
#ifndef CIASICIDGFXENGINE_SOUTHERNISLAND
#define CIASICIDGFXENGINE_SOUTHERNISLAND 0x0000000A
#endif
static int radv_amdgpu_surface_sanity(const struct radeon_surf *surf)
{ {
unsigned type = RADEON_SURF_GET(surf->flags, TYPE); unsigned type = RADEON_SURF_GET(surf->flags, TYPE);
if (!(surf->flags & RADEON_SURF_HAS_TILE_MODE_INDEX)) if (!(surf->flags & RADEON_SURF_HAS_TILE_MODE_INDEX))
return -EINVAL; return -EINVAL;
/* all dimension must be at least 1 ! */ if (!surf->blk_w || !surf->blk_h)
if (!surf->npix_x || !surf->npix_y || !surf->npix_z ||
!surf->array_size)
return -EINVAL; return -EINVAL;
if (!surf->blk_w || !surf->blk_h || !surf->blk_d)
return -EINVAL;
switch (surf->nsamples) {
case 1:
case 2:
case 4:
case 8:
break;
default:
return -EINVAL;
}
switch (type) { switch (type) {
case RADEON_SURF_TYPE_1D: case RADEON_SURF_TYPE_1D:
if (surf->npix_y > 1) if (surf_info->height > 1)
return -EINVAL; return -EINVAL;
/* fall through */ /* fall through */
case RADEON_SURF_TYPE_2D: case RADEON_SURF_TYPE_2D:
case RADEON_SURF_TYPE_CUBEMAP: case RADEON_SURF_TYPE_CUBEMAP:
if (surf->npix_z > 1 || surf->array_size > 1) if (surf_info->depth > 1 || surf_info->array_size > 1)
return -EINVAL; return -EINVAL;
break; break;
case RADEON_SURF_TYPE_3D: case RADEON_SURF_TYPE_3D:
if (surf->array_size > 1) if (surf_info->array_size > 1)
return -EINVAL; return -EINVAL;
break; break;
case RADEON_SURF_TYPE_1D_ARRAY: case RADEON_SURF_TYPE_1D_ARRAY:
if (surf->npix_y > 1) if (surf_info->height > 1)
return -EINVAL; return -EINVAL;
/* fall through */ /* fall through */
case RADEON_SURF_TYPE_2D_ARRAY: case RADEON_SURF_TYPE_2D_ARRAY:
if (surf->npix_z > 1) if (surf_info->depth > 1)
return -EINVAL; return -EINVAL;
break; break;
default: default:
@@ -100,453 +76,28 @@ static int radv_amdgpu_surface_sanity(const struct radeon_surf *surf)
return 0; return 0;
} }
static void *ADDR_API radv_allocSysMem(const ADDR_ALLOCSYSMEM_INPUT * pInput)
{
return malloc(pInput->sizeInBytes);
}
static ADDR_E_RETURNCODE ADDR_API radv_freeSysMem(const ADDR_FREESYSMEM_INPUT * pInput)
{
free(pInput->pVirtAddr);
return ADDR_OK;
}
ADDR_HANDLE radv_amdgpu_addr_create(struct amdgpu_gpu_info *amdinfo, int family, int rev_id,
enum chip_class chip_class)
{
ADDR_CREATE_INPUT addrCreateInput = {0};
ADDR_CREATE_OUTPUT addrCreateOutput = {0};
ADDR_REGISTER_VALUE regValue = {0};
ADDR_CREATE_FLAGS createFlags = {{0}};
ADDR_E_RETURNCODE addrRet;
addrCreateInput.size = sizeof(ADDR_CREATE_INPUT);
addrCreateOutput.size = sizeof(ADDR_CREATE_OUTPUT);
regValue.noOfBanks = amdinfo->mc_arb_ramcfg & 0x3;
regValue.gbAddrConfig = amdinfo->gb_addr_cfg;
regValue.noOfRanks = (amdinfo->mc_arb_ramcfg & 0x4) >> 2;
regValue.backendDisables = amdinfo->backend_disable[0];
regValue.pTileConfig = amdinfo->gb_tile_mode;
regValue.noOfEntries = ARRAY_SIZE(amdinfo->gb_tile_mode);
if (chip_class == SI) {
regValue.pMacroTileConfig = NULL;
regValue.noOfMacroEntries = 0;
} else {
regValue.pMacroTileConfig = amdinfo->gb_macro_tile_mode;
regValue.noOfMacroEntries = ARRAY_SIZE(amdinfo->gb_macro_tile_mode);
}
createFlags.value = 0;
createFlags.useTileIndex = 1;
addrCreateInput.chipEngine = CIASICIDGFXENGINE_SOUTHERNISLAND;
addrCreateInput.chipFamily = family;
addrCreateInput.chipRevision = rev_id;
addrCreateInput.createFlags = createFlags;
addrCreateInput.callbacks.allocSysMem = radv_allocSysMem;
addrCreateInput.callbacks.freeSysMem = radv_freeSysMem;
addrCreateInput.callbacks.debugPrint = 0;
addrCreateInput.regValue = regValue;
addrRet = AddrCreate(&addrCreateInput, &addrCreateOutput);
if (addrRet != ADDR_OK)
return NULL;
return addrCreateOutput.hLib;
}
static int radv_compute_level(ADDR_HANDLE addrlib,
struct radeon_surf *surf, bool is_stencil,
unsigned level, unsigned type, bool compressed,
ADDR_COMPUTE_SURFACE_INFO_INPUT *AddrSurfInfoIn,
ADDR_COMPUTE_SURFACE_INFO_OUTPUT *AddrSurfInfoOut,
ADDR_COMPUTE_DCCINFO_INPUT *AddrDccIn,
ADDR_COMPUTE_DCCINFO_OUTPUT *AddrDccOut)
{
struct radeon_surf_level *surf_level;
ADDR_E_RETURNCODE ret;
AddrSurfInfoIn->mipLevel = level;
AddrSurfInfoIn->width = u_minify(surf->npix_x, level);
AddrSurfInfoIn->height = u_minify(surf->npix_y, level);
if (type == RADEON_SURF_TYPE_3D)
AddrSurfInfoIn->numSlices = u_minify(surf->npix_z, level);
else if (type == RADEON_SURF_TYPE_CUBEMAP)
AddrSurfInfoIn->numSlices = 6;
else
AddrSurfInfoIn->numSlices = surf->array_size;
if (level > 0) {
/* Set the base level pitch. This is needed for calculation
* of non-zero levels. */
if (is_stencil)
AddrSurfInfoIn->basePitch = surf->stencil_level[0].nblk_x;
else
AddrSurfInfoIn->basePitch = surf->level[0].nblk_x;
/* Convert blocks to pixels for compressed formats. */
if (compressed)
AddrSurfInfoIn->basePitch *= surf->blk_w;
}
ret = AddrComputeSurfaceInfo(addrlib,
AddrSurfInfoIn,
AddrSurfInfoOut);
if (ret != ADDR_OK)
return ret;
surf_level = is_stencil ? &surf->stencil_level[level] : &surf->level[level];
surf_level->offset = align64(surf->bo_size, AddrSurfInfoOut->baseAlign);
surf_level->slice_size = AddrSurfInfoOut->sliceSize;
surf_level->pitch_bytes = AddrSurfInfoOut->pitch * (is_stencil ? 1 : surf->bpe);
surf_level->npix_x = u_minify(surf->npix_x, level);
surf_level->npix_y = u_minify(surf->npix_y, level);
surf_level->npix_z = u_minify(surf->npix_z, level);
surf_level->nblk_x = AddrSurfInfoOut->pitch;
surf_level->nblk_y = AddrSurfInfoOut->height;
if (type == RADEON_SURF_TYPE_3D)
surf_level->nblk_z = AddrSurfInfoOut->depth;
else
surf_level->nblk_z = 1;
switch (AddrSurfInfoOut->tileMode) {
case ADDR_TM_LINEAR_ALIGNED:
surf_level->mode = RADEON_SURF_MODE_LINEAR_ALIGNED;
break;
case ADDR_TM_1D_TILED_THIN1:
surf_level->mode = RADEON_SURF_MODE_1D;
break;
case ADDR_TM_2D_TILED_THIN1:
surf_level->mode = RADEON_SURF_MODE_2D;
break;
default:
assert(0);
}
if (is_stencil)
surf->stencil_tiling_index[level] = AddrSurfInfoOut->tileIndex;
else
surf->tiling_index[level] = AddrSurfInfoOut->tileIndex;
surf->bo_size = surf_level->offset + AddrSurfInfoOut->surfSize;
/* Clear DCC fields at the beginning. */
surf_level->dcc_offset = 0;
surf_level->dcc_enabled = false;
/* The previous level's flag tells us if we can use DCC for this level. */
if (AddrSurfInfoIn->flags.dccCompatible &&
(level == 0 || AddrDccOut->subLvlCompressible)) {
AddrDccIn->colorSurfSize = AddrSurfInfoOut->surfSize;
AddrDccIn->tileMode = AddrSurfInfoOut->tileMode;
AddrDccIn->tileInfo = *AddrSurfInfoOut->pTileInfo;
AddrDccIn->tileIndex = AddrSurfInfoOut->tileIndex;
AddrDccIn->macroModeIndex = AddrSurfInfoOut->macroModeIndex;
ret = AddrComputeDccInfo(addrlib,
AddrDccIn,
AddrDccOut);
if (ret == ADDR_OK) {
surf_level->dcc_offset = surf->dcc_size;
surf_level->dcc_fast_clear_size = AddrDccOut->dccFastClearSize;
surf_level->dcc_enabled = true;
surf->dcc_size = surf_level->dcc_offset + AddrDccOut->dccRamSize;
surf->dcc_alignment = MAX2(surf->dcc_alignment, AddrDccOut->dccRamBaseAlign);
}
}
if (!is_stencil && AddrSurfInfoIn->flags.depth &&
surf_level->mode == RADEON_SURF_MODE_2D && level == 0) {
ADDR_COMPUTE_HTILE_INFO_INPUT AddrHtileIn = {0};
ADDR_COMPUTE_HTILE_INFO_OUTPUT AddrHtileOut = {0};
AddrHtileIn.flags.tcCompatible = AddrSurfInfoIn->flags.tcCompatible;
AddrHtileIn.pitch = AddrSurfInfoOut->pitch;
AddrHtileIn.height = AddrSurfInfoOut->height;
AddrHtileIn.numSlices = AddrSurfInfoOut->depth;
AddrHtileIn.blockWidth = ADDR_HTILE_BLOCKSIZE_8;
AddrHtileIn.blockHeight = ADDR_HTILE_BLOCKSIZE_8;
AddrHtileIn.pTileInfo = AddrSurfInfoOut->pTileInfo;
AddrHtileIn.tileIndex = AddrSurfInfoOut->tileIndex;
AddrHtileIn.macroModeIndex = AddrSurfInfoOut->macroModeIndex;
ret = AddrComputeHtileInfo(addrlib,
&AddrHtileIn,
&AddrHtileOut);
if (ret == ADDR_OK) {
surf->htile_size = AddrHtileOut.htileBytes;
surf->htile_slice_size = AddrHtileOut.sliceSize;
surf->htile_alignment = AddrHtileOut.baseAlign;
}
}
return 0;
}
static void radv_set_micro_tile_mode(struct radeon_surf *surf,
struct radeon_info *info)
{
uint32_t tile_mode = info->si_tile_mode_array[surf->tiling_index[0]];
if (info->chip_class >= CIK)
surf->micro_tile_mode = G_009910_MICRO_TILE_MODE_NEW(tile_mode);
else
surf->micro_tile_mode = G_009910_MICRO_TILE_MODE(tile_mode);
}
static unsigned cik_get_macro_tile_index(struct radeon_surf *surf)
{
unsigned index, tileb;
tileb = 8 * 8 * surf->bpe;
tileb = MIN2(surf->tile_split, tileb);
for (index = 0; tileb > 64; index++)
tileb >>= 1;
assert(index < 16);
return index;
}
static int radv_amdgpu_winsys_surface_init(struct radeon_winsys *_ws, static int radv_amdgpu_winsys_surface_init(struct radeon_winsys *_ws,
const struct ac_surf_info *surf_info,
struct radeon_surf *surf) struct radeon_surf *surf)
{ {
struct radv_amdgpu_winsys *ws = radv_amdgpu_winsys(_ws); struct radv_amdgpu_winsys *ws = radv_amdgpu_winsys(_ws);
unsigned level, mode, type; unsigned mode, type;
bool compressed;
ADDR_COMPUTE_SURFACE_INFO_INPUT AddrSurfInfoIn = {0};
ADDR_COMPUTE_SURFACE_INFO_OUTPUT AddrSurfInfoOut = {0};
ADDR_COMPUTE_DCCINFO_INPUT AddrDccIn = {0};
ADDR_COMPUTE_DCCINFO_OUTPUT AddrDccOut = {0};
ADDR_TILEINFO AddrTileInfoIn = {0};
ADDR_TILEINFO AddrTileInfoOut = {0};
int r; int r;
r = radv_amdgpu_surface_sanity(surf); r = radv_amdgpu_surface_sanity(surf_info, surf);
if (r) if (r)
return r; return r;
AddrSurfInfoIn.size = sizeof(ADDR_COMPUTE_SURFACE_INFO_INPUT);
AddrSurfInfoOut.size = sizeof(ADDR_COMPUTE_SURFACE_INFO_OUTPUT);
AddrDccIn.size = sizeof(ADDR_COMPUTE_DCCINFO_INPUT);
AddrDccOut.size = sizeof(ADDR_COMPUTE_DCCINFO_OUTPUT);
AddrSurfInfoOut.pTileInfo = &AddrTileInfoOut;
type = RADEON_SURF_GET(surf->flags, TYPE); type = RADEON_SURF_GET(surf->flags, TYPE);
mode = RADEON_SURF_GET(surf->flags, MODE); mode = RADEON_SURF_GET(surf->flags, MODE);
compressed = surf->blk_w == 4 && surf->blk_h == 4;
/* MSAA and FMASK require 2D tiling. */ struct ac_surf_config config;
if (surf->nsamples > 1 ||
(surf->flags & RADEON_SURF_FMASK))
mode = RADEON_SURF_MODE_2D;
/* DB doesn't support linear layouts. */ memcpy(&config.info, surf_info, sizeof(config.info));
if (surf->flags & (RADEON_SURF_Z_OR_SBUFFER) && config.is_3d = !!(type == RADEON_SURF_TYPE_3D);
mode < RADEON_SURF_MODE_1D) config.is_cube = !!(type == RADEON_SURF_TYPE_CUBEMAP);
mode = RADEON_SURF_MODE_1D;
/* Set the requested tiling mode. */ return ac_compute_surface(ws->addrlib, &ws->info, &config, mode, surf);
switch (mode) {
case RADEON_SURF_MODE_LINEAR_ALIGNED:
AddrSurfInfoIn.tileMode = ADDR_TM_LINEAR_ALIGNED;
break;
case RADEON_SURF_MODE_1D:
AddrSurfInfoIn.tileMode = ADDR_TM_1D_TILED_THIN1;
break;
case RADEON_SURF_MODE_2D:
AddrSurfInfoIn.tileMode = ADDR_TM_2D_TILED_THIN1;
break;
default:
assert(0);
}
/* The format must be set correctly for the allocation of compressed
* textures to work. In other cases, setting the bpp is sufficient. */
if (compressed) {
switch (surf->bpe) {
case 8:
AddrSurfInfoIn.format = ADDR_FMT_BC1;
break;
case 16:
AddrSurfInfoIn.format = ADDR_FMT_BC3;
break;
default:
assert(0);
}
} else {
AddrDccIn.bpp = AddrSurfInfoIn.bpp = surf->bpe * 8;
}
AddrDccIn.numSamples = AddrSurfInfoIn.numSamples = surf->nsamples;
AddrSurfInfoIn.tileIndex = -1;
/* Set the micro tile type. */
if (surf->flags & RADEON_SURF_SCANOUT)
AddrSurfInfoIn.tileType = ADDR_DISPLAYABLE;
else if (surf->flags & RADEON_SURF_Z_OR_SBUFFER)
AddrSurfInfoIn.tileType = ADDR_DEPTH_SAMPLE_ORDER;
else
AddrSurfInfoIn.tileType = ADDR_NON_DISPLAYABLE;
AddrSurfInfoIn.flags.color = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER);
AddrSurfInfoIn.flags.depth = (surf->flags & RADEON_SURF_ZBUFFER) != 0;
AddrSurfInfoIn.flags.cube = type == RADEON_SURF_TYPE_CUBEMAP;
AddrSurfInfoIn.flags.display = (surf->flags & RADEON_SURF_SCANOUT) != 0;
AddrSurfInfoIn.flags.pow2Pad = surf->last_level > 0;
AddrSurfInfoIn.flags.opt4Space = 1;
/* DCC notes:
* - If we add MSAA support, keep in mind that CB can't decompress 8bpp
* with samples >= 4.
* - Mipmapped array textures have low performance (discovered by a closed
* driver team).
*/
AddrSurfInfoIn.flags.dccCompatible = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER) &&
!(surf->flags & RADEON_SURF_DISABLE_DCC) &&
!compressed && AddrDccIn.numSamples <= 1 &&
((surf->array_size == 1 && surf->npix_z == 1) ||
surf->last_level == 0);
AddrSurfInfoIn.flags.noStencil = (surf->flags & RADEON_SURF_SBUFFER) == 0;
AddrSurfInfoIn.flags.compressZ = AddrSurfInfoIn.flags.depth;
/* noStencil = 0 can result in a depth part that is incompatible with
* mipmapped texturing. So set noStencil = 1 when mipmaps are requested (in
* this case, we may end up setting stencil_adjusted).
*
* TODO: update addrlib to a newer version, remove this, and
* use flags.matchStencilTileCfg = 1 as an alternative fix.
*/
if (surf->last_level > 0)
AddrSurfInfoIn.flags.noStencil = 1;
/* Set preferred macrotile parameters. This is usually required
* for shared resources. This is for 2D tiling only. */
if (AddrSurfInfoIn.tileMode >= ADDR_TM_2D_TILED_THIN1 &&
surf->bankw && surf->bankh && surf->mtilea && surf->tile_split) {
/* If any of these parameters are incorrect, the calculation
* will fail. */
AddrTileInfoIn.banks = surf->num_banks;
AddrTileInfoIn.bankWidth = surf->bankw;
AddrTileInfoIn.bankHeight = surf->bankh;
AddrTileInfoIn.macroAspectRatio = surf->mtilea;
AddrTileInfoIn.tileSplitBytes = surf->tile_split;
AddrTileInfoIn.pipeConfig = surf->pipe_config + 1; /* +1 compared to GB_TILE_MODE */
AddrSurfInfoIn.flags.opt4Space = 0;
AddrSurfInfoIn.pTileInfo = &AddrTileInfoIn;
/* If AddrSurfInfoIn.pTileInfo is set, Addrlib doesn't set
* the tile index, because we are expected to know it if
* we know the other parameters.
*
* This is something that can easily be fixed in Addrlib.
* For now, just figure it out here.
* Note that only 2D_TILE_THIN1 is handled here.
*/
assert(!(surf->flags & RADEON_SURF_Z_OR_SBUFFER));
assert(AddrSurfInfoIn.tileMode == ADDR_TM_2D_TILED_THIN1);
if (ws->info.chip_class == SI) {
if (AddrSurfInfoIn.tileType == ADDR_DISPLAYABLE) {
if (surf->bpe == 2)
AddrSurfInfoIn.tileIndex = 11; /* 16bpp */
else
AddrSurfInfoIn.tileIndex = 12; /* 32bpp */
} else {
if (surf->bpe == 1)
AddrSurfInfoIn.tileIndex = 14; /* 8bpp */
else if (surf->bpe == 2)
AddrSurfInfoIn.tileIndex = 15; /* 16bpp */
else if (surf->bpe == 4)
AddrSurfInfoIn.tileIndex = 16; /* 32bpp */
else
AddrSurfInfoIn.tileIndex = 17; /* 64bpp (and 128bpp) */
}
} else {
if (AddrSurfInfoIn.tileType == ADDR_DISPLAYABLE)
AddrSurfInfoIn.tileIndex = 10; /* 2D displayable */
else
AddrSurfInfoIn.tileIndex = 14; /* 2D non-displayable */
AddrSurfInfoOut.macroModeIndex = cik_get_macro_tile_index(surf);
}
}
surf->bo_size = 0;
surf->dcc_size = 0;
surf->dcc_alignment = 1;
surf->htile_size = surf->htile_slice_size = 0;
surf->htile_alignment = 1;
/* Calculate texture layout information. */
for (level = 0; level <= surf->last_level; level++) {
r = radv_compute_level(ws->addrlib, surf, false, level, type, compressed,
&AddrSurfInfoIn, &AddrSurfInfoOut, &AddrDccIn, &AddrDccOut);
if (r)
break;
if (level == 0) {
surf->bo_alignment = AddrSurfInfoOut.baseAlign;
surf->pipe_config = AddrSurfInfoOut.pTileInfo->pipeConfig - 1;
radv_set_micro_tile_mode(surf, &ws->info);
/* For 2D modes only. */
if (AddrSurfInfoOut.tileMode >= ADDR_TM_2D_TILED_THIN1) {
surf->bankw = AddrSurfInfoOut.pTileInfo->bankWidth;
surf->bankh = AddrSurfInfoOut.pTileInfo->bankHeight;
surf->mtilea = AddrSurfInfoOut.pTileInfo->macroAspectRatio;
surf->tile_split = AddrSurfInfoOut.pTileInfo->tileSplitBytes;
surf->num_banks = AddrSurfInfoOut.pTileInfo->banks;
surf->macro_tile_index = AddrSurfInfoOut.macroModeIndex;
} else {
surf->macro_tile_index = 0;
}
}
}
/* Calculate texture layout information for stencil. */
if (surf->flags & RADEON_SURF_SBUFFER) {
AddrSurfInfoIn.bpp = 8;
AddrSurfInfoIn.flags.depth = 0;
AddrSurfInfoIn.flags.stencil = 1;
/* This will be ignored if AddrSurfInfoIn.pTileInfo is NULL. */
AddrTileInfoIn.tileSplitBytes = surf->stencil_tile_split;
for (level = 0; level <= surf->last_level; level++) {
r = radv_compute_level(ws->addrlib, surf, true, level, type, compressed,
&AddrSurfInfoIn, &AddrSurfInfoOut, &AddrDccIn, &AddrDccOut);
if (r)
return r;
/* DB uses the depth pitch for both stencil and depth. */
if (surf->stencil_level[level].nblk_x != surf->level[level].nblk_x)
surf->stencil_adjusted = true;
if (level == 0) {
/* For 2D modes only. */
if (AddrSurfInfoOut.tileMode >= ADDR_TM_2D_TILED_THIN1) {
surf->stencil_tile_split =
AddrSurfInfoOut.pTileInfo->tileSplitBytes;
}
}
}
}
/* Recalculate the whole DCC miptree size including disabled levels.
* This is what addrlib does, but calling addrlib would be a lot more
* complicated.
*/
#if 0
if (surf->dcc_size && surf->last_level > 0) {
surf->dcc_size = align64(surf->bo_size >> 8,
ws->info.pipe_interleave_bytes *
ws->info.num_tile_pipes);
}
#endif
return 0;
} }
static int radv_amdgpu_winsys_surface_best(struct radeon_winsys *rws, static int radv_amdgpu_winsys_surface_best(struct radeon_winsys *rws,

Some files were not shown because too many files have changed in this diff Show More